Metadata-Version: 2.1
Name: data_prep_toolkit_transforms
Version: 0.2.1.dev1
Summary: Data Preparation Toolkit Transforms
Author-email: Maroun Touma <touma@us.ibm.com>
License: Apache-2.0
Keywords: transforms,data preprocessing,data preparation,llm,generative,ai,fine-tuning,llmapps
Requires-Python: <3.12,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: data-prep-toolkit==0.2.1.dev0
Requires-Dist: argparse
Requires-Dist: boto3==1.34.69
Requires-Dist: bs4==0.0.2
Requires-Dist: clamd==1.0.2
Requires-Dist: docling[ocr]==1.1.2
Requires-Dist: duckdb==0.10.1
Requires-Dist: fasttext==0.9.2
Requires-Dist: filetype<2.0.0,>=1.2.0
Requires-Dist: huggingface-hub<1.0.0,>=0.21.4
Requires-Dist: langcodes==3.3.0
Requires-Dist: mmh3==4.1.0
Requires-Dist: numpy==1.26.4
Requires-Dist: pandas
Requires-Dist: parameterized
Requires-Dist: pyarrow==16.1.0
Requires-Dist: python-dateutil>=2.8.2
Requires-Dist: pytz>=2020.1
Requires-Dist: quackling==0.1.0
Requires-Dist: sentence-transformers==3.0.1
Requires-Dist: transformers==4.38.2
Requires-Dist: tzdata>=2022.7
Requires-Dist: xxhash==3.4.1
Requires-Dist: scancode-toolkit==32.1.0; platform_system != "Darwin"

# DPK Python Transforms

## installation

The [transforms](https://github.com/IBM/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:

`python -m pip install data-prep-toolkit-transforms`

installing the python transforms will also install  `data-prep-toolkit`

## List of Transforms in current package

* code
    * [code2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/python/README.md)
    * header_cleanser (Not available on MacOS)
    * code_quality
    * proglang_select
* language
    * doc_chunk
	* *doc_quality
	* lang_id
	* pdf2parquet
	* text_encoder
* universal
    * ededup
	* filter
	* resize
	* tokenization




 
