Metadata-Version: 2.1
Name: data_prep_toolkit_transforms
Version: 0.2.1.dev3
Summary: Data Preparation Toolkit Transforms
Author-email: Maroun Touma <touma@us.ibm.com>
License: Apache-2.0
Keywords: transforms,data preprocessing,data preparation,llm,generative,ai,fine-tuning,llmapps
Requires-Python: <3.12,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: data-prep-toolkit>=0.2.1.dev3
Requires-Dist: bs4==0.0.2
Requires-Dist: docling-core==1.2.0
Requires-Dist: docling==1.11.0
Requires-Dist: filetype<2.0.0,>=1.2.0
Requires-Dist: quackling==0.4.0
Requires-Dist: duckdb==0.10.1
Requires-Dist: fasttext==0.9.2
Requires-Dist: huggingface-hub<1.0.0,>=0.21.4
Requires-Dist: langcodes==3.3.0
Requires-Dist: mmh3==4.1.0
Requires-Dist: numpy==1.26.4
Requires-Dist: pandas
Requires-Dist: parameterized
Requires-Dist: sentence-transformers==3.0.1
Requires-Dist: transformers==4.38.2
Requires-Dist: xxhash==3.4.1
Requires-Dist: presidio-analyzer>=2.2.355
Requires-Dist: presidio-anonymizer>=2.2.355
Requires-Dist: flair>=0.14.0
Requires-Dist: pandas>=2.2.2
Requires-Dist: scancode-toolkit==32.1.0; platform_system != "Darwin"

# DPK Python Transforms

## installation

The [transforms](https://github.com/IBM/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:

`python -m pip install data-prep-toolkit-transforms`

installing the python transforms will also install  `data-prep-toolkit`

## List of Transforms in current package

* code
    * [code2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/python/README.md)
    * [header_cleanser (Not available on MacOS)](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/header_cleanser/python/README.md)
    * [code_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_quality/python/README.md)
    * [proglang_select](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/proglang_select/python/README.md)
* language
    * [doc_chunk](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_chunk/python/README.md)
	* [doc_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_quality/python/README.md)
	* [lang_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/lang_id/python/README.md)
	* [pdf2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/pdf2parquet/python/README.md)
	* [text_encoder](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/text_encoder/python/README.md)
	* [pii_redactor](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/pii_redactor/python/README.md)
* universal
    * [ededup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/ededup/python/README.md)
	* [filter](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/filter/python/README.md)
	* [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/resize/python/README.md)
	* [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/tokenization/doc_chunk/python/README.md)
	* [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_id/python/README.md)

	




 
