Metadata-Version: 2.1
Name: data_prep_toolkit_transforms_ray
Version: 0.2.1.dev3
Summary: Data Preparation Toolkit Transforms using Ray
Author-email: Maroun Touma <touma@us.ibm.com>
License: Apache-2.0
Keywords: transforms,data preprocessing,data preparation,llm,generative,ai,fine-tuning,llmapps
Requires-Python: <3.12,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: data-prep-toolkit-ray>=0.2.1.dev3
Requires-Dist: data-prep-toolkit-transforms>=0.2.1.dev3
Requires-Dist: parameterized
Requires-Dist: tqdm==4.66.3
Requires-Dist: mmh3==4.1.0
Requires-Dist: xxhash==3.4.1
Requires-Dist: scipy>=1.12.0
Requires-Dist: networkx==3.3
Requires-Dist: colorlog==6.8.2
Requires-Dist: func-timeout==4.3.5
Requires-Dist: pandas==2.2.2
Requires-Dist: emerge-viz==2.0.0
Requires-Dist: scancode-toolkit==32.1.0; platform_system != "Darwin"

# DPK Ray Transforms

## installation

The [transforms](https://github.com/IBM/data-prep-kit/blob/dev/transforms/README.md) are delivered as a standard pyton library available on pypi and can be installed using pip install:

`python -m pip install data-prep-toolkit-transforms-ray`

installing the Ray transforms will also install `data_prep_toolkit_transforms` and `data-prep-toolkit-ray`

## List of Ray Transforms availabe in current package

* code
	* [code2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/ray/README.md)
	* [proglang_select](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/proglang_select/ray/README.md)
	* [header_cleanser (Not available on MacOS)](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code2parquet/ray/README.md)
	* [code_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/code_quality/ray/README.md)
	* [repo_level_ordering](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/repo_level_ordering/ray/README.md)
* language
	* [doc_quality](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_quality/ray/README.md)
	* [doc_chunk](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_chunk/ray/README.md)
	* [lang_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/lang_id/ray/README.md)
	* [text_encoder](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/text_encoder/ray/README.md)
	* [pdf2parquet](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/pdf2parquet/ray/README.md)
	* [pii_redactor](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/pii_redactor/ray/README.md)
* universal
	* [fdedup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/fdedup/ray/README.md)
	* [tokenization](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/tokenization/ray/README.md)
	* [ededup](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/ededup/ray/README.md)
	* [profiler](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/profiler/ray/README.md)
	* [doc_id](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/doc_id/ray/README.md)
	* [filter](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/filter/ray/README.md)
	* [resize](https://github.com/IBM/data-prep-kit/blob/dev/transforms/code/resize/ray/README.md)





 
