Metadata-Version: 2.1
Name: data_prep_toolkit
Version: 0.2.2.dev2
Summary: Data Preparation Toolkit Library for Ray and Python
Author-email: Maroun Touma <touma@us.ibm.com>
License: Apache-2.0
Keywords: data,data preprocessing,data preparation,llm,generative,ai,fine-tuning,llmapps
Requires-Python: <3.13,>=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy<1.29.0
Requires-Dist: pyarrow==16.1.0
Requires-Dist: boto3==1.34.69
Requires-Dist: argparse
Requires-Dist: mmh3
Requires-Dist: psutil
Provides-Extra: dev
Requires-Dist: twine; extra == "dev"
Requires-Dist: pytest>=7.3.2; extra == "dev"
Requires-Dist: pytest-dotenv>=0.5.2; extra == "dev"
Requires-Dist: pytest-env>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.3.2; extra == "dev"
Requires-Dist: pytest-cov>=4.1.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: moto==5.0.5; extra == "dev"
Requires-Dist: markupsafe==2.0.1; extra == "dev"
Provides-Extra: ray
Requires-Dist: ray[default]==2.36.1; extra == "ray"
Requires-Dist: fastapi>=0.110.2; extra == "ray"
Requires-Dist: pillow>=10.3.0; extra == "ray"
Provides-Extra: spark
Requires-Dist: pyspark>=3.5.2; extra == "spark"
Requires-Dist: psutil>=6.0.0; extra == "spark"
Requires-Dist: PyYAML>=6.0.2; extra == "spark"

# Data Processing Library
This provides a python framework for developing _transforms_
on data stored in files - currently parquet files are supported -
and running them in a [ray](https://www.ray.io/) cluster.
Data files may be stored in the local file system or  COS/S3.
For more details see the [documentation](../doc/overview.md).

### Virtual Environment
The project uses `pyproject.toml` and a Makefile for operations.
To do development you should establish the virtual environment
```shell
make venv
```
and then either activate
```shell
source venv/bin/activate
```
or set up your IDE to use the venv directory when developing in this project

## Library Artifact Build and Publish
To test, build and publish the library 
```shell
make test build publish
```

To up the version number, edit the Makefile to change VERSION and rerun
the above.  This will require committing both the `Makefile` and the
autotmatically updated `pyproject.toml` file.



