Metadata-Version: 2.1
Name: data_prep_toolkit
Version: 0.0.1.dev1
Summary: Data Preparation Toolkit Library
Author-email: David Wood <dawood@us.ibm.com>, Boris Lublinsky <blublinsky@ibm.com>
License: Apache-2.0
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: ray[default] ==2.9.3
Requires-Dist: pyarrow ==15.0.2
Requires-Dist: boto3 ==1.34.69
Requires-Dist: argparse
Requires-Dist: mmh3
Requires-Dist: fastapi >=0.109.1
Requires-Dist: pillow >=10.2.0
Provides-Extra: dev
Requires-Dist: twine ; extra == 'dev'
Requires-Dist: pytest >=7.3.2 ; extra == 'dev'
Requires-Dist: pytest-dotenv >=0.5.2 ; extra == 'dev'
Requires-Dist: pytest-env >=1.0.0 ; extra == 'dev'
Requires-Dist: pre-commit >=3.3.2 ; extra == 'dev'
Requires-Dist: pytest-cov >=4.1.0 ; extra == 'dev'
Requires-Dist: pytest-mock >=3.10.0 ; extra == 'dev'
Requires-Dist: moto ==5.0.5 ; extra == 'dev'
Requires-Dist: markupsafe ==2.0.1 ; extra == 'dev'

# Data Processing Library
This provides a python framework for developing _transforms_
on data stored in files - currently parquet files are supported -
and running them in a [ray](https://ray.com) cluster.
Data files may be stored in the local file system or  COS/S3.
For more details see the [documentation](doc/overview.md).

### Virtual Environment
The project uses `pyproject.toml` and a Makefile for operations.
To do development you should establish the virtual environment
```shell
make venv
```
and then either activate
```shell
source venv/bin/activate
```
or set up your IDE to use the venv directory when developing in this project

## Library Artifact Build and Publish
To test, build and publish the library to artifactory
```shell
make test build publish
```
To up the version number, edit the Makefile to change VERSION and rerun
the above.  This will require committing both the `Makefile` and the
autotmatically updated `pyproject.toml` file.


