Metadata-Version: 2.1
Name: one-data-processing
Version: 0.0.3
Summary: Data Processing is used for data processing through MinIO, databases, Web APIs, etc.
Home-page: https://github.com/kubeagi/arcadia
Keywords: PDF WORD WEB parsing preprocessing
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: pandas ==2.1.2
Requires-Dist: numpy ==1.26.1
Requires-Dist: sanic ==23.6.0
Requires-Dist: sanic-cors ==2.2.0
Requires-Dist: aiohttp ==3.8.6
Requires-Dist: ulid ==1.1
Requires-Dist: minio ==7.1.17
Requires-Dist: zhipuai ==1.0.7
Requires-Dist: langchain ==0.0.354
Requires-Dist: spacy ==3.5.4
Requires-Dist: pypdf ==3.17.1
Requires-Dist: emoji ==2.2.0
Requires-Dist: ftfy ==6.1.1
Requires-Dist: psycopg2-binary ==2.9.9
Requires-Dist: kubernetes ==25.3.0
Requires-Dist: duckdb ==0.9.2
Requires-Dist: DBUtils ==3.0.3
Requires-Dist: pyyaml ==6.0.1
Requires-Dist: opencc ==0.2
Requires-Dist: opencc-python-reimplemented ==0.1.7
Requires-Dist: selectolax ==0.3.17
Requires-Dist: openai ==1.3.7
Requires-Dist: python-docx ==1.1.0
Requires-Dist: bs4 ==0.0.1
Requires-Dist: playwright ==1.40.0
Requires-Dist: pillow ==10.2.0
Requires-Dist: html2text ==2020.1.16

# Current Version Main Features

Data Processing is used for data processing through MinIO, databases, Web APIs, etc. The data types handled include:
- txt
- json  
- doc
- html
- excel
- csv
- pdf
- markdown
- ppt

## Current Text Type Processing  

The data processing process includes: cleaning abnormal data, filtering, de-duplication, and anonymization.

## Install with pip

```bash
pip install one-data-processing
```
