Metadata-Version: 2.1
Name: code-mixed-text-toolkit
Version: 0.3.5
Summary: A library for processing Code Mixed Text. Still in development!
Home-page: https://code-mixed-text-toolkit.readthedocs.io/
Author: Reuben Devanesan
Author-email: reubendevanesan@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: requests
Requires-Dist: tqdm


[![forthebadge made-with-python](http://ForTheBadge.com/images/badges/made-with-python.svg)](https://www.python.org/)
<!-- [![forthebadge](https://forthebadge.com/images/badges/made-with-java.svg)](https://forthebadge.com) -->

<div align = center>
<a href = "github.com/plugyawn"><img width="600px" height="180px" src= "https://user-images.githubusercontent.com/76529011/185376373-787f65d5-b78b-4f11-a7fb-e9aa19dc3a04.png"></a>
</div>

-----------------------------------------
[![code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)![Compatibility](https://img.shields.io/badge/compatible%20with-python3.9.x-blue.svg)

CMTT is a wrapper library that makes code-mixed text processing more efficient than ever. More documentation incoming!

### Installation
```
pip install code-mixed-text-toolkit
```

### Get started
How to use this library:

```Python
import code_mixed_text_toolkit.data as cmtt_data
import code_mixed_text_toolkit.preprocessing as cmtt_pp

# Loading json files
result_json = cmtt_data.load('https://world.openfoodfacts.org/api/v0/product/5060292302201.json')

# Loading csv files
result_csv = cmtt_data.load('https://gist.githubusercontent.com/rnirmal/e01acfdaf54a6f9b24e91ba4cae63518/raw/b589a5c5a851711e20c5eb28f9d54742d1fe2dc/datasets.csv')

# List all datasets available
cmtt_data.list_datasets(show_key="url")

# Download specific datasets
cmtt_data.download("openfoodfacts")
cmtt_data.download("rnirmal")

# Load and preprocess txt dataset
result_txt = cmtt_data.load('https://www.w3.org/TR/PNG/iso_8859-1.txt')
result_txt_tokenized = cmtt_pp.tokenizer.word_tokenize(result_txt)

# Search target word in txt corpus
cmtt_pp.search.search_word(result_txt, 'with', tokenize = True, width = 3)
```

### Contributors
 - [Paras Gupta](https://github.com/paras-gupt)
 - [Tarun Sharma](https://github.com/tarun2001sharma)
 - [Reuben Devanesan](https://github.com/Reuben27)

