Metadata-Version: 2.1
Name: Text_Preprocessing_Toolkit
Version: 0.0.1
Summary: A package that automates text preprocessing
Home-page: https://github.com/Gaurav-Jaiswal-1/Text-Preprocessing-Toolkit.git
Author: Gaurav Jaiswal
Author-email: Gaurav Jaiswal <jaiswalgaurav863@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/Gaurav-Jaiswal-1/Text-Preprocessing-Toolkit
Project-URL: Issues, https://github.com/Gaurav-Jaiswal-1/Text-Preprocessing-Toolkit/issues
Keywords: text preprocessing,toolkit,automation
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: spellchecker
Requires-Dist: matplotlib
Requires-Dist: pandas
Requires-Dist: spacy
Requires-Dist: nltk



# Text Preprocessing Toolkit

This repository contains a Python package for text preprocessing tasks. The toolkit includes functions for various preprocessing steps such as tokenization, lemmatization, stopword removal, text normalization, and more. It aims to provide a convenient and customizable solution for preparing text data for downstream tasks like natural language processing (NLP) and machine learning.

## Features

- **Lowercasing**: Convert all text to lowercase.
- **Punctuation Removal**: Remove punctuation marks from text.
- **Stopword Removal**: Remove common words (e.g., "and", "the") that do not contribute much meaning.
- **Lemmatization**: Reduce words to their base or root form (e.g., "running" -> "run").
- **Spell Correction**: Correct misspelled words in the text.
- **URL and HTML Tag Removal**: Clean URLs and HTML tags from text.
- **Special Character Removal**: Remove non-alphanumeric characters.

## Requirements

- Python 3.8 or higher
- `flake8` for linting
- `pytest` for testing
- Any dependencies defined in `requirements.txt`

## Installation

To install the package, clone the repository and install the necessary dependencies.

### Clone the repository:

```bash
git clone https://github.com/your-username/text-preprocessing-toolkit.git
cd text-preprocessing-toolkit
```

### Install dependencies:

```bash
pip install -r requirements.txt
```

Alternatively, if you want to install the package globally:

```bash
pip install .
```

## Usage

You can use this toolkit in your Python project by importing the preprocessing functions:

```python
from text_preprocessing_toolkit import processor

text = "Your sample text goes here!"

# Preprocess text
cleaned_text = processor.preprocess(text, steps=[
    "lowercase",
    "remove_punctuation",
    "remove_stopwords",
    "lemmatize_text",
    "remove_special_characters",
    "remove_url",
    "remove_html_tags",
    "correct_spellings"
])

print(cleaned_text)
```

### Available Preprocessing Steps:

- **lowercase**: Convert text to lowercase.
- **remove_punctuation**: Remove punctuation characters.
- **remove_stopwords**: Remove stopwords (common words like 'the', 'and', etc.).
- **lemmatize_text**: Lemmatize words (reduce to base form).
- **remove_special_characters**: Remove special characters from text.
- **remove_url**: Remove URLs from text.
- **remove_html_tags**: Remove HTML tags.
- **correct_spellings**: Correct common spelling mistakes.

## Running Tests

This repository includes unit and integration tests using `pytest`. To run the tests:

1. Install `pytest` if you haven't already:

```bash
pip install pytest
```

2. Run the tests:

```bash
pytest
```

Tests are located in the `tests/` directory.

## Code Linting

This project uses `flake8` for linting. To check the code for style issues:

```bash
flake8 text_preprocessing_toolkit
```

## CI/CD

This repository is integrated with GitHub Actions for continuous integration and continuous deployment (CI/CD). Every time a new commit is pushed or a pull request is created to the `main` branch, the following steps will be automatically performed:

- **Linting**: Code will be checked for style issues using `flake8`.
- **Testing**: Unit tests will be run using `pytest`.
- **Build**: The package will be built using `python -m build`.
- **Publish**: The package will be uploaded to PyPI (if a release is created).

## Contributing

We welcome contributions! If you'd like to contribute to the project, please follow these steps:

1. Fork the repository.
2. Create a new branch (`git checkout -b feature-name`).
3. Make your changes and commit them (`git commit -m 'Add feature'`).
4. Push to your forked repository (`git push origin feature-name`).
5. Create a pull request.

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

### Notes:
- Replace the repository URL in the `git clone` command with your actual GitHub repository URL.
- Update any project-specific features or configurations that might be necessary.


# ________________________________________________________________________________________________
