Metadata-Version: 2.1
Name: Amazon-DenseClus
Version: 0.0.5
Summary: Dense Clustering for Mixed Data Types
Home-page: https://github.com/awslabs/amazon-denseclus
Author: Charles Frenzel
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: umap-learn (>=0.5.1)
Requires-Dist: numpy (>=1.20.2)
Requires-Dist: hdbscan (>=0.8.27)
Requires-Dist: numba (>=0.51.2)
Requires-Dist: pandas (>=1.2.4)
Requires-Dist: scikit-learn (>=0.24.2)

# Amazon DenseClus

[![build](https://github.com/awslabs/amazon-denseclus/actions/workflows/tests.yml/badge.svg)](https://github.com/awslabs/amazon-denseclus/actions/workflows/tests.yml) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/Amazon-DenseClus) [![PyPI version](https://badge.fury.io/py/Amazon-DenseClus.svg)](https://badge.fury.io/py/Amazon-DenseClus) ![PyPI - Wheel](https://img.shields.io/pypi/wheel/Amazon-DenseClus) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) ![PyPI - License](https://img.shields.io/pypi/l/Amazon-DenseClus)

DenseClus is a Python module for clustering mixed type data using [UMAP](https://github.com/lmcinnes/umap) and [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan). Allowing for both categorical and numerical data, DenseClus makes it possible to incoproate all features in clustering.

## Installation

```bash
pip install Amazon-DenseClus
```

## Usage

DenseClus requires a Panda's dataframe as input with both numerical and categorical columns.
All preprocessing and extraction are done under the hood, just call fit and then retrieve the clusters!

```python
from denseclus.DenseClus import DenseClus

clf = DenseClus(
    umap_combine_method="intersection_union_mapper",
)
clf.fit(df)

print(clf.score())
```

## Examples

A hands-on example with an overview of how to use is currently available in the form of a [Jupyer notebook](notebooks/DenseClus%20Example%20NB.ipynb).

## References

```bibtex
@article{mcinnes2018umap-software,
  title={UMAP: Uniform Manifold Approximation and Projection},
  author={McInnes, Leland and Healy, John and Saul, Nathaniel and Grossberger, Lukas},
  journal={The Journal of Open Source Software},
  volume={3},
  number={29},
  pages={861},
  year={2018}
}
```

```bibtex
@article{mcinnes2017hdbscan,
  title={hdbscan: Hierarchical density based clustering},
  author={McInnes, Leland and Healy, John and Astels, Steve},
  journal={The Journal of Open Source Software},
  volume={2},
  number={11},
  pages={205},
  year={2017}
}
```


