Metadata-Version: 2.1
Name: dm_utils
Version: 0.1.1
Summary: Data Mining Utils
Home-page: https://pypi.org/project/dm_utils/
Author: Mingze He
Author-email: hemingze126@126.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: catboost ~=1.2.5
Requires-Dist: colorama ~=0.4.6
Requires-Dist: ipython ~=8.26.0
Requires-Dist: joblib ~=1.4.2
Requires-Dist: lightgbm ~=4.3.0
Requires-Dist: matplotlib ~=3.9.0
Requires-Dist: ngboost ~=0.5.1
Requires-Dist: numpy ~=1.26.4
Requires-Dist: pandas ~=2.2.2
Requires-Dist: pytorch-tabnet ~=4.1.0
Requires-Dist: scikit-learn ~=1.4.0
Requires-Dist: scipy ~=1.12.0
Requires-Dist: seaborn ~=0.13.2
Requires-Dist: tqdm ~=4.66.5
Requires-Dist: xgboost ~=2.0.3

# README

`dm_utils` is a utility for Data Mining.

## Installation

```bash
pip install dm_utils
```

## Usage

- `dm_utils.hom` : hold-out method

```python
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from dm_utils.hom import HOM

x, y = load_iris(return_X_y=True, as_frame=True)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
# classification task, xgboost and lightgbm model
hom = HOM(task='cls', model=['xgb', 'lgb'])
hom.fit(xtrain, ytrain, record_time=True)
ypred = (hom.predict(xtest) > 0.5).argmax(axis=1)
print(accuracy_score(ypred, ytest))
```

- `dm_utils.oof` : out of fold prediction

```python
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from dm_utils.oof import OOF

x, y = load_breast_cancer(return_X_y=True, as_frame=True)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
# classification task, 2*xgboost, 2*lightgbm and 1*catboost model for 5-fold oof
oof = OOF(task='cls', model=['xgb', 'xgb', 'lgb', 'lgb', 'cb'])
oof.fit(xtrain, ytrain, record_time=True)
ypred = oof.predict(xtest) > 0.5
print(accuracy_score(ypred, ytest))
```

## Features

support algorithm: `scikit-learn`, `xgboost`, `lightgbm`, `catboost`, `ngboost` and `pytorch-tabnet`
