Metadata-Version: 2.1
Name: dpmmpython
Version: 0.1.9
Summary: Python wrapper for DPMMSubClusters julia package
Home-page: https://github.com/BGU-CS-VIL/DPMMPython
Author: Or Dinari
Author-email: dinari@post.bgu.ac.il
License: MIT
Project-URL: Source, https://github.com/BGU-CS-VIL/DPMMPython
Project-URL: Tracker, https://github.com/BGU-CS-VIL/issues
Project-URL: Documentation, https://bgu-cs-vil.github.io/DPMMSubClusters.jl/latest/
Keywords: julia python
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: julia
Requires-Dist: wget
Provides-Extra: test
Requires-Dist: numpy ; extra == 'test'
Requires-Dist: ipython ; extra == 'test'
Requires-Dist: pytest (>=4.4) ; extra == 'test'
Requires-Dist: mock ; extra == 'test'

<br>
<p align="center">
<img src="https://www.cs.bgu.ac.il/~dinari/images/clusters_low_slow.gif" alt="DPGMM SubClusters 2d example">
</p>

## DPMMSubClusters

This package is a Python wrapper for the [DPMMSubClusters.jl](https://github.com/BGU-CS-VIL/DPMMSubClusters.jl) Julia package and for the [DPMMSubClusters_GPU](https://github.com/BGU-CS-VIL/DPMMSubClusters_GPU) CUDA/C++ package.<br>

### Motivation

Working on a subset of 100K images from ImageNet, containing 79 classes, we have created embeddings using [SWAV](https://github.com/facebookresearch/swav), and reduced the dimension to 128 using PCA. We have compared our method with the popular scikit-learn [GMM](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.GaussianMixture.html) and [DPGMM](https://scikit-learn.org/stable/modules/generated/sklearn.mixture.BayesianGaussianMixture.html) with the following results:
<p align="center">
  
| Method                                              | Timing (sec) | NMI (higher is better) |
|-----------------------------------------------------|--------------|------------------------|
| *Scikit-learn's GMM* (using EM, and given the True K) | 2523         | 0.695                   |
| *Scikit-learn's DPGMM*                                | 6108         | 0.683                   | 
| DPMMpython                                          | 475           | 0.705                   | 

</p>


### Installation

If you wish to use only the CPU version, you may skip all the GPU related steps.

1. Install CUDA version 11.2 (or higher) from https://developer.nvidia.com/CUDA-downloads
2. git clone https://github.com/BGU-CS-VIL/DPMMSubClusters_GPU
3. Install Julia from: https://julialang.org/downloads/platform
4. Add our DPMMSubCluster package from within a Julia terminal via Julia package manager:
```
] add DPMMSubClusters
```
5. Add our dpmmpython package in python: pip install dpmmpython
6. Add Environment Variables:
	#### On Linux:
	1. Add "CUDA_VERSION" with the value of the version of your CUDA installation (e.g., 11.6).
	2. Add to the "PATH" environment variable the path to the Julia executable (e.g., in .bashrc add: export PATH =$PATH:$HOME/julia/julia-1.6.0/bin).
	3. Make sure that CUDA_PATH exist. If it is missing add it with a path to CUDA (e.g., export CUDA_PATH=/usr/local/cuda-11.6/).
	4. Make sure that the relevant CUDA paths are included in $PATH and $LD_LIBRARY_PATH (e.g., export PATH=/usr/local/cuda-11.6/bin:$PATH, export LD_LIBRARY_PATH=/usr/local/cuda-
11.6/lib64:$LD_LIBRARY_PATH).
	#### On Windows:
	1. Add "CUDA_VERSION" with the value of the version of your CUDA installation (e.g., 11.6).
	2. Add to the "PATH" environment variable the path to the Julia executable (e.g., C:\Users\<USER>\AppData\Local\Programs\Julia\Julia-1.6.0\bin).
	3. Make sure that CUDA_PATH exists. If it is missing add it with a path to CUDA (e.g., C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6).
7. Install cmake if necessary.
8. Install PyJulia from within a Python terminal:
```
	import julia;julia.install();
```
9. For Windows only (optional, used on for debugging purposes): Install OpenCV
	1. run Git Bash
	2. cd <YOUR_PATH_TO_DPMMSubClusters_GPU>/DPMMSubClusters
	3. ./installOCV.sh

### Building
For Windows for the CUDA/C++ package both of the build options below are viable. For Linux use
Option 2.
#### Option 1:
DPMMSubClusters.sln - Solution file for Visual Studio 2019
#### Option 2:
CMakeLists.txt
1. Run in the terminal:
```
cd <YOUR_PATH_TO_DPMMSubClusters_GPU>/DPMMSubClusters
mkdir build
cd build
cmake -S ../
```
2. Build:
* Windows: 
```cmake --build . --config Release --target ALL_BUILD```
* Linux: ```cmake --build . --config Release --target all```

### Post Build
Add Environment Variable:
* On Linux:</BR>
Add "DPMM_GPU_FULL_PATH_TO_PACKAGE_IN_LINUX" with the value of the path to the binary of the package DPMMSubClusters_GPU.</BR>
The path is: <YOUR_PATH_TO_DPMMSubClusters_GPU>/DPMMSubClusters/DPMMSubClusters.
* On Windows:</BR>
Add "DPMM_GPU_FULL_PATH_TO_PACKAGE_IN_WINDOWS" with the value of the path to the exe of the package DPMMSubClusters_GPU.</BR>
The path is: <YOUR_PATH_TO_DPMMSubClusters_GPU>\DPMMSubClusters\build\Release
\DPMMSubClusters.exe.


### Usage Example:

```
from dpmmpython.dpmmwrapper import DPMMPython
from dpmmpython.priors import niw
import numpy as np

data,gt = DPMMPython.generate_gaussian_data(10000, 2, 10, 100.0)
prior = niw(1,np.zeros(2),4,np.eye(2))
labels,_,results = DPMMPython.fit(data,100,prior = prior,verbose = True, gt = gt, gpu = False)
  
```
```
Iteration: 1 || Clusters count: 1 || Log posterior: -71190.14226686998 || Vi score: 1.990707323192506 || NMI score: 6.69243345834295e-16 || Iter Time:0.004499912261962891 || Total time:0.004499912261962891
Iteration: 2 || Clusters count: 1 || Log posterior: -71190.14226686998 || Vi score: 1.990707323192506 || NMI score: 6.69243345834295e-16 || Iter Time:0.0038819313049316406 || Total time:0.008381843566894531
...
Iteration: 98 || Clusters count: 9 || Log posterior: -40607.39498126549 || Vi score: 0.11887067921133423 || NMI score: 0.9692247699387838 || Iter Time:0.015907764434814453 || Total time:0.5749104022979736
Iteration: 99 || Clusters count: 9 || Log posterior: -40607.39498126549 || Vi score: 0.11887067921133423 || NMI score: 0.9692247699387838 || Iter Time:0.01072382926940918 || Total time:0.5856342315673828
Iteration: 100 || Clusters count: 9 || Log posterior: -40607.39498126549 || Vi score: 0.11887067921133423 || NMI score: 0.9692247699387838 || Iter Time:0.010260820388793945 || Total time:0.5958950519561768
```
```
predictions = DPMMPython.predict(results[-1],data)
```

You can modify the number of processes by using `DPMMPython.add_procs(procs_count)`, note that you can only scale it upwards.

#### Additional Examples:
[Clustering](https://nbviewer.jupyter.org/github/BGU-CS-VIL/dpmmpython/blob/master/examples/clustering_example.ipynb)
<br>
[Multi-Process](https://nbviewer.jupyter.org/github/BGU-CS-VIL/dpmmpython/blob/master/examples/multi_process.ipynb)


#### Python 3.8/3.9
If you are having problems with the above Python version, please update PyJulia and PyCall to the latest versions, this should fix it.

### Misc

For any questions: dinari@post.bgu.ac.il

Contributions, feature requests, suggestion etc.. are welcomed.

If you use this code for your work, please cite the following:

```
@inproceedings{dinari2019distributed,
  title={Distributed MCMC Inference in Dirichlet Process Mixture Models Using Julia},
  author={Dinari, Or and Yu, Angel and Freifeld, Oren and Fisher III, John W},
  booktitle={2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)},
  pages={518--525},
  year={2019}
}
```


