Metadata-Version: 2.1
Name: multidimensionalks
Version: 0.0.4
Summary: Multidimensional KS test module in python
Author: Tomasz Pawlowski
Author-email: t.pawlowski@mimuw.edu.pl
License: MIT
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: License :: OSI Approved :: MIT License
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy (==1.19)

# multidimensionalks
Python c extension with method for calculating multidimensional Kolmogorov-Smirnov test

`multidimensionalks.test(rvs, cdf=None, counts_rvs=None, counts_cdf=None, n_jobs=1, permutation_samples=0, binomial_significance=False, use_avx=True, max_alpha_beta=False, scale_result=True, deduplicate_data=True)`

# Example usage

```python
from multidimensionalks import test
import numpy as np

test(np.array([[1, 2, 3], [1, 3, 2]]), rvs=np.array([[1,2,2]]))
```

# Parameters

* `rvs`: 2-dimensional numpy number array with rows representing `d`-dimensional observations,
* `cdf`: 2-dimensional numpy number array with rows representing second sample`d`-dimensional observations,
* `counts_rvs`: in case of `rvs` having multiple duplicates, an array without duplicates and a separate array of counts can be provided,
* `counts_cdf`: in case of `cdf` having multiple duplicates, an array without duplicates and a separate array of counts can be provided, additionally if `cdf` is not given `counts_cdf` are taken as counts of elements of `rvs` array,
* `n_jobs`: number of threads used during calculation,
* `permutation_samples`: number of times data is shuffled and the statistic value is calculated to estimate pvalue,
* `binomial_significance`: boolean value indicating if statistical significance should be calculated. Defaults to `False`,
* `use_avx`: boolean value indicating if `AVX2` instructions should be used during the calculations. Defaults to `True`,
* `max_alpha_beta`: boolean value indicating how λ and β values should be combined. Value `True` results in `max(λ, β)`. `λ+β` is used otherwise.
* `scale_result`: Whether to scale the statistic by $\sqrt{\frac{|rvs|+|cdf|}{|rvs|\times|cdf|}}$,
* `deduplicate_data`: Whether to deduplicate data points before running the algorithms. 

# Return

3 Element tuple with values:

* ks statistic
* pvalue calculated using statistical method or -1 in case `binomial_significance` is set to `False`
* pvalue calculated using permutation method or -1 in case `permutation_samples` is set to `0`
