Metadata-Version: 2.1
Name: quality-covers
Version: 3.1.0
Summary: Python Package with quality covers C++ extension
Home-page: UNKNOWN
Author: Nicolas Gros
Author-email: nicolas.gros01@u-bourgogne.fr
License: UNKNOWN
Platform: UNKNOWN
Description-Content-Type: text/markdown
Requires-Dist: numpy

# Quality covers

Quality covers is a pattern mining algorithm.

# Install

```shell
pip3 install --upgrade quality_covers
```

## Transactional file

If your file looks like this

chess.dat: 
```
1 3 5 7 10 
1 3 5 7 10 
1 3 5 8 9 
1 3 6 7 9 
1 3 6 8 9 
```

or

```
P30968
P48551 P17181
P05121 Q03405 P00747 P02671
Q02643
P48551 P17181
```

use

```python
import quality_covers

quality_covers.run_classic_size("chess.dat", False)
```

## Binary file

If your file looks like this

chess.data: 
```
1 0 1 0 1 0 1 0 0 1
1 0 1 0 1 0 1 0 0 1
1 0 1 0 1 0 0 1 1 0
1 0 1 0 0 1 1 0 1 0
1 0 1 0 0 1 0 1 1 0
```

use

```python
import quality_covers

quality_covers.run_classic_size("chess.data", True)
```

## Output of the functions

The functions will create two files in current directory:
- *chess.data.out*: the result file
- *chess.data.clock*: information about time execution

# Extract binary matrices

You can obtain binary matrices by calling `extract_binary_matrices` on the output file

```python
quality_covers.extract_binary_matrices('chess.data.out')
```

# Optional arguments

## Threshold coverage

You can provide a threshold to the coverage.

```python
# 60% of coverage
quality_covers.run_classic_size("chess.data", True, 0.6)
```

## Measures

You can also ask for information about measures:
- frequency
- monocle
- separation
- object uniformity

```python
quality_covers.run_classic_size("chess.data", True, 0.6, True)
```

```
3,4,9 ; 4,5,6,7,8#Object Uniformity=0.81944; Monocole=91.00000; Frequency=0.33333; Separation=0.48387
2,9 ; 1,3,7#Object Uniformity=0.68750; Monocole=28.00000; Frequency=0.22222; Separation=0.27273
1,6,9 ; 2,7#Object Uniformity=0.63889; Monocole=28.00000; Frequency=0.33333; Separation=0.31579
# Mandatory: 0
# Non-mandatory: 3
# Total: 3
# Coverage: 25/38(65.78947%)
# Mean frequency: 0.29630
# Mean monocole: 49.00000
# Mean object uniformity: 0.71528
# Mean separation: 0.35746
```

# Different algorithms

There are currently four different algorithms:

- `run_classic_size`
- `run_approximate_size`
- `run_fca_cemb_with_mandatory`
- `run_fca_cemb_without_mandatory`

# Examples

## Transactional file with 80% coverage and measures information with approximate size algorithm

### Data file

```
1 3 5 7 10 
1 3 5 7 10 
1 3 5 8 9 
1 3 6 7 9 
1 3 6 8 9 
1 4 5 7 10 
1 4 5 7 10 
1 4 5 8 9 
1 4 6 7 9 
1 4 6 8 9 
2 3 5 7 10 
2 3 5 7 10 
2 3 5 8 9 
2 3 6 7 9 
2 3 6 8 9 
2 4 5 7 10 
2 4 5 7 10 
2 4 5 8 9 
2 4 6 7 9 
2 4 6 8 9 
```

### Python commands

```python
import quality_covers

quality_covers.run_approximate_size(file.data', True, 0.8, True)
```

### Results file.data.out

```
1,2,6,7,11,12,16,17 ; 5,7,10#Object Uniformity=0.60000; Monocle=648.00000; Frequency=0.40000; Separation=0.50000
4,5,9,10,14,15,19,20 ; 9,6#Object Uniformity=0.40000; Monocle=352.00000; Frequency=0.40000; Separation=0.36364
3,5,8,10,13,15,18,20 ; 8,9#Object Uniformity=0.40000; Monocle=352.00000; Frequency=0.40000; Separation=0.36364
11,12,13,14,15,16,17,18,19,20 ; 2#Object Uniformity=0.20000; Monocle=228.00000; Frequency=0.50000; Separation=0.20000
6,7,8,9,10,16,17,18,19,20 ; 4#Object Uniformity=0.20000; Monocle=258.00000; Frequency=0.50000; Separation=0.20000
1,2,3,4,5,11,12,13,14,15 ; 3#Object Uniformity=0.20000; Monocle=258.00000; Frequency=0.50000; Separation=0.20000
# Mandatory: 0
# Non-mandatory: 6
# Total: 6
# Coverage: 82/100(82.00000%)
# Mean frequency: 0.45000
# Mean monocle: 349.33334
# Mean object uniformity: 0.33333
# Mean separation: 0.30455
```

### Extract binary matrices

```python
import quality_covers

quality_covers.extract_binary_matrices('file.data.out')
```

#### Result binary matrices extent

```
1 0 0 0 0 1
1 0 0 0 0 1
0 0 1 0 0 1
0 1 0 0 0 1
0 1 1 0 0 1
1 0 0 0 1 0
1 0 0 0 1 0
0 0 1 0 1 0
0 1 0 0 1 0
0 1 1 0 1 0
1 0 0 1 0 1
1 0 0 1 0 1
0 0 1 1 0 1
0 1 0 1 0 1
0 1 1 1 0 1
1 0 0 1 1 0
1 0 0 1 1 0
0 0 1 1 1 0
0 1 0 1 1 0
0 1 1 1 1 0
```
#### Result binary matrices extent

The first line is the name of the column

```
5 7 10 9 6 8 2 4 3
1 1 1 0 0 0 0 0 0
0 0 0 1 1 0 0 0 0
0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 1
```

# More info

## Paper associated

To come

## Research lab

- http://www.ciad-lab.fr/

## More tools about association rules

- https://marm.checksem.fr/api/ui/
- https://app.marm.checksem.fr/
- https://quality-cover.checksem.fr/api/ui

## Authors

Amira Mouakher (<amira.mouakher@u-bourgogne.fr>)
Nicolas Gros (<nicolas.gros01@u-bourgogne.fr>)
Sebastien Gerin (<sebastien.gerin@sayens.fr>)


