Metadata-Version: 2.1
Name: CNVoyant
Version: 1.1.5
Summary: Copy Number Variant Pathogenicity Classifier
Home-page: https://github.com/nch-igm/CNVoyant
Author: Rob Schuetz
Author-email: robert.schuetz@nationwidechildrens.org
License: MIT
Project-URL: Bug Tracker, https://github.com/nch-igm/CNVoyant/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: onnxruntime
Requires-Dist: pandas
Requires-Dist: progressbar
Requires-Dist: requests
Requires-Dist: pyvcf3
Requires-Dist: pyBigWig
Requires-Dist: pybedtools
Requires-Dist: scikit-learn ==1.3.2
Requires-Dist: pickleshare
Requires-Dist: uuid
Requires-Dist: pysam
Requires-Dist: shap
Requires-Dist: pyarrow
Requires-Dist: matplotlib
Requires-Dist: seaborn
Requires-Dist: tqdm

# CNVoyant
A collection of tools to annotate, predict clinical significance, and provide prediction explanations for Copy Number Variants (CNVs). Models were trained with the January 2023 version of ClinVar. Separate models were trained to predict deletion and duplication CNVs. To read more about features and benchmarking results, please see our recent publication in JOURNAL_LINK. Here is the graphical abstract of the project:

![image](https://github.com/nch-igm/CNVoyant/assets/72405035/2b97d807-6817-4916-b089-548088684036)

## Dependencies
Python dependencies are handled via the anaconda package manager. The best way to create an environment with all needed dependencies is with conda or mamba (a conda wrapper that runs much faster). Create a new enviornment with CNVoyant with this command:
```
mamba create -n CNVoyant -c conda-forge -c bioconda python=3.10 schuetz.12::cnvoyant
```

## Download Databases
CNVoyant requires ClinVar, conservation scores, functional region boundaries, gnomAD SV, and a GRCh38 reference genome to annotate inputted CNVs. To download these resources, a dependency directory must be specified and passed to the `build_all` method of the  `DependencyBuilder` object.
```
from CNVoyant import DependencyBuilder

data_dir = '/path/to/cnvoyant_dependencies'
db = DependencyBuilder(data_dir)
db.build_all()
```

## Build Features
CNVoyant features must be generated before predictions can be generated. Features can be generated by calling the `get_features` method from the `FeatureBuilder` object.
```
import pandas as pd
from CNVoyant import FeatureBuilder

# Create sample data
cnv_df = pd.DataFrame({
  'CHROMOSOME': ['chr1','chr2','chr3','chr4','chr3'],
  'START': [100000, 100000, 100000, 100000, 179197182],
  'END': [200000, 200000, 200000, 200000, 179236784],
  'CHANGE': ['DEL','DEL','DUP','DUP','DEL']
})

# Intialize CNVoyant FeatureBuilder instance
fb = FeatureBuilder(variant_df = cnv_df, data_dir = data_dir)

# Generate features
fb.get_features()
```

## Generate Predictions
Pretrained models are available to generate predictions. Predictions can be generated by calling the `predict` method from the `Classifier` object.
```
from CNVoyant import Classifier

# Intialize CNVoyant Classifier instance
cl = Classifier(data_dir)

# Generate predictions
cnvoyant_preds = cl.predict(fb.feature_df)
```

### Retrain CNVoyant Classifier
The CNVoyant models can be retrained to a specified set of variants, given that a label is available. Label values must be either 'Benign', 'VUS', or 'Pathogenic'. The name of the column header must be passed to the `train` method from the `Classifier` object.
```
from CNVoyant import FeatureBuilder, Classifier

# Sample data
cnv_train_df = pd.DataFrame({
  'CHROMOSOME': ['chr1','chr2','chr3','chr4','chr3','chr8','chr8','chr8'],
  'START': [100000,100000,100000,100000,179197182,60680919,38458191,37878455],
  'END': [200000,200000,200000,200000,179236784,60738964,38470707,38884501],
  'CHANGE': ['DUP','DEL','DUP','DUP','DEL','DEL','DUP','DUP'],
  'LABEL': ['Benign','Benign','Benign','Benign','Pathogenic','VUS','VUS','Pathogenic']
})

# Intialize CNVoyant FeatureBuilder instance
fb_train = FeatureBuilder(variant_df = cnv_train_df, data_dir = data_dir)

# Generate features
fb_train.get_features()

# Intialize CNVoyant Classifier instance
cl_retrained = Classifier(data_dir)

# Retrain models
cl_retrained.train(fb_train.feature_df, label = 'LABEL')

# Generate predictions
cnvoyant_retrained_preds = cl_retrained.predict(fb.feature_df)
```

### Generate CNVoyant Explanations
A key feature of CNVoyant is the ability to provide reasoning behind the provided clinical significance predictions. Explanations are provided via SHAP force plots, which indicate which features drove the prediction of each class for the provided CNV.
```
from CNVoyant import Explainer

cnv_coordinates = {
    'CHROMOSOME': 'chr3',
    'START': 179197182,
    'END': 179236784,
    'CHANGE': 'DEL'
}

expl = Explainer(
    cnv_coordinates = cnv_coordinates,
    output_dir = '/path/to/output',
    classifier = cl
    )

expl.explain()
```
The output looks like this:<br>
<img width="468" alt="image" src="https://github.com/nch-igm/CNVoyant/assets/72405035/f8342f0a-2c14-4108-b4df-dd9fffe7ba77">

