Metadata-Version: 2.1
Name: CNVoyant
Version: 1.0.30
Summary: Copy Number Variant Pathogenicity Classifier
Home-page: https://github.com/nch-igm/CNVoyant
Author: Rob Schuetz
Author-email: robert.schuetz@nationwidechildrens.org
License: MIT
Project-URL: Bug Tracker, https://github.com/nch-igm/CNVoyant/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: onnxruntime
Requires-Dist: pandas
Requires-Dist: progressbar
Requires-Dist: requests
Requires-Dist: pyvcf3
Requires-Dist: pyBigWig
Requires-Dist: pybedtools
Requires-Dist: scikit-learn ==1.3.2
Requires-Dist: pickleshare
Requires-Dist: uuid
Requires-Dist: pysam
Requires-Dist: shap
Requires-Dist: pyarrow
Requires-Dist: matplotlib
Requires-Dist: tqdm

# CNVoyant
A series of tools to annotate and predict the clinical significance of Copy Number Variants (CNVs). Models were trained with the January 2023 version of ClinVar. Separate models were trained to predict deletion and duplication CNVs. To read more about features and benchmarking results, please see our recent publication in JOURNAL_LINK. Here is the graphical abstract of the project:

![figure2](https://github.com/nch-igm/CNVoyant/assets/72405035/9d779d1d-c4dc-4a0a-b141-6d58087684a5)

## Dependencies
Python dependencies are handled via pip, but a few non-python dependencies are required for CNVoyant to run properly. The best way to create an environment with all needed dependencies is with conda or mamba. An environment.yml file is included in this repository and can be used to create a CNVoyant environment:
```
mamba env create -n CNVoyant -f environment.yml
```

## Download Databases
CNVoyant requires ClinVar, phastCons, phyloP, and gnomAD SV to annotate inputted CNVs. To download these resources, a dependency directory must be specified and passed to the `get_databases` method of the  `DependencyBuilder` object.
```
from CNVoyant import DependencyBuilder

db = DependencyBuilder(data_dir = '/path/to/datadir')
db.get_databases()
```

Other intermediate files are also required. They can build by calling `build_dependencies`.
```
db.build_dependencies()
```

## Build Features
CNVoyant features must be generated before predictions can be generated. Features can be generated by calling the `get_features` method from the `FeatureBuilder` object.
```
from CNVoyant import FeatureBuilder

# Intialize CNVoyant FeatureBuilder instance
fb = FeatureBuilder(data_dir = '/path/to/datadir')

# Create sample data
cnv_df = pd.DataFrame({
  'CHROM': ['chr1','chr2','chr3','chr4'],
  'START': [100000, 100000, 100000, 100000],
  'END': [200000, 200000, 200000, 200000],
  'CHANGE': ['DEL','DEL','DUP','DUP]
})

# Generate features
fb.get_features(cnv_df)
```

## Generate Predictions
Pretrained models are available to generate predictions. Predictions can be generated by calling the `predict` method from the `Classifier` object.
```
from CNVoyant import Classifier

# Intialize CNVoyant Classifier instance
cl = Classifier()

# Generate predictions
cl.predict(fb.feature_df)
cnvoyant_preds = cl.preds
```

### Retrain CNVoyant Classifier
The CNVoyant models can be retrained to a specified set of variants, given that a label is available. Label values must be either 'Benign', 'VUS', or 'Pathogenic'. The name of the column header must be passed to the `train` method from the `Classifier` object.
```
from CNVoyant import FeatureBuilder, Classifier

# Sample data
cnv_train_df = pd.DataFrame({
  'CHROM': ['chr1','chr2','chr3','chr4'],
  'START': [100000, 100000, 100000, 100000],
  'END': [200000, 200000, 200000, 200000],
  'CHANGE': ['DEL','DEL','DUP','DUP],
  'LABEL': ['Benign','VUS','Pathogenic','Benign']
})

fb_train = FeatureBuilder()
fb_train.get_features(cnv_train_df)

# Intialize CNVoyant Classifier instance
cl_retrained = Classifier()

# Retrain models
cl_retrained.train(fb_train.feature_df, label = 'LABEL')

# Generate predictions
cl_retrained.predict(fb.feature_df)
cnvoyant_retrained_preds = cl_retrained.preds
```
