Metadata-Version: 2.4
Name: rbceq2
Version: 2.3.1rc1
Summary: Call ISBT alleles from VCF/s
Author-email: Liam McIntyre <limcintyre@redcrossblood.org.au>
License: Copyright 2025 Australian Red Cross Lifeblood
        
        Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=2.2.2
Requires-Dist: polars>=0.20.26
Requires-Dist: icecream>=2.1
Requires-Dist: loguru>=0.7.2
Requires-Dist: pyarrow>=18.1
Requires-Dist: reportlab>=4.3.1
Provides-Extra: dev
Requires-Dist: flake8>=3.9.2; extra == "dev"
Requires-Dist: coverage>=7.0; extra == "dev"
Dynamic: license-file

<table>
  <tr>
    <td>
      <h1>RBCeq2: blood group allele inference</h1>
    </td>
    <td align="right">
      <img src="images/Lifeblood-R_Primary_Keyline_RGB.jpg" alt="Lifeblood Logo" width="150">
    </td>
  </tr>
</table>

> [!WARNING]
> NOT FOR CLINICAL USE

## Version v2.3.1

RBCeq2 reads in genomic variant data in the form of variant call files (VCF) and outputs blood group (BG) genotype and phenotype inference.

At the highest level RBCeq2 finds all possible alleles, then filters out those that fail certain logic checks. This allows for an auditable trail of why it has reached a certain result. Every effort has been made to be explicit both in encoding alleles in our database and while writing code. This results in verbose but unambiguous results. Last, some liberties have been taken to standardise syntax and nomenclature across blood groups. 

The initial release of RBCeq2 was focused on perfecting the calling of International Society for Blood Transfusion (ISBT) defined BG alleles from simple variants; single nucleotide variants (SNVs) and small insertions and deletions (indels). Further, it supported the use of long read derived VCFs (i.e. addition of large indels and phased data). However, these features were not as polished. This release (v2.3.1) includes major improvements to the phasing logic – see section 7 of the docs and the change log for details. 

## Bugs

This software is extensively tested and accurately reports genotypes/phenotypes based on our inhouse definitions of the ‘correct’ answer, however, there are some examples where the ‘correct’ answer is subjective. The docs are detailed – if you find what you think is a bug in the results from RBCeq2 please take the time to understand if it inline with what we intended or not (use --debug and look to see what happened). We will endeavor to fix any black and white bugs ASAP. Most of these will be rare variants that are encoded wrong in our database. We value any and all feedback and feature requests.

## Documentation

Documentation can be downloaded from the release page, you will need to be signed in to github to access it.

## How To

Install via pip (python3.12+) or clone the git repository:

```bash
pip install RBCeq2

rbceq2 -h

usage: rbceq2 --vcf example_multi_sample.vcf.gz --out example --reference_genome GRCh37

options:
  -h, --help            show this help message and exit
  -v, --version         Show programs version number and exit.
  --vcf VCF             Path to VCF file/s. Give a folder if you want to pass multiple separate files (file names must end in .vcf or .vcf.gz), or alternatively give a file if using a multi-sample VCF. (default: None)
  --out OUT             Prefix for output files (default: None)
  --depth DEPTH         Minimum number of reads for a variant (default: 10)
  --quality QUALITY     Minimum average genotype quality for a variant (default: 10)
  --processes PROCESSES
                        Number of processes. I.e., how many CPUs are available? ~1GB RAM required per process (default: 1)
  --reference_genome {GRCh37,GRCh38}
                        GRCh37/8 (default: None)
  --phased              Use phase information (default: False)
  --microarray          Input is from a microarray. (default: False)
  --debug               Enable debug logging. If not set, logging will be at info level. (default: False)
  --validate            Enable VCF validation. Doubles run time. Might help you identify input issues (default: False)
  --PDFs                Generate a per sample PDF report (default: False)
  --HPAs                Generate results for HPA (default: False)
```
