Metadata-Version: 2.1
Name: fuc
Version: 0.14.0
Summary: Frequently used commands in bioinformatics
Home-page: https://github.com/sbslee/fuc
Author: Seung-been "Steven" Lee
Author-email: sbstevenlee@gmail.com
License: MIT
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Description-Content-Type: text/x-rst
Requires-Dist: biopython
Requires-Dist: lxml
Requires-Dist: matplotlib
Requires-Dist: matplotlib-venn
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: pyranges
Requires-Dist: pysam
Requires-Dist: scipy
Requires-Dist: seaborn

..
   This file was automatically generated by docs/create.py.

README
******

.. image:: https://badge.fury.io/py/fuc.svg
    :target: https://badge.fury.io/py/fuc

.. image:: https://readthedocs.org/projects/sbslee-fuc/badge/?version=latest
   :target: https://sbslee-fuc.readthedocs.io/en/latest/?badge=latest
   :alt: Documentation Status

.. image:: https://anaconda.org/bioconda/fuc/badges/version.svg
   :target: https://anaconda.org/bioconda/fuc

.. image:: https://anaconda.org/bioconda/fuc/badges/license.svg
   :target: https://github.com/sbslee/fuc/blob/main/LICENSE

.. image:: https://anaconda.org/bioconda/fuc/badges/downloads.svg
   :target: https://anaconda.org/bioconda/fuc/files

.. image:: https://anaconda.org/bioconda/fuc/badges/installer/conda.svg
   :target: https://conda.anaconda.org/bioconda

Introduction
============

The main goal of the fuc package (pronounced "eff-you-see") is to wrap some of the most **f**\ requently **u**\ sed **c**\ ommands in the field of bioinformatics into one place.

The package supports both command line interface (CLI) and application programming interface (API) whose documentations are available at the `Read the Docs <https://sbslee-fuc.readthedocs.io/en/latest/>`_.

Currently, fuc can be used to analyze, summarize, visualize, and manipulate the following file formats:

- Sequence Alignment/Map (SAM)
- Binary Alignment/Map (BAM)
- CRAM
- Variant Call Format (VCF)
- Mutation Annotation Format (MAF)
- Browser Extensible Data (BED)
- FASTQ
- FASTA
- delimiter-separated values format (e.g. comma-separated values or CSV format)

Additionally, fuc can be used to parse output data from the following programs:

- Ensembl Variant Effect Predictor (VEP)
- SnpEff
- bcl2fastq and bcl2fastq2

Your contributions (e.g. feature ideas, pull requests) are most welcome.

| Author: Seung-been "Steven" Lee
| Email: sbstevenlee@gmail.com
| License: MIT License

Installation
============

The following packages are required to run fuc:

.. parsed-literal::

   biopython
   lxml
   matplotlib
   matplotlib-venn
   numpy
   pandas
   pyranges
   pysam
   scipy
   seaborn

There are various ways you can install fuc. The recommended way is via conda:

.. code-block:: text

   $ conda install -c bioconda fuc

Above will automatically download and install all the dependencies as well. Alternatively, you can use pip to install fuc and all of its dependencies:

.. code-block:: text

   $ pip install fuc

Finally, you can clone the GitHub repository and then install fuc locally:

.. code-block:: text

   $ git clone https://github.com/sbslee/fuc
   $ cd fuc
   $ pip install .

The nice thing about this approach is that you will have access to development versions that are not available in Anaconda or PyPI. For example, you can access a development branch with the ``git checkout`` command. When you do this, please make sure your environment already has all the dependencies installed.

Getting Help
============

For detailed documentations on the fuc package's CLI and API, please refer to the `Read the Docs <https://sbslee-fuc.readthedocs.io/en/latest/>`_.

For getting help on the fuc CLI:

.. code-block:: text

   $ fuc -h
   usage: fuc [-h] [-v] COMMAND ...

   positional arguments:
     COMMAND        name of the command
       bam_head     [BAM] print the header of a SAM/BAM/CRAM file
       bam_index    [BAM] index a SAM/BAM/CRAM file
       bam_rename   [BAM] rename the samples in a SAM/BAM/CRAM file
       bam_slice    [BAM] slice a SAM/BAM/CRAM file
       bed_intxn    [BED] find intersection of two or more BED files
       bed_sum      [BED] summarize a BED file
       fq_count     [FASTQ] count sequence reads in FASTQ files
       fq_sum       [FASTQ] summarize a FASTQ file
       fuc_compf    [FUC] compare contents of two files
       fuc_demux    [FUC] parse Reports directory from bcl2fastq or bcl2fastq2
       fuc_exist    [FUC] check whether files/directories exist
       fuc_find     [FUC] find files with certain extension recursively
       maf_maf2vcf  [MAF] convert a MAF file to a VCF file
       maf_oncoplt  [MAF] create an oncoplot with a MAF file
       maf_sumplt   [MAF] create a summary plot with a MAF file
       maf_vcf2maf  [MAF] convert an annotated VCF file to a MAF file
       tbl_merge    [TABLE] merge two table files
       tbl_sum      [TABLE] summarize a table file
       vcf_filter   [VCF] filter a VCF file
       vcf_merge    [VCF] merge two or more VCF files
       vcf_rename   [VCF] rename the samples in a VCF file.
       vcf_slice    [VCF] slice a VCF file
       vcf_vcf2bed  [VCF] convert a VCF file to a BED file
       vcf_vep      [VCF] filter a VCF file annotated by Ensemble VEP

   optional arguments:
     -h, --help     show this help message and exit
     -v, --version  show the version number and exit

For getting help on a specific command (e.g. vcf_merge):

.. code-block:: text

   $ fuc vcf_merge -h

Below is the list of submodules available in the fuc API:

- **common** : The common submodule is used by other fuc submodules such as pyvcf and pybed. It also provides many day-to-day actions used in the field of bioinformatics.
- **pybam** : The pybam submodule is designed for working with sequence alignment files (SAM/BAM/CRAM). It essentially wraps the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation.
- **pybed** : The pybed submodule is designed for working with BED files. It implements ``pybed.BedFrame`` which stores BED data as ``pandas.DataFrame`` via the `pyranges <https://github.com/biocore-ntnu/pyranges>`_ package to allow fast computation and easy manipulation. The submodule strictly adheres to the standard `BED specification <https://genome.ucsc.edu/FAQ/FAQformat.html>`_.
- **pycov** : The pycov submodule is designed for working with depth of coverage data from sequence alingment files (SAM/BAM/CRAM). It implements ``pycov.CovFrame`` which stores read depth data as ``pandas.DataFrame`` via the `pysam <https://pysam.readthedocs.io/en/latest/api.html>`_ package to allow fast computation and easy manipulation.
- **pyfq** : The pyfq submodule is designed for working with FASTQ files. It implements ``pyfq.FqFrame`` which stores FASTQ data as ``pandas.DataFrame`` to allow fast computation and easy manipulation.
- **pymaf** : The pymaf submodule is designed for working with MAF files. It implements ``pymaf.MafFrame`` which stores MAF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The ``pymaf.MafFrame`` class also contains many useful plotting methods such as ``MafFrame.plot_oncoplot`` and ``MafFrame.plot_summary``. The submodule strictly adheres to the standard `MAF specification <https://docs.gdc.cancer.gov/Data/File_Formats/MAF_Format/>`_.
- **pysnpeff** : The pysnpeff submodule is designed for parsing VCF annotation data from the `SnpEff <https://pcingola.github.io/SnpEff/>`_ program. It should be used with ``pyvcf.VcfFrame``.
- **pyvcf** : The pyvcf submodule is designed for working with VCF files. It implements ``pyvcf.VcfFrame`` which stores VCF data as ``pandas.DataFrame`` to allow fast computation and easy manipulation. The ``pyvcf.VcfFrame`` class also contains many useful plotting methods such as ``VcfFrame.plot_comparison`` and ``VcfFrame.plot_tmb``. The submodule strictly adheres to the standard `VCF specification <https://samtools.github.io/hts-specs/VCFv4.3.pdf>`_.
- **pyvep** : The pyvep submodule is designed for parsing VCF annotation data from the `Ensembl VEP <https://asia.ensembl.org/info/docs/tools/vep/index.html>`_ program. It should be used with ``pyvcf.VcfFrame``.

For getting help on a specific submodule (e.g. pyvcf):

.. code:: python3

   >>> from fuc import pyvcf
   >>> help(pyvcf)

CLI Examples
============

**BAM**

To print the header of a SAM file:

.. code-block:: text

   $ fuc bam_head in.sam

To index a CRAM file:

.. code-block:: text

   $ fuc bam_head in.cram

To slice a BAM file:

.. code-block:: text

   $ fuc bam_slice in.bam chr1:100-200 out.bam

**BED**

To find intersection between BED files:

.. code-block:: text

   $ fuc bed_intxn 1.bed 2.bed 3.bed > intersect.bed

**FASTQ**

To count sequence reads in a FASTQ file:

.. code-block:: text

   $ fuc fq_count example.fastq

**FUC**

To check whether a file exists in the operating system:

.. code-block:: text

   $ fuc fuc_exist example.txt

To find all VCF files within the current directory recursively:

.. code-block:: text

   $ fuc fuc_find .vcf.gz

**TABLE**

To merge two tab-delimited files:

.. code-block:: text

   $ fuc tbl_merge left.tsv right.tsv > merged.tsv

**VCF**

To merge VCF files:

.. code-block:: text

   $ fuc vcf_merge 1.vcf 2.vcf 3.vcf > merged.vcf

To filter a VCF file annotated by Ensemble VEP:

.. code-block:: text

   $ fuc vcf_vep in.vcf 'SYMBOL == "TP53"' > out.vcf

API Examples
============

**BAM**

To create read depth profile of a region from a CRAM file:

.. code:: python3

    >>> from fuc import pycov
    >>> cf = pycov.CovFrame.from_file('HG00525.final.cram', zero=True,
    ...    region='chr12:21161194-21239796', names=['HG00525'])
    >>> cf.plot_region('chr12', start=21161194, end=21239796)

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/coverage.png

**VCF**

To filter a VCF file based on a BED file:

.. code:: python3

   >>> from fuc import pyvcf
   >>> vf = pyvcf.VcfFrame.from_file('original.vcf')
   >>> filtered_vf = vf.filter_bed('targets.bed')
   >>> filtered_vf.to_file('filtered.vcf')

To remove indels from a VCF file:

.. code:: python3

   >>> from fuc import pyvcf
   >>> vf = pyvcf.VcfFrame.from_file('with_indels.vcf')
   >>> filtered_vf = vf.filter_indel()
   >>> filtered_vf.to_file('no_indels.vcf')

To create a Venn diagram showing genotype concordance between groups:

.. code:: python3

    >>> from fuc import pyvcf, common
    >>> common.load_dataset('pyvcf')
    >>> f = '~/fuc-data/pyvcf/plot_comparison.vcf'
    >>> vf = pyvcf.VcfFrame.from_file(f)
    >>> a = ['Steven_A', 'John_A', 'Sara_A']
    >>> b = ['Steven_B', 'John_B', 'Sara_B']
    >>> c = ['Steven_C', 'John_C', 'Sara_C']
    >>> vf.plot_comparison(a, b, c)

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/plot_comparison.png

To create various figures for normal-tumor analysis:

.. code:: python3

    >>> import matplotlib.pyplot as plt
    >>> from fuc import common, pyvcf
    >>> common.load_dataset('pyvcf')
    >>> vf = pyvcf.VcfFrame.from_file('~/fuc-data/pyvcf/normal-tumor.vcf')
    >>> af = pyvcf.AnnFrame.from_file('~/fuc-data/pyvcf/normal-tumor-annot.tsv', 'Sample')
    >>> normal = af.df[af.df.Tissue == 'Normal'].index
    >>> tumor = af.df[af.df.Tissue == 'Tumor'].index
    >>> fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(2, 2, figsize=(10, 10))
    >>> vf.plot_tmb(ax=ax1)
    >>> vf.plot_tmb(ax=ax2, af=af, hue='Tissue')
    >>> vf.plot_hist('DP', ax=ax3, af=af, hue='Tissue')
    >>> vf.plot_regplot(normal, tumor, ax=ax4)
    >>> plt.tight_layout()

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/normal-tumor.png

**MAF**

To create an oncoplot with a MAF file:

.. code:: python3

    >>> from fuc import common, pymaf
    >>> common.load_dataset('tcga-laml')
    >>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
    >>> mf = pymaf.MafFrame.from_file(f)
    >>> mf.plot_oncoplot()

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/oncoplot.png

To create a customized oncoplot with a MAF file, see the 'Create customized oncoplot' tutorial:

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/customized_oncoplot.png

To create a summary figure for a MAF file:

.. code:: python3

    >>> from fuc import common, pymaf
    >>> common.load_dataset('tcga-laml')
    >>> f = '~/fuc-data/tcga-laml/tcga_laml.maf.gz'
    >>> mf = pymaf.MafFrame.from_file(f)
    >>> mf.plot_summary()

.. image:: https://raw.githubusercontent.com/sbslee/fuc-data/main/images/maf_summary.png



