Metadata-Version: 2.1
Name: poola-be
Version: 0.0.1
Summary: Python package to analyze the results of base editor screens
Home-page: https://github.com/gpprnd/poola_be/tree/master/
Author: Mudra Hegde
Author-email: mhegde@broadinstitute.org
License: Apache Software License 2.0
Keywords: CRISPR,python,base editor
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: License :: OSI Approved :: Apache Software License
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: numpy (>=1.19.5)
Requires-Dist: pandas (>=1.0.0)
Requires-Dist: scikit-learn (>=0.24.1)

# poola_be
> Python package for base editor screens


## Install

`pip install poola_be`

## How to use

To demonstrate the use of these functions, we will first design a base editor tiling library with guides tiling the transcript ENST00000380152 of BRCA2. These guides are annotated with predicted edits using the C>T base editor in the window of nucleotide 4-8.

```python
from poola_be import core as pool_be
import pandas as pd

design_df = pd.read_csv('sample_input/crisprbe-guides.txt', sep='\t')
design_df.head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Input</th>
      <th>CRISPR Enzyme</th>
      <th>Edit Type</th>
      <th>Edit Window</th>
      <th>Target Assembly</th>
      <th>Target Genome Sequence</th>
      <th>Target Gene ID</th>
      <th>Target Gene Symbol</th>
      <th>Target Gene Strand</th>
      <th>Target Transcript ID</th>
      <th>...</th>
      <th>PAM Sequence</th>
      <th>sgRNA Target Sequence Start Pos. (global)</th>
      <th>sgRNA Orientation</th>
      <th>Nucleotide Edits (global)</th>
      <th>Guide Edits</th>
      <th>Nucleotide Edits</th>
      <th>Amino Acid Edits</th>
      <th>Mutation Category</th>
      <th>Constraint Violations</th>
      <th>Note</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>ENST00000380152</td>
      <td>SpyoCas9</td>
      <td>C-T</td>
      <td>4..8</td>
      <td>GRCh38 (9606)</td>
      <td>NC_000013.11</td>
      <td>ENSG00000139618</td>
      <td>BRCA2</td>
      <td>+</td>
      <td>ENST00000380152.8</td>
      <td>...</td>
      <td>TGG</td>
      <td>32316449</td>
      <td>sense</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1</th>
      <td>ENST00000380152</td>
      <td>SpyoCas9</td>
      <td>C-T</td>
      <td>4..8</td>
      <td>GRCh38 (9606)</td>
      <td>NC_000013.11</td>
      <td>ENSG00000139618</td>
      <td>BRCA2</td>
      <td>+</td>
      <td>ENST00000380152.8</td>
      <td>...</td>
      <td>AGG</td>
      <td>32316462</td>
      <td>sense</td>
      <td>32316465C&gt;T</td>
      <td>C_4</td>
      <td>5C&gt;T</td>
      <td>Pro2Leu</td>
      <td>Missense</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>2</th>
      <td>ENST00000380152</td>
      <td>SpyoCas9</td>
      <td>C-T</td>
      <td>4..8</td>
      <td>GRCh38 (9606)</td>
      <td>NC_000013.11</td>
      <td>ENSG00000139618</td>
      <td>BRCA2</td>
      <td>+</td>
      <td>ENST00000380152.8</td>
      <td>...</td>
      <td>AGG</td>
      <td>32316467</td>
      <td>antisense</td>
      <td>32316479G&gt;A;32316481G&gt;A, 32316483G&gt;A</td>
      <td>C_8_6, C_4</td>
      <td>19G&gt;A;21G&gt;A, 23G&gt;A</td>
      <td>Glu7Lys, Arg8Lys</td>
      <td>Missense, Missense</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>3</th>
      <td>ENST00000380152</td>
      <td>SpyoCas9</td>
      <td>C-T</td>
      <td>4..8</td>
      <td>GRCh38 (9606)</td>
      <td>NC_000013.11</td>
      <td>ENSG00000139618</td>
      <td>BRCA2</td>
      <td>+</td>
      <td>ENST00000380152.8</td>
      <td>...</td>
      <td>TGG</td>
      <td>32316477</td>
      <td>antisense</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>4</th>
      <td>ENST00000380152</td>
      <td>SpyoCas9</td>
      <td>C-T</td>
      <td>4..8</td>
      <td>GRCh38 (9606)</td>
      <td>NC_000013.11</td>
      <td>ENSG00000139618</td>
      <td>BRCA2</td>
      <td>+</td>
      <td>ENST00000380152.8</td>
      <td>...</td>
      <td>TGG</td>
      <td>32316488</td>
      <td>antisense</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
<p>5 rows × 23 columns</p>
</div>



# Assign severe mutation bin

As noted in the "Mutation Category" column, each guide is predicted to make more one or more types of mutations if Cs are present in the editing window. We can then annotate each guide with the most severe mutation bin in the order Nonsense > Splice site > Missense > Intron > Silent > UTR > no edit.

```python
design_df['Mutation Bin'] = design_df['Mutation Category'].apply(pool_be.get_most_severe_mutation_type)
design_df[['sgRNA Target Sequence','Mutation Category','Mutation Bin']].head()
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sgRNA Target Sequence</th>
      <th>Mutation Category</th>
      <th>Mutation Bin</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>TCGTAGGTAAAAATGCCTAT</td>
      <td>NaN</td>
      <td>No edits</td>
    </tr>
    <tr>
      <th>1</th>
      <td>TGCCTATTGGATCCAAAGAG</td>
      <td>Missense</td>
      <td>Missense</td>
    </tr>
    <tr>
      <th>2</th>
      <td>GGCCTCTCTTTGGATCCAAT</td>
      <td>Missense, Missense</td>
      <td>Missense</td>
    </tr>
    <tr>
      <th>3</th>
      <td>AAAAAATGTTGGCCTCTCTT</td>
      <td>NaN</td>
      <td>No edits</td>
    </tr>
    <tr>
      <th>4</th>
      <td>TTAAAAATTTCAAAAAATGT</td>
      <td>NaN</td>
      <td>No edits</td>
    </tr>
  </tbody>
</table>
</div>



# Calculate median residue

We can then get the median residue of the predicted edits.

```python
design_df['Median Residue'] = design_df.apply(lambda x: pool_be.get_median_residues(x['Mutation Bin'], x['Amino Acid Edits']), axis=1)
design_df[['sgRNA Target Sequence','Amino Acid Edits','Mutation Category','Mutation Bin','Median Residue']].head(15)
```




<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>sgRNA Target Sequence</th>
      <th>Amino Acid Edits</th>
      <th>Mutation Category</th>
      <th>Mutation Bin</th>
      <th>Median Residue</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>TCGTAGGTAAAAATGCCTAT</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>No edits</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>1</th>
      <td>TGCCTATTGGATCCAAAGAG</td>
      <td>Pro2Leu</td>
      <td>Missense</td>
      <td>Missense</td>
      <td>2.0</td>
    </tr>
    <tr>
      <th>2</th>
      <td>GGCCTCTCTTTGGATCCAAT</td>
      <td>Glu7Lys, Arg8Lys</td>
      <td>Missense, Missense</td>
      <td>Missense</td>
      <td>7.5</td>
    </tr>
    <tr>
      <th>3</th>
      <td>AAAAAATGTTGGCCTCTCTT</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>No edits</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>4</th>
      <td>TTAAAAATTTCAAAAAATGT</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>No edits</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>5</th>
      <td>AAGACACGCTGCAACAAAGC</td>
      <td>Thr17Ile, Arg18Cys</td>
      <td>Missense, Missense</td>
      <td>Missense</td>
      <td>17.5</td>
    </tr>
    <tr>
      <th>6</th>
      <td>TTTTTTTTTTAAATAGATTT</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>No edits</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>7</th>
      <td>TAGGACCAATAAGTCTTAAT</td>
      <td>Pro26Leu</td>
      <td>Missense</td>
      <td>Missense</td>
      <td>26.0</td>
    </tr>
    <tr>
      <th>8</th>
      <td>TCAAACCAATTAAGACTTAT</td>
      <td>Trp31Ter</td>
      <td>Nonsense</td>
      <td>Nonsense</td>
      <td>31.0</td>
    </tr>
    <tr>
      <th>9</th>
      <td>GCAGGTTCAGAATTATAGGG</td>
      <td>Glu45Lys</td>
      <td>Missense</td>
      <td>Missense</td>
      <td>45.0</td>
    </tr>
    <tr>
      <th>10</th>
      <td>TCTGCAGGTTCAGAATTATA</td>
      <td>Ala47Thr</td>
      <td>Missense</td>
      <td>Missense</td>
      <td>47.0</td>
    </tr>
    <tr>
      <th>11</th>
      <td>TTCTGCAGGTTCAGAATTAT</td>
      <td>Ala47Thr</td>
      <td>Missense</td>
      <td>Missense</td>
      <td>47.0</td>
    </tr>
    <tr>
      <th>12</th>
      <td>TTATGTTCAGATTCTTCTGC</td>
      <td>Glu51Lys</td>
      <td>Missense</td>
      <td>Missense</td>
      <td>51.0</td>
    </tr>
    <tr>
      <th>13</th>
      <td>TGTGGAGTTTTAAATAGGTT</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>No edits</td>
      <td>NaN</td>
    </tr>
    <tr>
      <th>14</th>
      <td>ACCTATTTAAAACTCCACAA</td>
      <td>NaN</td>
      <td>NaN</td>
      <td>No edits</td>
      <td>NaN</td>
    </tr>
  </tbody>
</table>
</div>




