Metadata-Version: 2.0
Name: biometalib
Version: 0.0.3
Summary: A set of helper functions for working with biological metadata from the SRA.
Home-page: https://github.com/jfear/biometalib
Author: Justin Fear
Author-email: justin.m.fear@gmail.com
License: MIT license
Platform: UNKNOWN
Requires-Dist: bumpversion (==0.5.3)
Requires-Dist: fuzzywuzzy (>=0.15.0)
Requires-Dist: mongoengine (>=0.11.0)
Requires-Dist: pymongo (>=3.3.0)
Requires-Dist: pytest (>=3.0.5)
Requires-Dist: pytest-runner (>=2.11)
Requires-Dist: pyyaml (>=3.12)
Requires-Dist: ruamel.yaml (<0.15.0)
Requires-Dist: sramongo (>=0.0.1)

# Biometalib

Biometalib is a set of useful libraries and tools for working with SRA biological metadata.

## Installation

This library is designed for python 3+ and can be installed with `pip` or `conda`.

### Pip

If using pip you must first install my `sramongo` package.

```bash
pip install git+https://github.com/jfear/sramongo
pip install git+https://github.com/jfear/biometalib
```

### Conda [Suggested]

First make sure you have a working installation of Anaconda, I suggest
[Miniconda](https://conda.io/miniconda.html).

```bash
conda install -c jfear biometalib
```

## Attribute Selector

Attribute selector is a helper script for selecting which attributes you want
to focus on for a project. The biological metadata submitted by users contain a
variety of different types of attributes. Sometimes these include things like
misspellings or different forms of a word, it also includes attributes that are
unique to a single project. This tool is to be used to quickly curate these
columns. Attribute selector uses a `YAML` formatted file to store attribute
decisions.

In the YAML file, **selected attributes** will be the keys. When merging
multiple attributes into a single **selected attribute** they will be stored as
values. For example:

```
sex:
- sex
- Sex
- gender
```

Here the **selected attribute** `sex` has the attributes `Sex` and `gender`
associated with it.  There is also a special **selected attribute** `ignore` that will store a list
of attributes that you want to ignore.

Using the BioSample selection sheet I have created a starting YAML that can be
used when running `attribute_selector`.

To run the attributes selector on my public version of the Biometa database type:

```bash
# Download example YAML
$ wget -O my_attribute_selection.yaml https://raw.githubusercontent.com/jfear/biometalib/master/data/flybase_example.yaml
$ attribute_selector --host mongo.geneticsunderground.com --port 27022 --db sra --username sra --password oliver --authenticationDatabase user-data --config my_attribute_selection.yaml
```

`attribute_selector` is an interactive command line tool. Iterates overall
attribute column names that are not already **selected attributes** in the
YAML. The current attribute is displayed in red. At the prompt you can type:

* `k` to set the current attribute as a **selected attribute** [keep]
* `r` to rename the current attribute, this will set the current attribute as
  value of the renamed **selected attribute** [rename]
* `i` adds current attribute to ignore list [ignore]
* `e` show example values listed under the current attribute [example]
* `s` show attributes with similar names (fuzzy string match). Here **selected attributes** will appear in yellow [similar]
* `n` skip and go to the next attribute [next]
* `quit` exit out of the program, but save progress.



