Metadata-Version: 2.1
Name: ntdownload
Version: 1.7
Summary: Easily updatable local NCBI taxonomy dumps file copy
Home-page: https://github.com/ggonnella/fastsubtrees/tree/main/ntdowload
Author: Giorgio Gonnella
Author-email: gonnella@zbh.uni-hamburg.de
License: ISC
Keywords: ncbi taxonomy dumpfiles mirror
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: End Users/Desktop
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: ISC License (ISCL)
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Topic :: Software Development :: Libraries
Description-Content-Type: text/markdown
License-File: LICENSE.txt
License-File: AUTHORS.txt

# NtDownload

NtDownload is a tool for downloading and keeping up-to-date a local
version of the NCBI Taxonomy database dump files

After download it creates a timestamp file, so that the next time the download
is repeated only if a newer version is available.

## Installation

The software is distributed as a Python 3 package and can be installed
using ``pip install ntdownload``.

## Command line interface

### ntdownload

To download the dump files, use the ``ntdownload`` script. Thereby the
output directory is passed as CLI argument. If it does not exist, it is
created. The file is downloaded using FTP, unless this does not work
or the option ``--force-https`` is used, in which case HTTPS is used.

The dump files archive is unpacked after download and deleted, unless
the option ``--no-unpack`` is used.

If the option ``--exitcode`` is used, then the exit code of the script is 100
if no newer version of the dump files was found, and thus nothing was
downloaded. Otherwise the exit code is always 0 (or 1 on error).

### ntnames

The script ``ntnames`` is provided, which, after download using ``ntdownload``
can be used for creating a list of taxonomy IDs and scientific names,
which can be used as attribute source file for fastsubtrees.

## API

### Downloader

To download the dump files, use the Downloader class:
```
from ntdownload import Downloader
d = Downloader(output_directory_name)
has_downloaded = d.run()
```

The output directory is created if it does not exist. The dump files archive is
unpacked and the archive deleted, except if the option ``unpack=False``
is used.

The download protocol is FTP unless it is not working or the option
``force_https=True`` is used, in which case HTTPS is used.

The return value of ``run()`` is ``True`` if a dump file was downloaded,
``False`` if no newer version was available.

### Scientific names iterator

The function ``yield_scientific_names_from_dump(ntdumps_dir)``
yields tuples in the form ``(tax_id, scientific_name)`` reading
them from the ``names.dmp`` file.

## Test suite

The test suite is run using ``pytest``.


