Metadata-Version: 2.1
Name: corpuscula
Version: 1.0.1
Summary: A toolkit that simplifies corpus processing
Home-page: https://github.com/fostroll/corpuscula
Author: Sergei Ternovykh
Author-email: fostroll@gmail.com
License: BSD
Keywords: natural-language-processing nlp conllu corpora
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Information Technology
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development
Classifier: Topic :: Software Development :: Libraries
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Description-Content-Type: text/markdown

<div align="right"><strong>RuMor: Russian Morphology project</strong></div>
<h2 align="center">Corpuscula: a python NLP library for corpus processing</h2>

[PyPI Version](https://img.shields.io/pypi/v/corpuscula?color=blue)
[Python Version](https://img.shields.io/pypi/pyversions/corpuscula?color=blue)

A part of ***RuMor*** project. It contains tools to simplify corpus
processing. Highlights are:

* full [*CONLL-U*](https://universaldependencies.org/format.html) support
(includes *CONLL-U Plus*)
* wrappers for known corpora of Russian language
* parser and wrapper for Russian part of *Wikipedia*
* *Corpus Dictionary* that can be used for further morphology processing
* simple database to keep named entities

## Installation

### pip

***Corpuscula*** supports *Python 3.5* or later. To install it via *pip*, run:
```sh
$ pip install corpuscula
```

If you currently have a previous version of ***Corpuscula*** installed, use:
```sh
$ pip install corpuscula -U
```

### From Source

Alternatively, you can also install ***Corpuscula*** from source of this *git
repository*:
```sh
$ git clone https://github.com/fostroll/corpuscula.git
$ cd corpuscula
$ pip install -e .
```
This gives you access to examples and data that are not included to the
*PyPI* package.

## Setup

After installation, you'd like to specify a directory where you prefer to
store downloading corpora:
```python
>>> import corpuscula.corpus_utils as cu
>>> cu.set_root_dir(<path>)  # We will keep corpora here
```
**NB:** it will create/update config file `.rumor` in your home directory.

If you'll not do it, ***Corpuscula*** will try to keep corpora in the
directory where you installed it.

## Usage

[*CONLL-U* Support](https://github.com/fostroll/corpuscula/blob/master/doc/README_CONLLU.md)

[Management of Corpora](https://github.com/fostroll/corpuscula/blob/master/doc/README_CORPORA.md)

[Wrapper for *Wikipedia*](https://github.com/fostroll/corpuscula/blob/master/doc/README_WIKIPEDIA.md)

[*Corpus Dictionary*](https://github.com/fostroll/corpuscula/blob/master/doc/README_CDICT.md)

[Utilities](https://github.com/fostroll/corpuscula/blob/master/doc/README_UTILS.md)

[*Items* database](https://github.com/fostroll/corpuscula/blob/master/doc/README_ITEMS.md)


