Metadata-Version: 1.1
Name: nagisa
Version: 0.2.3
Summary: A Japanese tokenizer based on recurrent neural networks
Home-page: https://github.com/taishi-i/nagisa
Author: Taishi Ikeda
Author-email: taishi.ikeda.0323@gmail.com
License: MIT License
Download-URL: https://github.com/taishi-i/nagisa/archive/0.2.3.tar.gz
Description: | |Codacy Badge|
        | |Build Status|
        | |Build status|
        | |Coverage Status|
        | |Documentation Status|
        | |PyPI|
        | |PyPI - Downloads|
        | |FOSSA Status|
        
        | Nagisa is a python module for Japanese word segmentation/POS-tagging.
        | It is designed to be a simple and easy-to-use tool.
        
        This tool has the following features.
        
        -  Based on recurrent neural networks.
        -  The word segmentation model uses character- and word-level features
           `[池田+] <http://www.anlp.jp/proceedings/annual_meeting/2017/pdf_dir/B6-2.pdf>`__.
        -  The POS-tagging model uses tag dictionary information
           `[Inoue+] <http://www.aclweb.org/anthology/K17-1042>`__.
        
        For more details refer to the following links.
        
        -  The article in Japanese is available
           `here <https://qiita.com/taishi-i/items/5b9275a606b392f7f58e>`__.
        -  The documentation is available
           `here <https://nagisa.readthedocs.io/en/latest/?badge=latest>`__.
        
        Installation
        ============
        
        | Python 2.7.x or 3.5+ is required.
        | This tool uses `DyNet <https://github.com/clab/dynet>`__ (the Dynamic
          Neural Network Toolkit) to calcucate neural networks.
        | You can install nagisa by using the following command.
        
        .. code:: bash
        
            pip install nagisa
        
        For Windows users, please run it with python 3.6+ (64bit).
        
        Basic usage
        ===========
        
        Sample of word segmentation and POS-tagging for Japanese.
        
        .. code:: python
        
            import nagisa
        
            text = 'Pythonで簡単に使えるツールです'
            words = nagisa.tagging(text)
            print(words)
            #=> Python/名詞 で/助詞 簡単/形状詞 に/助動詞 使える/動詞 ツール/名詞 です/助動詞
        
            # Get a list of words
            print(words.words)
            #=> ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']
        
            # Get a list of POS-tags
            print(words.postags)
            #=> ['名詞', '助詞', '形状詞', '助動詞', '動詞', '名詞', '助動詞']
        
        Post-processing functions
        =========================
        
        Filter and extarct words by the specific POS tags.
        
        .. code:: python
        
            # Filter the words of the specific POS tags.
            words = nagisa.filter(text, filter_postags=['助詞', '助動詞'])
            print(words)
            #=> Python/名詞 簡単/形状詞 使える/動詞 ツール/名詞
        
            # Extarct only nouns.
            words = nagisa.extract(text, extract_postags=['名詞'])
            print(words)
            #=> Python/名詞 ツール/名詞
        
            # This is a list of available POS-tags in nagisa.
            print(nagisa.tagger.postags)
            #=> ['補助記号', '名詞', ... , 'URL']
        
        Add the user dictionary in easy way.
        
        .. code:: python
        
            # default
            text = "3月に見た「3月のライオン」"
            print(nagisa.tagging(text))
            #=> 3/名詞 月/名詞 に/助詞 見/動詞 た/助動詞 「/補助記号 3/名詞 月/名詞 の/助詞 ライオン/名詞 」/補助記号
        
            # If a word ("3月のライオン") is included in the single_word_list, it is recognized as a single word.
            new_tagger = nagisa.Tagger(single_word_list=['3月のライオン'])
            print(new_tagger.tagging(text))
            #=> 3/名詞 月/名詞 に/助詞 見/動詞 た/助動詞 「/補助記号 3月のライオン/名詞 」/補助記号
        
        Train a model
        =============
        
        | Nagisa (v0.2.0+) provides a simple train method
        | for a joint word segmentation and sequence labeling (e.g, POS-tagging,
          NER) model.
        
        | The format of the train/dev/test files is tsv.
        | Each line is ``word`` and ``tag`` and one line is represented by
          ``word`` \\t(tab) ``tag``.
        | Note that you put EOS between sentences.
        | Refer to `sample datasets </nagisa/data/sample_datasets>`__ and
          `tutorial (Train a model for Universal
          Dependencies) <https://nagisa.readthedocs.io/en/latest/tutorial.html>`__.
        
        ::
        
            $ cat sample.train
            唯一  NOUN
            の   ADP
            趣味  NOU
            は   ADP
            料理  NOUN
            EOS
            とても ADV
            おいしかっ   ADJ
            た   AUX
            です  AUX
            。   PUNCT
            EOS
            ドル  NOUN
            は   ADP
            主要  ADJ
            通貨  NOUN
            EOS
        
        .. code:: python
        
            # After finish training, save the three model files (*.vocabs, *.params, *.hp).
            nagisa.fit(train_file="sample.train", dev_file="sample.dev", test_file="sample.test", model_name="sample")
        
            # Build the tagger by loading the trained model files.
            sample_tagger = nagisa.Tagger(vocabs='sample.vocabs', params='sample.params', hp='sample.hp')
        
            text = "福岡・博多の観光情報"
            words = sample_tagger.tagging(text)
            print(words)
            #> 福岡/PROPN ・/SYM 博多/PROPN の/ADP 観光/NOUN 情報/NOUN
        
        .. |Codacy Badge| image:: https://api.codacy.com/project/badge/Grade/769dd003c7184d4d81dad74fd8a322a1
           :target: https://app.codacy.com/app/taishi-i/nagisa?utm_source=github.com&utm_medium=referral&utm_content=taishi-i/nagisa&utm_campaign=Badge_Grade_Dashboard
        .. |Build Status| image:: https://travis-ci.org/taishi-i/nagisa.svg?branch=master
           :target: https://travis-ci.org/taishi-i/nagisa
        .. |Build status| image:: https://ci.appveyor.com/api/projects/status/6k35hmxl1juf1hqf?svg=true
           :target: https://ci.appveyor.com/project/taishi-i/nagisa
        .. |Coverage Status| image:: https://coveralls.io/repos/github/taishi-i/nagisa/badge.svg?branch=master
           :target: https://coveralls.io/github/taishi-i/nagisa?branch=master
        .. |Documentation Status| image:: https://readthedocs.org/projects/nagisa/badge/?version=latest
           :target: https://nagisa.readthedocs.io/en/latest/?badge=latest
        .. |PyPI| image:: https://img.shields.io/pypi/v/nagisa.svg
           :target: https://pypi.python.org/pypi/nagisa
        .. |PyPI - Downloads| image:: https://img.shields.io/pypi/dm/nagisa.svg
           :target: https://img.shields.io/pypi/dm/nagisa.svg
        .. |FOSSA Status| image:: https://app.fossa.io/api/projects/git%2Bgithub.com%2Ftaishi-i%2Fnagisa.svg?type=shield
           :target: https://app.fossa.io/projects/git%2Bgithub.com%2Ftaishi-i%2Fnagisa?ref=badge_shield
        
Platform: Unix
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Japanese
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Operating System :: Unix
Classifier: Operating System :: Microsoft :: Windows
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Libraries :: Python Modules
