Metadata-Version: 1.1
Name: nagisa
Version: 0.1.1
Summary: A Japanese tokenizer based on recurrent neural networks
Home-page: https://github.com/taishi-i/nagisa
Author: Taishi Ikeda
Author-email: taishi.ikeda.0323@gmail.com
License: MIT License
Download-URL: https://github.com/taishi-i/nagisa/archive/0.1.1.tar.gz
Description: .. figure:: /nagisa/data/nagisa_image.jpg
           :alt: nagisa
        
           Alt text
        
        --------------
        
        |Build Status| |Documentation Status| |PyPI|
        
        Nagisa is a python module for Japanese word segmentation/POS-tagging. It
        is designed to be a simple and easy-to-use tool.
        
        This tool has the following features. - Based on recurrent neural
        networks. - The word segmentation model uses character- and word-level
        features
        `[池田+] <http://www.anlp.jp/proceedings/annual_meeting/2017/pdf_dir/B6-2.pdf>`__.
        - The POS-tagging model uses tag dictionary information
        `[Inoue+] <http://www.aclweb.org/anthology/K17-1042>`__.
        
        For more details refer to the following links. - The slide in Japanese
        is available
        `here <https://drive.google.com/open?id=1AzR5wh5502u_OI_Jxwsq24t-er_rnJBP>`__.
        - The documentation is available
        `here <https://nagisa.readthedocs.io/en/latest/?badge=latest>`__.
        
        Installation
        ============
        
        Python 2.7.x or 3.5+ is required. This tool uses
        `DyNet <https://github.com/clab/dynet>`__ (the Dynamic Neural Network
        Toolkit) to calcucate neural networks. You can install nagisa by using
        the following command.
        
        .. code:: bash
        
            pip install nagisa
        
        If you use nagisa on Windows, please run it with python 3.5+.
        
        Usage
        =====
        
        Basic usage.
        
        .. code:: python
        
            import nagisa
        
            # Sample of word segmentation and POS-tagging for Japanese
            text = 'Pythonで簡単に使えるツールです'
            words = nagisa.tagging(text)
            print(words)
            #=> Python/名詞 で/助詞 簡単/形状詞 に/助動詞 使える/動詞 ツール/名詞 です/助動詞
        
            # Get a list of words
            print(words.words)
            #=> ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']
        
            # Get a list of POS-tags
            print(words.postags)
            #=> ['名詞', '助詞', '形状詞', '助動詞', '動詞', '名詞', '助動詞']
        
            # The nagisa.wakati method is faster than the nagisa.tagging method.
            words = nagisa.wakati(text)
            print(words)
            #=> ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']
        
        Post processing functions.
        
        .. code:: python
        
            # Extarcting all nouns from a text
            words = nagisa.extract(text, extract_postags=['名詞'])
            print(words)
            #=> Python/名詞 ツール/名詞
        
            # Filtering specific POS-tags from a text
            words = nagisa.filter(text, filter_postags=['助詞', '助動詞'])
            print(words)
            #=> Python/名詞 簡単/形状詞 使える/動詞 ツール/名詞
        
            # A list of available POS-tags
            print(nagisa.tagger.postags)
            #=> ['補助記号', '名詞', ... , 'URL']
        
            # A word can be recognized as a single word forcibly.
            text = 'ニューラルネットワークを使ってます。'
            print(nagisa.tagging(text))
            #=> ニューラル/名詞 ネットワーク/名詞 を/助詞 使っ/動詞 て/助動詞 ます/助動詞 。/補助記号
        
            # If a word is included in the single_word_list, it is recognized as a single word.
            tagger_nn = nagisa.Tagger(single_word_list=['ニューラルネットワーク'])
            print(tagger_nn.tagging(text))
            #=> ニューラルネットワーク/名詞 を/助詞 使っ/動詞 て/助動詞 ます/助動詞 。/補助記号
        
            # Nagisa is good at capturing the URLs and kaomoji from an input text.
            url = 'https://github.com/taishi-i/nagisaでコードを公開中(๑¯ω¯๑)'
            words = nagisa.tagging(url)
            print(words)
            #=> https://github.com/taishi-i/nagisa/URL で/助詞 コード/名詞 を/助詞 公開/名詞 中/接尾辞 (๑　̄ω　̄๑)/補助記号
        
        .. |Build Status| image:: https://travis-ci.org/taishi-i/nagisa.svg?branch=master
           :target: https://travis-ci.org/taishi-i/nagisa
        .. |Documentation Status| image:: https://readthedocs.org/projects/nagisa/badge/?version=latest
           :target: https://nagisa.readthedocs.io/en/latest/?badge=latest
        .. |PyPI| image:: https://img.shields.io/pypi/v/nagisa.svg
           :target: https://pypi.python.org/pypi/nagisa
        
Platform: Unix
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Japanese
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Operating System :: Unix
Classifier: Operating System :: Microsoft :: Windows
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Software Development :: Libraries :: Python Modules
