Metadata-Version: 2.1
Name: hdx-python-utilities
Version: 1.6.0
Summary: HDX Python Utilities
Home-page: https://github.com/OCHA-DAP/hdx-python-utilities
Author: Michael Rans
Author-email: rans@email.com
License: MIT
Keywords: HDX,utilities,library,country,iso 3166
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: basicauth
Requires-Dist: beautifulsoup4
Requires-Dist: cchardet (>=2.1.4)
Requires-Dist: colorlog
Requires-Dist: email-validator
Requires-Dist: html5lib
Requires-Dist: psycopg2-binary
Requires-Dist: pyaml
Requires-Dist: six (>=1.11.0)
Requires-Dist: sshtunnel
Requires-Dist: tabulator (>=1.19.0)
Requires-Dist: typing
Requires-Dist: yamlloader

|Build_Status| |Coverage_Status|

The HDX Python Utilities Library provides a range of helpful utilities:

1. `Easy downloading of files with support for authentication, streaming and hashing <#downloading-files>`__
#. `Loading and saving JSON and YAML (inc. with OrderedDict) <#loading-and-saving-json-and-yaml>`__
#. `Database utilities (inc. connecting through SSH and SQLAlchemy helpers) <#database-utilities>`__
#. `Dictionary and list utilities <#dictionary-and-list-utilities>`__
#. `HTML utilities (inc. BeautifulSoup helper) <#html-utilities>`__
#. `Compare files (eg. for testing) <#compare-files>`__
#. `Simple emailing <#emailing>`__
#. `Easy logging setup <#configuring-logging>`__
#. `Path utilities <#path-utilities>`__
#. `Text processing <#text-processing>`__


This library is part of the `Humanitarian Data Exchange`_ (HDX) project. If you have
humanitarian related data, please upload your datasets to HDX.

Usage
-----

The library has detailed API documentation which can be found
here: \ http://ocha-dap.github.io/hdx-python-utilities/. The code for the
library is here: \ https://github.com/ocha-dap/hdx-python-utilities.

Downloading files
~~~~~~~~~~~~~~~~~

Various utilities to help with downloading files. Includes retrying by default.

For example, given YAML file extraparams.yml:
::

    mykey:
        basic_auth: "XXXXXXXX"
        locale: "en"

We can create a downloader as shown below that will use the authentication defined
in basic_auth and add the parameter locale=en to each request
(eg. for get request http://myurl/lala?param1=p1&locale=en):
::

    with Download(extra_params_yaml='extraparams.yml', extra_params_lookup='mykey') as downloader:
        response = downloader.download(url)  # get requests library response
        json = response.json()

        # Download file to folder/filename
        f = downloader.download_file('http://myurl', post=False,
                                     parameters=OrderedDict([('b', '4'), ('d', '3')]),
                                     folder=tmpdir, filename=filename)
        filepath = abspath(f)

        # Read row by row from tabular file
        for row in downloader.get_tabular_rows('http://myurl/my.csv', dict_rows=True, headers=1)
            a = row['col']

Other useful functions:

::

    # Build get url from url and dictionary of parameters
    Download.get_url_for_get('http://www.lala.com/hdfa?a=3&b=4',
                             OrderedDict([('c', 'e'), ('d', 'f')]))
        # == 'http://www.lala.com/hdfa?a=3&b=4&c=e&d=f'

    # Extract url and dictionary of parameters from get url
    Download.get_url_params_for_post('http://www.lala.com/hdfa?a=3&b=4',
                                     OrderedDict([('c', 'e'), ('d', 'f')]))
        # == ('http://www.lala.com/hdfa',
              OrderedDict([('a', '3'), ('b', '4'), ('c', 'e'), ('d', 'f')]))

Loading and Saving JSON and YAML
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Examples:
::

    # Load YAML
    mydict = load_yaml('my_yaml.yml')

    # Load 2 YAMLs and merge into dictionary
    mydict = load_and_merge_yaml('my_yaml1.yml', 'my_yaml2.yml')

    # Load YAML into existing dictionary
    mydict = load_yaml_into_existing_dict(existing_dict, 'my_yaml.yml')

    # Load JSON
    mydict = load_json('my_json.yml')

    # Load 2 JSONs and merge into dictionary
    mydict = load_and_merge_json('my_json1.json', 'my_json2.json')

    # Load JSON into existing dictionary
    mydict = load_json_into_existing_dict(existing_dict, 'my_json.json')

    # Save dictionary to YAML file in pretty format
    # preserving order if it is an OrderedDict
    save_yaml(mydict, 'mypath.yml', pretty=True, sortkeys=False)

    # Save dictionary to JSON file in compact form
    # sorting the keys
    save_json(mydict, 'mypath.json', pretty=False, sortkeys=False)


Database utilities
~~~~~~~~~~~~~~~~~~

These are built on top of SQLAlchemy and simplify its setup.

Your SQLAlchemy database tables must inherit from Base in
hdx.utilities.database eg.
::

    from hdx.utilities.database import Base
    class MyTable(Base):
        my_col = Column(Integer, ForeignKey(MyTable2.col2), primary_key=True)


Examples:
::

    # Get SQLAlchemy session object given database parameters and
    # if needed SSH parameters. If database is PostgreSQL, will poll
    # till it is up.
    with Database(database='db', host='1.2.3.4', username='user', password='pass',
                  driver='driver', ssh_host='5.6.7.8', ssh_port=2222,
                  ssh_username='sshuser', ssh_private_key='path_to_key') as session:
        session.query(...)

    # Extract dictionary of parameters from SQLAlchemy url
    result = Database.get_params_from_sqlalchemy_url(TestDatabase.sqlalchemy_url)

    # Build SQLAlchemy url from dictionary of parameters
    result = Database.get_sqlalchemy_url(**TestDatabase.params)

    # Wait util PostgreSQL is up
    Database.wait_for_postgres('mydatabase', 'myserver', 5432, 'myuser', 'mypass')

Dictionary and list utilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Examples:
::

    # Merge dictionaries
    d1 = {1: 1, 2: 2, 3: 3, 4: ['a', 'b', 'c']}
    d2 = {2: 6, 5: 8, 6: 9, 4: ['d', 'e']}
    result = merge_dictionaries([d1, d2])
    assert result == {1: 1, 2: 6, 3: 3, 4: ['d', 'e'], 5: 8, 6: 9}

    # Diff dictionaries
    d1 = {1: 1, 2: 2, 3: 3, 4: {'a': 1, 'b': 'c'}}
    d2 = {4: {'a': 1, 'b': 'c'}, 2: 2, 3: 3, 1: 1}
    diff = dict_diff(d1, d2)
    assert diff == {}
    d2[3] = 4
    diff = dict_diff(d1, d2)
    assert diff == {3: (3, 4)}

    # Add element to list in dict
    d = dict()
    dict_of_lists_add(d, 'a', 1)
    assert d == {'a': [1]}
    dict_of_lists_add(d, 2, 'b')
    assert d == {'a': [1], 2: ['b']}
    dict_of_lists_add(d, 'a', 2)
    assert d == {'a': [1, 2], 2: ['b']}

    # Spread items in list so similar items are further apart
    input_list = [3, 1, 1, 1, 2, 2]
    result = list_distribute_contents(input_list)
    assert result == [1, 2, 1, 2, 1, 3]

    # Get values for the same key in all dicts in list
    input_list = [{'key': 'd', 1: 5}, {'key': 'd', 1: 1}, {'key': 'g', 1: 2},
                  {'key': 'a', 1: 2}, {'key': 'a', 1: 3}, {'key': 'b', 1: 5}]
    result = extract_list_from_list_of_dict(input_list, 'key')
    assert result == ['d', 'd', 'g', 'a', 'a', 'b']

    # Cast either keys or values or both in dictionary to type
    d1 = {1: 2, 2: 2.0, 3: 5, 'la': 4}
    assert key_value_convert(d1, keyfn=int) == {1: 2, 2: 2.0, 3: 5, 'la': 4}
    assert key_value_convert(d1, keyfn=int, dropfailedkeys=True) == {1: 2, 2: 2.0, 3: 5}
    d1 = {1: 2, 2: 2.0, 3: 5, 4: 'la'}
    assert key_value_convert(d1, valuefn=int) == {1: 2, 2: 2.0, 3: 5, 4: 'la'}
    assert key_value_convert(d1, valuefn=int, dropfailedvalues=True) == {1: 2, 2: 2.0, 3: 5}

    # Cast keys in dictionary to integer
    d1 = {1: 1, 2: 1.5, 3.5: 3, '4': 4}
    assert integer_key_convert(d1) == {1: 1, 2: 1.5, 3: 3, 4: 4}

    # Cast values in dictionary to integer
    d1 = {1: 1, 2: 1.5, 3: '3', 4: 4}
    assert integer_value_convert(d1) == {1: 1, 2: 1, 3: 3, 4: 4}

    # Cast values in dictionary to float
    d1 = {1: 1, 2: 1.5, 3: '3', 4: 4}
    assert float_value_convert(d1) == {1: 1.0, 2: 1.5, 3: 3.0, 4: 4.0}

    # Average values by key in two dictionaries
    d1 = {1: 1, 2: 1.0, 3: 3, 4: 4}
    d2 = {1: 2, 2: 2.0, 3: 5, 4: 4, 7: 3}
    assert avg_dicts(d1, d2) == {1: 1.5, 2: 1.5, 3: 4, 4: 4}

    # Read and write lists to csv
    l = [[1, 2, 3, 'a'],
         [4, 5, 6, 'b'],
         [7, 8, 9, 'c']]
    write_list_to_csv(l, filepath, headers=['h1', 'h2', 'h3', 'h4'])
    newll = read_list_from_csv(filepath)
    newld = read_list_from_csv(filepath, dict_form=True, headers=1)
    assert newll == [['h1', 'h2', 'h3', 'h4'], ['1', '2', '3', 'a'], ['4', '5', '6', 'b'], ['7', '8', '9', 'c']]
    assert newld == [{'h1': '1', 'h2': '2', 'h4': 'a', 'h3': '3'},
                    {'h1': '4', 'h2': '5', 'h4': 'b', 'h3': '6'},
                    {'h1': '7', 'h2': '8', 'h4': 'c', 'h3': '9'}]

    # Convert command line arguments to dictionary
    args = 'a=1,big=hello,1=3'
    assert args_to_dict(args) == {'a': '1', 'big': 'hello', '1': '3'}

HTML utilities
~~~~~~~~~~~~~~

These are built on top of BeautifulSoup and simplify its setup.

Examples:

::

    # Get soup for url with optional kwarg downloader=Download() object
    soup = get_soup('http://myurl')
    tag = soup.find(id='mytag')

    # Get text of tag stripped of leading and trailing whitespace
    # and newlines and with &nbsp replaced with space
    result = get_text('mytag')

    # Extract HTML table as list of dictionaries
    result = extract_table(tabletag)

Compare files
~~~~~~~~~~~~~

Compare two files:
::

    result = compare_files(testfile1, testfile2)
    # Result is of form eg.:
    # ["- coal   ,3      ,7.4    ,'needed'\n", '?         ^\n',
    #  "+ coal   ,1      ,7.4    ,'notneeded'\n", '?         ^                +++\n']

Emailing
~~~~~~~~

Example of setup and sending email:
::

    smtp_initargs = {
        'host': 'localhost',
        'port': 123,
        'local_hostname': 'mycomputer.fqdn.com',
        'timeout': 3,
        'source_address': ('machine', 456),
    }
    username = 'user@user.com'
    password = 'pass'
    email_config_dict = {
        'connection_type': 'ssl',
        'username': username,
        'password': password
    }
    email_config_dict.update(smtp_initargs)

    recipients = ['larry@gmail.com', 'moe@gmail.com', 'curly@gmail.com']
    subject = 'hello'
    text_body = 'hello there'
    html_body = """\
    <html>
      <head></head>
      <body>
        <p>Hi!<br>
           How are you?<br>
           Here is the <a href="https://www.python.org">link</a> you wanted.
        </p>
      </body>
    </html>
    """
    sender = 'me@gmail.com'

    with Email(email_config_dict=email_config_dict) as email:
        email.send(recipients, subject, text_body, sender=sender)

Configuring Logging
~~~~~~~~~~~~~~~~~~~

The library provides coloured logs with a simple default setup which
should be adequate for most cases. If you wish to change the logging
configuration from the defaults, you will need to call 
\ **setup_logging** with arguments.

::

    from hdx.utilities.easy_logging import setup_logging
    ...
    logger = logging.getLogger(__name__)
    setup_logging(KEYWORD ARGUMENTS)

**KEYWORD ARGUMENTS** can be:

+-----------+-----------------------+------+--------------------------+----------------------------+
| Choose    | Argument              | Type | Value                    | Default                    |
|           |                       |      |                          |                            |
+===========+=======================+======+==========================+============================+
| One of:   | logging\_config\_dict | dict | Logging configuration    |                            |
|           |                       |      | dictionary               |                            |
+-----------+-----------------------+------+--------------------------+----------------------------+
| or        | logging\_config\_json | str  | Path to JSON Logging     |                            |
|           |                       |      | configuration            |                            |
+-----------+-----------------------+------+--------------------------+----------------------------+
| or        | logging\_config\_yaml | str  | Path to YAML Logging     | Library's internal         |
|           |                       |      | configuration            | logging\_configuration.yml |
+-----------+-----------------------+------+--------------------------+----------------------------+
| One of:   | smtp\_config\_dict    | dict | Email Logging            |                            |
|           |                       |      | configuration dictionary |                            |
+-----------+-----------------------+------+--------------------------+----------------------------+
| or        | smtp\_config\_json    | str  | Path to JSON Email       |                            |
|           |                       |      | Logging configuration    |                            |
+-----------+-----------------------+------+--------------------------+----------------------------+
| or        | smtp\_config\_yaml    | str  | Path to YAML Email       |                            |
|           |                       |      | Logging configuration    |                            |
+-----------+-----------------------+------+--------------------------+----------------------------+

Do not supply **smtp_config_dict**, **smtp_config_json** or
**smtp_config_yaml** unless you are using the default logging
configuration!

If you are using the default logging configuration, you have the option
to have a default SMTP handler that sends an email in the event of a
CRITICAL error by supplying either **smtp_config_dict**,
**smtp_config_json** or **smtp_config_yaml**. Here is a template of a
YAML file that can be passed as the **smtp_config_yaml** parameter:

::

    handlers:
        error_mail_handler:
            toaddrs: EMAIL_ADDRESSES
            subject: "RUN FAILED: MY_PROJECT_NAME"

Unless you override it, the mail server **mailhost** for the default
SMTP handler is **localhost** and the from address **fromaddr** is
**noreply@localhost**.

To use logging in your files, simply add the line below to the top of
each Python file:

::

    logger = logging.getLogger(__name__)

Then use the logger like this:

::

    logger.debug('DEBUG message')
    logger.info('INFORMATION message')
    logger.warning('WARNING message')
    logger.error('ERROR message')
    logger.critical('CRITICAL error message')

Path utilities
~~~~~~~~~~~~~~

Examples:
::

    # Get current directory of script
    dir = script_dir(ANY_PYTHON_OBJECT_IN_SCRIPT)

    # Get current directory of script with filename appended
    path = script_dir_plus_file('myfile.txt', ANY_PYTHON_OBJECT_IN_SCRIPT)

    # Gets temporary directory from environment variable
    # TEMP_DIR and falls back to os function
    temp_folder = temp_dir()

Text processing
~~~~~~~~~~~~~~~

Examples:
::

    # Extract words from a string sentence into a list
    result = get_words_in_sentence("Korea (Democratic People's Republic of)")
    assert result == ['Korea', 'Democratic', "People's", 'Republic', 'of']

    # Find matching text in strings
    a = 'The quick brown fox jumped over the lazy dog. It was so fast!'
    b = 'The quicker brown fox leapt over the slower fox. It was so fast!'
    c = 'The quick brown fox climbed over the lazy dog. It was so fast!'
    result = get_matching_text([a, b, c], match_min_size=10)
    assert result == ' brown fox  over the  It was so fast!'

.. |Build_Status| image:: https://travis-ci.org/OCHA-DAP/hdx-python-utilities.svg?branch=master
    :alt: Travis-CI Build Status
    :target: https://travis-ci.org/OCHA-DAP/hdx-python-utilities
.. |Coverage_Status| image:: https://coveralls.io/repos/github/OCHA-DAP/hdx-python-utilities/badge.svg?branch=master
    :alt: Coveralls Build Status
    :target: https://coveralls.io/github/OCHA-DAP/hdx-python-utilities?branch=master
.. _Humanitarian Data Exchange: https://data.humdata.org/



