Metadata-Version: 2.0
Name: textdata
Version: 1.0.3
Summary: Get clean line or text data from multi-line strings
Home-page: https://bitbucket.org/jeunice/textdata
Author: Jonathan Eunice
Author-email: jonathan.eunice@gmail.com
License: UNKNOWN
Keywords: text data lines dedent
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: BSD License
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.2
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Topic :: Software Development :: Libraries :: Python Modules

It's very common to need to extract text or text lines from within
program source. The way Python likes to have its text indented,
however, means that there will often be extra spaces appended to
the beginning of each line, as well as possibly extra lines at the
start and end of the text that are present for inclusion in the program
source, but not useful in the resulting data.

Python string methods give easy ways to clean this text up, but
it's no joy reinventing that particular text-cleanup wheel every
time you need it--especially since many of the details are nitsy,
dropping the code down into low-level constructs rather than
just "give me the text!"

This module helps clean up included text (or text lines) in a simple,
reusable way that won't muck up your programs with extra code, or
require constant wheel-reinvention.

Usage
=====

::

    data = lines("""
        There was an old woman who lived in a shoe.
        She had so many children, she didn't know what to do;
        She gave them some broth without any bread;
        Then whipped them all soundly and put them to bed.
    """)

will result in::

    ['There was an old woman who lived in a shoe.',
     "She had so many children, she didn't know what to do;",
     'She gave them some broth without any bread;',
     'Then whipped them all soundly and put them to bed.']

If instead you used ``textlines()``, the result is the same, but
joined by newlines into into a single string::

    "There was an old woman who lived in a shoe.\nShe ... to bed."
    # where the ... abbreviates exactly the characters you'd expect

``textlines`` is an optional entry point, as ``lines`` has a ``join``
kwarg that, if set, joins the lines with that string.

Both routines provide  typically-desired cleanups:

  * remove blank lines default), but at least first and last blanks
    (which usually appear due to Python formatting)
  * remove common line prefix (default)
  * strip leading/trailing spaces (leading by request, trailing by default)
  * (optionally) join the lines together with your choice of separator string

The API
=======

``lines(text, noblanks=True, dedent=True, lstrip=False, rstrip=True, join=False)``

    Returns text as a series of cleaned-up lines.

    * ``text`` is the text to be processed.
    * ``noblanks`` => all blank lines are eliminated, not just starting and ending ones. (default ``True``).
    * ``dedent`` => strip a common prefix (usually whitespace) from each line (default ``True``).
    * ``lstrip`` => strip all left (leading) space from each line (default ``False``).
      Note that ``lstrip`` and ``dedent`` are  mutually exclusive ways of handling leading space.
    * ``rstrip`` => strip all right (trailing) space from each line (default ``True``)
    * ``join`` => either ``False`` (do nothing), ``True`` (concatenate lines), or a string that will be used to join the resulting lines (default ``False``)

``textlines(text, noblanks=True, dedent=True, lstrip=False, rstrip=True, join=False)``

    Does the same helpful cleanups as ``lines()``, but returns
    result as a single string, with lines separated by newlines (by
    default) and without a trailing newline.

Notes
=====

  * Automated multi-version testing accomplished with
    `pytest <http://pypi.python.org/pypi/pytest>`_ and `tox
    <http://pypi.python.org/pypi/tox>`_. Latest version successfully
    tested against Python 2.6, 2.7, 3.2, 3.3, 3.4, and PyPy 2.4.0 (based
    on 2.7.8). It should also work on Python 2.5, though
    it's no longer officially supported; time to upgrade!

  * Common line prefix is now computed without considering blank
    lines, so blank lines need not have any indentation on them
    just to "make things work."

  * The tricky case where all lines have a common prefix, but it's
    not entirely composed of whitespace, now properly handled.
    This is useful for lines that are already "quoted" such as
    with leading `"|"` or `">"` symbols (common in Markdown
    and old-school email usage styles)/

  * ``textlines()`` is now somewhat superfluous, now that ``lines()``
    has a ``join`` kwarg.  But you may prefer it for the implicit
    indication that it's turning lines into text.

  * It's tempting to define a constant such as ``Dedent`` that might
    be the default for the ``lstrip`` parameter, instead of having
    separate ``dedent`` and ``lstrip`` Booleans. The more I use
    singleton classes in Python as designated special values, the
    more useful they seem.

  * The author, `Jonathan Eunice <mailto:jonathan.eunice@gmail.com>`_
    or `@jeunice on Twitter <http://twitter.com/jeunice>`_ welcomes
    your comments and suggestions.

Installation
============

::

    pip install -U textdata

To ``easy_install`` under a specific Python version (3.3 in this example)::

    python3.3 -m easy_install --upgrade textdata

(You may need to prefix these with "sudo " to authorize installation.)


