
.. _debug_faq:

=========================================
Debugging Theano: FAQ and Troubleshooting
=========================================

There are many kinds of bugs that might come up in a computer program.
This page is structured as a FAQ.  It provides recipes to tackle common
problems, and introduces some of the tools that we use to find problems in our
own Theano code, and even (it happens) in Theano's internals, in
:ref:`using_debugmode`.

Isolating the Problem/Testing Theano Compiler
---------------------------------------------

You can run your Theano function in a :ref:`DebugMode<using_debugmode>`.
This tests the Theano optimizations and helps to find where NaN, inf and other problems come from.


Using Test Values
-----------------

As of v.0.4.0, Theano has a new mechanism by which graphs are executed
on-the-fly, before a ``theano.function`` is ever compiled. Since optimizations
haven't been applied at this stage, it is easier for the user to locate the
source of some bug. This functionality is enabled through the config flag
``theano.config.compute_test_value``. Its use is best shown through the
following example.


.. code-block:: python

    # compute_test_value is 'off' by default, meaning this feature is inactive
    theano.config.compute_test_value = 'off'

    # configure shared variables
    W1val = numpy.random.rand(2, 10, 10).astype(theano.config.floatX)
    W1 = theano.shared(W1val, 'W1')
    W2val = numpy.random.rand(15, 20).astype(theano.config.floatX)
    W2 = theano.shared(W2val, 'W2')

    # input which will be of shape (5,10)
    x  = T.matrix('x')

    # transform the shared variable in some way. Theano does not
    # know off hand that the matrix func_of_W1 has shape (20, 10)
    func_of_W1 = W1.dimshuffle(2, 0, 1).flatten(2).T

    # source of error: dot product of 5x10 with 20x10
    h1 = T.dot(x, func_of_W1)

    # do more stuff
    h2 = T.dot(h1, W2.T)

    # compile and call the actual function
    f = theano.function([x], h2)
    f(numpy.random.rand(5, 10))

Running the above code generates the following error message:

.. code-block:: bash

    Definition in:
      File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 1102, in apply
        lopt_change = self.process_node(fgraph, node, lopt)
      File "/u/desjagui/workspace/PYTHON/theano/gof/opt.py", line 882, in process_node
        replacements = lopt.transform(node)
      File "/u/desjagui/workspace/PYTHON/Theano/theano/tensor/blas.py", line 1030, in local_dot_to_dot22
        return [_dot22(*node.inputs)]
      File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/op.py", line 324, in __call__
        self.add_tag_trace(node)
    For the full definition stack trace set the Theano flags traceback.limit to -1

    Traceback (most recent call last):
      File "test.py", line 29, in <module>
        f(numpy.random.rand(5,10))
      File "/u/desjagui/workspace/PYTHON/theano/compile/function_module.py", line 596, in __call__
        self.fn()
      File "/u/desjagui/workspace/PYTHON/theano/gof/link.py", line 288, in streamline_default_f
        raise_with_op(node)
      File "/u/desjagui/workspace/PYTHON/theano/gof/link.py", line 284, in streamline_default_f
        thunk()
      File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/cc.py", line 1111, in execute
        raise exc_type, exc_value, exc_trace
    ValueError: ('Shape mismatch: x has 10 cols but y has 20 rows',
    _dot22(x, <TensorType(float64, matrix)>), [_dot22.0],
    _dot22(x, InplaceDimShuffle{1,0}.0), 'Sequence id of Apply node=4')

Needless to say, the above is not very informative and does not provide much in
the way of guidance. However, by instrumenting the code ever so slightly, we
can get Theano to reveal the exact source of the error.

.. code-block:: python

    # enable on-the-fly graph computations
    theano.config.compute_test_value = 'warn'

    ...

    # input which will be of shape (5, 10)
    x  = T.matrix('x')
    # provide Theano with a default test-value
    x.tag.test_value = numpy.random.rand(5, 10)

In the above, we are tagging the symbolic matrix *x* with a special test
value. This allows Theano to evaluate symbolic expressions on-the-fly (by
calling the ``perform`` method of each op), as they are being defined. Sources
of error can thus be identified with much more precision and much earlier in
the compilation pipeline. For example, running the above code yields the
following error message, which properly identifies *line 23* as the culprit.

.. code-block:: bash

    Traceback (most recent call last):
      File "test2.py", line 23, in <module>
        h1 = T.dot(x,func_of_W1)
      File "/u/desjagui/workspace/PYTHON/Theano/theano/gof/op.py", line 360, in __call__
        node.op.perform(node, input_vals, output_storage)
      File "/u/desjagui/workspace/PYTHON/Theano/theano/tensor/basic.py", line 4458, in perform
        z[0] = numpy.asarray(numpy.dot(x, y))
    ValueError: ('matrices are not aligned', (5, 10), (20, 10))

The ``compute_test_value`` mechanism works as follows:

* Theano ``constants`` and ``shared`` variables are used as is. No need to instrument them.
* A Theano *variable* (i.e. ``dmatrix``, ``vector``, etc.) should be
  given a special test value through the attribute ``tag.test_value``.
* Theano automatically instruments intermediate results. As such, any quantity
  derived from *x* will be given a ``tag.test_value`` automatically.

``compute_test_value`` can take the following values:

* ``off``: Default behavior. This debugging mechanism is inactive.
* ``raise``: Compute test values on the fly. Any variable for which a test
  value is required, but not provided by the user, is treated as an error. An
  exception is raised accordingly.
* ``warn``: Idem, but a warning is issued instead of an *Exception*.
* ``ignore``: Silently ignore the computation of intermediate test values, if a
  variable is missing a test value.

.. note::
  This feature is currently incompatible with ``Scan`` and also with ops
  which do not implement a ``perform`` method.


"How do I Print an Intermediate Value in a Function/Method?"
------------------------------------------------------------

Theano provides a 'Print' op to do this.

.. code-block:: python

    x = theano.tensor.dvector('x')

    x_printed = theano.printing.Print('this is a very important value')(x)

    f = theano.function([x], x * 5)
    f_with_print = theano.function([x], x_printed * 5)

    #this runs the graph without any printing
    assert numpy.all( f([1, 2, 3]) == [5, 10, 15])

    #this runs the graph with the message, and value printed
    assert numpy.all( f_with_print([1, 2, 3]) == [5, 10, 15])


Since Theano runs your program in a topological order, you won't have precise
control over the order in which multiple ``Print()`` ops are evaluted.  For a more
precise inspection of what's being computed where, when, and how, see the discussion
:ref:`faq_monitormode`.

.. warning::

    Using this ``Print`` Theano Op can prevent some Theano
    optimization from being applied. This can also happen with
    stability optimization. So if you use this Print and have nan, try
    to remove them to know if this is the cause or not.


"How do I Print a Graph?" (before or after compilation)
-------------------------------------------------------

.. TODO: dead links in the next paragraph

Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation.  These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`theano.printing.pydotprint` that creates a png image of the function.

You can read about them in :ref:`libdoc_printing`.



"The Function I Compiled is Too Slow, what's up?"
-------------------------------------------------

First, make sure you're running in ``FAST_RUN`` mode. Even though
``FAST_RUN`` is the default mode, insist by passing ``mode='FAST_RUN'``
to ``theano.function`` (or ``theano.make``) or by setting :attr:`config.mode`
to ``FAST_RUN``.

Second, try the Theano :ref:`using_profilemode`.  This will tell you which
``Apply`` nodes, and which ops are eating up your CPU cycles.

Tips:

* Use the flags ``floatX=float32`` to require type *float32* instead of *float64*;
  Use the Theano constructors matrix(),vector(),... instead of dmatrix(), dvector(),...
  since they respectively involve the default types *float32* and *float64*.
* Check in the ``profile`` mode that there is no ``Dot`` op in the post-compilation
  graph while you are multiplying two matrices of the same type. ``Dot`` should be
  optimized to ``dot22`` when the inputs are matrices and of the same type. This can
  still happen when using ``floatX=float32`` when one of the inputs of the graph is
  of type *float64*.


.. _faq_monitormode:

"How do I Step through a Compiled Function?"
--------------------------------------------

You can use ``MonitorMode`` to inspect the inputs and outputs of each
node being executed when the function is called. The code snipped below
shows how to print all inputs and outputs:

.. code-block:: python

    import theano

    def inspect_inputs(i, node, fn):
        print i, node, "input(s) value(s):", [input[0] for input in fn.inputs],

    def inspect_outputs(i, node, fn):
        print "output(s) value(s):", [output[0] for output in fn.outputs]

    x = theano.tensor.dscalar('x')
    f = theano.function([x], [5 * x],
                        mode=theano.compile.MonitorMode(
                            pre_func=inspect_inputs,
                            post_func=inspect_outputs))
    f(3)

    # The code will print the following:
    #   0 Elemwise{mul,no_inplace}(TensorConstant{5.0}, x) [array(5.0), array(3.0)] [array(15.0)]

When using these ``inspect_inputs`` and ``inspect_outputs`` functions
with ``MonitorMode``, you should see [potentially a lot of] printed output.
Every ``Apply`` node will be printed out,
along with its position in the graph, the arguments to the functions ``perform`` or
``c_code`` and the output it computed.
Admittedly, this may be a huge amount of
output to read through if you are using big tensors... but you can choose to
add logic that would, for instance, print
something out only if a certain kind of op were used, at a certain program
position, or only if a particular value showed up in one of the inputs or outputs.
A typical example is to detect when NaN values are added into computations, which
can be achieved as follows:

.. code-block:: python

    import numpy

    import theano

    def detect_nan(i, node, fn):
        for output in fn.outputs:
            if numpy.isnan(output[0]).any():
                print '*** NaN detected ***'
                theano.printing.debugprint(node)
                print 'Inputs : %s' % [input[0] for input in fn.inputs]
                print 'Outputs: %s' % [output[0] for output in fn.outputs]
                break

    x = theano.tensor.dscalar('x')
    f = theano.function([x], [theano.tensor.log(x) * x],
                        mode=theano.compile.MonitorMode(
                            post_func=detect_nan))
    f(0)  # log(0) * 0 = -inf * 0 = NaN

    # The code above will print:
    #   *** NaN detected ***
    #   Elemwise{Composite{[mul(log(i0), i0)]}} [@A] ''
    #    |x [@B]
    #   Inputs : [array(0.0)]
    #   Outputs: [array(nan)]

To help understand what is happening in your graph, you can
disable the ``local_elemwise_fusion`` and all ``inplace``
optimizations. The first is a speed optimization that merges elemwise
operations together. This makes it harder to know which particular
elemwise causes the problem. The second optimization makes some ops'
outputs overwrite their inputs. So, if an op creates a bad output, you
will not be able to see the input that was overwriten in the ``post_func``
function. To disable those optimizations (with a Theano version after
0.6rc3), define the MonitorMode like this:

.. code-block:: python

   mode = theano.compile.MonitorMode(post_func=detect_nan).excluding(
       'local_elemwise_fusion', 'inplace)
    f = theano.function([x], [theano.tensor.log(x) * x],
                        mode=mode)

.. note::

    The Theano flags ``optimizer_including``, ``optimizer_excluding``
    and ``optimizer_requiring`` aren't used by the MonitorMode, they
    are used only by the ``default`` mode. You can't use the ``default``
    mode with MonitorMode, as you need to define what you monitor.

To be sure all inputs of the node are available during the call to
``post_func``, you must also disable the garbage collector. Otherwise,
the execution of the node can garbage collect its inputs that aren't
needed anymore by the Theano function. This can be done with the Theano
flag:

.. code-block:: cfg

   allow_gc=False



.. TODO: documentation for link.WrapLinkerMany


How to Use pdb
--------------

In the majority of cases, you won't be executing from the interactive shell
but from a set of Python scripts. In such cases, the use of the Python
debugger can come in handy, especially as your models become more complex.
Intermediate results don't necessarily have a clear name and you can get
exceptions which are hard to decipher, due to the "compiled" nature of the
functions.

Consider this example script ("ex.py"):

.. code-block:: python

        import theano
        import numpy
        import theano.tensor as T

        a = T.dmatrix('a')
        b = T.dmatrix('b')

        f = theano.function([a, b], [a * b])

        # matrices chosen so dimensions are unsuitable for multiplication
        mat1 = numpy.arange(12).reshape((3, 4))
        mat2 = numpy.arange(25).reshape((5, 5))

        f(mat1, mat2)

This is actually so simple the debugging could be done easily, but it's for
illustrative purposes. As the matrices can't be multiplied element-wise
(unsuitable shapes), we get the following exception:

.. code-block:: text

    File "ex.py", line 14, in <module>
      f(mat1, mat2)
    File "/u/username/Theano/theano/compile/function_module.py", line 451, in __call__
    File "/u/username/Theano/theano/gof/link.py", line 271, in streamline_default_f
    File "/u/username/Theano/theano/gof/link.py", line 267, in streamline_default_f
    File "/u/username/Theano/theano/gof/cc.py", line 1049, in execute ValueError: ('Input dimension mis-match. (input[0].shape[0] = 3, input[1].shape[0] = 5)', Elemwise{mul,no_inplace}(a, b), Elemwise{mul,no_inplace}(a, b))

The call stack contains some useful information to trace back the source
of the error. There's the script where the compiled function was called --
but if you're using (improperly parameterized) prebuilt modules, the error
might originate from ops in these modules, not this script. The last line
tells us about the op that caused the exception. In this case it's a "mul"
involving variables with names "a" and "b". But suppose we instead had an
intermediate result to which we hadn't given a name.

After learning a few things about the graph structure in Theano, we can use
the Python debugger to explore the graph, and then we can get runtime
information about the error. Matrix dimensions, especially, are useful to
pinpoint the source of the error. In the printout, there are also 2 of the 4
dimensions of the matrices involved, but for the sake of example say we'd
need the other dimensions to pinpoint the error. First, we re-launch with
the debugger module and run the program with "c":

.. code-block:: text

    python -m pdb ex.py
    > /u/username/experiments/doctmp1/ex.py(1)<module>()
    -> import theano
    (Pdb) c

Then we get back the above error printout, but the interpreter breaks in
that state. Useful commands here are

* "up" and "down" (to move up and down the call stack),
* "l" (to print code around the line in the current stack position),
* "p variable_name" (to print the string representation of 'variable_name'),
* "p dir(object_name)", using the Python dir() function to print the list of an object's members

Here, for example, I do "up", and a simple "l" shows me there's a local
variable "node". This is the "node" from the computation graph, so by
following the "node.inputs", "node.owner" and "node.outputs" links I can
explore around the graph.

That graph is purely symbolic (no data, just symbols to manipulate it
abstractly). To get information about the actual parameters, you explore the
"thunk" objects, which bind the storage for the inputs (and outputs) with
the function itself (a "thunk" is a concept related to closures). Here, to
get the current node's first input's shape, you'd therefore do "p
thunk.inputs[0][0].shape", which prints out "(3, 4)".

