
.. _debug_faq:

=========================================
Debugging Theano: FAQ and Troubleshooting
=========================================

There are many kinds of bugs that might come up in a computer program.
This page is structured as an FAQ.  It should provide recipes to tackle common
problems, and introduce some of the tools that we use to find problems in our
Theano code, and even (it happens) in Theano's internals, such as
:ref:`using_debugmode`.

Isolating the problem/Testing Theano compiler
---------------------------------------------

You can run your Theano function in a DebugMode(:ref:`using_debugmode`). This test the Theano optimizations and help to find where NaN, inf and other problem come from.


How do I print an intermediate value in a Function/Method?
----------------------------------------------------------

Theano provides a 'Print' Op to do this.

.. code-block:: python

    x = theano.tensor.dvector('x')

    x_printed = theano.printing.Print('this is a very important value')(x)

    f = theano.function([x], x * 5)
    f_with_print = theano.function([x], x_printed * 5)

    #this runs the graph without any printing
    assert numpy.all( f([1,2,3]) == [5, 10, 15])

    #this runs the graph with the message, and value printed
    assert numpy.all( f_with_print([1,2,3]) == [5, 10, 15])


Since Theano runs your program in a topological order, you won't have precise
control over the order in which multiple Print() Ops are evaluted.  For a more
precise inspection of what's being computed where, when, and how, see the
:ref:`faq_wraplinker`.


How do I print a graph (before or after compilation)?
----------------------------------------------------------

Theano provides two functions (:func:`theano.pp` and
:func:`theano.printing.debugprint`) to print a graph to the terminal before or after
compilation.  These two functions print expression graphs in different ways:
:func:`pp` is more compact and math-like, :func:`debugprint` is more verbose.
Theano also provides :func:`pydotprint` that creates a png image of the function.

You can read about them in :ref:`libdoc_printing`.



The function I compiled is too slow, what's up?
-----------------------------------------------
First, make sure you're running in FAST_RUN mode.  
FAST_RUN is the default mode, but make sure by passing ``mode='FAST_RUN'``
to ``theano.function`` (or ``theano.make``) or by setting :attr:`config.mode`
to ``FAST_RUN``.

Second, try the theano :ref:`using_profilemode`.  This will tell you which
Apply nodes, and which Ops are eating up your CPU cycles.

Tips:

* use the flags floatX=float32 to use float32 instead of float64 for the theano type matrix(),vector(),...(if you used dmatrix, dvector() they stay at float64).
* Check that in the profile mode that there is no Dot operation and you're multiplying two matrices of the same type. Dot should be optimized to dot22 when the inputs are matrices and of the same type. This can happen when using floatX=float32 and something in the graph makes one of the inputs float64.

.. _faq_wraplinker:

How do I step through a compiled function with the WrapLinker?
--------------------------------------------------------------

This is not exactly an FAQ, but the doc is here for now...
It's pretty easy to roll-your-own evaluation mode.
Check out this one:

.. code-block:: python

    class PrintEverythingMode(Mode):
        def __init__(self):
            def print_eval(i, node, fn):
                print i, node, [input[0] for input in fn.inputs],
                fn()
                print [output[0] for output in fn.outputs]
            wrap_linker = theano.gof.WrapLinkerMany([theano.gof.OpWiseCLinker()], [print_eval])
            super(PrintEverythingMode, self).__init__(wrap_linker, optimizer='fast_run')

When you use ``mode=PrintEverythingMode()`` as the mode for Function or Method,
then you should see [potentially a lot of] output.  Every Apply node will be printed out,
along with its position in the graph, the arguments to the ``perform`` or
``c_code`` and the output it computed.  

>>> x = T.dscalar('x')
>>> f = function([x], [5*x], mode=PrintEverythingMode())
>>> f(3)
>>> # print: 0 Elemwise{mul,no_inplace}(5, x) [array(5, dtype=int8), array(3.0)] [array(15.0)]
>>> # print: [array(15.0)]

Admittedly, this may be a huge amount of
output to read through if you are using big tensors... but you can choose to
put logic inside of the print_eval function  that would, for example, only
print something out if a certain kind of Op was used, at a certain program
position, or if a particular value shows up in one of the inputs or outputs.
Use your imagination :)

.. TODO: documentation for link.WrapLinkerMany

This can be a really powerful debugging tool.
Note the call to ``fn`` inside the call to ``print_eval``; without it, the graph wouldn't get computed at all!

How to use pdb ?
----------------

In the majority of cases, you won't be executing from the interactive shell
but from a set of Python scripts. In such cases, the use of the Python
debugger can come in handy, especially as your models become more complex.
Intermediate results don't necessarily have a clear name and you can get
exceptions which are hard to decipher, due to the "compiled" nature of
functions.

Consider this example script ("ex.py"):

.. code-block:: python

        import theano
        import numpy
        import theano.tensor as T

        a = T.dmatrix('a')
        b = T.dmatrix('b')

        f = theano.function([a,b], [a*b])

        # matrices chosen so dimensions are unsuitable for multiplication
        mat1 = numpy.arange(12).reshape((3,4))
        mat2 = numpy.arange(25).reshape((5,5))

        f(mat1, mat2)

This is actually so simple the debugging could be done easily, but it's for
illustrative purposes. As the matrices can't be element-wise multiplied
(unsuitable shapes), we get the following exception:

.. code-block:: text

    File "ex.py", line 14, in <module>
      f(mat1, mat2)
    File "/u/username/Theano/theano/compile/function_module.py", line 451, in __call__
    File "/u/username/Theano/theano/gof/link.py", line 271, in streamline_default_f
    File "/u/username/Theano/theano/gof/link.py", line 267, in streamline_default_f
    File "/u/username/Theano/theano/gof/cc.py", line 1049, in execute ValueError: ('Input dimension mis-match. (input[0].shape[0] = 3, input[1].shape[0] = 5)', Elemwise{mul,no_inplace}(a, b), Elemwise{mul,no_inplace}(a, b))

The call stack contains a few useful informations to trace back the source
of the error. There's the script where the compiled function was called --
but if you're using (improperly parameterized) prebuilt modules, the error
might originate from ops in these modules, not this script. The last line
tells us about the Op that caused the exception. In thise case it's a "mul"
involving Variables name "a" and "b". But suppose we instead had an
intermediate result to which we hadn't given a name.

After learning a few things about the graph structure in Theano, we can use
the Python debugger to explore the graph, and then we can get runtime
information about the error. Matrix dimensions, especially, are useful to
pinpoint the source of the error. In the printout, there are also 2 of the 4
dimensions of the matrices involved, but for the sake of example say we'd
need the other dimensions to pinpoint the error. First, we re-launch with
the debugger module and run the program with "c":

.. code-block:: text

    python -m pdb ex.py
    > /u/username/experiments/doctmp1/ex.py(1)<module>()
    -> import theano
    (Pdb) c

Then we get back the above error printout, but the interpreter breaks in
that state. Useful commands here are

* "up" and "down" (to move up and down the call stack),
* "l" (to print code around the line in the current stack position),
* "p variable_name" (to print the string representation of 'variable_name'),
* "p dir(object_name)", using the Python dir() function to print the list of an object's members

Here, for example, I do "up", and a simple "l" shows me there's a local
variable "node". This is the "node" from the computation graph, so by
following the "node.inputs", "node.owner" and "node.outputs" links I can
explore around the graph.

That graph is purely symbolic (no data, just symbols to manipulate it
abstractly). To get information about the actual parameters, you explore the
"thunks" objects, which bind the storage for the inputs (and outputs) with
the function itself (a "thunk" is a concept related to closures). Here, to
get the current node's first input's shape, you'd therefore do "p
thunk.inputs[0][0].shape", which prints out "(3, 4)".

