.. _parallel-computation:

Parallel Computation With YT
============================

YT has been instrumented with the ability to compute many -- most, even --
quantities in parallel.  This utilizes the package 
`mpi4py <http://code.google.com/p/mpi4py>`_ to parallelize using the Message
Passing Interface, typically installed on clusters.  

.. _capabilities:

Capabilities
------------

Currently, YT is able to
perform the following actions in parallel:

 * Projections
 * Slices
 * Cutting planes (oblique slices)
 * Derived Quantities (total mass, angular momentum, etc)
 * 1-, 2- and 3-D profiles
 * Halo finding

This list covers just about every action YT can take!  Additionally, almost all
scripts will benefit from parallelization without any modification.  The goal
of Parallel-YT has been to retain API compatibility and abstract all
parallelism.  

Setting Up Parallel YT
----------------------

To run scripts in parallel, you must first install `mpi4py <http://code.google.com/p/mpi4py>`_.
Instructions for doing so are provided on the MPI4Py website.  Once that has
been accomplished, you're all done!  You just need to launch your scripts with
``mpirun`` and signal to YT that you want to run them in parallel.

For instance, the following script, which we'll save as ``my_script.py``:

.. code-block:: python

   from yt.mods import *
   pf = load("RD0035/RedshiftOutput0035")
   v, c = pf.h.find_max("Density")
   print v, c
   pc = PlotCollection(pf, center = [0.5, 0.5, 0.5])
   pc.add_projection("Density", 0)
   pc.save()

Will execute the finding of the maximum density and the projection in parallel
if launched in parallel.  To do so, at the command line you would execute

.. code-block:: bash

   $ mpirun -np 16 python2.6 my_script.py --parallel

if you wanted it to run in parallel.  If you run into problems, the you can use
:ref:`remote-debugging` to examine what went wrong.

.. warning:: If you manually interact with the filesystem, not through YT, you
   will have to ensure that you only execute your functions on the root
   processor.  You can do this with the function :func:only_on_root.

It's important to note that all of the processes listed in `capabilities` work
-- and no additional work is necessary to parallelize those processes.
Furthermore, the ``yt`` command itself recognizes the ``--parallel`` option, so
those commands will work in parallel as well.

The Derived Quantities and Profile objects must both have the ``lazy_reader``
option set to ``True`` when they are instantiated.  What this does is to
operate on a grid-by-grid decomposed basis.  In ``yt`` version 1.5 and the
trunk, this has recently been set to be the default.

Types of Parallelism
--------------------

In order to divide up the work, YT will attempt to send different tasks to
different processors.  However, to minimize inter-process communication, YT
will decompose the information in different ways based on the task.

Spatial Decomposition
+++++++++++++++++++++

During this process, the hierarchy will be decomposed along either all three
axes or along an image plane, if the process is that of projection.  This type
of parallelism is overall less efficient than grid-based parallelism, but it
has been shown to obtain good results overall.

Grid Decomposition
++++++++++++++++++

The alternative to spatial decomposition is a simple round-robin of the grids.
This process alows YT to pool data access to a given Enzo data file, which
ultimately results in faster read times and better parallelism.
