Analysis Philosophy
===================
.. sectionauthor:: J. S. Oishi <jsoishi@astro.berkeley.edu>

There are many tools available for analysis and visualization of AMR
data; there are many just for ``enzo``. So why ``yt``? Along the road
to answering that question, we shall take a somewhat philosophical
scenic route. For the more pragmatically minded, the answer is simple:
what ``yt`` does not yet do, you can make it do so. This is not as
glib as it may seem: it is in fact the main philosophical tennant that
underlies ``yt``. In this section, it is not our goal to show you just
how much ``yt`` already does. Instead, we will discuss how it is that
``yt`` does anything at all. In doing so, we hope to give you a sense
of whether or not ``yt`` will align with your science goals.

At its core, ``yt`` is not a set of scripts to visualize AMR data, nor
is it a set of low-level routines that return a homo- or even
heterogeneous set of gridded data to your favorite scientific
programming language--though ``yt`` incorporates both of these things,
if your favorite scientific language is python. Instead, ``yt``
provides a series of objects, some common AMR code structures (such as
hierarchies and levels in a nested mesh) and some physical (a
cylinder, cube, or sphere somewhere in the problem domain), that allow
you to process AMR data in order to get at the fundamental underlying
physics. 


Design Goals
------------

``yt`` evolved naturally out of three design goals, though when Matt
was busy writing it, he never really thought about them.  Over
time, it became clear that they are real and furthermore that they
are important to understanding how to use ``yt``.  These three goals
are directed analysis, repeatability, and data exploration. 

Directed Analysis: Answering a Question
+++++++++++++++++++++++++++++++++++++++

One of the main motivators for ``yt`` is to make it possible to sit
down with a definite question about an AMR dataset and code up a
script that will provide an answer to that question. Indeed much of its
object-oriented nature can be viewed as a way perform operations on a
data object. Given that AMR simulations are usually used to track some
kind of structure formation, be it shocks, stars, or galaxies, the
data object may not be the entire domain, but some region within it
that is interesting. This data object in hand, ``yt`` makes it easy
(if possible: some tasks ``yt`` can merely make *possible*) to
manipulate that data in such a way to answer a question related to
your research.

Repeatability
+++++++++++++

In any scientific analysis, being able to repeat the set of steps that
prepared an answer or physical quantity is essential.  To that end,
much of the usage of ``yt`` is focused around running scripts,
describing actions and plots programmatically.  Being able to write a
script or conducting a set of commands that will reproduce identical
results is fundamentally important, and ``yt`` will attempt to make
that easy.  It's for this reason that the interactive features of
``yt`` are not always as advanced as they might otherwise be. We are
actively working on integrating the SAGE notebook system into ``yt``,
which our preliminary tests suggest is a nice compromise between
interactivity and repeatability. 

Exploration
+++++++++++

However, it is the serendipitous nature of science that often finding
the right question is not obvious at first. This is certainly true for
astrophysical simulation, especially so for simulations of structure
formation. What are we looking for, and how will we know when we find
it? 

Quite often, the best way forward is to explore the simulation data as
freely as possible.  Without the ability for spot-examination,
serendipitous discovery or general wandering, the code would be simply
a pipeline, rather than a general tool. The flexible extensibility of
``yt``, that is, the ability to create new derived quantities easily,
as well as the ability to extract and display data regions in a
variety of ways allows for this exploration.

.. _philo-objects:

Object Methodology
------------------

``yt`` follows a strong object-oriented methodology.  There is no real
global state of ``yt``; all state is contained within objects that
encapsulate an AMR code object or physical region.

Physical Objects vs Code Objects
++++++++++++++++++++++++++++++++

The best way to think about doing things with ``yt`` is to think first
of objects. The AMR code puts a number of objects on disk, and ``yt``
has a matching set of objects to mimic these closely as possible. Your
code runs (hopefully) a simulacrum of the physical universe, and thus
in order to make sense of the output data, ``yt`` provides a set of
objects meant to mimic the kinds of physical regions and processes you
are interested in. For example, in a simulation of star formation out
of some larger structure (the cosmic dark matter web, a turbulent
molecular cloud), you might be interested in a sphere one parsec in
radius around the point of maximum density. In a simulation of an
accretion disk, you might want a cylindrical region of 1000 AU in
radius and 10 AU in height with its axial vector aligned with the net
angular momentum vector, which may be arbitrary with respect to the
simulation cardinal axes. These are physical objects, and ``yt`` has a
set of these too. Finally, you may wish to reduce the data to produce
some essential data that represent a specific process. These
reductions are also objects, and they are included in ``yt`` as well.

Somewhat separate from this, but in the same spirit, are plots. In
``yt``, plots are also objects that one can create, manipulate, and
save. In the case of plots, however, you tell ``yt`` what you want to
see, and it can fetch data from the appropriate source. 

In list form,

   Code Objects
     These are things that are on the disk that the AMR code knows about --
     things like grids, data dumps, the grid hierarchy and so on.
   Physical Objects
     These are objects like spheres, rectangular prisms, slices, and
     so on. These are collections of related data arranged by physical
     properties, and they are not necessarily associated with a single
     code object.
   Reduced Objects
     These are objects created by taking a set of data and reducing it
     into a smaller format, suitable for a specific purpose.
     Histograms, 1-D profiles, and averages are all members of this
     category.
   Plots
     Plots are somewhat different than other objects, as they are
     neither physical nor code. Instead, the plotting interface
     accepts information about what you want to see, then goes and
     fetches what is necessary--from code, physical, and reduced
     objects as necessary.

Flexible Projections: an Example of Reusable Data Reduction
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

AMR data is best applied when the dynamic range in a quantity of
interest (length or mass scales, typically) is large, but the volume
filling factor of such interesting regions is small. In astronomy,
virtually all available observations are projections on the sky, with
little radial information about the observed structure. In order to
compare with these observations, *projections* are an extremely useful
data reduction for simulations. It is often useful to project to a
given resolution, which may be as high as the highest subdomain in the
AMR data set. However, projecting in a given direction through the
full AMR data set can be quite costly in computing time. ``yt``'s
project tool saves an *adaptive* projection when it completes this
costly step, allowing you to make 2D images at whatever resolution you
like with very modest computational resources. This idea, that of
saving as much information as you need (and no more) to make the data
reduction flexible for reuse is another core idea behind ``yt``. You
should not have to spend computational resources and precious time to
replot a projection from a 1000x1000 image to a 2000x2000 image. As a
side note, in this specific case, because the 2D data product ``yt``
produces is "smart", it never needs to use an array in memory as large
as the full effective AMR resolution (which could be very large, and
nearly devoid of unique information).

.. _philo-derived-fields:

Derived Fields and Derived Quantities
-------------------------------------

While the heart of ``yt`` is the large set of basic code, physical,
reduced, and plot objects already developed, in a metaphorical sense,
its 'soul' is the fact that any of the objects can be used as starting
points for creating fields and quantities of your own devices. Derived
quantities and derived fields are the physical objects ``yt`` creates
from the ``primitive`` variables the AMR code stores. These may or may
not be the so-called primitive variables of fluid dynamics (density,
velocity, energy): they are whatever your AMR code writes to
disk. 

Derived quantities are those data products derived from these
variables such that the total amount of returned data is *less* than
the number of cells. Derived fields, on the other hand, return a field
with *equal* numbers of cells and the same geometry as the primitive
variables from which it was derived. For example, ``yt`` could compute
the gravitational potential at every point in space reconstructed from
the density field.

``yt`` already includes a large number of both derived fields and
quantities, but its real power is that it is easy to create your
own. See :ref:`creating-derived-fields` for detailed instructions on creating
derived fields. 
