Modification in the trunk since the last release

Theano 0.3.1rc2 (2011-02-18)
-------------------

Deprecation:
 * The theano shared variable attribute `value` is deprecated, use `get_value()` or `set_value()`!
    http://deeplearning.net/software/theano/tutorial/aliasing.html

Bugs fixed:
 * The random number generator in theano/sandbox/rng_mrg.py did not always return the same sequence of number on the CPU and GPU.
    * In some cases, there was a small fraction of garbage in the returned sequence, 
      but that garbage looked random. So if your usage did not depend too much on the random properties, you might be OK.
 * In python mode (not the default mode) when input of elemwise operation was an empty ndarray, we were not returning an empty ndarray.
 * Some segfault at exit with GPU code.
 * Some bugs in Scan:
    * Scan was incorrectly caching the number of steps to execute
      This affect you only if you change the number of step of a compiled scan op. Constant number of step were ok.
    * others: Razvan?
 * In GpuConv, errors in conv_patch_stack_reduce when the entire kernel doesn't fit into shared memory.
   The error was not found before as the impact was less then the relative tolerance of 1e-3. Now the relative tolerance is 1e-5.

Crash fixed:
 * Add a feature to not have an exception that makes Theano crash when taking the gradient on DimShuffle in some particular case.
 * Compilation crash for GpuElemwise with tensor with high number of dimensions(~6 or more).
 * Disabled C code generator that make gcc crash on complex type.
 * Crash in optimization when an Op has no input.
 * output shape is now computed correctly for matrix-vector multiplication on GPU.
 * In Scan, when using numbers as inputs, not symbolic variables
 * In GpuSum, bug in calculation of n_blocks for the 10 pattern
   (Sum on the row of a matrix)

Optimization:
 * New SpecifyShape op that allow to pass more shape info in the graph.
 * Speed up gemv by a work around scipy gemv slowness when the matrix is in C order (the default).
 * Remove join of only 1 element
 * During optimization, consider one more case in get_constant_value.

GPU:
 * cuda_shared.value = X now works inplace!
     * cuda_shared_var.set_value(new_ndarray) will overwrite the old value inplace in the most common case.
 * Allow to create a CudaNdarraySharedVariable from a CudaNdarray.
 * new init_gpu_device theano flags.
 * Fuse GpuElemwise more often (in the case where there are so many inputs that fusing them all would bust the 256 bytes limit of parameter to gpu function).
 * Cpu join of only 1 element that was not moved to the gpu.

New features:
 * Tensor.reshape now makes dimensions of length broadcastable (fixes #434).
 * Tensor.prod now implements the gradient
 * DebugMode now warns if an Op declared itself as returning a view of the input but did not do so.
    * This behaviour is a problem, because it can block other Ops from being inplace on the same inputs. This could lower the reuse of memory.
 * Sparse.structured_dot now works when both matrices are sparse
 * Sparse type is now supported by the shape op, and the ShapeFeature optimizer works correctly with them.
 * New 3D convolution ops, with CPU and GPU implementations.
 * New colors in pydotprint.

Documentation:
 * Documented lib.amdlibm and (new) init_gpu_device config variables.
 * A new page (was done for 0.3 but an error was hiding it on the web page) on the memory aliasing contract of Theano.
 * Revision to the Windows installation instructions.
 * The cuda documentation is now generated on the web server.
 * Better documentation of .theanorc and its sections.

Unit tests:
 * Stop usage of deprecated functions or syntax in the unit tests.
 * Better testing of GPU convolution nets.
 * Make more tests able to use different random seeds.
 * Tests of sparse now use default mode, not a hard-coded one.
 * Remove some tests of unimplemented features.

Other:
 * The name of compiledir now includes the Python version to make it easier for people with many Python versions
 * Added theano.tensor.std as a shortcut to sqrt(var(input=input, axis=axis)).
 * Whitespace, tabulation and indentation clean-up in the code.
 * Better detection of memory sharing between variables.
