.. index:: Views, AMPS; Views,

.. include:: ./macros.inc
.. _#ug-views:

13. Aggregating and Analyzing Data in AMPS
==========================================

AMPS contains a high-performance aggregation engine, which can be used
to project one SOW topic onto another, similar to the ``CREATE VIEW``
functionality found in most RDBMS software. The aggregation engine can
join input from multiple topics, of the same or different message types,
and can produce output in different message types.

View topics are part of the AMPS State of the World, which means that
views support delta subscriptions and out of focus (OOF) tracking. A
view can also be used as the underlying topic for another view.


In addition, for the limited cases where a view is not practical, AMPS
allows an individual subscription to request aggregation and projection
of a single SOW topic.

Notice that the features described in this chapter are designed for cases
where an application needs to aggregate data across messages or to perform
a calculation on an individual message that should not be preserved as a
part of that message.

To modify a message as it is published to AMPS, use *preprocessing or
enrichment*.  To simply retrieve a subset of the fields in a message, use
*select lists*.

.. _#ug-views-understanding:

Understanding Views
--------------------

Views allow you to aggregate messages from one or more SOW topics in
AMPS and present the aggregation as a new SOW topic. AMPS stores the
contents of the view as serialized messages in memory, similar to a
materialized view in RDBMS software.

Views are often used to simplify subscriber implementation and can
reduce the network traffic to subscribers. For example, if some clients
will only process orders where the total cost of the order exceeds a
certain value, you can both simplify subscriber code and reduce network
traffic by creating a view that contains a calculated field for the
total cost. Rather than receiving all messages and calculating the cost,
subscribers can filter on the calculated field. You can also combine
information from multiple topics. For example, you could create a view
that contains orders from high-priority customers that exceed a certain
dollar amount.

AMPS sends messages to view topics the same way that AMPS sends messages
to SOW topics: when a publish arrives for a message that is used to
calculate the view, AMPS recalculates the values in the view as
necessary and sends a message on the view topic. Likewise,
you can query a view the same way that you query a SOW topic.


Defining a view is straightforward. You set the name of the view, the
SOW topic or topics from which messages originate and describe how you
want to aggregate, or project, the messages. AMPS creates a topic and
projects the messages as requested.

A view requires one or more underlying topics. Any topic, view,
or queue defined in the ``SOW`` section can be the underlying
topic for a view. However, the underlying topic for the view
must be defined in the AMPS configuration file before the
view itself is defined.

.. caution::

   All message types that you specify in a view must support view
   creation.

Because AMPS uses the SOW topics of the underlying messages to determine
when to update the view, the underlying topics used in a view must be
in the State of the World. Any topic, view, conflated topic, or queue defined
in the ``SOW`` section can be the underlying topic for a view.
However, the underlying topic or topics must be defined in the
AMPS configuration file before the view is defined.

AMPS updates each view after a publish or delta publish to a message in
an underlying topic. Updates are processed for each view in the order in
which AMPS processed the updates to the underlying topic. AMPS processes
these updates asynchronously, after each SOW update is persisted. For
additional performance, AMPS provides the ability to conflate updates to
views that process high velocity updates, as described in
:ref:`#inlineviewconflation`.

As with a SOW topic, an incoming publish that does not change the value
of the underlying message or the calculated value in the view is
considered to be an update to the view topic.  If an application
needs to see only changed fields, that application should use a
delta subscription (with the ``no_empties`` option).

.. tip::

   When the underlying topic for a view is a queue,
   the view will show only the messages in the queue
   that are not currently leased to subscribers.    

.. include:: ./joins.inc

Constructing Fields
-------------------

The AMPS expression language is used to construct fields in aggregates,
as described in :ref:`Chapter 4 Constructing Fields <#ug-constructing-fields>`.

Best Practices for Views
----------------------------

When creating a view, consider the following best practices:

* AMPS must compute and serialize each field in the view. Smaller
  numbers of views, and less expensive calculations, may provide
  better performance.

* AMPS must determine the update to the view for each change
  to an underlying topic.

* For views that join multiple topics, consider the amount
  of work produced by an update to each topic in the join.
  You can estimate this by paying attention to the number
  of matching messages on each side of the join.
  
  Consider a view that joins a large ``orders`` topic to 
  an ``order_type`` topic with a much smaller number of
  messages.  If, for each ``order_type``, there are
  10000 matching messages in the ``orders`` topic,
  then a publish that updates a message in the ``order_type``
  topic would produce 10000 updates to a view joining these
  two topics together. In cases like this, avoid making
  unnecessary updates to an underlying topic with messages
  that match a large number of messages on the other side
  of the join. (For example, a change to update a value
  in ``order_type`` would be a necessary change. Simply
  republishing the same messages to ``order_type`` on
  a periodic basis would produce a large number of updates
  to the view without changing the results, and is more likely
  to be unnecessary work.)

* If an underlying message can have frequent updates
  and subscribers only need to receive the final state
  of the message in the view, consider using
  the ``InlineConflation`` option to allow the view
  to avoid processing intermediate changes where possible.

* For topics that are the underlying topics of views,
  avoid publishing updates that do not change the values
  of the fields used in the view. Each update to an
  underlying topic causes an update to the view. In
  particular, an approach such as republishing a set
  of lookup values every few minutes will produce a
  large amount of work (while AMPS fully recalculates
  the view) without a change in the results. In general,
  unless the set of values that was republished is
  significant, avoid republishing values to a topic
  underlying a view.

.. include:: ./view-examples.inc
.. include:: ./aggregated_subscriptions.inc

