Skip to content
Snippets Groups Projects
overview.rst 6.57 KiB
Newer Older
Daniel Lyons's avatar
Daniel Lyons committed
Overview
============

Workspaces is composed of two major services and a suite of utilities.

- The **Capability Service**, which provides stateful, concurrency-limited, high-level processing.
- The **Workflow Service**, which abstracts the actual running of jobs in the cluster

Apart from these two, there is also the **notification service**, a shared service for sending notifications.

Daniel Lyons's avatar
Daniel Lyons committed
.. image:: overview/images/overall.png
Daniel Lyons's avatar
Daniel Lyons committed
   :width: 567px
   :height: 404px
   :alt: diagram showing users and DAs interacting with workspaces UI

The workflow service eventually runs executables in the cluster environment. Some of these executables are
externally-provided, such as CASA. In many cases, external tools have complex environmental requirements. When this
occurs, we have provided wrappers generally called "envoys" that perform the environmental setup and then hand-off to
the external tool. Workspaces also has a number of internal tools it uses for fetching data from the archive or
delivering data to the user.

.. contents:: Table of Contents

.. toctree::
    :maxdepth: 2

    overview/overall-architecture.rst


The Workflow Service
--------------------

The first and simpler service is the Workflow service, which exists to make it straightforward to launch processing jobs
without knowing too much about how this is done. The workflow service provides a number of functionalities that the
legacy workflow system did not:

- Workflow state is never lost, even if the workflow server goes offline after launching a workflow
- Users can attach arbitrary files to a workflow execution
- Workflow definitions consist entirely of templates and thus are quick and easy to create from scratch
- Workflows that fail due to transient or hardware problems are automatically restarted
- Workflows can run in various datacenters without requiring special work by the workflow service

.. toctree::
    :maxdepth: 2

Daniel Lyons's avatar
Daniel Lyons committed
    overview/workflow-schema
    overview/how-workflows-run-in-the-dsoc-or-naasc
    overview/workflow-creation
Daniel Lyons's avatar
Daniel Lyons committed

The Capability Service
----------------------

The capability service is where most of the interesting complexity comes into play.
Daniel Lyons's avatar
Daniel Lyons committed
Like the workflow service, the capability service manages the execution of processes.
However, there are some important distinctions:

* Workflows are *Aristotelian*; they run once and succeed or fail. Capabilities are *Platonic*;
  many versions of the capability can be executed until the ideal outcome is obtained
* Workflows are run by machines. Capabilities are run by humans.

You can think of the capability system as the *control system* for Workspaces. The capability service, at its core,
dispatches behaviors based on events it receives from AMQP. In fact, one of the core design goals of the capability
system was to be invulnerable to service outages. This works because the capability system is data-driven: the existing
state is kept in a database, and pending changes of state are persisted by the AMQP system until the capability service
is able to process them. This is what allows us to weather up to (say) 24 hours of downtime of the entire
capability and workflow systems.

There are basically three characteristics that differentiate capabilities from
workflows:

1. Versions. The :doc:`capability version system <overview/capability-versions>` allows users to refine and resubmit
   requests if the obtained result is unacceptable. This is vital for dealing with iterative processing like calibration
   and imaging, where due to RFI or instrument failures, new flagging files need to be applied to secure the proper
   outcome.
2. States. The :doc:`capability state system <overview/capability-states>` allows us to define arbitrary "workflows" for
   handling capability requests. These workflows are high-level, like "we must QA before ingesting images," and can
   include accepting human input and decisions as well as processing.
3. Queues. The capability queue system allows our stakeholders to specify very high-level constraints on processing,
   including pausing capability requests altogether, or choosing a number of concurrent requests to permit. This is
   useful during move configurations when processing should be paused, and then after move configurations to prevent the
   standard calibration capability from swamping the cluster completely.
Daniel Lyons's avatar
Daniel Lyons committed

.. toctree::
    :maxdepth: 2

Daniel Lyons's avatar
Daniel Lyons committed
    overview/capability-schema
Daniel Lyons's avatar
Daniel Lyons committed
    overview/capability-states
    overview/capability-versions
    overview/restriction-engine

Utilities
---------

We can group the tools into two broad families: tools that are executed remotely, on the cluster; and tools that are
intended to be used by developers and DAs to inspect or affect the system.

Utilities for the Cluster
~~~~~~~~~~~~~~~~~~~~~~~~~

Scientific processing often requires a significant amount of environmental setup prior to being run. However, to access
heterogeneous clusters, it is necessary to assume as little as possible about the environment in which these programs
are executed. Some sort of bridge is necessary that encompasses the environmental setup and can run in different
environments. This led us to a design pattern we call "envoys," which is documented at
:doc:`overview/envoys`. The ones that currently exist are the :doc:`CARTA envoy <tools/carta_envoy>`,
:doc:`CASA envoy <tools/casa_envoy>` and :doc:`ingest envoy <tools/ingest_envoy>`.

Some tools are intended to directly support workflows:

- :doc:`conveyor <tools/conveyor>` moves data in and out of the QA cache area
- :doc:`deliver <tools/deliver>` copies results to the delivery area for the user
- :doc:`ingest <tools/ingest>` moves data from disk to the archive
- :doc:`productfetcher <tools/productfetcher>` retrieves data from the archive
- :doc:`ws_annihilator <tools/ws_annihilator>` is a cleanup program that is run automatically by cron
- :doc:`iiwf_trigger <tools/iiwf_trigger>` is a tool to start an image ingestion, called by the system

Some tools are for testing:

- :doc:`null <tools/null>` supports the "null" testing workflow.
- :doc:`vela <tools/vela>` emulates CASA's behaviors but runs instantly

Utilities for Developers and DAs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- :doc:`wf_inspector <tools/wf_inspector>` makes it easy to get into an executing workflow
- :doc:`ws_metrics <tools/ws_metrics>` is a tool for retrieving Workspaces metrics
- :doc:`mediator <tools/mediator>` allows workspaces requests to be destructively modified
- :doc:`mod_analyst <tools/mod_analyst>` manages the DAs and AODs in the stopgap users table
- :doc:`seci_ingestion_status <tools/seci_ingestion_status>` checks on the ingestion status of a SECI imaging job

.. toctree::
    :maxdepth: 2

    overview/envoys