Newer
Older
Overview
============
Workspaces is composed of two major services and a suite of utilities.
- The **Capability Service**, which provides stateful, concurrency-limited, high-level processing.
- The **Workflow Service**, which abstracts the actual running of jobs in the cluster
Apart from these two, there is also the **notification service**, a shared service for sending notifications.
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
:width: 567px
:height: 404px
:alt: diagram showing users and DAs interacting with workspaces UI
The workflow service eventually runs executables in the cluster environment. Some of these executables are
externally-provided, such as CASA. In many cases, external tools have complex environmental requirements. When this
occurs, we have provided wrappers generally called "envoys" that perform the environmental setup and then hand-off to
the external tool. Workspaces also has a number of internal tools it uses for fetching data from the archive or
delivering data to the user.
.. contents:: Table of Contents
.. toctree::
:maxdepth: 2
overview/overall-architecture.rst
The Workflow Service
--------------------
The first and simpler service is the Workflow service, which exists to make it straightforward to launch processing jobs
without knowing too much about how this is done. The workflow service provides a number of functionalities that the
legacy workflow system did not:
- Workflow state is never lost, even if the workflow server goes offline after launching a workflow
- Users can attach arbitrary files to a workflow execution
- Workflow definitions consist entirely of templates and thus are quick and easy to create from scratch
- Workflows that fail due to transient or hardware problems are automatically restarted
- Workflows can run in various datacenters without requiring special work by the workflow service
.. toctree::
:maxdepth: 2
overview/workflow-schema
overview/how-workflows-run-in-the-dsoc-or-naasc
overview/workflow-creation
The Capability Service
----------------------
The capability service is where most of the interesting complexity comes into play.
Like the workflow service, the capability service manages the execution of processes.
However, there are some important distinctions:
* Workflows are *Aristotelian*; they run once and succeed or fail. Capabilities are *Platonic*;
many versions of the capability can be executed until the ideal outcome is obtained
* Workflows are run by machines. Capabilities are run by humans.
You can think of the capability system as the *control system* for Workspaces. The capability service, at its core,
dispatches behaviors based on events it receives from AMQP. In fact, one of the core design goals of the capability
system was to be invulnerable to service outages. This works because the capability system is data-driven: the existing
state is kept in a database, and pending changes of state are persisted by the AMQP system until the capability service
is able to process them. This is what allows us to weather up to (say) 24 hours of downtime of the entire
capability and workflow systems.
There are basically three characteristics that differentiate capabilities from
workflows:
1. Versions. The :doc:`capability version system <overview/capability-versions>` allows users to refine and resubmit
requests if the obtained result is unacceptable. This is vital for dealing with iterative processing like calibration
and imaging, where due to RFI or instrument failures, new flagging files need to be applied to secure the proper
outcome.
2. States. The :doc:`capability state system <overview/capability-states>` allows us to define arbitrary "workflows" for
handling capability requests. These workflows are high-level, like "we must QA before ingesting images," and can
include accepting human input and decisions as well as processing.
3. Queues. The capability queue system allows our stakeholders to specify very high-level constraints on processing,
including pausing capability requests altogether, or choosing a number of concurrent requests to permit. This is
useful during move configurations when processing should be paused, and then after move configurations to prevent the
standard calibration capability from swamping the cluster completely.
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
overview/capability-states
overview/capability-versions
overview/restriction-engine
Utilities
---------
We can group the tools into two broad families: tools that are executed remotely, on the cluster; and tools that are
intended to be used by developers and DAs to inspect or affect the system.
Utilities for the Cluster
~~~~~~~~~~~~~~~~~~~~~~~~~
Scientific processing often requires a significant amount of environmental setup prior to being run. However, to access
heterogeneous clusters, it is necessary to assume as little as possible about the environment in which these programs
are executed. Some sort of bridge is necessary that encompasses the environmental setup and can run in different
environments. This led us to a design pattern we call "envoys," which is documented at
:doc:`overview/envoys`. The ones that currently exist are the :doc:`CARTA envoy <tools/carta_envoy>`,
:doc:`CASA envoy <tools/casa_envoy>` and :doc:`ingest envoy <tools/ingest_envoy>`.
Some tools are intended to directly support workflows:
- :doc:`conveyor <tools/conveyor>` moves data in and out of the QA cache area
- :doc:`deliver <tools/deliver>` copies results to the delivery area for the user
- :doc:`ingest <tools/ingest>` moves data from disk to the archive
- :doc:`productfetcher <tools/productfetcher>` retrieves data from the archive
- :doc:`ws_annihilator <tools/ws_annihilator>` is a cleanup program that is run automatically by cron
- :doc:`iiwf_trigger <tools/iiwf_trigger>` is a tool to start an image ingestion, called by the system
Some tools are for testing:
- :doc:`null <tools/null>` supports the "null" testing workflow.
- :doc:`vela <tools/vela>` emulates CASA's behaviors but runs instantly
Utilities for Developers and DAs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- :doc:`wf_inspector <tools/wf_inspector>` makes it easy to get into an executing workflow
- :doc:`ws_metrics <tools/ws_metrics>` is a tool for retrieving Workspaces metrics
- :doc:`mediator <tools/mediator>` allows workspaces requests to be destructively modified
- :doc:`mod_analyst <tools/mod_analyst>` manages the DAs and AODs in the stopgap users table
- :doc:`seci_ingestion_status <tools/seci_ingestion_status>` checks on the ingestion status of a SECI imaging job
.. toctree::
:maxdepth: 2
overview/envoys