Overview ============ Workspaces is composed of two major services and a suite of utilities. - The **Capability Service**, which provides stateful, concurrency-limited, high-level processing. - The **Workflow Service**, which abstracts the actual running of jobs in the cluster Apart from these two, there is also the **notification service**, a shared service for sending notifications. .. image:: overview/images/overall.png :width: 567px :height: 404px :alt: diagram showing users and DAs interacting with workspaces UI The workflow service eventually runs executables in the cluster environment. Some of these executables are externally-provided, such as CASA. In many cases, external tools have complex environmental requirements. When this occurs, we have provided wrappers generally called "envoys" that perform the environmental setup and then hand-off to the external tool. Workspaces also has a number of internal tools it uses for fetching data from the archive or delivering data to the user. .. contents:: Table of Contents .. toctree:: :maxdepth: 2 overview/overall-architecture.rst The Workflow Service -------------------- The first and simpler service is the Workflow service, which exists to make it straightforward to launch processing jobs without knowing too much about how this is done. The workflow service provides a number of functionalities that the legacy workflow system did not: - Workflow state is never lost, even if the workflow server goes offline after launching a workflow - Users can attach arbitrary files to a workflow execution - Workflow definitions consist entirely of templates and thus are quick and easy to create from scratch - Workflows that fail due to transient or hardware problems are automatically restarted - Workflows can run in various datacenters without requiring special work by the workflow service .. toctree:: :maxdepth: 2 overview/workflow-schema overview/how-workflows-run-in-the-dsoc-or-naasc overview/workflow-creation The Capability Service ---------------------- The capability service is where most of the interesting complexity comes into play. Like the workflow service, the capability service manages the execution of processes. However, there are some important distinctions: * Workflows are *Aristotelian*; they run once and succeed or fail. Capabilities are *Platonic*; many versions of the capability can be executed until the ideal outcome is obtained * Workflows are run by machines. Capabilities are run by humans. You can think of the capability system as the *control system* for Workspaces. The capability service, at its core, dispatches behaviors based on events it receives from AMQP. In fact, one of the core design goals of the capability system was to be invulnerable to service outages. This works because the capability system is data-driven: the existing state is kept in a database, and pending changes of state are persisted by the AMQP system until the capability service is able to process them. This is what allows us to weather up to (say) 24 hours of downtime of the entire capability and workflow systems. There are basically three characteristics that differentiate capabilities from workflows: 1. Versions. The :doc:`capability version system <overview/capability-versions>` allows users to refine and resubmit requests if the obtained result is unacceptable. This is vital for dealing with iterative processing like calibration and imaging, where due to RFI or instrument failures, new flagging files need to be applied to secure the proper outcome. 2. States. The :doc:`capability state system <overview/capability-states>` allows us to define arbitrary "workflows" for handling capability requests. These workflows are high-level, like "we must QA before ingesting images," and can include accepting human input and decisions as well as processing. 3. Queues. The capability queue system allows our stakeholders to specify very high-level constraints on processing, including pausing capability requests altogether, or choosing a number of concurrent requests to permit. This is useful during move configurations when processing should be paused, and then after move configurations to prevent the standard calibration capability from swamping the cluster completely. .. toctree:: :maxdepth: 2 overview/capability-schema overview/capability-states overview/capability-versions overview/restriction-engine Utilities --------- We can group the tools into two broad families: tools that are executed remotely, on the cluster; and tools that are intended to be used by developers and DAs to inspect or affect the system. Utilities for the Cluster ~~~~~~~~~~~~~~~~~~~~~~~~~ Scientific processing often requires a significant amount of environmental setup prior to being run. However, to access heterogeneous clusters, it is necessary to assume as little as possible about the environment in which these programs are executed. Some sort of bridge is necessary that encompasses the environmental setup and can run in different environments. This led us to a design pattern we call "envoys," which is documented at :doc:`overview/envoys`. The ones that currently exist are the :doc:`CARTA envoy <tools/carta_envoy>`, :doc:`CASA envoy <tools/casa_envoy>` and :doc:`ingest envoy <tools/ingest_envoy>`. Some tools are intended to directly support workflows: - :doc:`conveyor <tools/conveyor>` moves data in and out of the QA cache area - :doc:`deliver <tools/deliver>` copies results to the delivery area for the user - :doc:`ingest <tools/ingest>` moves data from disk to the archive - :doc:`productfetcher <tools/productfetcher>` retrieves data from the archive - :doc:`ws_annihilator <tools/ws_annihilator>` is a cleanup program that is run automatically by cron - :doc:`iiwf_trigger <tools/iiwf_trigger>` is a tool to start an image ingestion, called by the system Some tools are for testing: - :doc:`null <tools/null>` supports the "null" testing workflow. - :doc:`vela <tools/vela>` emulates CASA's behaviors but runs instantly Utilities for Developers and DAs ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - :doc:`wf_inspector <tools/wf_inspector>` makes it easy to get into an executing workflow - :doc:`ws_metrics <tools/ws_metrics>` is a tool for retrieving Workspaces metrics - :doc:`mediator <tools/mediator>` allows workspaces requests to be destructively modified - :doc:`mod_analyst <tools/mod_analyst>` manages the DAs and AODs in the stopgap users table - :doc:`seci_ingestion_status <tools/seci_ingestion_status>` checks on the ingestion status of a SECI imaging job .. toctree:: :maxdepth: 2 overview/envoys