capability-schema.rst



Capability Schema
The capability database schema resembles :doc:`the workflow database schema <workflow-schema>`. What is different
is that, as the overview describes, the capability system layers on versions and states, so we have the corresponding
increase in complexity. This results in the following schema:

Let's break this down into parts.

Contents


Core tables
The capability table
The capability_requests table
The capability_templates table


Version-related tables
The capability_versions table
The capability_version_files table
The capability_executions table


State-related tables
The capability_states table
The capability_state_transitions table
The capability_state_actions table
The capability_state_machines table


Core tables
The core works like the workflow schema, and consists of the tables capabilities, capability_requests and capability_templates.

The capability table
Once again, we have the primary key capability_name.
The capability queue system is mostly implemented on this table, using the following columns:


max_jobs defines the maximum number of concurrent workflow requests this capability is allowed to run. (Note that
any number of capability requests can exist in various states for this capability, the only thing that is rate-limited
is the executing of workflows

enabled controls whether new requests can be made or not

paused controls whether the queue is being run or not. Again, note that this only prevents workflows from being
executed; other actions can always run

The :doc:`restriction engine <restriction-engine>` uses some fields to make decisions about whether versions can be made,
namely requires_qa and single_version_only.
The field has_image_products controls whether we provide a CARTA link.
And finally, the field start_state tells the :doc:`capability state system <capability-states>` where to begin
a new request.

The capability_requests table
We have the request ID in the capability_request_id field, and the capability_name refers back to the
capabilities table, to specify which capability this is a request for. The created_at and updated_at fields
are timestamps that are automatically updated.
The fields ingested and sealed are booleans that indicate some supplemental facts about the capability request.
Sealed requests cannot have new versions or executions made. Ingested requests have their results in the archive.
Finally, we have the temporary fields stage_1_reviewer and stage_2_reviewer, which are used by the QA system.
Wondering where the JSON argument lives? It's in the capability_versions table.

The capability_templates table
This works exactly like the same functionality in the workflows system. The difference is that this is here to enable
editing of the file before sending the request. The only existent examples are PPR.xml (pipeline processing request)
files, which occasionally need to be edited before running, or modified between versions, which would not be possible
if the generated files were kept at the workflow level.

Version-related tables
Unlike the workflow system, capabilities can be refined and rerun until the requester obtains the desired results.
Capabilities were designed to enable a Platonic world-view in which the ideal products are out there, somewhere, and
so if multiple executions of the same request may be needed to obtain them, so be it.
This is why we have not just a capability request, but also a capability version. All of the user-modifyable information
is kept at the version level. So when a request is made, we immediately generate a request version 1 with the supplied
JSON argument.
Because hardware problems can happen, we not only have versions but also executions. The execution represents a single
attempt to run a workflow. If it should fail, we can create another one, but this functionality doesn't exist yet, so at
the moment, we always have a list of a single execution under a version.

The capability_versions table
Versions are uniquely identified by the tuple (capability_request_id, version_number), where
version_number starts at 1 and increases with each new version.
We have the parameters from the initial request here on version 1. These are immutable once the version exists, but
you can modify the current version's parameters in creating the next version's parameters.
We then have the workflow_metadata field, which is a place for storing whatever the last workflow wanted to
communicate back to us.
The state field is one of Complete, Created, Running, Cancelled, Failed or Error, with the
obvious meaning.
And once again we have sealed, which controls whether new versions can be made.
Finally, the internal_notes column is for storing notes from the DAs about this version.

The capability_version_files table
Yet another "files" table, with the same warning to not store anything very large.

The capability_executions table
Since we need a 1:1 relationship between executions and workflow runs, this table exists and keeps a copy of some of the
important attributes of the workflow request. Also, delivery is tracked here.
The state field hooks up to the :doc:`capability state system <capability-states>`.
Here we have a current_workflow_request_id that contains the request ID for the corresponding request in the
workflow service.
queue_state is one of NotQueued, Queued, or Running, with the expected meaning.
We also have some delivery settings here: delivery_url and delivery_path, one of which will be filled out if the
execution has been delivered to users.

State-related tables
The state system exists to implement a transition table defining the state machine (or "workflow" if the term weren't so
overloaded in this system) that a capability offers. This is what enables us to have some capabilities that require QA,
or a double QA regime with data analysts and astronomers-on-duty, or simply have a one-shot workflow. Any directed graph
topology is possible. More details can be found in the :doc:`capability state machine documentation <capability-states>`.

The capability_states table
This is just the name of the state and the capability it pertains to, but it is a foreign key for other tables.

The capability_state_transitions table
The idea behind our version of the Mealy machine is that we have a set of events flowing through the system. When we are
in an anticipated from_state and catch an event matching pattern, we transition to to_state. But first we run
the actions in the next table, capability_state_actions.

The capability_state_actions table
There is a 1:N relationship between transitions and actions; that is, we can run more than one action on a single
transition between two states.
Each row here represents one of those actions.
Each action is simply a reference to the action type, which is a Python class, and then an argument.
The argument is a string which is passed to the instance of the action prior to execution. For more details, see the
:doc:`capability state machine documentation <capability-states>`.

The capability_state_machines table
Because the state machines get complex quickly, we have established certain patterns for their construction. These
patterns are called state machines and exist in the codebase. This database table contains the information necessary to
generate the rows in the other tables using this code and a few parameters.
capability_name is the capability for whom this generates the other rows.
machine_type is a string that identifies the template in the code.
associated_workflows indicates the workflow to be inserted into the template.