-
Charlotte Hausman authoredCharlotte Hausman authored
Capability Schema
The capability database schema resembles :doc:`the workflow database schema <workflow-schema>`. What is different is that, as the overview describes, the capability system layers on versions and states, so we have the corresponding increase in complexity. This results in the following schema:

Let's break this down into parts.
Core tables
The core works like the workflow schema, and consists of the tables capabilities
, capability_requests
and capability_templates
.
The capability
table
Once again, we have the primary key capability_name
.
The capability queue system is mostly implemented on this table, using the following columns:
-
max_jobs
defines the maximum number of concurrent workflow requests this capability is allowed to run. (Note that any number of capability requests can exist in various states for this capability, the only thing that is rate-limited is the executing of workflows -
enabled
controls whether new requests can be made or not -
paused
controls whether the queue is being run or not. Again, note that this only prevents workflows from being executed; other actions can always run
The :doc:`restriction engine <restriction-engine>` uses some fields to make decisions about whether versions can be made,
namely requires_qa
and single_version_only
.
The field has_image_products
controls whether we provide a CARTA link.
And finally, the field start_state
tells the :doc:`capability state system <capability-states>` where to begin
a new request.
The capability_requests
table
We have the request ID in the capability_request_id
field, and the capability_name
refers back to the
capabilities
table, to specify which capability this is a request for. The created_at
and updated_at
fields
are timestamps that are automatically updated.
The fields ingested
and sealed
are booleans that indicate some supplemental facts about the capability request.
Sealed requests cannot have new versions or executions made. Ingested requests have their results in the archive.
Finally, we have the temporary fields stage_1_reviewer
and stage_2_reviewer
, which are used by the QA system.
Wondering where the JSON argument lives? It's in the capability_versions
table.
The capability_templates
table
This works exactly like the same functionality in the workflows system. The difference is that this is here to enable editing of the file before sending the request. The only existent examples are PPR.xml (pipeline processing request) files, which occasionally need to be edited before running, or modified between versions, which would not be possible if the generated files were kept at the workflow level.
Version-related tables
Unlike the workflow system, capabilities can be refined and rerun until the requester obtains the desired results. Capabilities were designed to enable a Platonic world-view in which the ideal products are out there, somewhere, and so if multiple executions of the same request may be needed to obtain them, so be it.
This is why we have not just a capability request, but also a capability version. All of the user-modifyable information is kept at the version level. So when a request is made, we immediately generate a request version 1 with the supplied JSON argument.
Because hardware problems can happen, we not only have versions but also executions. The execution represents a single attempt to run a workflow. If it should fail, we can create another one, but this functionality doesn't exist yet, so at the moment, we always have a list of a single execution under a version.
The capability_versions
table
Versions are uniquely identified by the tuple (capability_request_id
, version_number
), where
version_number
starts at 1 and increases with each new version.
We have the parameters
from the initial request here on version 1. These are immutable once the version exists, but
you can modify the current version's parameters in creating the next version's parameters.
We then have the workflow_metadata
field, which is a place for storing whatever the last workflow wanted to
communicate back to us.
The state
field is one of Complete
, Created
, Running
, Cancelled
, Failed
or Error
, with the
obvious meaning.
And once again we have sealed
, which controls whether new versions can be made.
Finally, the internal_notes
column is for storing notes from the DAs about this version.
The capability_version_files
table
Yet another "files" table, with the same warning to not store anything very large.
The capability_executions
table
Since we need a 1:1 relationship between executions and workflow runs, this table exists and keeps a copy of some of the important attributes of the workflow request. Also, delivery is tracked here.
The state
field hooks up to the :doc:`capability state system <capability-states>`.
Here we have a current_workflow_request_id
that contains the request ID for the corresponding request in the
workflow service.
queue_state
is one of NotQueued
, Queued
, or Running
, with the expected meaning.
We also have some delivery settings here: delivery_url
and delivery_path
, one of which will be filled out if the
execution has been delivered to users.
State-related tables
The state system exists to implement a transition table defining the state machine (or "workflow" if the term weren't so overloaded in this system) that a capability offers. This is what enables us to have some capabilities that require QA, or a double QA regime with data analysts and astronomers-on-duty, or simply have a one-shot workflow. Any directed graph topology is possible. More details can be found in the :doc:`capability state machine documentation <capability-states>`.
The capability_states
table
This is just the name of the state and the capability it pertains to, but it is a foreign key for other tables.
The capability_state_transitions
table
The idea behind our version of the Mealy machine is that we have a set of events flowing through the system. When we are
in an anticipated from_state
and catch an event matching pattern
, we transition to to_state
. But first we run
the actions in the next table, capability_state_actions
.
The capability_state_actions
table
There is a 1:N relationship between transitions and actions; that is, we can run more than one action on a single transition between two states.
Each row here represents one of those actions.
Each action is simply a reference to the action type, which is a Python class, and then an argument.
The argument is a string which is passed to the instance of the action prior to execution. For more details, see the :doc:`capability state machine documentation <capability-states>`.
The capability_state_machines
table
Because the state machines get complex quickly, we have established certain patterns for their construction. These patterns are called state machines and exist in the codebase. This database table contains the information necessary to generate the rows in the other tables using this code and a few parameters.
capability_name
is the capability for whom this generates the other rows.
machine_type
is a string that identifies the template in the code.
associated_workflows
indicates the workflow to be inserted into the template.