The CASA Version Matrix is intended to provide SSA's data processing infrastructure with dynamic support of CASA versions: past,
present, and future. As CASA has evolved and grown, the ways in which it runs processing have changed. In order to
provide support for user driven processing, in which a user may choose to use any CASA version available, it has become
necessary to define a system that can validate, verify, and adapt to changes between various CASA versions and our own
infrastructure.
A user is allowed to select their desired CASA version for most types of processing surfaced through the Archive. At
this time, we support the following types of user driven processing:
Archive
* Basic Restores
* CMS Restores
* ALMA User Driven Imaging (AUDI) **
** AUDI currently only allows use of the specified Default version
What is CASA?
------------
Common Astronomy Software Applications (CASA) is a radio astronomy data processing package built and released by NRAO
and our associated partner institutions.
It is used for all standard, and user driven, data processing for observation data coming out of the EVLA and ALMA
telescopes. CASA is the primary processing package supported by the NRAO
Archive at this time and all our data processing workflows rely on it.
It is important to note that there are actually two different types of CASA releases we need to pay attention to. The
CASA release by itself, and the CASA release with a packaged pipeline. We are only concerned with the pipeline releases,
as they are what we use from within workflows. Please be aware that when we refer to a 'CASA version' within the Matrix,
we are referring to the **CASA version+pipeline** rather than the standalone CASA release.
More information on CASA can be found at the official website here: `<https://casa.nrao.edu>`_
Interacting Systems
-------------------
There are two major systems that will interact with the CASA Version Matrix at this time.
NRAO Archive
^^^^^^^^^^^^
The archive allows a number of types of user driven processing, such as restoring calibrated data and reprocessing
images with certain inputs. Users are allowed to select a CASA version from the archive dropdowns in such cases. The
Matrix should provide the OPV as the recommended version to the user unless the version is no longer valid, in which
case it should recommend the default for the desired type of processing.
Workspaces Capability Service
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
While the CASA Version Matrix is served from within the WS Capability Service docker container, it is treated as a
separate system, as with the Archive Service which handles automatic calibration submission based on ingestion
completion.
For requests originating from within the Workspaces system, the Capability Service will interact with the Matrix in two
ways:
1. Validating requested CASA Versions as with the Archive above
2. Dynamically determining and rendering standard CASA procedure recipies for PPR.xml's
For requests originating from the Archive, only number 2 will be relevant as number 1 was handled prior to request
submission.
What are the Matrix's responsibilities?
---------------------------------------
As the CASA Version Matrix is to be considered the one source of truth for what versions of CASA are valid for processing,
the Matrix has a number of responsibilities. These include, but are not necessarily limited to, tracking of what
versions+pipelines are capable of running on the current NRAO clusters, tracking of what versions+pipelines support a
given processing capability, tracking and access of processing recipies for PPR generation, and maintenance of SSA's
CASA processing links.
Maintenance of SSA's CASA Links
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order for us to proved access to, and processing of, ALMA data in a timely fashion, we run all ALMA processing workflows
on the NRAO Charlottesville cluster, where the North American ALMA Science Center (NAASC) is located and our copy of ALMA's
rawdata is easily accessible. Otherwise, we'd have to steam the data to DSOC, and that slows everything down.
The standard NRAO CASA installs at each site (DSOC vs NAASC) are not kept in locations with the same
access paths. As SSA does not, and has no desire to, control the install locations of CASA, we maintain our own area in
each site's processing center with symlinks to the CASA installations which are relevant to our systems.
This area is */home/ssa/casa/* and we sync the links form DSOC to NAASC regularly to provide a standard access point
across NRAO sites.
Under the current Archive system, these links are created and maintained by a cronjob running a bash script which was
last updated in 2020. The Matrix is designed to subsume this script, replacing it with a service endpoint, and possibly
replace the cron job in the future.
Determination of Valid CASA Versions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
In order to provide our users with the most accurate data products possible, we need to make every effort to
ensure their processing requests succeed. As data processing results depend on the outcome of CASA, we must verify the
CASA version used to be as correct as possible for their requests.
What makes a CASA version valid?
________________________________
In order for a CASA version to be valid for processing the following conditions must be met:
1. it *must* be capable of running on the current cluster's operating system.
- All processing currently occurs within one of NRAO's processing clusters. In the future we plan to enable fully remote processing, but at this time we are limited by the conditions of our clusters' resources. The current clusters are running RHEL8; we have had issues running 6.1.x versions on this OS. These versions are considered invalid for processing.
2. It *must* be installed and linked to the SSA home area.
- Even if the requested version can run on the current cluster, if it's not installed in the expected area and linked to SSA home, it doesn't exist as far as processing is concerned.
3. It *must* be able to run the requested type of processing.
- As time passes, CASA surfaces new processing abilities. Even if a CASA version meets the other validity criteria, if it predates the start of the requested type, it's useless, as the needed recipie isn't available. For example: EVLA Standard Imaging only began support around CASA 5.4.x. Any version predating that is invalid for Standard Imaging.
Tracking of Original Processing Versions (OPV)
______________________________________________
All types of processing currently serviced from the Archive represent a reprocessing of existing, ingested, data products
(i.e. calibrations or images). The primary case we're concerned with at this time is Restores.
Ideally, we should be reproducing these products using the *same* CASA version the product was originally produced by.
Until now we have not been retaining the CASA version used upon ingestion, despite already having an existing database field for calibrations.
With the implementation of the Matrix, we will also begin ingestion of the CASA version and pipeline used to produce
the product in question, as this is essential information for Matrix functionality. We will also be back-populating
existing calibration products to fill in their missing version and pipeline information.
What if a OPV version isn't valid?
__________________________________
The first this the matrix will do is determine if the requested product has OPV information available in the database.
If the product has version information, the Matrix will then verify if the OPV still meets the validity criteria
described above. Does it run on the current cluster? Is it installed in SSA home? Is it valid for the requested type of reprossesing?
That last question might seem a bit strange. Clearly the product was produced using this version in the first place, so
shouldn't that mean the version *must* be valid for reprocessing of this product? Simply, no. Just because the OPV
was used to produce the product, does not guarantee it can successfully reproduce it.
There are known CASA versions which might have been used to produce calibrations which passed QA and were ingested,
which were only later determined to have processing errors, rendering the version unsuitable. The Data Analysts maintain
a list of usable CASA versions and known issues, found here: `Pipeline Version History <https://science.nrao.edu/facilities/vla/data-processing/pipeline/pipeline-version-history>`_
In the case where the OPV is *not* valid, **the Matrix shall fallback to the CAPO defined default for the requested
processing type**.
If the CAPO define default is *also* not valid, **the Matrix shall fallback to the newest available version**.
Concisely, the version precedence chain is as follows:
1. The Original Processing Version shall be preferred above all else
2. If the OPV is not valid, use the CAPO defined default
3. If the default is not valid, use the most current installed version
System specified Defaults
_________________________
There are a number of system capability defaults specified by CAPO properties for ease and consistency with current systems.
It is possible it set processing specific defaults for Basic Restores, CMS Restores, AUDI, VUDI, and Standard
Calibration/Imaging (they use the same default).
All CASA version defaults use the *edu.nrao.archive.workflow.config.CasaVersions* prefix and start with *homeFor*.
Dynamic PPR Generation
^^^^^^^^^^^^^^^^^^^^^^
As the Matrix is responsible for handling all things related to CASA versions used by our processing infrastructure, it
shall also handle the determination, location, and provision of CASA processing recipies.
Each CASA pipeline ships with a set of standard recipie templates for various types of processing. These can change
between releases, therefore in order to provide support for multiple versions we must be able to pull the needed recipie
from the requested version as needed.
After verifying the requested version's validity, the Matrix shall determine if a recipie exists for the requested type
of processing. If a recipie should exist, the Matrix shall then attempt to locate the needed file in the CASA version
on disk in the cluster and return it to the calling service (most likely the WS Capabiltiy Service) as text formatted for
rendering in a mustache template. If the expected recipie file cannot be found, the Matrix shall return an empty object
and inform the calling system to use the hardcoded default template (this should be current with the most recent CASA version).