Skip to content
Snippets Groups Projects
Commit 01e91fac authored by Charlotte Hausman's avatar Charlotte Hausman
Browse files

envoy documentation

parent a7e38dd2
No related branches found
No related tags found
1 merge request!412envoy documentation
Pipeline #2481 passed
......@@ -28,19 +28,19 @@ Setup can be further divided into three essential parts: Auditing, Environment,
There are two types of auditing that occur in CASA Envoy. The first, Directory Auditing, ensures that the required
directory structure of *rawdata*, *working*, and *products* is contained in the processing directory.
The second type is File Auditing. CASA Envoy must be provided to different files at inital call, *metadata.json* and *PPR.xml*. These files must be
audited to ensure that all required fields are present. *PPR.xml* must be submitted directly to CASA so it is important
to make sure that it is sstructured correctly before submission. While *metadata.json* is not submitted to CASA, it does
contain all the information for results delivery post CASA and needs to be audited as well.
The second type is File Auditing. CASA Envoy must be provided two different files at inital call, *metadata.json* and *PPR.xml*. These files must be
audited to ensure that all required fields are present. *PPR.xml* must be submitted directly to CASA, so it is important
to make sure that it is structured correctly before submission. While *metadata.json* is not submitted to CASA, it does
contain all the information for results delivery post-CASA and needs to be audited as well.
A further function of the File Auditing is correcting *PPR.xml* for processing with HTCondor. CASA requires the name of
A further function of File Auditing is correcting *PPR.xml* for processing with HTCondor. CASA requires the name of
the processing directory it is running in - unfortunately, with HTCondor it isn't possible to know that name prior to
submitting a condor job. Therefore, *PPR.xml* is corrected after submission to condor with the correct directory
information. This corrected *PPR.xml* is placed into the *working* directory where it is then used by CASA, and the
unaltered original remains in the parent processing directory.
### Environment
CASA has several environment variable that are required to be set by whatever system it is running on.
CASA has several environment variables that are required to be set by whatever system it is running on.
These include:
| ENV Variables | Description |
......@@ -58,9 +58,8 @@ that all input data is in the correct location pre-processing for CASA to find w
## Launch
Since the only difference between calibration and imaging processing is the contents of *PPR.xml*, CASA Envoy contains
a single CASA Launcher class which is utilized by the two type launcher classes: Calibration Launcher and Imaging
a single CASA Launcher class which is utilized by the two typed launcher classes: Calibration Launcher and Imaging
Launcher. Each type launcher handles both standard and restore or integration types of processing. The two type classes ensure that setup is correct for each product type, as described above, and then calls
the CASA Launcher.
<br/>
<br/>
Post CASA processing, the casa log is checked for error flags and CASA envoy exits.
# Ingestion Envoy
# Ingest Envoy: The Workspaces NGAS & Metadata Ingestion System
Ingest Envoy is responsible for setup and launch of all types of file ingestion for the Workspaces System.
Currently, this includes standard calibration and standard image ingestion.
```
usage: ingest_envoy [-h] [--calibration CALIBRATION] [--image IMAGE]
Workspaces Ingestion System
optional arguments:
-h, --help show this help message and exit
--calibration CALIBRATION
run ingestion for a calibration product
--image IMAGE run ingestion for an image product
```
Ingest Envoy makes use of the existing *ingest* functionality of the AAT-PPI which simply takes an
*ingestion manifest* as input. While this is consistent regaurdless of ingestion type, the manifest itself,
as well as the ingestion staging requirements differ between the types of files to be ingested. For this reason,
Ingest Envoy's functionality can be broken into two underlying parts: Setup and Launch.
## Setup
Setup can be further divided into two essential components: Product Staging and Manifest Generation.
### Product Staging
Product Staging is the step that collects all ingestable products and places them in the workspaces staging area located at:
``` /lustre/aoc/cluster/pipeline/<capo-profile>/workspaces/staging ```
This collection is most often performed via a shell script such as the ```calibration-table-collector.sh```
for calibration ingestion or the ```image-product-collector.sh``` for image ingestion.
In the case of calibration ingestion, the collection script creates a tar file containing all the calibration tables
and then creates a new weblog tar file to ensure that only the most recent version is ingested with the tables.
Both tar files are then placed in the staging area.
In the case of image ingestion, the collection script copies the image fits files to the staging area along with a new
weblog tar file as with calibration ingestion, but it also creates a *pipeline artifacts* tar file which contains other
files produced during a CASA imaging run, such as *casa_pipescript.py* and the CASA produced PPR file
*unknown.hifv_contimage.pprequest.xml*. There is also an extra metadata file, *aux_image_metadata.json*, required for image ingestion which must
be transferred to the staging area.
Once product staging is done, the envoy is ready to produce the ingestion manifest file.
### Manifest Generation
The Ingestion Manifest is essentially the master instruction list for an ingestion request. It contains the names,
locations, and types of all products to be ingested into NGAS and the NRAO metadata database for retrieval via
the new NRAO Archive.
There are three main sections to an ingestion manifest: Parameters, Input Group, and Output Group.
The Parameters section sets parameters such as ingestion path, telescope, and if there is an additional metadata file.
The Input Group section defines the input group association for the files to be ingested. This section contains the
input science product that was used to create the file to be ingested. For calibrations, this should be an execution
block locator, and for images, this should be the calibration locator.
The Output Group section defines all files related to the main science product being ingested. This section contains
the type and file name of the main science product and the type and file name of any associated ancillary products.
An ancillary product is anything the is related to the main science product that is worth ingesting, such as weblogs
and the pipeline and ingestion artifacts tar files.
After the manifest is properly generated, the manifest and any additional metadata files are tarred up into the
*ingestion artifacts* tar file and both the manifest and the new artifact tar file are placed in the staging area.
## Launch
Calibration and Image ingestion are initiated in exactly the same way - by providing the staging area directory name to
*ingest*. Because of this, Ingest Envoy contains a single function for launching the *ingest* pex which is called by
all typed ingestion launchers.
Ingest Envoy has two types of Launcher classes: IngestCalibrationLauncher and IngestImageLauncher. Each typed launcher
handles the type specific setup, as described above, and then call the shared ingestion function. Upon *ingest*'s
completion, Ingest Envoy checks the return code and logs either a successful or failed ingestion and exits.
This is a Python port of the ingestion manifest builder in archive-metaproject.
See https://open-confluence.nrao.edu/display/SSA/Ingestion+Manifests.
......@@ -30,7 +30,7 @@ def _get_settings(filename: str, arg_type: str) -> dict:
def arg_parser() -> argparse.ArgumentParser:
parser = argparse.ArgumentParser(
description="Workspaces Ingestion handler", formatter_class=argparse.RawTextHelpFormatter
description="Workspaces Ingestion System", formatter_class=argparse.RawTextHelpFormatter
)
parser.add_argument(
"--calibration",
......@@ -44,7 +44,7 @@ def arg_parser() -> argparse.ArgumentParser:
nargs=1,
action="store",
required=False,
help="run ingestion for an image product (anticipated functionality)",
help="run ingestion for an image product",
)
return parser
......
# Envoy Architecture
## What is an Envoy?
An Envoy is defined by the Merriam-Webster dictionary as "a person delegated to represent one government in its
dealings with another".
In the context of the Workspaces System, an Envoy is a python executable program that is delegated the task of
interacting with a system external to Workspaces from within a workflow. This interaction includes making sure all
required inputs are present and correct as well and launching the external system in question.
## When should I build a new Envoy?
If you are needing to interact with a system external to workspaces, i.e. a system which is *not built and maintained
within the workspaces project*, you should create an envoy. Examples of external systems Workspaces
interacts with are CASA or the *ingest* system.
## How do I build a new Envoy?
An Envoy is essentially a bridge between Workspaces and the system of interest. As such, a Workspaces Envoy has two
essential functions: Setup and Launch.
### Basic Envoy Architecture
An envoy is expected to be extensible if required, i.e. it should be able to launch multiple types of calls to the
system of interest. As such, envoys should typically have the following structure:
- a main entry point which determines which type of call to make based on supplied parser input, this file is typically
named after the envoy (unless it's the main file for CASA Envoy which is aptly named "palaver")
- A number of typed launcher classes which handle the type specific setup (typically broken up into other classes)
and then make the call system to execute. These classes are typically contained in a file called *launchers.py* but can
also rely on any number of other classes for setup functionality.
Example: CASA Envoy has two main classes *palaver.py*, the entry point, and *launchers.py*, which contains the
CalibraitonLauncher and ImageLauncher classes. However, the package also contains the setup helper modules *auditor.py*
and *foundation.py*, which audit the input files for required fields and correct for HTCondor submission, and ensure
all data is in the required locations for CASA execution respectively.
### What should the setup phase cover?
This is the most complex part of any envoy and typically encapsulates multiple stages.
Because the external system already exists, we cannot dictate the inputs provided for execution nor the
environment needed for it to run successfully. Therefore, we need to gather all required information and ensure it's in place
before making the call to run the system.
Steps that might occur in the setup phase include:
- setting environment variables required by the system being launched
- generating required input files, such as tar files
- ensuring file placement in specific locations
- ensuring required files contain all required information
- ensuring required files are corrected for HTCondor processing if necessary
### What should the launch phase cover?
Once setup is complete all requirements to call the system should be satisfied. The *only* thing that should happen in
the launch phase is the execution call to the system of interest.
Currently, launching an external system has been found to have little variation when there are multiple types of action
that the system is being asked to perform. Example, CASA calibration and CASA imaging have the same inputs - PPR.xml - and
therefore only one way of launching CASA has been needed. However, it is possible that a system might need to be
launched with different inputs which might require different launchers.
### What happens after the external system exits?
This step can be viewed as optional as not all envoys might need it. Sometimes there are post execution steps that
should be completed within the envoy before it exits as they might impact the completion status.
Example: CASA likes to pretend everything is gloriously fine when it exits and doesn't throw error codes. Therefore, the
envoy is required to check the casa log after CASA exits to make sure it really really really did complete successfully.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment