Delivery: A system for packaging images and data
What is delivery? Delivery is what happens after the the active processing portion of the workflow concludes. It is the step that moves the retrieved or generated products from the processing area to a place where they can be accessed by the requesting user.
Most workflows proceed by retrieving some files from NGAS and running CASA on those files to produce new products. The files are large and CASA is quite heavy, so we retrieve the files into a spool area on the Lustre filesystem and then launch the CASA jobs on the cluster. Once CASA is finished, the files the user wants are still sitting in that spool area on Lustre. Delivery is what gets the files from there to where the user can retrieve them.
Concept
Delivery starts from a directory with some products in it. Delivery then identifies the products in that directory. Using knowledge from Capo about different destinations, delivery copies the data into those destinations, in the correct format for the product type. Delivery also accepts some arguments to filter out products that aren't interesting or to perform simple packaging steps like creating tar archives containing the data.
Usage
usage: deliver [-h] [--prefix PREFIX] [-p | -P] [-l LOCAL_DESTINATION] [-t] [-r] SOURCE_DIRECTORY
positional arguments:
SOURCE_DIRECTORY The directory where the products to be delivered are located
optional arguments:
-h, --help show this help message and exit
--prefix PREFIX Prefix for the destination (a request ID perhaps)
-p, --use-piperesults
Use the CASA piperesults file, if present
-P, --ignore-piperesults
Ignore the CASA piperesults file
Destination options:
-l LOCAL_DESTINATION, --local-destination LOCAL_DESTINATION
Deliver to this local directory instead of the appropriate web root
-t, --tar Archive the delivered items as a tar file
Product filtering options:
-r, --rawdata Deliver the rawdata instead of the products
The command deliver
must be called with a mandatory source directory. This is the location containing the files to
be delivered.
If the user has specified the destination, -l <dir>
may be specified to tell delivery where to write files.
Without this argument, delivery will use a path in Capo, specifically edu.nrao.workspaces.DeliverySettings. downloadDirectory
, which is currently set to /lustre/aoc/cluster/pipeline/$PROFILE/downloads
, and a download URL
will be generated based on the Capo setting edu.nrao.workspaces.DeliverySettings.downloadUrl
, which is currently
set to https://dl-nrao.aoc.nrao.edu
.
If the user has requested a single tar archive, then call delivery with -t
to force it to generate a tar archive.
The default behavior is to simply copy the files.
Delivery can use CASA's "piperesults" file to discern the location and type of generated products. If you want this
behavior, call deliver -p
. If you want instead for delivery to ignore it, use deliver -P
.
Delivery also supports a --prefix
argument, which allows you to generate intermediate directories between the
requested or implied delivery root and where the files are ultimately placed.
References
Discussion of the new design can be found in Confluence under Proposed Delivery Redesign and some important supporting documentation around how the directories are built can be found at Delivery Directory Improvements.