Skip to content
Snippets Groups Projects

Delivery: A system for packaging images and data

What is delivery? Delivery is what happens after the the active processing portion of the workflow concludes. It is the step that moves the retrieved or generated products from the processing area to a place where they can be accessed by the requesting user.

Most workflows proceed by retrieving some files from NGAS and running CASA on those files to produce new products. The files are large and CASA is quite heavy, so we retrieve the files into a spool area on the Lustre filesystem and then launch the CASA jobs on the cluster. Once CASA is finished, the files the user wants are still sitting in that spool area on Lustre. Delivery is what gets the files from there to where the user can retrieve them.

Concept

Delivery starts from a directory with some products in it. Delivery then identifies the products in that directory. Using knowledge from Capo about different destinations, delivery copies the data into those destinations, in the correct format for the product type. Delivery also accepts some arguments to filter out products that aren't interesting or to perform simple packaging steps like creating tar archives containing the data.

Usage

usage: deliver [-h] [--prefix PREFIX] [-p | -P] [-l LOCAL_DESTINATION] [-t] [-r] SOURCE_DIRECTORY

positional arguments:
  SOURCE_DIRECTORY      The directory where the products to be delivered are located

optional arguments:
  -h, --help            show this help message and exit
  --prefix PREFIX       Prefix for the destination (a request ID perhaps)
  -p, --use-piperesults
                        Use the CASA piperesults file, if present
  -P, --ignore-piperesults
                        Ignore the CASA piperesults file

Destination options:
  -l LOCAL_DESTINATION, --local-destination LOCAL_DESTINATION
                        Deliver to this local directory instead of the appropriate web root
  -t, --tar             Archive the delivered items as a tar file

Product filtering options:
  -r, --rawdata         Deliver the rawdata instead of the products

The command deliver must be called with a mandatory source directory. This is the location containing the files to be delivered.

If the user has specified the destination, -l <dir> may be specified to tell delivery where to write files. Without this argument, delivery will use a path in Capo, specifically edu.nrao.workspaces.DeliverySettings. downloadDirectory, which is currently set to /lustre/aoc/cluster/pipeline/$PROFILE/downloads, and a download URL will be generated based on the Capo setting edu.nrao.workspaces.DeliverySettings.downloadUrl, which is currently set to https://dl-nrao.aoc.nrao.edu.

If the user has requested a single tar archive, then call delivery with -t to force it to generate a tar archive. The default behavior is to simply copy the files.

Delivery can use CASA's "piperesults" file to discern the location and type of generated products. If you want this behavior, call deliver -p. If you want instead for delivery to ignore it, use deliver -P.

Delivery also supports a --prefix argument, which allows you to generate intermediate directories between the requested or implied delivery root and where the files are ultimately placed.

References

Discussion of the new design can be found in Confluence under Proposed Delivery Redesign and some important supporting documentation around how the directories are built can be found at Delivery Directory Improvements.