Skip to content
Snippets Groups Projects

NRAO Archive and Pipeline Processing Interface

Detailed documentation is available on our Confluence page.

Overview

The NRAO archive system allows people to download and do some reprocessing of radio astronomical observations made using NRAO-affiliated instruments: The Very Large Array (VLA), the Very Long Baseline Array (VLBA), the Atacama Large Millimeter Array (ALMA) as well as the the Green Bank Telescope (GBT).

Components

The archive system is a constellation of subsystems each performing a critical task.

  • amygdala is the core of the nervous system for messaging and making decisions about those messages
  • archiveIface is the web-based user interface for starting downloads and reprocessing
  • the data-fetcher is responsible for retrieving archive files from NGAS
  • deployment is our system for putting this system online
  • logback-utils and logback-servlet-utils are provide logging services
  • mail is a templated mailing system
  • messaging provides messaging services the other components rely on
  • Model provides Java models for the entities in the system
  • archive-solr provides Solr indexing services for fast lookups
  • NGRH-ALMA-10_8 is the request handler and provides users with insight into what step their download or reprocessing request is on
  • opencadc and tap-server provide Virtual Observatory services
  • pipeline-manifest-lib and ppr-schema generate and parse reprocessing requests and their results
  • schema is the database schema used by the archive system
  • pyat is the Python interface to the archive as well as the ingestion system
  • workflow-all provides cluster-based workflows for downloads, imaging and calibration

How are requests processed?

To give a quick view of how the system works, let's walk through a single request.

  1. The user arrives at the archiveIface at archive-new.nrao.edu wanting to fetch some data

  2. The user searches for a particular observation, such as 13B-014.

    Behind the scenes, the archiveIface makes a request to a Solr index, built by archive-solr, to find observations for 13B-014, which it then presents to the user.

  3. The user selects a data set and chooses download and reprocessing options provided by archiveIface and clicks either Download or Reprocess.

  4. archiveIface sends the request to NGRH-ALMA-10_8 (the request handler).

  5. NGRH-ALMA-10_8 sends a workflow-start message to workflow-all.

  6. workflow-all runs a sequence of workflow steps:

    1. ppr-schema is used to generate pipeline processing request (PPR) for the user's request
    2. data-fetcher is used in the cluster to obtain the user's data
    3. Other workflow tasks and jobs are used to run CASA and obtain the results
    4. Finally, a message is sent back to *NGRH-ALMA-10_8 with the results
  7. NGRH-ALMA-10_8 shows the user their download is complete and how to obtain the files.

All of the work is coordinated using AMQP messaging (via messaging) and the database (defined by schema).

More details can be had by looking at our Confluence documentation.