-
Andrew Kapuscinski authoredAndrew Kapuscinski authored
Product Fetcher Architecture
The product fetcher is designed to retrieve data products from our archives. As the name suggests, the input to the
product fetcher is a science product locator, a string like
uid://evla/execblock/27561b56-4c6a-4614-bc26-67e436b5e92c
. The science product locator is decoded by a service called
the locator service, which uses the archive's knowledge of different instruments and their storage locations to produce
something called a location report. The location report contains a list of files that are associated to the science
product, and information about where they can be obtained. The job of the product locator is to interpret the report and
retrieve the files from wherever they may be.
The goals for the product locator are:
- Accuracy: retrieving the files correctly, including retrying as necessary and verifying file content
- Speed: retrieving the files as quickly as possible without sacrificing accuracy
Because the work is mostly I/O bound and accesses many servers, the product fetcher depends on a high degree of concurrency to achieve speed.
Map
I divide the fetching process into two stages. In the first stage, we're generating a plan; in the second stage, we're executing the plan. The "meat" of the program and the bulk of the time and effort takes place in the second stage and is built out of the following pieces:
FileFetcher
The core of the program is what happens inside a FileFetcher. A FileFetcher retrieves a single file. There are several different ways files can be stored and there is a FileFetcher for each storage medium and access method. At the moment, this means there are three implementations of FileFetcher:
- NgasStreamingFileFetcher, which does a web request against an NGAS resource and writes the result to disk
- NgasDirectCopyFileFetcher, which asks NGAS to write a resource to a certain path on disk
- OracleXmlFileFetcher, which queries Oracle for a value in a certain row of a certain table and writes the result to a certain path on disk