DMTN-202: Use cases and science requirements on a user batch facility

Note

This technote is not yet published.

The long-under-defined user batch facility has been examined by the DM-SST and a proposed set of use cases and quasi-requirements on the facility has been produced.

This document is not binding on the construction or operations project until further consideration - it is intended as a more concrete starting point for design efforts and commitments.

What we colloquially call “User Batch” is a shorthand for a set of user-facing computational capabilities called for in the SRD, LSR, OSS, and DMSR using very generic language.

The relevant existing requirements are summarized in the Confluence docuent “Level 3 Definition and Traceability” (note that that page also covers Level 3 / “User Generated” data products).

This note proposes that we recognize that “User Batch” should cover the following capabilities:

The user computing capability should allow running in bulk over catalog data.
The user computing capability should allow running in bulk over image data.
The system capacity is defined as an “amount of computing capacity equivalent to at least userComputingFraction (10%) of the total LSST data processing capacity (computing and storage) for the purpose of scientific analysis of LSST data and the production of Level 3 Data Products by external users”.
We have to provide a software framework to facilitate both catalog- and image-based user computation, which has to support systematic runs over collections of data and has to preserve provenance.
The framework(s) has/have to support re-running standard computations from the pipelines in addition to running more free-form user jobs.
There has to be a resource allocation mechanism to allow users to be given quotas, which can be modified per-user.