Abstract
Computational Science is being revolutionized by the integration of AI
and simulation and in particular, by deep learning surrogates that can
replace all or part or of traditional large-scale HPC computations.
Surrogates can achieve remarkable performance improvements (e.g.,
several orders of magnitude) and so save in both time and energy. The
Surrogate Benchmark Initiative (SBI) project will create a community
repository and FAIR data ecosystem for HPC application surrogate
benchmarks, including data, code, and all relevant collateral
artifacts the science and engineering community needs to use and reuse
these data sets and surrogates. We intend that our repositories will
generate active research from both the participants in our project and
the broad community of AI and domain scientists.
By collaborating with the major industry organization in this area --
MLPerf -- and mirroring their process as much as possible, we will
both increase the value of and obtain industry involvement in the SBI
benchmarks. MLPerf is a major effort with over 1400 members from over
80 institutional members (mainly from industry) and strong existing
involvement of the Department of Energy laboratories through the HPC
and science data MLPerf working groups. We will build tutorials around
each deposited benchmark which will allow users from a broad range of
fields (shown in collaboration letters) to make new surrogates and new
SBI benchmarks based on the initial set of four that we will produce
in house. We will set up a set of working groups and other community
activities that will advance all issues around the surrogate concept.
In particular, we will take the requirements exhibited in benchmarks
and produce general middleware to support the generation (training
from HPC simulations) and the use of surrogates. This also will make
it easier for general users to develop new surrogates and so make the
major performance increases pervasive across DoE computational
science. Here we see SBI benefitting application communities and
computer systems research.
SBI will also support several AI research areas which we will advance
in our project. Our benchmarks will drive research on efficient
generic surrogate architectures and how they fit with different
hardware systems. Another specific activity will be research on the
uncertainty quantification of the surrogate estimates and we expect
future surrogates will always come with this built-in. Thirdly there
will be important studies of the amount of training data needed to get
reliable surrogates for a given accuracy choice. Finally, we have
already derived some simple but effective performance models or
surrogates but these need extension as deeper uses of surrogates
become understood and exhibited in our repository depositions.