Meeting Notes 07-26-2021
Minutes of Meeting July 26, 2021
Shantenu led a discussion of surrogates noting his work was delayed by a loss of a postdoc. Shantenu divided Surrogates into 3 areas
- **MLinHPC **as in Climate in Oxford paper [2001.08055] Building high accuracy emulators for scientific simulations with deep neural architecture search giving speedups over a billion but there are some curious features of this work. Directly replace full simulation but also can calculate potentials as in DeepMD https://arxiv.org/pdf/2005.00223.pdf where 80-90% of computational cost is potential computation.
- Docking with Austin Clyde and Argonne group (including Shantenu and Rick) [2106.07036] Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening
- Factor of 10 no loss of accuracy
- NVIDIA helped on performance
- Docking with Austin Clyde and Argonne group (including Shantenu and Rick) [2106.07036] Protein-Ligand Docking Surrogate Models: A SARS-CoV-2 Benchmark for Deep Learning Accelerated Virtual Screening
- MLaboutHPC Here ML guides the simulation such as in choosing ensemble and his DeepDriveMD Westpahl algorithm shows a Factor of 100 compared to Anton
- Shantenu used a VAE but list of 7 methods on slide 8
- MLoutHPC where Shantenu gave one example where one optimizes campaign across scales using Reinforcement learning with Austin Clyde model at top
Shantenu presented PY2 and PY3 plans
In PY2 primary goals are:
- (mini-)Review of surrogates in HPC – Volunteers? See later
- Formalizing Performance measures (MLinHPC)
- Three scenarios discussed above: Climate, Docking, Potentials
- Experimenting with Performance (MLoutHPC)
- Use DeepDriveMD to support different surrogates (Table 1) for common physical model (system)
In PY3
- tackle (more) complex problem of MLoutHPC
AlphaFold2 (Google) and RoseTTaFold (Baker at Washington) DeepMind’s AI for protein structure is coming to the masses news BOTH released
CASP said protein folding solved from AlphaFold2 but RosettaFold is cheaper and as good as AlphaFold2. This could be an opportunity
Beckman noted we see a science transformation using FAIR Methodology.
Rick Stevens has challenged “How much did Go AI cost”
Dataset size is a serious issue.
- deepmind/alphafold: Open source code for AlphaFold. notes The total download size for the full databases is around 415 GB and the total size when unzipped is 2.2 TB. Please make sure you have a large enough hard drive space, bandwidth and time to download. We recommend using an SSD for better genetic search performance.
- Hurricane simulation will become inference
- Doe strategy train leave data where it is similar to medical federated learning
- Vikram noted that material science led to smaller datasets as just output final results and not the full trajectory
We discussed having a session at The Argonne Training Program on Extreme-Scale Computing (ATPESC) in 2022
Next month we will consider Implications for the project. Vikram and Shantenu volunteered