Meeting Notes 07-31-2023

Meeting Notes from 07-31-2023

Minutes of SBI-FAIR July 31 2023, Meeting

Present: Geoffrey Fox, Gregor von Laszewski, Przemek Porebski, Kamil Iskra, Xiaodong Yu, Baixi Sun. Piotr Luszczek, Shantenu Jha

Apologies: Vikram Jadhao,

Virginia

Geoffrey presented the Virginia Update https://docs.google.com/presentation/d/132erkV49Lgd0ZFx-AtNWJPRwTrxc480m-rU6jmvMmYA/edit?usp=sharing, which also included Indiana (see below)
Good progress with Argonne Surrogates
- We have added PtychoNN to SABATH, and we have run AutoPhaseNN on Rivanna
We reviewed other surrogates from Virginia including OSMIBench and a new Calorimeter simulation
We are working well with Tennessee on SABATH
Gregor finished with a little study on use of Rivanna – the Virginia Supercomputer

Indiana

Vikram is still traveling in India and was not able to join today’s meeting. He shared by email the following updates.These are included in Virginia Presentation
the nanoconfinement surrogate repository is updated with the latest results from the sample size study published in JCTC, Probing Accuracy-Speedup Tradeoff in Machine Learning Surrogates for Molecular Dynamics Simulations | Journal of Chemical Theory and Computation
- https://github.com/softmaterialslab/nanoconfinement-md/tree/master/python/surrogate_samplesize
- Coordinate with Fanbo Sun who is leading the development of this surrogate and conducted the sample size study that I have shared in our meetings.
Working on preparing the dataset for the follow-up study to JCTC
literature review: if folks are interested, the special issue on machine learning for molecular simulation in JCTC has many interesting papers (including surrogates): Journal of Chemical Theory and Computation | Vol 19, No 14

Argonne

Argonne’s funds have essentially finished
Xiaodong Yu is moving to Stevens
New compression study comparing methods that are error bounded or nott – their performance differs by a factor of 4-6
Baixi gave an update presentation SSO: A Highly Scalable Second-order Optimization Framework ffor Deep Neural Networks via Communication Reduction
Quantized Stochastic Gradient Descent QSGD Non error bounded
Model accuracy versus compression tradeoff
Unable to utilize error-feedback due to GPU memory being filled by large models and large batch size.
Looked at different rounding methods
- Stochastic rounding preserves direction better as not so many zeros
Revised our I/O paper i.e., SOLAR based on the reviews, submitting to ppopp’24 with new experiments and better writeup

Rutgers

The surrogate survey paper is making good progress with DeepDriveMD other motifs.
Andre Merzy is working on associated Miniapps (surrogates)
Will work with MLCommons in October

Tennessee

Piotr presented his groups work https://drive.google.com/file/d/1ep9zxdv25I3MJmPt5YcJi32SHu5BAF4J/view?usp=sharing
MiniWeatherML running with MPI and with or without CUDA.
- No external dataset is required
SABATH making good progress in collaboration with Virginia
They are working on Cosmoflow
Piotr noted that those sites that are continuing with the project will need to submit a project report very soon. Geoffrey shared his project report to allow a common story