Meeting Notes 07-31-2023
Meeting Notes from 07-31-2023
Minutes of SBI-FAIR July 31 2023, Meeting
- Monday, August 28, 2023, https://virginia.zoom.us/my/gc.fox
- Monday, September 25, 2023, https://virginia.zoom.us/my/gc.fox
Present: Geoffrey Fox, Gregor von Laszewski, Przemek Porebski, Kamil Iskra, Xiaodong Yu, Baixi Sun. Piotr Luszczek, Shantenu Jha
Apologies: Vikram Jadhao,
Virginia
- Geoffrey presented the Virginia Update https://docs.google.com/presentation/d/132erkV49Lgd0ZFx-AtNWJPRwTrxc480m-rU6jmvMmYA/edit?usp=sharing, which also included Indiana (see below)
- Good progress with Argonne Surrogates
- We have added PtychoNN to SABATH, and we have run AutoPhaseNN on Rivanna
- We reviewed other surrogates from Virginia including OSMIBench and a new Calorimeter simulation
- We are working well with Tennessee on SABATH
- Gregor finished with a little study on use of Rivanna – the Virginia Supercomputer
Indiana
- Vikram is still traveling in India and was not able to join today’s meeting. He shared by email the following updates.These are included in Virginia Presentation
- the nanoconfinement surrogate repository is updated with the latest results from the sample size study published in JCTC, Probing Accuracy-Speedup Tradeoff in Machine Learning Surrogates for Molecular Dynamics Simulations | Journal of Chemical Theory and Computation
- https://github.com/softmaterialslab/nanoconfinement-md/tree/master/python/surrogate_samplesize
- Coordinate with Fanbo Sun who is leading the development of this surrogate and conducted the sample size study that I have shared in our meetings.
- Working on preparing the dataset for the follow-up study to JCTC
- literature review: if folks are interested, the special issue on machine learning for molecular simulation in JCTC has many interesting papers (including surrogates): Journal of Chemical Theory and Computation | Vol 19, No 14
Argonne
- Argonne’s funds have essentially finished
- Xiaodong Yu is moving to Stevens
- New compression study comparing methods that are error bounded or nott – their performance differs by a factor of 4-6
- Baixi gave an update presentation SSO: A Highly Scalable Second-order Optimization Framework ffor Deep Neural Networks via Communication Reduction
- Quantized Stochastic Gradient Descent QSGD Non error bounded
- Model accuracy versus compression tradeoff
- Unable to utilize error-feedback due to GPU memory being filled by large models and large batch size.
- Looked at different rounding methods
- Stochastic rounding preserves direction better as not so many zeros
- Revised our I/O paper i.e., SOLAR based on the reviews, submitting to ppopp’24 with new experiments and better writeup
Rutgers
- The surrogate survey paper is making good progress with DeepDriveMD other motifs.
- Andre Merzy is working on associated Miniapps (surrogates)
- Will work with MLCommons in October
Tennessee
- Piotr presented his groups work https://drive.google.com/file/d/1ep9zxdv25I3MJmPt5YcJi32SHu5BAF4J/view?usp=sharing
- MiniWeatherML running with MPI and with or without CUDA.
- No external dataset is required
- SABATH making good progress in collaboration with Virginia
- They are working on Cosmoflow
- Piotr noted that those sites that are continuing with the project will need to submit a project report very soon. Geoffrey shared his project report to allow a common story
Last modified January 26, 2024: add notes (fa4a2ea)