Meeting Notes 07-31-2023

Meeting Notes from 07-31-2023

Minutes of SBI-FAIR July 31 2023, Meeting

Present: Geoffrey Fox, Gregor von Laszewski, Przemek Porebski, Kamil Iskra, Xiaodong Yu, Baixi Sun. Piotr Luszczek, Shantenu Jha

Apologies: Vikram Jadhao,

Virginia

  • Geoffrey presented the Virginia Update https://docs.google.com/presentation/d/132erkV49Lgd0ZFx-AtNWJPRwTrxc480m-rU6jmvMmYA/edit?usp=sharing, which also included Indiana (see below)
  • Good progress with Argonne Surrogates
    • We have added PtychoNN to SABATH, and we have run AutoPhaseNN on Rivanna
  • We reviewed other surrogates from Virginia including OSMIBench and a new Calorimeter simulation
  • We are working well with Tennessee on SABATH
  • Gregor finished with a little study on use of Rivanna – the Virginia Supercomputer

Indiana

Argonne

  • Argonne’s funds have essentially finished
  • Xiaodong Yu is moving to Stevens
  • New compression study comparing methods that are error bounded or nott – their performance differs by a factor of 4-6
  • Baixi gave an update presentation SSO: A Highly Scalable Second-order Optimization Framework ffor Deep Neural Networks via Communication Reduction
  • Quantized Stochastic Gradient Descent QSGD Non error bounded
  • Model accuracy versus compression tradeoff
  • Unable to utilize error-feedback due to GPU memory being filled by large models and large batch size.
  • Looked at different rounding methods
    • Stochastic rounding preserves direction better as not so many zeros
  • Revised our I/O paper i.e., SOLAR based on the reviews, submitting to ppopp’24 with new experiments and better writeup

Rutgers

  • The surrogate survey paper is making good progress with DeepDriveMD other motifs.
  • Andre Merzy is working on associated Miniapps (surrogates)
  • Will work with MLCommons in October

Tennessee

  • Piotr presented his groups work https://drive.google.com/file/d/1ep9zxdv25I3MJmPt5YcJi32SHu5BAF4J/view?usp=sharing
  • MiniWeatherML running with MPI and with or without CUDA.
    • No external dataset is required
  • SABATH making good progress in collaboration with Virginia
  • They are working on Cosmoflow
  • Piotr noted that those sites that are continuing with the project will need to submit a project report very soon. Geoffrey shared his project report to allow a common story
Last modified January 26, 2024: add notes (fa4a2ea)