Meeting Notes 06-26-2023

Meeting Notes from 06-26-2023

Minutes of SBI-FAIR June 26, 2023, Meeting

**Present: **Geoffrey Fox, Gregor von Laszewski, Przemek Porebski, Kamil Iskra, Xiaodong Yu, Baixi Sun. Vikram Jadhao, Piotr Luszczek, Shantenu Jha, Margaret Lentz

Virginia

  • This was presented by Geoffrey
  • He described work on new surrogates, including LHC Calorimeter, Epidemiology, Extended virtual tissue, and Earthquake
  • He described work on the repository and SABATH
  • This involved two existing AI models CloudMask and OSMIBench
  • Shantenu Jha asked about the number of inferences per second.
    • From MLCommons Science Working minutes, we find for OSMIBench
    • On Summit, with 6 GPUs per node, one uses 6 instances of TensorFlow server per node. One uses batch sizes like 250K with a goal of a billion inferences per second

Argonne

  • Continue to work on Second-order Optimization Framework for Deep Neural Networks with Communication Reduction
  • Baixi Sun presented the details
  • He introduced quantization to lower precision QSGD which gives encouraging results, although In one case quantization method failed in the eigenvalue stage
  • We removed Rick Stevens from the Google Group
  • Geoffrey mentioned his ongoing work on improving shuffling using Arrow vector format; he will share the paper when ready

Indiana

Rutgers

  • Shantenu presented
  • Nice paper on surrogate classes with Wes Brewer, who works with Geoffrey on OSMIBench
  • Mini-apps for each of the 6 motifs that need FAIR metadata
    • 5 motifs use surrogates; one generates them
  • He described an interesting workshop on molecular simulations
  • He noted that Aurora training trillion parameter foundation model for science
  • LLMs need 10 power 8 exaflops: Need to optimize!
  • Vikram noted SIMULATION INTELLIGENCE: TOWARDS A NEW GENERATION OF SCIENTIFIC METHODS

Tennessee

  • Piotr presented slides
  • CosmoFlow on 8 GPUs is running well
  • He introduced the MiniWeatherML mini-app
    • CUDA-aware pointers must be explicitly specified in the FAIR schema
    • Test in PETSc leaves threaded MPI in an invalid state
    • Alternative MPIX query interface varies between MPI implementations
    • GPU Direct copy support is optional
  • SABATH system is moving ahead with a focus on adding MPI support
  • Piotr is now the PI of this project at UTK. We removed Cade Brown, Jack Dongarra, and Deborah Penchof from the Google Group
Last modified January 26, 2024: add notes (fa4a2ea)