Meeting Notes 04-25-2022

Meeting Notes from 04-25-2022

Minutes of SBI-FAIR April 25, 2022, Meeting

Present: Kamil Iskra, Deborah Penchoff, Vikram Jadhao, Shantenu Jha, Geoffrey Fox, Xiaodong Yu, Piotr Luszczek, Cade Brown, Baixi Sun, Jack Dongarra

Updates

Virginia

  • Discussed continued work on diffusion surrogate with Glazier and Javier Toledo (Edmonton)
  • Discussed Fusion surrogate benchmark from Lawrence Livermore

Tennessee

  • Cade Brown presented an update
  • Discussed Sentinel 3 benchmark based on UK Cloudmask from MLCommons
  • Then discussed FAIR Benchmark platform SLIP which is has been extended to become SABATH
  • Described report structure
    • Model format - how universal is this
  • Has done UK cloudmask and looked at TEvol (2 MLCommons benchmarks)
  • Deal with Jupyter notebooks with nbconvert
  • Add callbacks to model.fit
  • How to do FAIR
  • Use Json
  • Relation to SciML-Bench GitHub - stfc-sciml/sciml-bench: SciML Benchmarking Suite for AI for Science and MLCube from MLCommons

Rutgers

**Indiana **

Argonne

  • Baixi presentation
  • Described distributed training shuffling problem as a graph
  • Cost of training has large data loading time
  • Studied increasing standard deviation/mean by redistribution over nodes
  • Address Imbalance data loading by moving computetasks to other nodes
  • Note large compute variance over GPUs even if batch size fixed, which seems surprising – why are some GPUs slow?
Last modified January 26, 2024: add notes (fa4a2ea)