Meeting Notes 02-27-2023

Meeting Notes from 02-27-2023

Minutes of SBI-FAIR February 27 2023, Meeting

**Present: **Geoffrey Fox, Piotr Luszczek, Gregor von Laszewski, Kamil Iskra, Xiaodong Yu, Baixi Sun. Vikram Jadhao,

We discussed modifying our simple summary describing the status and plans for the project to add a discussion of the timeline. Virginia did theirs as an example on slide 2.

Indiana

Vikram discussed recent activity, responding to referee comments on their recent paper.

Virginia

Geoffrey noted two new surrogates: A diffusion surrogate https://arxiv.org/abs/2302.03786 with James Glazier and J. Quetzalcoatl Toledo-Marin; a computational fluid dynamics surrogate https://code.ornl.gov/whb/osmi-bench from Oak Ridge

Geoffrey described issues arising from the diffusion surrogate above. We are trying to understand how deep learning can work for problems with a large range of input or output values. Examples could be covid, flu counts, images with a wide range of illumination, finding surrogate solutions where function values often range over several orders of magnitude, and one is interested in both large and small values. This range of values is seen over spatial values (images) or time values (time series)

However, this doesn’t seem to work properly in deep learning, where the activation value is 1. The weights cannot adjust to different sizes of input values, so one cannot see the nonlinearity of activation in values over the full range. Naively the DL will choose weights, so activation nonlinearity only really impacts a portion of the value range. One can think of many approaches

a) replace value by value**n for n < 1 including log value

b) scale activation value by an average value (found from a coarser scale if labeled by space as in an image)

c) Mixture of experts with different values of activation for each expert such as 0.001 0.01 0.1 1

Tennessee

Piotr reported that the SABATH project had a new student and was ramping up.

Argonne

Baixi discussed second-order optimization using Kronecker-factored Approximate Curvature K-FAC, which significantly outperforms standard Stochastic Gradient Descent. This is coupled with compression to reduce communication costs.

Last modified January 26, 2024: add notes (fa4a2ea)