Software

Some software that we developed

1: cloudmesh
2: sabath

A list of software we use to make things easiers

1 - cloudmesh

cloudmesh is a flexible framework to develop cloud and HPC programs using python. It is based on a number of plugins.

Overview

Cloudmesh allows the creation of an extensible commandline and commandshell tool based internally on a number of python APIs that can be loaded conveniently through plugins.

Plugins useful for this effort include

cloudmesh-vpn¹ – a convenient way to configure VPN
cloudmesh-common² – useful common libraries including a StopWatch for benchmarking
cloudmesh-cmd5³ – a plugin manager that allows plugins to be integrated as commandline tool or command shell
cloudmesh-ee⁴ – A pluging to create AI grid searchs using LSF and SLURM jobs
cloudmesh-cc⁵ – A plugin to allow benchmarks to be run in coordination on heterogeneous compute resources and multiple clusters
cloudmesh-apptainer⁶ – mangae apptainers via a Python API

Cloudmesh has over 100 plugins coordinated at http://github.com/cloudmesh

⁷

References

https://github.com/cloudmesh-vpn ↩︎
https://github.com/cloudmesh-common ↩︎
https://github.com/cloudmesh-cmd5 ↩︎
https://github.com/cloudmesh-ee ↩︎
https://github.com/cloudmesh-cc ↩︎
https://github.com/cloudmesh-apptainer ↩︎
Gregor von Laszewski, J. P. Fleischer, and Geoffrey C. Fox. 2022. Hybrid Reusable Computational Analytics Workflow Management with Cloudmesh. https://doi.org/10.48550/ARXIV.2210.16941 ↩︎

2 - sabath

SABATH provides benchmarking infrastructure for evaluating scientific ML/AI models. It contains support for scientific machine learning surrogates from external repositories such as SciML-Bench.

Introduction

SABATH provides benchmarking infrastructure for evaluating scientific ML/AI models. It contains support for scientific machine learning surrogates from external repositories such as SciML-Bench.

The software dependences are explicitly exposed in the surrogate model definition, which allows the use of advanced optimization, communication, and hardware features. For example, distributed, multi-GPU training may be enabled with Horovod. Surrogate models may be implemented using TensorFlow, PyTorch, or MXNET frameworks.

Models

Models are collected so far at

https://github.com/icl-utk-edu/sabath/tree/main/var/sabath/assets/sabath/models

References

https://github.com/icl-utk-edu/sabath ↩︎