Quarterly meeting and newsletter, October 2023

Please join us for the next Mochi quarterly meeting on Thursday, October 26, 2023, at 10am CT. Mochi quarterly meetings are a great opportunity to learn about community activities, share best practices, get help with problems, and find out what’s new in Mochi.

Please suggest agenda items on the Mochi slack space or the [email protected] mailing list.

Microsoft Teams meeting
Join on your computer or mobile app
Click here to join the meeting
Or call in (audio only)
+1 630-556-7958,,254649841#

Mochi updates and agenda items

Upcoming Publications and Presentations
- “Mochi: A Case Study in Translational Computer Science for High-Performance Computing Data Management” (under preparation for an upcoming issue of IEEE Computing in Science and Engineering)
- If you are attending IEEE Cluster 2023 in Santa Fe, please consider stopping by the REX-IO workshop on Tuesday October 31 for the keynote presentation “Anticipating and Adapting to Change in HPC Storage” by Phil Carns.
New Mochi Microservices
- Matthieu Dorier will present an overview of Warabi, a new blob storage microservice. Warabi has similarities to Bake, but has been designed from the ground up with a cleaner, more comprehensive API and seamless integration with the Bedrock ecosystem.
HPE Slingshot Status Update
- Communicating on a Slingshot network requires access to a Virtual Network Interface, or VNI, to authorize communication across processes. You may need to take additional steps to configure the VNI depending on your use case.
  - Communicating across processes that were launched together (e.g. in the same srun or mpiexec invocation):
    - Mercury and thus Mochi will use the same VNI allocated for use by MPI with no additional configuration needed.
    - You may need to use a “–single-node-vni” argument to mpiexec or a “–network=single_node_vni” argument to srun, depending on your platform, to make sure that a VNI is allocated even if the launcher believes that all processes will be executing on the same node.
  - Communicating across independently-launched processes within a job:
    - On the Aurora or Sunspot systems at ANL, no additional configuration is needed.
    - On HPE/SLURM based systems (i.e. Frontier and Perlmutter) additional configuration is needed, because these systems utilize a unique VNI for each job step by default. You can instruct Mercury to instead use a job-level VNI by passing a special value of “0:0” in the “auth_key” field of the Mercury json configuration in Mochi. This feature is already available in mercury@master but will also be available in the next point release. In addition, you must also enable the job-level VNI with the –network=job_vni option to the sbatch command or as a directive at the top of your job script.
  - Communicating across jobs:
    - We are still working with HPE on a general solution to enable communication across jobs.
General platform updates:
- A recipe for building and running Mochi on the ANL Sunspot system can now be found in the Mochi platform-configurations repository.