Quarterly Newsletter, July 2021

Publication news:

  • Pierre Matri and Robert Ross. “Neon: Low-Latency Streaming Pipelines for HPC”, to appear in IEEE Cloud 2021, Sept 5-10 2021.
    • Introduces a new Mochi service for stream processing
  • Stay tuned for more Mochi-related publications at SC21 in November. More details will be posted once the SC21 technical program is announced.

Recent development updates:

  • A proof-of-concept of UCX support in Mercury is available in the master-ucx version of Mercury in the Mochi Spack repository
    • Please contact us if you are interested in this capability; it is under active development and should be considered experimental at this time.
  • The git origin/main branch of Margo includes new safety checks to ensure compatible Argobots runtime parameters if Argobots is initialized outside of Margo. This will be available in an upcoming release after coordinating updates to other Mochi packages.
  • Both Mochi and Margo have new Contributor License Agreement (CLA) documents available online as of July 2021 with more relaxed language than the previous version. We will soon streamline these even further with online electronic forms that will be activated within the GitHub contribution process.

Debugging tips:

  • We have encountered several bug reports on Libfabric 1.13.0 in the last few days, especially with the RXM provider. Debugging is in progress, but in the mean time you may want to consider reverting to an earlier release if you encounter communication problems.
  • Recent libfabric releases also include a new PSM3 provider. PSM3 is not directly supported by Mercury / Mochi, but enabling it in libfabric may interfere with the performance of the traditional PSM2 provider. The libfabric package in the Mochi spack repository disables PSM3 by default for now to avoid this problem.

Quarterly newsletter, April 2021

New presentation materials:

GitHub migration complete:

New software releases:

  • Argobots 1.1
    • Underlying user-level threading package for Mochi
    • includes performance improvements, broader platform support, and new profiling and debugging capabilities (more on that later)
  • Mercury 2.0.1rc3
    • Underlying RPC communication package for Mochi
    • improved logging and several performance optimizations
    • final 2.0.1 release coming soon
  • Mochi-sdskv 0.1.12
    • Key/Value store microservice
    • Bedrock support
    • various packaging (cmake, pkgconfig, and dependency) improvements
  • Bedrock 0.2.1
    • Flexible service composition tool
    • various packaging (cmake, pkgconfig) improvements
  • Sonata 0.6.2
    • Document store microservice
    • various packaging (cmake) improvements

Performance regressions from previous quarterly newsletter resolved:

  • Power9 CPU mutex locking performance regression is resolved in Argobots 1.1
  • OmniPath network performance regression is resolved in Mercury 2.0.1rc3

New debugging/profiling/maintenance features:

  • Margo is now using munit for unit testing
    • Available in origin/main (or mochi-margo@main in Spack)
    • Coverage is limited for now but will be expanded over time
    • We will also be leveraging this frame work in additional components over time
  • Recent Argobots updates include multiple (optional) stack guard methods
    • See Argobots documentation or Spack package variants. Notable optoins:
      • “mprotect”: real time detection of stack overruns (with some performance overhead; just use this for debugging)
      • “canary”: lightweight deferred stack overrun detection (lighter weight, but will not report that a stack overflow occurred until shutdown)
  • margo_state_dump() function
    • Available in origin/main (or mochi-margo@main in Spack)
    • function that can be called at any time to dump point-in-time state to a text file or stdout for debugging purposes
    • includes Margo json configuration, Argobots configuration, current Argobots ES layout, Argobots performance profile, in flight RPC counts, stack dump for blocked user-level threads, etc. See https://github.com/mochi-hpc/mochi-margo/blob/main/doc/debugging.md for details.

The Mochi Github migration is complete

All Mochi source code repositories have been migrated to github.com at https://github.com/mochi-hpc/ as of March 22, 2021.

If you are already using spack to install Mochi components, please update your Mochi repository at your earliest convenience:

spack repo rm mochi
git clone https://github.com/mochi-hpc/mochi-spack-packages.git
spack repo add mochi-spack-packages

The package names have not changed; this will just enable you to retrieve new versions as they are released by updating your cloned copy of the mochi-spack-packages repo.

Mochi BoF at the ECP Annual Meeting

The Mochi team will be hosting a BoF entitled “Using Mochi to build data services: Overview and Updates” at the (virtual) ECP Annual Meeting on Tue. Apr 13, 2021 at 2:30 PM.

If you are an ECP project member attending the meeting, you can find more information about the BoF in the meeting agenda.

We will be providing an overview of Mochi, highlighting new capabilities, and offering sign-ups for one-on-one sessions for anyone who would like more detailed information or help.

Quarterly newsletter, January 2021

Platform notes:

Logistics:

  • All of the ANL-hosted git repositories (https://xgitlab.cels.anl.gov/sds/) will be moving within the next few months. We will communicate when that happens.
    • There are no changes to policy or access (in fact, access will likely change for the better); it’s just that the xgitlab.cels instance that we are using is being decommissioned
  • We are working on landing a Spack PR (https://github.com/spack/spack/pull/20273) that will introduce a “mochi-margo” package, maintained by us, to replace the out-of-date “margo” package
    • Once this is done, we will likely start upstreaming more packages that depend on margo-mochi

Mochi service development news:

  • Work continues on a new component called “Bedrock” that can be used to more easily bootstrap microservice compositions (https://xgitlab.cels.anl.gov/sds/bedrock).
    • Bedrock is already available, and we are in the process of updating existing services to use it.
    • You can think of bedrock as a general-purpose Mochi daemon that takes a JSON configuration file describing how to spin up embedded microservices
  • We are actively working on performance tuning of “Benvolio”, which you can think of as a runtime I/O delegation service (i.e. that provides a more generic version of MPI-IO aggregation capabilities). https://xgitlab.cels.anl.gov/sds/benvolio

Upcoming training events:

  • We plan to host a BoF session at this year’s (virtual) ECP annual meeting
    in mid-April, with a mechanism for people to sign up for one on one
    sessions for more detailed interaction. (https://ecpannualmeeting.com/)
  • Please let us know what other kinds of outreach/training you are interested in this year.

Mochi at SC 2020

Highlighted events for the Mochi project at this year’s SC conference included the following:

  • Pascal Grosset, Jesus Pulido, and James Ahrens. 2020. “Personalized In Situ Steering for Analysis and Visualization,” in ISAV’20 In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV’20). Association for Computing Machinery, New York, NY, USA, 1–6. DOI:https://doi.org/10.1145/3426462.3426463
  • Christopher Kelly, Sungsoo Ha, Kevin Huck, Hubertus Van Dam, Line Pouchard, Gyorgy Matyasfalvi, Li Tang, Nicholas D’Imperio, Wei Xu, Shinjae Yoo, and Kerstin Kleese Van Dam. 2020. “Chimbuko: A Workflow-Level Scalable Performance Trace Analysis Tool,” in ISAV’20 In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization (ISAV’20). Association for Computing Machinery, New York, NY, USA, 15–19. DOI:https://doi.org/10.1145/3426462.3426465
  • P. Carns, K. Harms, B. Settlemyer, B. Atkinson and R. Ross, “Keeping It Real: Why HPC Data Services Don’t Achieve I/O Microbenchmark Performance,” in 2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW), GA, USA, 2020 pp. 1-6. doi: 10.1109/PDSW51947.2020.00006