przykład wdrożenia

Improving fidelity scaling to extract insightful scientific knowledge

Argonne researchers use HPCWorks to smooth out quantum molecular dynamics simulations

Argonne National Laboratory

Argonne National Laboratory is home to groundbreaking discoveries and transformative technologies across many science domains. Its research spans from physics and chemistry to nuclear energy and transportation.

https://www.anl.gov/

Centrala:: Lemont, Illinois, United States

Produkty:: HPCWorks PBS Professional

Udostępnij

About the customer

Argonne National Laboratory (Argonne) is a United States (U.S.) Department of Energy (DOE) multidisciplinary science and engineering research center, where talented researchers work together to confront the biggest scientific questions facing humanity. Argonne’s computing resources, including the powerful Polaris system and Aurora exascale computer located at the Argonne Leadership Computing Facility (ALCF), leverage several innovations to support cutting-edge machine learning (ML) and data science workloads alongside traditional modeling and simulation capabilities. This drives progress in a variety of projects, from cosmology to drug discovery and beyond.

Their challenge

Materials science and engineering are getting a boost from quantum methods, as neural-network quantum molecular dynamics (NNQMD) simulations, based on machine learning, are revolutionizing atomistic materials simulations. A state-of-the-art (SOTA) model, called, “Allegro” (a musical term meaning fast and lively), demonstrates increased accuracy and speed – but it struggles with fidelity scaling on massively parallel supercomputers, a big problem in the era of exascale. A research group of experts from the University of Southern California, Sony Group Corporation and Argonne were determined to find a way to scale successfully.

argonne-national-library-87206-feature(1)-640x360

Our solution

To address the fidelity scaling challenge, the researchers used ALCF’s powerful computing resources, including the 34-petaflop Polaris supercomputer, to solve the problem by combining the Allegro model with sharpness-aware minimization (SAM) to increase its smoothness and robustness. Polaris is graphics processing unit (GPU)-equipped, and its workloads are orchestrated using HPCWorks™ PBS Professional™ software, which is part of the Siemens Xcelerator business platform of software, hardware and services. HPCWorks PBS Professional is a fast, powerful workload manager designed to improve productivity, optimize utilization and efficiency and simplify resource administration for even the biggest high-performance computing (HPC) workloads, including demanding materials modeling.

The resulting “Allegro-Legato” model – legato means smooth – increases time to failure without compromising speed or accuracy. Allegro-Legato is much less dependent on problem size than the SOTA Allegro model, allowing larger and longer NNQMD simulations. In addition, Allegro-Legato retains Allegro’s SOTA accuracy and computational speed while enabling large spatiotemporal NNQMD simulations on leadership scale computers like Aurora. An example is the study of the vibrational properties of ammonia, which has a higher energy density than liquid hydrogen but can be stored at a far less energy-intensive temperature (-33° C versus -253° C). To develop new technologies based on ammonia, it is essential to understand and model its complex physical and chemical interactions.

argonne-national-library-87206-feature(2)-640x360

Left: While the Allegro model becomes unstable and eventually fails, Allegro-Legato maintains a nearly constant number of outliers, remaining stable.
Right: Time to failure lengthens by a factor of almost two with the smoother Allegro-Legato model.

Results

The delayed time to failure of the Allegro-Legato model has dramatically improved fidelity scaling, making it easier to extract insightful scientific knowledge from largescale simulations on leadership scale computers. Accurate models are vital for predicting the material’s thermodynamic behavior and how they can be used in energy, biological and pharmaceutical systems. The researchers’ scalable parallel implementation of Allegro-Legato with excellent computational scaling and GPU acceleration on Polaris combines accuracy, speed, robustness and scalability, allowing practical large spatiotemporal scale NNQMD simulations for challenging applications on exascale computing platforms like Aurora.

“HPCWorks PBS Professional is a fast, powerful workload manager designed to improve productivity, optimize utilization and efficiency and simplify resource administration for even the biggest high-performance computing (HPC) workloads, including demanding materials modeling.