Researchers use the cloud to shed light on a long-standing mystery

June 1, 2011

It's been nearly 25 years since the European Muon Collaboration made a startling discovery: only a portion of a proton's spin comes from the quarks that make up the proton.

The revelation was a bit of a shock for physicists who had believed that the spin of a proton could be calculated simply by adding the spin states of the three constituent quarks. This is often described as the "proton spin crisis."

"At that time people realized protons are not just a sum of three quarks stuck together like Lego-blocks," said Jan Balewski, an MIT-based member of the Solenoidal Tracker At RHIC (STAR) experiment. "Protons are dynamic systems of interacting constituent quarks, gluons, and sea quarks."

Gluons are massless spin 1 particles that "glue" the parts of a proton together; in this case, those parts would be two up quarks and one down quark. Sea quarks are quark-antiquark pairs that pop into existence and then annihilate each other almost immediately; their presence can contribute to the proton spin, making them a factor worth taking into consideration.

It has been postulated that the spin of the proton not only included spin from the three quarks from which it is built, but also from sea quarks and gluons. In fact, for a long time, physicists suspected that the remaining spin came from gluons. But as with the quark spin, experiments have shown that gluon spin can only account for a small fraction of the missing proton spin. The remaining proton spin should come from the orbital motion of the quarks, gluons, and sea quarks – and at the moment, the only direct measurements scientists know how to make are of the contribution from the sea quarks

"Since previous experiments could not distinguish between quark and antiquark contributions, part of the RHIC/STAR spin program was set to unravel this puzzle," said Balewski

Unpacking quark spin contributions

The question was: how is it that the quarks spins' contributions to the proton spin is only a small fraction of what was expected? To answer that, we need to learn more about where the quark spin contribution is coming from

The three concurrent Relativistic Heavy Ion Collider experiments are ideally suited to answer that question, Balewski explained. RHIC, situated at Brookhaven National Laboratory, is the only collider in the world that will create polarized proton beams in which the spin state of the majority of the protons will be aligned with direction of the beam. This allows the physicists to study the correlation between the spin orientation of the proton and its constituents.

The STAR collaboration consists of approximately 550 researchers at 55 institutions interested in exploring properties of the proton and also characterizing the quark-gluon plasma produced in collisions of heavier ions.

"I'm exploring with a group of spin-researchers at STAR the properties of W-boson events produced in about 1% of data recorded by the STAR detector from proton-proton collisions during this year's data taking period," Balewski said.

There are two kinds of W bosons. A W- boson is created when an up antiquark and a down quark from two colliding protons interact; conversely when a down antiquark and an up quark interact, a W+ boson occurs. Since the only antiquarks in a proton are sea quarks, and sea quarks always occur in quark-antiquark pairs, analyzing the W boson events can tell researchers how much of a proton's spin comes from up and down sea quarks.

Although there are four other types of sea quarks (strange, charm, top, and bottom) which this measurement doesn't account for, they all occur less frequently than the up and down quarks, with the strange quark being the next most common. As a result, some uncertainty about the composition of the quark spin contribution will remain. Nonetheless, what we do learn from these experiments remains valuable. Spin is central to a variety of scientific concepts and technologies, including the Magnetic Resonance Imaging machines that are used in hospitals around the world.

"The visible matter of the universe consists predominantly of proton-like particles," Balewski said. "If the results of our experiment cause a revision of our understanding of the proton makeup this will impact how we describe visible matter in the universe."

From data to results

With the possibility of such a payoff — not just for the W experiment, but for other STAR experiments as well — it's only natural that STAR researchers are eager to analyze their data and find out what it shows. But after five months of data taking, they typically must wait another ten months to complete detector calibration, reconstruction, and analysis.

That's just one of the reasons why the STAR software team has been eager to explore how cloud computing might enable STAR experiments to elastically vary the computing resources they are using.

"What was more important for STAR was that almost-real time event processing would be achieved and the analysis of the W events provided one opportunity for feedback to the experiment," Balewski said. "We can see certain expected characteristics of measured W events and tell the crew taking STAR data that all detector components work well, or direct their attention to those which need to be fixed."

Unfortunately, real-time processing of all of the STAR data would require continuous access to about 10,000 cores. Given that the entire STAR collaboration shares a cluster of only 2,000 dual-core machines, this simply wouldn't be possible.

To explore the opportunities cloud presents, an MIT-based computing team lead by Balewski adapted the W boson workflow to take advantage of the Magellan cloud.

Magellan consists of two government-funded cloud computing testbeds. One, Magellan at the National Energy Research Scientific Computing Center (NERSC) in Berkeley, California is based on Eucalyptus, a widely used open source cloud platform. The second, based at Argonne National Laboratory near Chicago, Illinois, hosts two clouds; one runs the OpenStack software while the other uses the Nimbus toolkit.

The result of the team's efforts was a real-time cloud-based data processing system that functions as a self-adjusting assembly line and handles variable throughput. No human intervention is needed, and there is no supervisor process that orchestrates the entire data flow. Every stage of the process is governed by local rules designed to handle time-outs and refusals from other elements by waiting a few minutes and then starting over.

Two independent processes on the compute cluster at Brookhaven check every half hour for new event files, and uses Globus Online to transfer those they find to the scratch disk reserved at NERSC. Every two hours, a third independent process takes a snapshot of the calibration data stored at Brookhaven, which changes much less rapidly than event data taking.

At NERSC, 20 eight-core virtual machines (VM) are running the STAR analysis software. Once per 24 hours, at a fixed time chosen at random, a cron job running on each VM pulls the most recent calibration snapshot from the cluster back in Brookhaven. Then the local copy of the calibration data on each VM is replaced; since each VM initiates this process at a different time, this ensures that the VMs always have a fairly recent copy.

Meanwhile, each VM can run eight jobs at a time to occupy all its cores. When a VM "notices" that it is running less than eight jobs, it requests a new raw event file from the scratch disk. (This request specifies the last valid timestamp for which the VM has calibration data; the response will search the scratch disk for a file that meets that criteria).

"The main challenge was to preserve independent, unsupervised raw file reconstruction on different VMs without processing the same file multiple times," Balewski said.

They did this by using atomic rename operation, which renames a selected raw event file and passes the new name to the VM that requested a new file. If multiple VMs try to access the same file at the same time, only one of the atomic rename processes will succeed. The remaining VMs will continue to request files periodically until their request succeeds; the result is that eventually, either all of the VMs will be analyzing data on all cores or the pool of events on the scratch disk will be empty.

The analyzed events are sent back to Brookhaven via Globus Online, where they are archived and available for researchers to access

The result

Over the last two months, the team has expanded this system. Today, they run a coherent cluster of over 100 VMs from three Magellan resource pools – Eucalyptus at NERSC, Nimbus at ANL, and OpenStack at ANL. The total number of cores has exceeded 800, and they expect to cross the threshold of 1000 parallel jobs soon.

If everything goes according to the new timeline, the W boson results should be ready to present at conferences six months earlier than in previous years. But, as noted earlier, the benefits go much further than that.

"The immediate access to reconstructed data has a significant psychological aspect," Balewski said. "We can discuss how many Ws we measured last week, check if they look the same as those measured two weeks ago, and conclude that the detector is stable. We can also work immediately on improving and fine-tuning the W-finding algorithm and clean up the results while data are being taken. This accelerates analysis by many months."

Said Balewski, "Everybody wants his results to be shown at conferences as soon as possible. Using cloud computing provides new means to accomplish it."