But blind searches are a lot more complicated, like hunting in a haystack without knowing what you are looking for.
To find what conventional computer algorithms and scientists may overlook in the huge volume of data collected in particle collider experiments, the particle physics community is turning to machine learning, an application of artificial intelligence that can teach itself to improve its searching skills as it sifts through a haystack of data.
In a machine learning challenge dubbed the 2020 Large Hadron Collider (LHC) Olympics, a team of cosmologists from the U.S. Department of Energy's Lawrence Berkeley National Laboratory developed a code that best identified a mock signal hidden in simulated particle-collision data.
Cosmologists? That's right.
"It was totally unexpected for us to perform so well", stated George Stein, a Berkeley Lab and UC Berkeley postdoctoral researcher who participated in the challenge with Uros Seljak, a Berkeley Lab cosmologist, UC Berkeley professor, and co-director of the Berkeley Center for Cosmological Physics, of which George Stein is a member.
Ten teams, composed mostly of particle physicists, competed in the competition, which ran from November 19, 2019, to January 12, 2020.
George Stein led the adaptation of a code that two other student researchers had developed under Uros Seljak's direction. The competition was launched by the organizers of the Machine Learning for Jets 2020 (ML4Jets2020) conference. Jets are narrow cones of particles produced in particle-collision experiments that particle physicists can trace back to measure the properties of their particle sources.
The competition results were announced during the conference, which was held at New York University January 15-17.
Ben Nachman, a Berkeley Lab postdoctoral researcher who is part of a group that works on ATLAS - a large detector at CERN's LHC - served as one of the event and contest organizers. David Shih, a physics and astronomy professor at Rutgers University now on a sabbatical at Berkeley Lab, and Gregor Kasieczka, a professor at the University of Hamburg in Germany, were co-organizers.
While some computing competitions allow participants to submit and test their codes multiple times to gauge whether they are getting closer to the correct results, the 2020 LHC Olympics competition gave teams just one shot to submit a solution.
"The cool thing is that we didn't use an off-the-shelf tool", Uros Seljak stated. "We used a tool that we had developed for our research."
He noted: "In my group we had been working on unsupervised machine learning. The idea is that you want to describe data where the data have no labels."
The tool that the team used is called sliced iterative optimal transport. "It's a form of deep learning, but a form where we do not optimize everything at once", Uros Seljak stated. "Instead, we do it iteratively", in stages.
The code is so efficient that it can run on a simple desktop or laptop computer. It was developed for a statistical approach known as Bayesian evidence.
Uros Seljak stated: "Suppose you are looking at anomalies in a planet's transit time", the time it takes for the planet to pass in front of a larger object from your viewpoint - like watching from Earth as Mercury moves in front of the sun.
"One solution requires that there be an extra planet", he stated, "and the other solution requires an extra moon, and they are both a good fit to the data, but they have very different parameters. How do I compare these two solutions?"
The Bayesian approach is to compute the evidence for both solutions and see which solution has a higher probability of being true.
"This kind of example comes up all of the time", Uros Seljak stated, and his team's code is designed to speed up the complex calculations required by conventional methods. "We were trying to improve upon something unrelated to particle physics, and we realized this could be used as a general machine learning tool."
He added: "Our solution is particularly useful for so-called anomaly detection: looking for very tiny signals in data that are somehow different than its other data."
In the 2020 LHC Olympics competition, participants first received a sample set of data that called out particle signal data from some background data - both the needle and the haystack - that allowed participants to test their codes.
Then they received the actual "black box" contest data: just the haystack. They were tasked to find a different and entirely unknown kind of particle signal hidden in the background data, and to specifically describe the signal events that their methods turned up.
Competition co-organizers David Shih and Ben Nachman noted that they had personally been working on an anomaly-detection method that uses a very similar approach - called "conditional density estimation" - to the technique developed by Uros Seljak and George Stein that was entered in the competition.
Uros Seljak and George Stein consulted with a number of particle physicists at the lab, including Ben Nachman, David Shih, and graduate student Patrick McCormack. They discussed, among other topics, how the high-energy physics community typically analyzes datasets like those used in the competition, but for the actual "black box" challenge Uros Seljak and George Stein were on their own.
As the competition was drawing toward a close, George Stein said: "We thought we found something about a week before the deadline."
George Stein and Uros Seljak submitted their results a few days before the conference, "but as we are not particle physicists, we were not planning to participate at the conference", Uros Seljak stated.
Then, George Stein received an e-mail from the conference organizers, who asked him to fly out and present a talk on the team's solution later that week. The organizers didn't share the results of the competition until all of the speakers had presented their results.
"My talk was originally first, and then shortly before the start of the session they moved me to last. I didn't know if that was a good thing", George Stein stated.
The code that the Berkeley Lab team entered picked up about 1000 events, with an error margin of plus or minus 200, and the correct response was 843 events. Their code was the clear winner in that category.
Several teams were close in estimating the energy level, or "resonance mass", of the signal, and the Berkeley Lab team was closest in its estimate of the resonance mass for a secondary signal stemming from the main signal.
At the conference, George Stein noted: "There was a huge interest in the overall approach we took. It made waves."
Oz Amram, another competitor in the contest, quipped in a Twitter post: "The result of the LHC Olympics is that cosmologists are better at our job than we are." But contest organizers did not formally announce a winner.
Ben Nachman, one of the event organizers, stated: "Even though George and Uros clearly outperformed the other competitors, in the end it is likely that no one algorithm will cover every possibility - so we will need a diverse set of approaches to achieve broad sensitivity."
He added: "Particle physics has entered an interesting time where every prediction for new particles we have tested at the Large Hadron Collider has so far turned out to be not realized in nature - except the Standard Model of particle physics. While it is essential to continue the program of model-driven searches, we also have to develop a parallel program to be model-agnostic. That is the motivation for this challenge."
Uros Seljak said that his team is planning to publish a paper that details its machine learning code.
"We are definitely planning to apply this to many astrophysics problems", he stated. "We will look for interesting applications - anything with glitches or transients, anything anomalous. We will work to speed up the code and make it more powerful. These kinds of approaches can really help."