0% found this document useful (0 votes)
14 views8 pages

Isolating Unisolated

This study presents the first analysis of anti-isolated Upsilon decays to two muons using machine learning anomaly detection on 2016 CMS Open Data from the LHC. The authors successfully elevate the signal significance from 1.6σ to 6.4σ by employing advanced ML techniques, demonstrating the potential for discovering real signals in collider data. Additionally, they provide a curated dataset and code to facilitate future research in anomaly detection methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

Isolating Unisolated

This study presents the first analysis of anti-isolated Upsilon decays to two muons using machine learning anomaly detection on 2016 CMS Open Data from the LHC. The authors successfully elevate the signal significance from 1.6σ to 6.4σ by employing advanced ML techniques, demonstrating the potential for discovering real signals in collider data. Additionally, they provide a curated dataset and code to facilitate future research in anomaly detection methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

MIT-CTP 5843

Isolating Unisolated Upsilons with Anomaly Detection in CMS Open Data

Rikab Gambhir,1, 2, ∗ Radha Mastandrea,3, 4, † Benjamin Nachman,4, 5, ‡ and Jesse Thaler1, 2, §


1
Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
2
The NSF AI Institute for Artificial Intelligence and Fundamental Interactions
3
Department of Physics, University of California, Berkeley, CA 94720, USA
4
Physics Division, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
5
Berkeley Institute for Data Science, University of California, Berkeley, CA 94720, USA
We present the first study of anti-isolated Upsilon decays to two muons (Υ → µ+ µ− ) in proton-
proton collisions at the Large Hadron Collider. Using a machine learning (ML)-based anomaly
detection strategy, we “rediscover” the Υ in 13 TeV CMS Open Data from 2016, despite overwhelming
anti-isolated backgrounds. We elevate the signal significance to 6.4σ using these methods, starting
from 1.6σ using the dimuon mass spectrum alone. Moreover, we demonstrate improved sensitivity
from using an ML-based estimate of the multi-feature likelihood compared to traditional “cut-and-
arXiv:2502.14036v1 [hep-ph] 19 Feb 2025

count” methods. Our work demonstrates that it is possible and practical to find real signals in
experimental collider data using ML-based anomaly detection, and we distill a readily-accessible
benchmark dataset from the CMS Open Data to facilitate future anomaly detection developments.

Quarkonia fragmentation within jets is an important techniques. At the LHC, CMS and ATLAS have started
test of quantum chromodynamics (QCD), since it probes to apply ML-based anomaly detection to search for new
the transition between perturbative and nonperturbative physics in jet data, though no significant excesses have
scales [1–4]. At the Large Hadron Collider (LHC), mea- emerged [15–19]. Past work on CMS Open Data used
surements of charmonium (e.g. J/ψ) production within ML-based anomaly detection to rediscover the top quark
jets by LHCb [5] and CMS [6] have led to improved frag- in already-known channels [20]. Our study of anti-isolated
mentation models [7] compared to baseline parton shower Υ’s builds on this literature by showing that ML-based
predictions with leading-order fragmentation functions [8]. anomaly detection is capable of finding real signals in real
To date, the only study of bottomonium (e.g. Υ) inside experimental data in regions of phase space not previously
jets is an LHCb thesis that emphasized relatively iso- studied. To facilitate further ML method development
lated Υ’s [9]. To our knowledge, no published study has and complement existing synthetic benchmarks, we pub-
looked for anti-isolated Υ resonances, whose decay prod- lish a curated slice of the CMS Open Data used in this
ucts are not well-separated from other collision activity, study along with the code to reproduce our analysis.
thereby offering a complementary probe of QCD fragmen- Anti-Isolated Upsilon Selection. The Υ’s are a
tation. Unlike anti-isolated J/ψ’s, whose signature in the series of spin-1 bottom-antibottom (bb̄) resonances with
dimuon channel is readily visible over the background, masses mΥ ≥ 2mb ≈ 10 GeV [21]. There are three main
anti-isolated Υ’s are rarer and are more difficult to identify resonances with substantial dimuon branching ratios of
through simple event selections. This motivates the use of ≈ 2%: Υ(1S), Υ(2S), and Υ(3S). (The Υ(4S), well-
high-dimensional auxiliary features to reveal anti-isolated known for its role in B factory physics, has a thousand-
Υ’s in collider data. fold smaller dimuon branching ratio.) Our analysis is
In this paper, we present the first study of anti-isolated focused on resonant anti-isolated
√ Υ → µ+ µ− decays as
Υ → µ+ µ− decays at the LHC. Our analysis is based seen by CMS in 2016 at s = 13 TeV. This data is made
on 13 TeV proton-proton collision data taken in 2016, public through the CERN Open Data Portal [22] as the
as accessed via the CMS Open Data [10]. By applying DoubleMu primary dataset [10], corresponding to 8.7 fb−1
machine learning (ML)-based anomaly detection, we iden- of integrated luminosity.
tify a statistically significant sample of anti-isolated Υ’s, As our primary analysis objects, we select the two
elevating a marginal 1.6σ excess to well over the 5σ dis- highest transverse momentum (pT ) muons that pass the
covery threshold. The anomaly detection method we use, MuonTightID criteria [23]. We also select events that
called CATHODE [11], is capable of detecting resonant pass the HLT_TrkMu15_DoubleTrkMu5NoFiltersNoVtx
features without requiring a simulation of the signal or trigger, which requires at least 2 tracker muons with
background. We further show how to boost the signal sig- pT ’s of at least 15 and 5 GeV, respectively, with no re-
nificance by reweighting the data according to the learned quirement that the muons come from the same primary
multi-feature likelihood, which yields better sensitivity vertex. To avoid trigger turn-on effects, we enforce further
than a simple “cut-and-count” on the anomaly score. pT cuts on the muons of 17 and 8 GeV [24]. We refer to
The discovery potential of ML-based anomaly detection the higher (lower) pT muon as the harder (softer) one.
has been amply demonstrated on synthetic datasets for We then divide the data into two channels: opposite sign
the LHC Olympics [12] and the Dark Machine Anomaly (OS), where the two hardest muons have opposite electric
Score Challenge [13]; see Ref. [14] for a review of related charges (i.e. µ+ µ− ), and same sign (SS), where they have
2

the same charges (i.e. µ+ µ+ or µ− µ− ). The SS sample high-mass sideband (SBH). These regions are separated
will serve as validation samples for Fig. 1 below. by vertical black lines in Figs. 1a and 2a below.
Many searches for dimuon resonances require the muons For an ML-based resonance search (first outlined in [27,
to be isolated from the rest of the collision activity. Since 28]), or extended bump hunt, the strategy is to mask the
we are interesting in finding Υ’s produced through QCD SR and only use the SB information (and the assumption
fragmentation, we instead impose an anti-isolation cri- of smoothness) to construct an estimate of the background
terion. Within a rapidity-azimuth (∆R) distance of 0.4 distribution pBkg (x, m) for a set of auxiliary features x.
around each of the two muons, we require the ratio of With these assumptions, we can compare the observed
non-muon pT to muon pT to be greater than 0.55. This data pData (x, m) to the background-only null hypothesis
threshold was chosen to be as stringent as possible while pBkg (x, m) in the SR to test for possible resonances. We
still retaining enough events to apply ML-based tech- compare our ML-based results to “classical” cuts on single
niques — approximately 10k events are needed to ensure features, for which only Steps 1 and 4a below are needed.
our models are well trained. Unlike Ref. [9], we do not The full analysis procedure is as follows:
require any matching to jets. See Ref. [25] for a previous
1. Choose Signal and Sideband Regions. To de-
CMS Open Data study that considered prompt dimuons
fine the bump hunt window, we choose the SR to
(i.e. coming from the collision vertex), which has a sub-
contain the three dimuon Υ resonances and the
stantial anti-isolated component, and Ref. [26] for another
SBL and SBH to avoid other known dimuon res-
CMS Open Data study using ML to study the relation
onances, corresponding to: SBL = [5.0, 9.0] GeV,
between isolation and promptness.
SR = [9.0, 10.6] GeV, and SBH = [10.6, 16.0] GeV.
Anti-isolated Υ’s can only be seen at 1.6σ using the
dimuon mass spectrum alone (see the right edge of Fig. 2b 2. Interpolate Background into SR. Following
below). By comparison, without any cuts, Υ’s are clearly CATHODE [11], we train an ensemble of Normal-
visible at the 28σ level. Anti-isolated Υ’s are typically izing Flows (NFs) [29–31] on the SBL and SBH data
found within QCD jets and are produced promptly during (with the SR masked out) to learn the conditional
fragmentation. There are two primary backgrounds to distribution pBkg (x|m). To estimate pBkg (m), we fit
anti-isolated Υ’s. The first is uncorrelated hadron decays, a quintic polynomial to a histogram of m in the SBs,
where QCD processes produce hadrons (mostly charged using a relative bin width of 1.5%. This width was
pions) that decay in flight to non-prompt muons; this chosen to be above the 1% dimuon mass resolution
produces approximately equal numbers of SS and OS (verified via J/ψ fits and consistent with similar
dimuons. The second is Drell-Yan production, where vir- dimuon studies [25, 32]). We can then generate in-
tual photons and Z bosons directly yield prompt dimuons terpolated background events in the SR according to
via γ/Z ∗ → µ+ µ− ; this produces only OS dimuons and pBkg (x, m) ≡ pBkg (x|m) pBkg (m) by sampling the
is essentially an irreducible background. By comparing NFs weighted by the polynomial mass fit.
the OS and SS distributions (Figs. 1a and 2a below), we
estimate that these two backgrounds are roughly the same 3. Calculate Likelihood Ratios. Restricting our
size in the anti-isolated OS channel. focus to the SR, we train an ensemble of Boosted De-
To enhance the visibility of anti-isolated Υ’s, we con- cision Trees (BDTs) [33, 34] to distinguish observed
sider three auxiliary features — the dimuon transverse data pData (x) from the interpolated background
momentum pT µ+ µ− and the 3D impact parameters (IP3D) pBkg (x). To do this, BDTs learn optimal sequences
of the harder and softer muons — which we refer to col- of cuts on their inputs. We choose BDTs over other
lectively as x. This feature set was chosen both for its architectures because they are fast to train, and
performance in reconstructing the Υ and because it does they were found to be especially robust to noise in
not produce spurious peaks or sculpting artifacts. Other previous extended bump hunt studies [35, 36]. We
feature sets we considered include various combinations do not provide the BDT with mass information. We
of single- and di-muon kinematic observables, as well as assume the data distribution can be expressed as
∆Rµ+ µ− and ∆pT µ+ µ− . Because of the limited size of the pData (x) = µ pSig (x) + (1 − µ) pBkg (x), where µ =
NSig
sideband training data, using too many features leads to NSig +NBkg is the signal fraction. In the asymptotic
suboptimal CATHODE performance. limit, the classifier score z(x) = pDatap(x)+p
Data (x)
is
Bkg (x)
General Analysis Procedure. The core assumptions monotonically related to the likelihood ratio:
of our analysis is that the signal of interest is positive,
localized in the dimuon mass m ≡ mµ+µ− , and lives z(x) − (1 − µ) (1 − z(x)) pSig (x)
ℓ(x) ≡ = , (1)
atop a smooth background. Even though we expect three µ (1 − z(x)) pBkg (x)
distinct Υ peaks, they count as one resonant feature for a
wide enough signal region (SR). In the sidebands (SBs), which in turn is the most powerful mass-independent
we assume the data is predominantly background, and test statistic [37, 38]. To make the most of the
we distinguish the low-mass sideband (SBL) from the limited SR data, we use 5-fold cross-validation [39]
3

2016 CMS Open Data DoubleMuon 2016 CMS Open Data DoubleMuon
105
Same Sign False Positive Rate

8.7 fb−1 100%, 0.53σ

Υ(1S)

Υ(2S)
Υ(3S)
4
10 √
s = 13 TeV 10.0%, 0.79σ
10−1
Anti-Isolated 1.0%, 1.25σ

Background-Only p-Value
3
10 0.1%, 1.16σ

102
Events

10−2

101

HLT_TrkMu15_DoubleTrkMu5NoFiltersNoVt
Random Cut
100 Same Sign Muons Dimuon pT
10−3
Bin width = 1.5% Softer Mu IP3D
Fit Type: Quintic Harder Mu IP3D
10−1 Muon Iso 04 ≥ 0.55
√ CATHODE
8.7 fb−1 , s = 13 TeV `-Reweighting
10−2 10−4
6 8 10 12 14 16 10−3 10−2 10−1 100
Dimuon Mass mµµ [GeV] False Positive Rate
(a) (b)

FIG. 1. Results of the SS validation study. (a) Dimuon mass distributions after a series of cuts on the BDT classifier for
different FPR working points. For each FPR, a quintic polynomial is fit to the two SB regions and interpolated into the SR,
delineated by the vertical black lines. (b) The p-value (significance) as a function of FPR for CATHODE (blue), cuts on the
dimuon pT (red) and the muon 3D impact parameters (yellow and green), and a random cut as a baseline (black).

to ensure that the classifiers are never trained on a ative, corresponding to the most background-like
subset of data they will evaluate. events. We impose a mild classifier cut to remove
these negative weight events (equivalently, set nega-
4. Estimate Significances. We have two different tive weights to zero). We then fit the mass spectrum
methods to estimate the signal significance: of the surviving events in the SB to estimate the
background-only p-value in the SR — this is just
a. Cut and Count. Here, we cut on a feature of
like the fits of Step 4a, but now with weighted
interest, count the number of events that pass the
events. The Poisson binned likelihoods of Ref. [40]
cut, and compare to the background estimate. We
are replaced with Scaled Poisson Distributions [44]
place a cut either on the classifier score for the ML-
to account for the event weights.
based approach or on individual features for the
classical comparisons. Working points are defined
The likelihood reweighting in Step 4b is expected to be
by the corresponding False-Positive Rate (FPR) in
more powerful than the cut-and-count analysis in Step
the SBs. To estimate the number of background
4a [41]. Step 4b, however, can only be performed for the
events in the SR after the cut, we fit the SB mass
ML-based method, since obtaining an estimate for the
spectrum using another quintic fit as above. Given
likelihood ratio requires an estimate of pBkg (x, m), which
the number of observed events in the SR after the
is only accessible via the NF. If the classifier-learned ℓ
cut, we estimate the one-sided p-value to reject the
is a poor estimate of the likelihood ratio, this method is
background-only hypothesis following Ref. [40].
still valid, albeit with degraded statistical power.
b. Reweight by the Likelihood. Instead of a single To mitigate issues related to ML training, we take
FPR working point, we can do a weighted analysis ensemble averages of our models (5 NFs and 100 BDTs).
across different cut values. Following Ref. [41] in The NFs are ensembled after Step 2 by drawing samples
adapting the Matrix Element Method [42, 43], we equally from each NF, and the BDTs are ensembled after
assign each event i a weight wi = ℓ(xi ) related to training in Step 3 by averaging their scores. For the
the estimated signal-to-background likelihood ratio quintic polynomial fits, we profile over the fit parameters.
from Eq. (1). These weights generalize the FPR We briefly discuss alternate analysis choices in the End
cut in Step 4a: there, events have weight 1 if they Matter, in particular the mass fit form and bin width.
pass a given threshold and 0 otherwise. Because the Same-Sign Validation Study. To be confident in the
estimate of ℓ is imperfect, some weights may be neg- statistical validity of our methods, we apply our analysis
4

2016 CMS Open Data DoubleMuon


105 2016 CMS Open Data DoubleMuon
Opposite Sign False Positive Rate 1σ

8.7 fb−1 100%, 1.6σ 10−2

Υ(1S)

Υ(2S)
Υ(3S)
104 √ 3σ
s = 13 TeV 10.0%, 5.13σ
Anti-Isolated 1.0%, 4.82σ −4
10

Background-Only p-Value

103 0.1%, 0.37σ
−6
10 5σ
2
10
Events

−8
10

101 10−10

10−12 7σ Random Cut


100 HLT_TrkMu15_DoubleTrkMu5NoFiltersNoVt
Opposite Sign Muons Dimuon pT
10−14 Bin width = 1.5% Softer Mu IP3D
−1 Fit Type: Quintic Harder Mu IP3D
10 Muon Iso 04 ≥ 0.55
10−16 √ CATHODE
8.7 fb−1 , s = 13 TeV `-Reweighting
10−2 10−18
6 8 10 12 14 16 10−3 10−2 10−1 100
Dimuon Mass mµµ [GeV] False Positive Rate
(a) (b)

FIG. 2. Results of the OS bump hunt analysis, shown in the same format as Fig. 1. (a) After a cut on the BDT classifier, clear
peaks emerge in the SR above the SB-fitted background. (b) Noting the different y-axis scale from Fig. 1b, the Υ significance
grows from 1.5σ to well over 5σ with CATHODE, especially with likelihood reweighting.

procedure to a “control” sample of SS dimuons, with significance. We conclude that in order to reveal the Υ
roughly half the number of events as the OS sample. A signal, the BDT classifier must be able to learn nontrivial
doubly-charged resonance decaying to muons (Φ++ → correlations between auxiliary features, which are not
µ+ µ+ ) at ∼ 10 GeV is strongly constrained by precision present in the SS control sample.
electroweak tests, such as the Z-pole width and running Opposite-Sign Search Results. We now present
of αEM [21]. Such a resonance is also ruled out by model- search results in the OS channel for anti-isolated Υ →
specific searches for doubly-charged scalars [45, 46]. Thus, µ+ µ− decays. In Fig. 2a, we show the dimuon mass
we seek to verify that our procedure does not find a signal distribution after a sequence of BDT cuts. A modest
in the SS channel where we expect there to be none. initial excess in the SR is visually amplified by cuts on
In Fig. 1a, we show the dimuon mass spectrum after the classifier. The quantitative gains in signal significance
various BDT cuts. As we impose stricter criteria on the are shown in Fig. 2b. CATHODE achieves a maximum
FPR (i.e. smaller fraction of background events passing significance of 5.7σ at a 7.5% FPR working point, which is
the cut), no significant peaks are observed in the SR. increased to 6.4σ with likelihood reweighting. By contrast,
We quantify this via the significance curves in Fig. 1b. none of the classical cuts surpass the nominal 5σ discovery
The cut-and-count significances (Step 4a) are plotted threshold — cutting on the harder muon’s IP3D achieves
as a function of the FPR, for both CATHODE and the at most 4.1σ significance. This demonstrates substantial
classical tests, and no method finds evidence of a localized gains from using a multidimensional ML-based approach
excess above 2.4σ. The ML-based likelihood reweighted compared to single-feature classical cuts.
significance (Step 4b) is plotted as a horizontal line at a To better understand how the BDT discriminates Υ-
modest 1.6σ. We conclude that our method successfully like events from background, we can inspect the impact of
avoids sculpting spurious signals. classifier cuts on the auxiliary features in the SR, shown
For completeness, we perform two additional valida- in Fig. 3. With more stringent FPR cuts, the BDT selects
tions. First, we use a classifier trained on OS data to events with moderate dimuon pT (≈ 60 GeV) and small
look for resonances in SS data. We (successfully) do not IP3Ds (≈ 10−3 cm). In essence, the BDT is attempting
find a SS signal, which implies that a classifier trained on to undo the effects of the initial anti-isolation condition
(signal-containing) OS data does not sculpt peaks where as best as it can to recover the Υ resonance. By focusing
none exist. Second, we use a classifier trained on SS on small impact parameters, the BDT is mitigating the
data to look for resonances in OS data. There is indeed background from uncorrelated hadron decays. However,
a initial 1.6σ Υ signal in the OS channel (as discussed cutting on the IP3D alone does not reproduce the original
below), but we (successfully) do not elevate the signal peak at 10−3 cm; if it did, a classical IP3D cut would have
5

2016 CMS Open Data DoubleMuon


Pre Anti-Isolation Cut HLT_TrkMu15_DoubleTrkMu5NoFiltersNoVt Bin width = 1.5%
+ − Fit Type: Quintic
104 100% FPR Opposite Sign: µ µ
10.0% FPR
Muon Iso √ 04 ≥ 0.55
8.7 fb−1 , s = 13 TeV
1.0% FPR
103 0.1% FPR
Events

102

101

100
0 50 100 150 10−3 10−2 10−1 10−3 10−2 10−1
Dimuon pT [GeV] Harder Mu IP3D [cm] Softer Mu IP3D [cm]

FIG. 3. Distributions in the SR of three auxiliary features — dimuon pT , harder muon IP3D, and softer muons IP3D — after
increasingly stringent cuts on the BDT classifier score. The dashed curves show results before the initial anti-isolation cut.

matched the performance of CATHODE. By focusing on improve sensitivity for the likelihood reweighting method,
moderate dimuon pT , the BDT is additionally rejecting where there are uncertainties from the estimate of µ in
low pT backgrounds. Both the uncorrelated hadron and Eq. (1) lead to suboptimal (though still valid) weights.
Drell-Yan backgrounds fall off at high pT , although the While we do not expect these variations to substantially
uncorrelated hadron background does so more sharply, improve the quoted significances, they could be relevant
which we can confirm through studies of the SS control for revealing more elusive signals.
sample. Since the Drell-Yan background generates gen- Now that we have “rediscovered” the Υ, we plan to
uine prompt µ+ µ− signatures, the BDT would need access build upon this study in the future and perform a full
to additional kinematic features to mitigate it further. scan over the dimuon mass range to search for resonances
Conclusions. Using an ML-based anomaly detection beyond the Standard Model. We encourage the anomaly
technique, we elevated an anti-isolated Υ → µ+ µ− signal detection community to use this study and this dataset
to over 5σ significance in CMS Open Data. Our study for development and testing, since the Υ peak serves as
represents the first analysis of anti-isolated Υ production a “standard candle” whose properties are well known.
at the LHC. This production channel is particularly useful This dataset is complementary to previously publicized
as a probe for the transition region of QCD, and our work benchmark datasets, since it corresponds to real, noisy,
is a step towards measuring observables relevant for bot- detector-level experimental data. It also presents new
tomonium fragmentation. Moreover, we showed that the challenges; for example, we found that not all auxiliary
statistical performance of multi-feature ML-based meth- feature sets avoided sculpting, a hurdle which we had not
ods is superior to that of classical single-feature cut and encountered in previous studies on synthetic data. We
count, with the best performance coming from reweighting hope this Υ analysis inspires further anomaly detection
events according to the estimated signal-to-background studies in real collider data.
likelihood. This is all done without any simulation of Acknowledgments. We would like to thank Ed
signal or background. Without these novel methods, anti- Witkowski for early discussions and contributions to this
isolated Υ’s are hard to find: the naive classical cuts we work, and Cari Cesarotti, Matt Strassler, and Daniel
tested yield at best around 4σ significance. Whiteson for detailed feedback and encouragement. RG
There are a number of ways we could enhance the sta- and JT are supported by the National Science Foundation
tistical sensitivity of our analysis. We chose fixed SR and (NSF) under Cooperative Agreement PHY-2019786 (The
SBs motivated by the known location of the Υ and other NSF AI Institute for Artificial Intelligence and Funda-
QCD dimuon resonances, but one could consider several mental Interactions (http://iaifi.org/), and by the
candidate regions to find the optimal choice. When esti- U.S. Department of Energy (DOE) Office of High Energy
mating backgrounds and likelihoods using NFs and BDTs, Physics under grant number DE-SC0012567. JT is ad-
we used ensembling to mitigate ML training artifacts, ditionally supported by the Simons Foundation through
but one could account for both statistical and systematic Investigator grant 929241, and in part by grant NSF PHY-
uncertainties in the architecture and training. This would 2309135 to the Kavli Institute for Theoretical Physics
6

(KITP). RM and BN are supported by the U.S. DOE


Office of Science under contract DE-AC02-05CH11231 2016 CMS Open Data DoubleMuon
and Grant No. 63038 from the John Templeton Foun- 1σ
dation. This research used resources of the National 2σ
10−2
Energy Research Scientific Computing Center, a DOE 3σ

Background-Only p-Value
Office of Science User Facility supported by the Office 4σ
of Science of the U.S. Department of Energy under Con- 10−5
tract No. DE-AC02-05CH11231 using NERSC award 5σ
−8
HEP-ERCAP0021099. 10

End Matter: Code and Data. All of the code
to reproduce the analyses and figures in this paper can 10−11
be found at https://github.com/hep-lbdl/dimuonAD. 7σ
The code consists of a series of numbered Python files
and Jupyter notebooks, that, when run in order, will 10−14
CATHODE
completely reproduce our analyses from scratch. The Cubic Binwidth: 1.1%
BDTs are trained using the XGBoost package [47] and 10−17 Quintic Binwidth: 1.5%
the CATHODE normalizing flow is built and trained Septic Binwidth: 2.3%
with PyTorch [48]. The architectures and hyperpa- 10−20
rameters used in this analysis can be found in the 10−3 10−2 10−1 100
GitHub repository in the configs folder. The datasets False Positive Rate
used in this paper are available on Zenodo at https:
//zenodo.org/records/14618719. FIG. 4. The CATHODE p-values as a function of FPR, for
End Matter: Systematic Variations and Pseu- different choices of fit polynomial and bin width. The central
doexperiments. To test the robustness of our anomaly fit and binning choice used in Fig. 2b is bolded in solid purple.
detection strategy, we want to show its behavior under
systematic variations. The baseline results in the main

text were based on a quintic polynomial background fit [email protected]

and a dimuon relative bin width of 1.5%. For fit varia- [email protected]
§
tions, we consider cubic and septic polynomials. For bin [email protected]
width variations, we consider 1.1% and 2.3%. [1] Per Ernstrom and Leif Lonnblad, “Generating heavy
quarkonia in a perturbative QCD cascade,” Z. Phys. C
In Fig. 4, we plot the CATHODE p-value as a function 75, 51–56 (1997), arXiv:hep-ph/9606472.
of FPR for these different variations, analogous to the solid [2] A. Andronic et al., “Heavy-flavour and quarkonium
blue line in Fig. 2b. All of these variations yield similar production in the LHC era: from proton–proton to
results to the baseline approach, and all choices considered heavy-ion collisions,” Eur. Phys. J. C 76, 107 (2016),
surpass the 5σ discovery threshold. This shows that the arXiv:1506.03981 [nucl-ex].
Υ significances are robust with respect to these choices. [3] Reggie Bain, Lin Dai, Andrew Hornig, Adam K. Leibovich,
Yiannis Makris, and Thomas Mehen, “Analytic and
If anything, the baseline approach is conservative relative Monte Carlo Studies of Jets with Heavy Mesons and
to these variations, likely due to a statistical fluctuation. Quarkonia,” JHEP 06, 121 (2016), arXiv:1603.06981 [hep-
We note that similar robustness to variations holds for ph].
the single-feature classical cuts and for the likelihood [4] Francesco Giovanni Celiberto and Michael Fucilla,
reweighting method. “Diffractive semi-hard production of a J/ψ or a Υ from
Another potential concern is whether our computed single-parton fragmentation plus a jet in hybrid factoriza-
tion,” Eur. Phys. J. C 82, 929 (2022), arXiv:2202.12227
p-values are valid even though we are using an ML-based [hep-ph].
approach. To verify that the standard asymptotic for- [5] Roel Aaij et al. (LHCb), “Study of J/ψ Produc-
mulae [40] for p-values are still valid, we did a numerical tion in Jets,” Phys. Rev. Lett. 118, 192001 (2017),
analysis with pseudoexperiments. Specifically, we gener- arXiv:1701.05116 [hep-ex].
ated 1000 pseudoexperiments by sampling from a trained [6] Armen Tumasyan et al. (CMS), “Fragmentation of jets
NF, which we take to be our null hypothesis. We then containing a prompt J/ψ meson in PbPb and pp collisions

computed the observed distribution of the test statistic, at sNN = 5.02 TeV,” Phys. Lett. B 825, 136842 (2022),
arXiv:2106.13235 [hep-ex].
which follows the expected asymptotic distribution out [7] Reggie Bain, Lin Dai, Adam Leibovich, Yiannis Makris,
to at least the 3σ level. and Thomas Mehen, “NRQCD Confronts LHCb Data on
Quarkonium Production within Jets,” Phys. Rev. Lett.
119, 032002 (2017), arXiv:1702.05525 [hep-ph].
[8] Geoffrey T. Bodwin, Eric Braaten, and G. Peter Lep-
age, “Rigorous QCD analysis of inclusive annihilation

[email protected] and production of heavy quarkonium,” Phys. Rev. D 51,
7

1125–1171 (1995), [Erratum: Phys.Rev.D 55, 5853 (1997)], proton collisions at 13 TeV,” Phys. Lett. B 796, 131–154
arXiv:hep-ph/9407339. (2019), arXiv:1812.00380 [hep-ex].
[9] Naomi Cooke, Measurements of Quarkonia and [25] Cari Cesarotti, Yotam Soreq, Matthew J. Strassler, Jesse
Tetraquark Production in Jets at LHCb, Ph.D. thesis, Thaler, and Wei Xue, “Searching in CMS Open Data for
Glasgow U. (2023). Dimuon Resonances with Substantial Transverse Momen-
[10] CMS Collaboration (2024). DoubleMuon pri- tum,” Phys. Rev. D 100, 015021 (2019), arXiv:1902.04222
mary dataset in NANOAOD format from [hep-ph].
RunH of 2016 (/DoubleMuon/Run2016H- [26] Edmund Witkowski, Benjamin Nachman, and Daniel
UL2016_MiniAODv2_NanoAODv9- Whiteson, “Learning to isolate muons in data,” Phys.
v1/NANOAOD). CERN Open Data Portal. Rev. D 108, 092008 (2023), arXiv:2306.15737 [hep-ex].
DOI:10.7483/OPENDATA.CMS.UZD7.Z50M. [27] Jack H. Collins, Kiel Howe, and Benjamin Nachman,
[11] Anna Hallin, Joshua Isaacson, Gregor Kasieczka, Claudius “Anomaly Detection for Resonant New Physics with Ma-
Krause, Benjamin Nachman, Tobias Quadfasel, Matthias chine Learning,” Phys. Rev. Lett. 121, 241803 (2018),
Schlaffer, David Shih, and Manuel Sommerhalder, “Clas- arXiv:1805.02664 [hep-ph].
sifying Anomalies THrough Outer Density Estimation [28] Jack H. Collins, Kiel Howe, and Benjamin Nachman,
(CATHODE),” (2021), arXiv:2109.00546 [hep-ph]. “Extending the search for new resonances with machine
[12] Gregor Kasieczka et al., “The LHC Olympics 2020: A learning,” Physical Review D 99 (2019), 10.1103/phys-
Community Challenge for Anomaly Detection in High revd.99.014038.
Energy Physics,” (2021), arXiv:2101.08320 [hep-ph]. [29] Esteban G. Tabak and Eric Vanden-Eijnden, “Density
[13] T. Aarrestad et al., “The Dark Machines Anomaly Score estimation by dual ascent of the log-likelihood,” Commu-
Challenge: Benchmark Data and Model Independent nications in Mathematical Sciences 8, 217 – 233 (2010).
Event Classification for the Large Hadron Collider,” [30] Ivan Kobyzev, Simon J.D. Prince, and Marcus A.
(2021), arXiv:2105.14027 [hep-ph]. Brubaker, “Normalizing flows: An introduction and re-
[14] Living Review of Machine Learning in High view of current methods,” IEEE Transactions on Pattern
Energy Physics. https://iml-wg.github.io/ Analysis and Machine Intelligence 43, 3964–3979 (2021).
HEPML-LivingReview. [31] George Papamakarios, Eric Nalisnick, Danilo Jimenez
[15] Georges Aad et al. (ATLAS), √ “Dijet resonance search Rezende, Shakir Mohamed, and Balaji Lakshmi-
with weak supervision using s = 13 TeV pp collisions narayanan, “Normalizing flows for probabilistic modeling
in the ATLAS detector,” Phys. Rev. Lett. 125, 131801 and inference,” (2021), arXiv:1912.02762 [stat.ML].
(2020), arXiv:2005.02983 [hep-ex]. [32] “Search for prompt production of a GeV scale resonance
[16] Georges Aad et al. (ATLAS), “Anomaly detection search decaying
√ to a pair of muons in proton-proton collisions at
for new resonances decaying into a Higgs boson and a s = 13 TeV,” CMS-PAS-EXO-21-005 (2023).
generic
√ new particle X in hadronic final states using [33] Leo Breiman, Jerome Friedman, Charles J. Stone, and
s = 13 TeV pp collisions with the ATLAS detector,” R.A. Olshen, Classification and Regression Trees (Chap-
Phys. Rev. D 108, 052009 (2023), arXiv:2306.03637 [hep- man and Hall/CRC, 1984).
ex]. [34] Jerome H. Friedman, “Greedy function approximation:
[17] Vladimir Chekhovsky et al. (CMS), “Model-agnostic A gradient boosting machine,” Annals of Statistics 29,
search for dijet resonances with anomalous
√ jet substruc- 1189–1232 (2000).
ture in proton-proton collisions at s = 13 TeV,” (2024), [35] Thorben Finke, Marie Hein, Gregor Kasieczka, Michael
arXiv:2412.03747 [hep-ex]. Krämer, Alexander Mück, Parada Prangchaikul, Tobias
[18] Georges Aad et al. (ATLAS), “Search for New Phenomena Quadfasel, David Shih, and Manuel Sommerhalder, “Tree-
in Two-Body Invariant Mass Distributions Using Unsu- based algorithms for weakly supervised anomaly detec-
pervised Machine Learning for Anomaly Detection at tion,” Phys. Rev. D 109, 034033 (2024), arXiv:2309.13111
s=13 TeV with the ATLAS Detector,” Phys. Rev. Lett. [hep-ph].
132, 081801 (2024), arXiv:2307.01612 [hep-ex]. [36] Marat Freytsis, Maxim Perelstein, and Yik Chuen San,
[19] Georges Aad et al. (ATLAS), “Weakly supervised anomaly “Anomaly detection in the presence of irrelevant features,”
detection for resonant new physics√in the dijet final state JHEP 02, 220 (2024), arXiv:2310.13057 [hep-ph].
using proton-proton collisions at s = 13 TeV with the [37] J. Neyman and E. S. Pearson, “On the problem of the
ATLAS detector,” (2025), arXiv:2502.09770 [hep-ex]. most efficient tests of statistical hypotheses,” Philosophi-
[20] Oliver Knapp, Guenther Dissertori, Olmo Cerri, Thong Q. cal Transactions of the Royal Society of London. Series A,
Nguyen, Jean-Roch Vlimant, and Maurizio Pierini, Containing Papers of a Mathematical or Physical Char-
“Adversarially Learned Anomaly Detection on CMS acter 231, 289–337 (1933).
Open Data: re-discovering the top quark,” (2020), [38] Eric M. Metodiev, Benjamin Nachman, and Jesse Thaler,
10.1140/epjp/s13360-021-01109-4, arXiv:2005.01598 [hep- “Classification without labels: Learning from mixed sam-
ex]. ples in high energy physics,” JHEP 10, 174 (2017),
[21] S. Navas et al. (Particle Data Group), “Review of particle arXiv:1708.02949 [hep-ph].
physics,” Phys. Rev. D 110, 030001 (2024). [39] Markus Ojala and Gemma C. Garriga, “Permutation tests
[22] CERN Open Data Portal. https://opendata.cern.ch. for studying classifier performance,” in 2009 Ninth IEEE
[23] A. M. Sirunyan et al. (CMS), “Performance of the CMS International Conference on Data Mining (2009) pp. 908–
muon detector and √ muon reconstruction with proton- 913.
proton collisions at s = 13 TeV,” JINST 13, P06015 [40] Glen Cowan, Kyle Cranmer, Eilam Gross, and Ofer
(2018), arXiv:1804.04528 [physics.ins-det]. Vitells, “Asymptotic formulae for likelihood-based tests
[24] Albert M Sirunyan et al. (CMS), “A search for pair pro- of new physics,” Eur. Phys. J. C 71, 1554 (2011), [Er-
duction of new light bosons decaying into muons in proton- ratum: Eur.Phys.J.C 73, 2501 (2013)], arXiv:1007.1727
8

[physics.data-an]. (1990).
[41] Marat Freytsis, Grigory Ovanesyan, and Jesse Thaler, [46] S. Atag and K. O. Ozansoy, “Realistic constraints on the
“Dark Force Detection in Low Energy e-p Collisions,” doubly charged bilepton couplings from Bhabha scatter-
JHEP 01, 111 (2010), arXiv:0909.2862 [hep-ph]. ing with LEP data,” Phys. Rev. D 68, 093008 (2003),
[42] K. Kondo, “Dynamical Likelihood Method for Reconstruc- arXiv:hep-ph/0310046.
tion of Events With Missing Momentum. 1: Method and [47] Tianqi Chen and Carlos Guestrin, “Xgboost: A scalable
Toy Models,” J. Phys. Soc. Jap. 57, 4126–4140 (1988). tree boosting system,” in Proceedings of the 22nd ACM
[43] K. Kondo, “Dynamical likelihood method for reconstruc- SIGKDD International Conference on Knowledge Discov-
tion of events with missing momentum. 2: Mass spectra ery and Data Mining, KDD ’16 (ACM, 2016) p. 785–794.
for 2 —> 2 processes,” J. Phys. Soc. Jap. 60, 836–844 [48] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer,
(1991). James Bradbury, Gregory Chanan, Trevor Killeen, Zeming
[44] G. Bohm and G. Zech, “Statistics of weighted Poisson Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison,
events and its applications,” Nucl. Instrum. Meth. A 748, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison,
1–6 (2014), arXiv:1309.1287 [physics.data-an]. Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner,
[45] Morris L. Swartz et al., “A Search for Doubly Charged Lu Fang, Junjie Bai, and Soumith Chintala, “Pytorch: An
Higgs Scalars in Z Decay,” Phys. Rev. Lett. 64, 2877–2880 imperative style, high-performance deep learning library,”
(2019), arXiv:1912.01703 [cs.LG].

You might also like