Detecting CAN Masquerade Attacks With Signal Clustering Similarity
Detecting CAN Masquerade Attacks With Signal Clustering Similarity
Clustering Similarity
Pablo Moriano∗ , Robert A. Bridges† , Michael D. Iannacone†
∗ Computer Science and Mathematics Division, † Cyber Resilience and Intelligence Division
Oak Ridge National Laboratory
{moriano, bridgesra, iannaconemd}@ornl.gov
Abstract— Vehicular Controller Area Networks (CANs) are port) and remote access (e.g., Bluetooth, 5G). This increasing
susceptible to cyber attacks of different levels of sophistication. connectivity enables more advanced vehicle features at the
arXiv:2201.02665v2 [cs.CR] 11 Mar 2022
Fabrication attacks are the easiest to administer—an adversary expense of expanding the attack surface. By hijacking ECUs,
simply sends (extra) frames on a CAN—but also the easiest to
detect because they disrupt frame frequency. To overcome time- attackers may stealthily manipulate CAN frames resulting
based detection methods, adversaries must administer masquer- in life threatening incidents. For example, malicious frame
ade attacks by sending frames in lieu of (and therefore at the injection through cellular networks has resulted in unintended
expected time of) benign frames but with malicious payloads. acceleration, vehicle brake deactivation, and rogue steering
Research efforts have proven that CAN attacks, and masquerade wheel turning [1], [2].
attacks in particular, can affect vehicle functionality. Examples
include causing unintended acceleration, deactivation of vehicle’s CAN attacks are commonly classified using a three-tiered
brakes, as well as steering the vehicle. We hypothesize that taxonomy that includes fabrication, suspension, and masquer-
masquerade attacks modify the nuanced correlations of CAN ade attacks [3], [4]. Fabrication attacks inject extra frames,
signal time series and how they cluster together. Therefore, whereas suspension attacks remove benign frames; conse-
changes in cluster assignments should indicate anomalous be- quently, both categories usually disturb regular frame timing
havior. We confirm this hypothesis by leveraging our previ-
ously developed capability for reverse engineering CAN signals on the bus and can be accurately detected using time-based
(i.e., CAN-D [Controller Area Network Decoder]) and focus methods [5], [6], [7]. Masquerade attacks require the adversary
on advancing the state of the art for detecting masquerade to send frames in lieu of (and therefore at the expected time
attacks by analyzing time series extracted from raw CAN frames. of) benign frames but with malicious payloads. In masquerade
Specifically, we demonstrate that masquerade attacks can be attacks, adversaries first suspend frames of a specific ID and
detected by computing time series clustering similarity using
hierarchical clustering on the vehicle’s CAN signals (time series) then inject spoofed frames that modify the content of the
and comparing the clustering similarity across CAN captures frames instead of their timing patterns. Hence, masquerade
with and without attacks. We test our approach in a previously attacks are the stealthiest CAN attacks.
collected CAN dataset with masquerade attacks (i.e., the ROAD Masquerade attacks can still be detected because they alter
dataset) and develop a forensic tool as a proof of concept to the regular relationships of a vehicle’s subsystems. Using an
demonstrate the potential of the proposed approach for detecting
CAN masquerade attacks. example attack from the ROAD [4] dataset, an adversary that
gains control of the ECU(s) that communicate the wheel speed
I. I NTRODUCTION signals (four nearly identical signals) can modify the frames to
break the near perfect correlation, which will stop the vehicle
Modern vehicles are complex cyber-physical systems con-
(regardless of the driver’s actions). By understating the regular
taining up to hundreds of electronic control units (ECUs).
relationships of the vehicle’s CAN signals, this condition can
ECUs are embedded computers that communicate over a (few)
be flagged as anomalous, even if the modified signals are not
Controller Area Networks (CANs) to help control vehicle
abnormal when considered individually.
functionality, including acceleration, braking, steering, and
The widespread dependence of modern vehicles on CANs,
engine status, among others. CANs are vulnerable to cyber
combined with the security vulnerabilities has been meet with
exploitation, both by adversaries with direct physical access
a push to develop intrusion detection systems (IDSs) for CAN.
(e.g., through the standard on-board diagnostic [OBD] II
Generally, there are two types of IDSs methods: signature and
This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-
machine learning (ML). Signature-based methods rely on a
AC05-00OR22725 with the US Department of Energy (DOE). The US government predefined set of rules for attack conditions. Behavior that
retains and the publisher, by accepting the article for publication, acknowledges that
the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to
matches the expected signature is regarded as an attack [8], [9],
publish or reproduce the published form of this manuscript, or allow others to do [10]. However, given the heterogeneous nature of the CAN bus
so, for US government purposes. DOE will provide public access to these results
of federally sponsored research in accordance with the DOE Public Access Plan
in terms of transmission rates and broadcasting, effective rules
(http://energy.gov/downloads/doe-public-access-plan). for detecting attacks are difficult to design, which contributes
to high rates of false negatives [11]. In contrast, ML-based
Workshop on Automotive and Autonomous Vehicle Security (AutoSec) 2022
24 April 2022, San Diego, CA, USA methods profile benign behavior to identify anomalies or
ISBN 1-891562-75-4 generalized attack patterns when the traffic does not behave
https://dx.doi.org/10.14722/autosec.2022.23028
www.ndss-symposium.org
as expected. used it for cluster analysis to identify distinct driver behaviors
In doing this, many ML-based methods leverage the CAN’s and detect potential attacks. Li et al. [16] leveraged correla-
frame payloads [12], [13]. Note that in passenger vehicles, tions from multiple sensors to train a regression model that
signals (sensor values communicated in CAN frames) are estimates a targeted sensor value. They used the difference
encoded into the frame payloads via proprietary (nonpublic, between the estimated and observed sensor values as an
original equipment manufacturer-specified) mappings. Some anomaly signature. Sharma et al. [17] proposed to compute
IDSs operate on the binary payload (raw bits) [12], whereas Pearson correlation matrices of geolocation-related signals
others operate on the time series of signal values [13]. Process- (e.g., latitude, longitude, elevation, speed, heading) to estimate
ing the binary payload has a set of associated challenges. First, the state of neighboring vehicles and detect location forging
there is a semantic gap with respect to the signals encoded in misbehavior based on correlation matrices’ distance. Guo et
the payload. This means that a single CAN frame’s payload al. [18] proposed Edge Computing Based Vehicle Anomaly
usually contains several signals encoded in different formats, Detection, which focuses on analyzing the time and frequency
including byte ordering, signedness, label and units, and scale domains of sensor data to detect anomalies. In the first step,
and offset [14]. Second, detecting subtle masquerade attacks they flag abrupt changes in the correlations of sensor readings
requires analyzing the payload content because the correlation in the time domain as an indication of anomalies. For more
between certain signals may change when the frame content is accurate anomaly detection in the second step, they further
modified during an attack; hence, analyzing translated signals analyze the sudden change in sensor readings by computing
is a promising avenue. Thus, considering the relationship the change in power spectral density (PSD) of sensor data in
between signals is important for achieving a more effective the frequency domain. Under anomalous circumstances, the
defense against advanced masquerade attacks. PSD is expected to be higher in the high-frequency band. He
In this work, we propose a forensic framework to decide if at al. [19] explored using correlations between heterogeneous
recorded CAN traffic contains masquerade attacks. The pro- sensors to identify consistency among sensor data (e.g., accel-
posed framework works at the signal-level and leverages time eration, engine RPM, vehicle speed, GPS) and then utilize the
series clustering similarity to arrive at statistical conclusions. data to detect anomalous sensor measurements. They accom-
In doing so, we use available and readable signal-level CAN plished this by embedding the relationship of multiple sensors
traffic in benign and attack conditions to test our framework. into an autoencoder and pinpointing anomalies based on the
The results obtained from our evaluation demonstrate the magnitude of the reconstruction loss. Leslie [20] developed
capability of the proposed framework to detect masquerade an unsupervised learning method to detect malicious traffic
attacks in previously recorded CAN traffic with high accuracy. over J1939 data. This method converts categorical features to
Our contributions in this paper are summarized as follows: numerical features with a one-hot encoding scheme and uses
• We detail a CAN forensic framework based on time series an ensemble AHC algorithm that integrates multiple linkage
clustering similarity for detecting masquerade attacks. The options.
proposed framework is based on (1) clustering time series Compared with the studies mentioned above, the present
using agglomerative hierarchical clustering (AHC); (2) com- paper is unique in that we model temporal and signal-wise
puting a clustering similarity; and (3) performing hypothesis dependencies between CAN signals using time series cluster-
testing using the clustering similarity distributions to decide ing [21]. Specifically, we use AHC to generate a hierarchical
between benign and attack conditions. relationship between signals known as a dendrogram [22].
• We perform a sensitivity analysis of detection capabilities Using a hypothesis test, we show that masquerade attacks
with respect to the type of AHC used. We report our results are detectable by the resultant distribution of clustering sim-
and offer possible explanations. ilarities. In addition, our method is tested on real CAN
• We evaluate the proposed framework on masquerade attacks data containing hundreds of signals, as opposed to previous
from the ROAD dataset [4]. Evaluation results show very methods that used a dozen signals at most.
high effectiveness of detecting attacks of different levels
III. M ETHODS
of sophistication. Our results indicate that the proposed
forensic framework can be built upon to yield a viable real- Our focus is on processing a set of N signals (i.e., time
time IDS, but using these results to craft a short-time-to- series, S = {X 1 , X 2 , . . . , X N }) obtained from a CAN log
detection IDS is future work. captured during a vehicle’s drive. The subsections below ex-
plain the mathematical details of each step of our method and
II. R ELATED W ORK the data source used to perform this research. The proposed
Our research is informed by past work leveraging time framework applies AHC (see § III-A) to produce a dendrogram
series signal correlations for context characterization of cyber- of clusters of S. Given two captures, each producing its
physical systems. Here, we provide an overview of related corresponding dendrogram, we compute a similarity between
work in this area. the dendrograms using the CluSim method (see § III-B) [23].
Ganesan et al. [15] introduced the notion of using pair- Finally, the pairwise similarities from each capture’s den-
wise correlations of vehicular sensor readings (e.g., speed, drograms are used to create a hypothesis test to distinguish
acceleration, steering) to characterize behavioral context. They between a benign CAN capture and an attack CAN capture.
2
A. Hierarchical Clustering by this method exists in the range [0, 1], where 0 implies
Hierarchical clustering is a method that outputs a hierarchy maximally dissimilar clusters, and 1 corresponds to identical
of clusters (i.e., a set of nested clusters that are organized clusterings. We parametrized the clustering similarity method
in a tree-like diagram known as dendrogram). It works by by letting r = −5.0 and α = 0.9. Figure 2 shows a
transforming a proximity matrix into a sequence of nested comparison between similarity scores of three dendrograms.
partitions. Figure 1 depicts the details of a hierarchical clus- The key advantage of CluSim is that it does not suffer from
tering and its subsequent dendrogram using the agglomera- critical biases found in previous methods (e.g., normalized
tive approach. The mathematical formulation of hierarchical mutual information) and works for hierarchical clusterings,
clustering can be found in the Appendix A. Agglomerative including in conditions of skew cluster sizes and a different
number of clusters. We detail the main steps of CluSim in the
<latexit sha1_base64="lh15BHjErcxaseqYjOIG8rS4yCs=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu3ix9o/29iJ2lGD4DFhd2TnYDttcHS22RqOxjkvM1KWSzRmyKLCxhVqK7ik2h+VhgrkFzihoaMKMzJx1VxdB++dMg7SXLunbNCoDycqzIyZZolzZmjPzXJvJv6vNyxt2osroYrSkuLzRWkpA5sHswiCsdDErZw6glwLd2vAz1Ejty6ohS1X81N9fzSm1CXclBVOSsxQu7qufn7/Wledbsj2eyFj/XrRmWtUk38u1g37+2FnyVOUupD3HuY8jHVC1u3X7jvYcviPyXGnzb609364f9mDOdbgLbyDXWDQhQM4hCMYAIdLuIY/8NcD74P3yfs8t3qtu5ktWIDXuwW8B8KR</latexit>
(a)
<latexit sha1_base64="yc6rs5zDFq7J5h+bSPd5R7Sqb9k=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu0mH2v/bGMnakcNgseE3ZGdg+20wdHZZms4Gue8zEhZLtGYIYsKG1eoreCSan9UGiqQX+CEho4qzMjEVXN1Hbx3yjhIc+2eskGjPpyoMDNmmiXOmaE9N8u9mfi/3rC0aS+uhCpKS4rPF6WlDGwezCIIxkITt3LqCHIt3K0BP0eN3LqgFrZczU/1/dGYUpdwU1Y4KTFD7eq6+vn9a111uiHb74WM9etFZ65RTf65WDfs74edJU9R6kLee5jzMNYJWbdfu+9gy+E/JsedNvvS3vvh/mUP5liDt/AOdoFBFw7gEI5gABwu4Rr+wF8PvA/eJ+/z3Oq17ma2YAFe7xa+GMKS</latexit>
(b)
<latexit sha1_base64="jNoJxxHFmZLNoi5fQ44jZY/YFjg=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu3yj7V/trETtaMGwWPC7sjOwXba4OhsszUcjXNeZqQsl2jMkEWFjSvUVnBJtT8qDRXIL3BCQ0cVZmTiqrm6Dt47ZRykuXZP2aBRH05UmBkzzRLnzNCem+XeTPxfb1jatBdXQhWlJcXni9JSBjYPZhEEY6GJWzl1BLkW7taAn6NGbl1QC1uu5qf6/mhMqUu4KSuclJihdnVd/fz+ta463ZDt90LG+vWiM9eoJv9crBv298POkqcodSHvPcx5GOuErNuv3Xew5fAfk+NOm31p7/1w/7IHc6zBW3gHu8CgCwdwCEcwAA6XcA1/4K8H3gfvk/d5bvVadzNbsACvdwvAKcKT</latexit>
(c)
Appendix B.
Clusterings
<latexit sha1_base64="PO0NqngryUnjKe32h7rSS1UGwRM=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16CRbBU0mkqMdCLx4r2A9oQ9lsJ+3SzSbsTqQl9K948aCIV/+IN/+N2zYHbX0w8Hhvhpl5QSK4Rtf9tgpb2zu7e8X90sHh0fGJfVpu6zhVDFosFrHqBlSD4BJayFFAN1FAo0BAJ5g0Fn7nCZTmsXzEWQJ+REeSh5xRNNLALvcRppg1RKoRFJcjPR/YFbfqLuFsEi8nFZKjObC/+sOYpRFIZIJq3fPcBP2MKuRMwLzUTzUklE3oCHqGShqB9rPl7XPn0ihDJ4yVKYnOUv09kdFI61kUmM6I4livewvxP6+XYnjnZ1wmKYJkq0VhKhyMnUUQzpArYChmhlCmuLnVYWOqKDM56JIJwVt/eZO0r6veTbX2UKvUa3kcRXJOLsgV8cgtqZN70iQtwsiUPJNX8mbNrRfr3fpYtRasfOaM/IH1+QP9GZUG</latexit>
4
<latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit>
2
<latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit>
{{X 1 , X 4 , X 2 , X 3 }}
<latexit sha1_base64="tHtQS5nNUb+jjgkoA5NlxTAuHLY=">AAACq3icbVFLbxMxEHaWVwmvFLhxsaiQOCzReklJcqvgAMfySBuRDdGsM5ta9T5ke1Ejy3+FX8MV7vwbnN2CSMpIo/k8841n9E1aSaFNFP3qBNeu37h5a+92987de/cf9PYfnuiyVhwnvJSlmqagUYoCJ0YYidNKIeSpxNP0/M2mfvoVlRZl8cmsK5znsCpEJjgYn1r0RolNDF6Y5idb1aqS6KxN7PSLZS6kPgzaELfhpUuc877oHUT9qDF6FbBLcHD0OGvseLHfmSXLktc5FoZL0HrGosrMLSgjuB/aTWqNFfBzWOHMwwJy1HPbLOboM59Z0qxU3gtDm+y/HRZyrdd56pk5mDO9W9sk/1eb1SYbza0oqtpgwdtBWS2pKelGL7oUCrmRaw+AK+F3pfwMFHDjVd2actGu2u0mS8z8OVpJYVVDDsq/nf3w9rWz8TBkh6OQsbHbZpYKitVfFhuG48Mw3uH8OVDDYZ7DWByy4XhzDrYr/lVwEvfZq/7gvb/LgLS2R56Qp+Q5YWRIjsg7ckwmhJNv5Dv5QX4GL4KPwecgaalB57LnEdmyAH8DpmfVWQ==</latexit>
Distance
<latexit sha1_base64="e1gCTqy9Q3csip6NEZtqM4PlqD0=">AAAB+HicbVBNS8NAEN3Ur1o/GvXoJVgETyWRoh4LevBYwX5AG8pmO2mXbjZhdyLW0F/ixYMiXv0p3vw3btsctPXBwOO9GWbmBYngGl332yqsrW9sbhW3Szu7e/tl++CwpeNUMWiyWMSqE1ANgktoIkcBnUQBjQIB7WB8PfPbD6A0j+U9ThLwIzqUPOSMopH6drmH8IjZjdlEJYNp3664VXcOZ5V4OamQHI2+/dUbxCyNQCITVOuu5yboZ1QhZwKmpV6qIaFsTIfQNVTSCLSfzQ+fOqdGGThhrExJdObq74mMRlpPosB0RhRHetmbif953RTDKz/jMkkRJFssClPhYOzMUnAGXAFDMTGEMsXNrQ4bUUUZmqxKJgRv+eVV0jqvehfV2l2tUq/lcRTJMTkhZ8Qjl6RObkmDNAkjKXkmr+TNerJerHfrY9FasPKZI/IH1ucPV1GTgA==</latexit>
{{X 1 , X 4 }, {X 2 , X 3 }}
<latexit sha1_base64="nDdZAjrAeZjuo545xhGZsNfjglk=">AAACyXicbZFNbxMxEIadLR9l+UoLNy4WFRKHVbReUtLcKjhQiUtBJI2UDdGs402tej+wvSjB8om/xR/hypX+CJzdRmoSRrL8auaxZ/ROUgqudBj+bnl7d+7eu7//wH/46PGTp+2Dw6EqKknZgBaikKMEFBM8ZwPNtWCjUjLIEsEukqv3q/rFdyYVL/IvelmySQbznKecgnapaXsYGxxrttD1VwbmFWQg3WfWxGb01RAbYHd1bWydukUWEvL5mooa6s2Kwji2/rR9FHbCOvCuIDfi6PR5Wsf59KA1jmcFrTKWaypAqTEJSz0xIDWnglk/rhQrgV7BnI2dzCFjamLqWSx+5TIznBbSnVzjOnv7hYFMqWWWODIDfam2a6vk/2rjSqcnE8PzstIsp02jtBJYF3jlJp5xyagWSyeASu5mxfQSJFDtPN/osmhG9f14xlLn767fnz+8sybqBeT4JCCkbzfJtd8NRXpB/ziItpiykqVYM8QxhEQB6fWtWwfZNn9XDKMOedvpfnJ76aIm9tEL9BK9RgT10Ck6Q+dogCj6hf6gv+ja++h98xbejwb1WjdvnqGN8H7+A5Yh4dU=</latexit>
X3
<latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit>
{{X 1 , X 4 }, {X 2 }, {X 3 }}
<latexit sha1_base64="4dxjUewpABYhO8aiZT32T2TitzA=">AAAC53ichZFLbxMxEMe9Wx7t8koLNy4WFRKHVbTepqS5VXCAY0GkjZQN0azjTa16vSvbixpZ/gzcEFc+FnwanE1BJEEwkuW/Zn7z0ExeC65NknwPwp1bt+/c3d2L7t1/8PBRZ//gXFeNomxIK1GpUQ6aCS7Z0HAj2KhWDMpcsIv86vUyfvGJKc0r+cEsajYpYS55wSkY75p2TGYzw65NW8nCvIESlK/lbGZHHy1xMfZfz2XOq3+R6f+RoyWSuSiadg6TbtIa3hbkRhyePilaO5vuB+NsVtGmZNJQAVqPSVKbiQVlOBXMRVmjWQ30CuZs7KWEkumJbedw+Ln3zHBRKf+kwa33zwwLpdaLMvdkCeZSb8aWzr/Fxo0pTiaWy7oxTNJVo6IR2FR4uWs844pRIxZeAFXcz4rpJSigxl9krcv1atQoymas8Bvb3uD7N6+cTfsxOT6JCRm4dbJSIOe/KdKPB8dxusHUjarFL4Z4hpA0Jv2B8+cgm8vfFudpl7zs9t75u/TQynbRU/QMvUAE9dEpeovO0BBR9CNAwV4QhTz8HH4Jv67QMLjJeYzWLPz2E6t27UY=</latexit>
<latexit sha1_base64="e1gCTqy9Q3csip6NEZtqM4PlqD0=">AAAB+HicbVBNS8NAEN3Ur1o/GvXoJVgETyWRoh4LevBYwX5AG8pmO2mXbjZhdyLW0F/ixYMiXv0p3vw3btsctPXBwOO9GWbmBYngGl332yqsrW9sbhW3Szu7e/tl++CwpeNUMWiyWMSqE1ANgktoIkcBnUQBjQIB7WB8PfPbD6A0j+U9ThLwIzqUPOSMopH6drmH8IjZjdlEJYNp3664VXcOZ5V4OamQHI2+/dUbxCyNQCITVOuu5yboZ1QhZwKmpV6qIaFsTIfQNVTSCLSfzQ+fOqdGGThhrExJdObq74mMRlpPosB0RhRHetmbif953RTDKz/jMkkRJFssClPhYOzMUnAGXAFDMTGEMsXNrQ4bUUUZmqxKJgRv+eVV0jqvehfV2l2tUq/lcRTJMTkhZ8Qjl6RObkmDNAkjKXkmr+TNerJerHfrY9FasPKZI/IH1ucPV1GTgA==</latexit>
{{X 1 }, {X 4 }, {X 2 }, {X 3 }}
<latexit sha1_base64="dsosNUe8GowCTi+iaKqoXwEi+mc=">AAACrnicbVFNb9NAEN2YAsV8pdBbLysqJA5W5DUpacSlag9wLBVJI8UmGm/WyarrtbW7RkQr/xh+Dddy5N+wcVKUpIx2pPdm3mhmZ9JScG3C8E/Le7D38NHj/Sf+02fPX7xsH7wa6qJSlA1oIQo1SkEzwSUbGG4EG5WKQZ4Kdp3eXCzz19+Z0ryQX82iZEkOM8kzTsG40KT9MbbYvdE3S2oc18GadDdJtEneLwl27vuT9nHYCRvD9wFZg+Ozw6yxy8lBaxxPC1rlTBoqQOsxCUuTWFCGU8FqP640K4HewIyNHZSQM53Y5pc1fusiU5wVyrk0uIluVljItV7kqVPmYOZ6N7cM/i83rkx2mlguy8owSVeNskpgU+DlyvCUK0aNWDgAVHE3K6ZzUECNW+xWlx+rUX0/nrLMXaShFmYV5KAcr+3Vp/PaRr2AnJwGhPTrbWWhQM7+qUgv6J8E0Y6mrFQp7jTEaQiJAtLr1+4cZHf598Ew6pAPne4Xd5cuWtk+OkJv0DtEUA+doc/oEg0QRT/RL3SLfnuhN/QSb7KSeq11zWu0Zd78LzJT0js=</latexit>
X1 X4 X2 X3
<latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit> <latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit> <latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit> <latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit>
Time series
<latexit sha1_base64="pw5LD79KoXv6qexSC9opI02A02g=">AAAB+3icbVDLSgNBEJyNrxhfazx6GQyCp7ArQT0GvHiMkBckS5iddJIhsw9meiVh2V/x4kERr/6IN//GSbIHTSxoKKq66e7yYyk0Os63Vdja3tndK+6XDg6Pjk/s03JbR4ni0OKRjFTXZxqkCKGFAiV0YwUs8CV0/On9wu88gdIiCps4j8EL2DgUI8EZGmlgl/sIM0ybIgCqQQnQ2cCuOFVnCbpJ3JxUSI7GwP7qDyOeBBAil0zrnuvE6KVMoeASslI/0RAzPmVj6BkasgC0ly5vz+ilUYZ0FClTIdKl+nsiZYHW88A3nQHDiV73FuJ/Xi/B0Z2XijBOEEK+WjRKJMWILoKgQ6GAo5wbwrgS5lbKJ0wxjiaukgnBXX95k7Svq+5NtfZYq9RreRxFck4uyBVxyS2pkwfSIC3CyYw8k1fyZmXWi/VufaxaC1Y+c0b+wPr8AXPylK0=</latexit>
X1 X4 X2 X3
<latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit> <latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit> <latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit> <latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit>
X1 X4 X2 X3 X1 X4 X2 X3
<latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit> <latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit> <latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit> <latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit> <latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit> <latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit> <latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit> <latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit>
Fig. 1. A hierarchical clustering using the agglomerative approach. (a) An Time series
<latexit sha1_base64="pw5LD79KoXv6qexSC9opI02A02g=">AAAB+3icbVDLSgNBEJyNrxhfazx6GQyCp7ArQT0GvHiMkBckS5iddJIhsw9meiVh2V/x4kERr/6IN//GSbIHTSxoKKq66e7yYyk0Os63Vdja3tndK+6XDg6Pjk/s03JbR4ni0OKRjFTXZxqkCKGFAiV0YwUs8CV0/On9wu88gdIiCps4j8EL2DgUI8EZGmlgl/sIM0ybIgCqQQnQ2cCuOFVnCbpJ3JxUSI7GwP7qDyOeBBAil0zrnuvE6KVMoeASslI/0RAzPmVj6BkasgC0ly5vz+ilUYZ0FClTIdKl+nsiZYHW88A3nQHDiV73FuJ/Xi/B0Z2XijBOEEK+WjRKJMWILoKgQ6GAo5wbwrgS5lbKJ0wxjiaukgnBXX95k7Svq+5NtfZYq9RreRxFck4uyBVxyS2pkwfSIC3CyYw8k1fyZmXWi/VufaxaC1Y+c0b+wPr8AXPylK0=</latexit>
3
TABLE I
CAN CAPTURES USED FROM THE ROAD DATASET [4].
4) Similarity Distribution Computation: Once each dendro-
Description # Files Used Duration (min) gram has been computed for each file, we compute empirical
3 distributions of similarity between pairs of dendrograms using
Training
4
TABLE II
S TATISTICAL H YPOTHESIS T EST R ESULTS (p- VALUES ).
A. Correlated Attack
Figure 4 shows the comparison of similarity distributions
in the correlated attack. Among these, we found that the
framework that used the average linkage (i.e., [c]) is not
able to differentiate between benign and attack conditions. We
also noticed that the Ward’s method has the most distinctive
difference (i.e., smaller p-value).
Benign Attack
Fig. 3. CluSim cluster similarity heatmap for each pair of files from the ROAD 12 (a) p=0.005 30 (b)
dataset (12 benign files, 13 masquerade attack files of five attack scenario 10 25 p=0.002
types) depicted. For each file, a hierarchical clustering dendrogram is produced
based on similarity of the file’s CAN signals. For each pair of files, CluSim 8 20
produces a similarity measure between the two file’s dendrograms using 6 15
Ward’s linkage, r = −5.0, and α = 0.9, which is visualized by colors in the 4 10
heatmap. Atop the heatmap is a dendrogram showing hierarchical clustering
of all files based on their the CluSim similarities. We used Euclidean distance 2 5
on these similarities with Ward linkage. Notably, there are four main clusters. 0 0
Density
From left to right, the first and second contain all benign files except Benign 0.8 0.9 0.80 0.85 0.90 0.95
dyno reverse file, and only two attack files, i.e., Attack correlated
masquerade 3 and Attack reverse light on 3, while the final 15.0 (c) p=0.123 25 (d)
two clusters contain all remaining attack files and the aforementioned benign 12.5 20 p=0.000
file. As a preliminary analysis, this heatmap and clustering give positive 10.0 15
results for masquerade attack detection by comparing Pearson correlation 7.5
signal clusters. 10
5.0
2.5 5
0.0 0
(d) Ward. We also report the p-value, using three decimals, 0.7 0.8 0.9 0.80 0.85 0.90 0.95
of the associated Mann-Whitney U test to compare the two Similarity
Fig. 4. Empirical distribution comparison of the correlated attack for each
distributions in the inset; statistically significant values (i.e., linkage selection: (a) single, (b) complete, (c) average, and (d) Ward. Results
p-value < 0.05) are printed in bold. Recall that we fixed from these distributions appear in the first row of Table II.
the scaling parameter r = −5 for comparing hierarchical
clusterings. This is because we want to capture differences
at higher levels of the dendrograms, in which the focus is on B. Max Speedometer Attack
coarser groups of multiple correlated signals, instead of more
Table II shows that for the max speedometer attack, each
fine-grained groupings of individual to a few signals, in which
linkage option produces statistically significant differences.
not much emphasis is on their correlations.
We notice again that the Ward linkage produces the most
Overall, we find that detecting attacks depends heavily on distinctive results. We believe that speedometer readings cor-
(1) the linkage function used to compute the hierarchical relate closely with wheel speed and engine readings, so when
clusterings and (2) the severity of the attack in terms of the speedometer value is manipulated (via attack) to appear
the number of correlations perturbed. Specifically, out of the maximally, correlations broken with these signals should be
five attacks studied, the method based on Ward’s linkage captured by the similarity distributions.
detected all of them (5 of 5), followed by complete linkage
C. Max Engine Coolant Attack
(4 of 5). Both single and average linkage methods detected
fewer attacks (3 of 5). We report the p-values resulting from Table II shows the results of the max engine coolant
running the forensic framework for the remaining attacks (i.e., temperature attack. We notice that only average and Ward
max speedometer, max engine coolant, reverse light on, and linkages detect significant differences. In this attack, the engine
reverse light off) for each linkage in Table II. We elaborate on coolant signal value is set to maximum, which may cause
each attack scenario below. correlations with other engine signals to differ.
5
D. Reverse Light On Attack TABLE III
C LASSIFICATION R ESULTS (%).
Table II shows the comparison of similarity distributions
in the reverse light on attack. Note that only the complete Attack Scenario Precision Recall F1 score
and Ward linkages produce statistically significant differences
Correlated 88.00 100.00 93.62
(i.e., [b] and [d]). We also note that, although statistically Max Speedometer 88.00 100.00 93.62
significant, these p-values are not as small as in the correlated Max Engine Coolant 87.18 92.73 89.87
attack, which is a consequence of having an attack that is more Reverse Light On 87.23 93.18 90.11
Reverse Light Off 88.00 100.00 93.62
difficult to detect. This suggests that fewer correlated signals
are affected under this attack, (i.e., only a binary [1 bit] signal False positive rate (FPR), defined as F PF+T P
N
equals to
13.64%. Because this method is unsupervised, the training set
was targeted). is defined by the benign files in a given fold not the attack
files; hence, the false positive counts and rate are independent
E. Reverse Light Off Attack of the attack scenarios.
Table II shows that for the reverse light off attack, each
linkage method produces statistically significant differences.
Among these, Ward’s linkage produces the most significant
difference, followed by average and complete linkages. The
single linkage produces the least significant result, but it still
meets the threshold.
F. Detection Evaluation
We compute and compare the performance of the proposed
framework for classifying benign and attack files. Recall that
we use 12 benign files. To do so, we implement a cross-
validation as follows: We set apart three benign files to be
used for testing purposes (along with all attack files) and use
the remaining nine files for training, that is, for computing
the similarity distribution from benign files. We chose to Fig. 5. False positive rates for each benign file. All are at or below 20%,
hold out three benign files for testing to be consistent with except Benign dyno drive benign anomaly at 27%. Comparing
the maximum number of attack files found in the attack with Figure 3, we see the lone benign file to cluster with attack files in
preliminary analysis (i.e., Benign dyno reverse) has FPR = 20%.
dataset (i.e., correlated, max speedometer, reverse light on,
and reverse light off attacks each have three attack files). We
implementthe above train-test split of our benign files for each Unlike the false positive and true negative counts, the true
of the 12 9 = 220 possible combinations. This experimental positive and false negative results will vary across different
design allows us to decide if the difference between similarity attack scenarios. For correlated, max speedometer, and reverse
distributions in benign and attack scenarios is statistically light off attacks, our results are identical. In these attack
significant and further count the number of true positives (TP), scenarios, recall is 100.00% meaning that all the attack con-
false positives (FP), false negatives (FN), and true negatives figurations are detected even when changing the set of benign
(TN). files used in training. In these attack scenarios, precision is
We use the best set of parameter values derived from our 88% meaning that from the detected configurations, there are
previous experiments, i.e., Ward linkage, r = −5.0, α = 0.9, a few that come from the set of benign files or false positives.
and a significance level for the statistical hypothesis test of We obtain different results for the max engine coolant and
0.05. We report the following micro-averaged classification reverse light on attacks. In particular, the lower results for the
metrics based on these numbers: Precision, defined as T PT+F P
P, max engine coolant attack suggests that this attack was more
gives the likelihood that the computed similarity distribution difficult to detect when varying the set of benign files.
difference can be attributed to an attack; Recall, defined as
TP V. D ISCUSSION
T P +F N , gives the likelihood that attack files are detected.
Since higher precision often comes at the price of lower recall This work proposes a statistical forensic framework to detect
(and vice versa), it is important to consider a balance of both masquerade attacks in the CAN bus. We quantify the empirical
metrics, and the standard balanced metric is the F1 score, distribution of similarities of time series captures in benign
defined as 2 × precision×recall
precision+recall . Table III summarizes these and attack conditions. To accomplish this, we cluster time
findings. series using AHC and compute the similarity between their
Because our method is unsupervised, the training set is corresponding dendrograms. We find that masquerade attacks
defined by the benign files in a given fold not the attack files; can be detected effectively using the proposed framework, and
hence, the false positive counts and rate are independent of its discriminatory power depends on the linkage function being
the attack scenarios. In our experiment, the overall FPR is used in the AHC as well as the the impact of the attacks on
13.64%, with per-file FPR depicted in Figure 5. correlated signals.
6
These results suggest that the proposed framework is a the dendrograms or in groups of correlated signals. However,
viable approach for detecting masquerade attacks in a forensic we acknowledge that the optimal selection of these parameters
setting. We assume that the time series signal translation (or may depend on the type of attack and driving conditions. We
at least a high-fidelity translation) is readily available for use. did not explore those variables in this research.
This seems feasible with current and upcoming work in reverse Not real-time detection. As is currently presented, this is not
engineering CAN bus signals, such as CAN-D [14]. a real-time detector.
The proposed framework detects all masquerade attacks in Baseline comparison. We did not compare our proposed
the ROAD dataset when the Ward linkage is used. Note that forensic framework with other methods.
Ward’s linkage (d) is an appropriate choice in this context
because it tends to produce dense-enough clusters and enables VI. C ONCLUSION
the capture of meaningful changes in clustering assignations In this research, we proposed a forensics framework for the
when attacks occur. In contrast, for the single linkage (a), detection of masquerade attacks in the CAN bus. To ascertain
clusters of signals tend to be spread out and often not compact this fact in experiments, we compute time series clustering
enough with clusters having disparate elements. In the com- similarity. We show that the similarity of time series clusters
plete linkage (b), clusters of signals tend to be compact, but under benign conditions exhibits statistically significant dif-
not far enough apart, with clusters having similar members. ferences from the the similarity of time series clusters under
Additionally, for the average linkage (c), clusters tend to be attack conditions. We demonstrated these differences under
relatively compact and relatively far apart, which strikes a different attack scenarios with different levels of sophistication
balance between single and complete linkages. using data from the ROAD dataset. This work shows that it
We note that the detection performance may also depend is possible to detect masquerade attacks by effectively using
on specific attack features. Here, the detection difficulty is the time series clustering representation of signals in the CAN
based on the potential number of correlated signals that are bus and appropriate choices of parameters to group them.
affected by the attack. Thus, an attack scenario in which wheel Future work in this area includes the development of a
speed signals are modified, such as in the correlated attack, real-time IDS that uses the principles described in this work.
has a more noticeable effect of disrupting correlation with Additional work includes the translation of such developments
other signals than an attack that modifies the reverse lights to edge computing devices that can be integrated with real-
because the wheel speed correlation attack manipulates four world vehicle conditions.
highly correlated signals (and seemingly strong correlations to
many other signals), whereas the reverse light attacks modify VII. ACKNOWLEDGMENTS
a single signal that has correlation with gear selection but not This research was sponsored in part by Oak Ridge National
many other signals. Laboratory’s (ORNL’s) Laboratory Directed Research and
Detection metrics are also affected by the number of files Development Program. This research used resources of the
used to compute the similarity distribution. It other words, Compute and Data Environment for Science (CADES) at
augmenting the number of files to estimate the similarity ORNL, which is supported by the Office of Science of the
distribution helps to have better defined distributions that are U.S. Department of Energy under Contract No. DE-AC05-
later used for comparison purposes. This explains the lower 00OR22725.
results on the max engine coolant attack that contains a single
R EFERENCES
file.
To the best of our knowledge, the results from this research [1] C. Miller and C. Valasek, “Remote exploitation of an unaltered passenger
vehicle,” Black Hat USA, vol. 2015, p. 91, 2015.
are the first to show systemic evidence of a forensic framework [2] ——, “CAN message injection: OG dynamite edition,” Tech. Rep., 2016.
successfully detecting masquerade attacks based on time series [3] K.-T. Cho and K. G. Shin, “Fingerprinting electronic control units for
clustering using a dataset of realistic and verified masquerade vehicle intrusion detection,” in Proceedings of the 25th USENIX Security
Symposium, 2016, pp. 911–927.
attacks. The following are some limitations of our work. [4] M. E. Verma, M. D. Iannacone, R. A. Bridges, S. C. Hollifield, P. Mori-
ROAD dataset conditions. The ROAD dataset was collected ano, B. Kay, and F. L. Combs, “Addressing the Lack of Comparability &
on a single vehicle while being exercised mostly on a dy- Testing in CAN Intrusion Detection Research: A Comprehensive Guide
to CAN IDS Data & Introduction of the ROAD Dataset,” 2022, arXiv
namometer. We acknowledge that more comprehensive data preprint arXiv:2012.14600, January 2022.
collection using different vehicles may be necessary to gener- [5] H. M. Song, H. R. Kim, and H. K. Kim, “Intrusion detection system
alize our findings. We also are aware that driving conditions based on the analysis of time intervals of CAN messages for in-vehicle
network,” in Proceedings of the International Conference on Information
may affect correlations of CAN signals, and the dynamometer Networking (ICOIN), 2016, pp. 63–68.
conditions may be restrictive. [6] M. R. Moore, R. A. Bridges, F. L. Combs, M. S. Starr, and S. J. Prowell,
Parameter tuning. The proposed framework allows for flex- “Modeling inter-signal arrival times for accurate detection of CAN bus
signal injection attacks: a data-driven approach to in-vehicle intrusion
ible election of linkage functions (e.g., single, complete, detection,” in Proceedings of the 12th Annual Conference on Cyber and
average, Ward) for computing the hierarchical clusterings and Information Security Research, 2017, pp. 1–4.
the scaling parameter r and α to control the influence of [7] D. H. Blevins, P. Moriano, R. A. Bridges, M. E. Verma, M. D. Iannacone,
and S. C. Hollifield, “Time-based can intrusion detection benchmark,”
hierarchical clusterings with shared lineages. Here, we fixed in Proceedings of the Workshop on Automotive and Autonomous Vehicle
the values of r and α to focus on differences at higher levels of Security (AutoSec), 2021, pp. 1–6.
7
[8] U. E. Larson, D. K. Nilsson, and E. Jonsson, “An approach to A PPENDIX
specification-based attack detection for in-vehicle networks,” in Proceed-
ings of the IEEE Intelligent Vehicles Symposium, 2008, pp. 220–225. A. Hierarchical Clustering Definition
[9] M. Bresch and N. Salman, “Design and implementation of an intrusion Here we mathematically define hierarchical clustering. A partition
detection system (ids) for in-vehicle networks,” Master’s thesis, Univer- P, of S breaks S into non-overlapping subsets {C 1 , C 2 , . . . , C m },
i.e., S = i∈{1,2,...,m} C i . A clustering is a partition, so the elements
S
sity of Gothenburg, 2017.
[10] H. Olufowobi, C. Young, J. Zambreno, and G. Bloom, “SAIDuCANT: of the partition are called clusters. A partition B of S is nested in a
Specification-Based Automotive Intrusion Detection Using Controller partition A of S if every subset of B is a subset of a subset of A,
Area Network (CAN) Timing,” IEEE Transactions on Vehicular Tech- i.e., ∀C i ∈ B ∃j : C i ⊆ C j ∈ A. A hierarchical clustering is then a
nology, vol. 69, no. 2, pp. 1484–1494, 2019. sequence of partitions in which each partition is nested into the next
[11] W. Wu, R. Li, G. Xie, J. An, Y. Bai, J. Zhou, and K. Li, “A survey partition in the sequence.
of intrusion detection for in-vehicle networks,” IEEE Transactions on
Intelligent Transportation Systems, vol. 21, no. 3, pp. 919–933, 2019. B. Brief CluSim Overview
[12] A. Taylor, S. Leblanc, and N. Japkowicz, “Anomaly detection in auto-
mobile control network data with long short-term memory networks,”
Here we describe how CluSim works in brevity. See Gates et
in Proceedings of the IEEE International Conference on Data Science al. [23] for full details. Given S = {X 1 , X 2 . . . , X N } and a
and Advanced Analytics (DSAA), 2016, pp. 130–139. clustering A = {C 1 , C 2 , . . . , C m }, first make the bipartite graph
[13] M. Hanselmann, T. Strauss, K. Dormann, and H. Ulmer, “CANet: An with elements of S on the left, clustering assignments from A on
unsupervised intrusion detection system for high dimensional CAN bus the right, and edges denoting containment (i.e., (X i , C j ) is an edge
data,” IEEE Access, vol. 8, pp. 58 194–58 205, 2020. if and only if X i is in cluster C j ). Note that this can be naturally
[14] M. E. Verma, R. A. Bridges, J. J. Sosnowski, S. C. Hollifield, and M. D. extended to a dendrogram representing a hierarchical clustering A
Iannacone, “CAN-D: A Modular Four-Step Pipeline for Comprehen- by using a weighted bipartite graph, where the weight of the edges
sively Decoding Controller Area Network Data,” IEEE Transactions on is given by a hierarchy weighting function based on the level of
Vehicular Technology, vol. 70, no. 10, pp. 9685–9700, 2021. the cluster assignation within the hierarchical clustering. Next, the
[15] A. Ganesan, J. Rao, and K. Shin, “Exploiting consistency among bipartite graph is projected into the S elements producing a weighted,
heterogeneous sensors for vehicle anomaly detection,” SAE Technical directed graph that captures the inter-element relationships induced
Paper, Tech. Rep., 2017. by common cluster memberships. Now equipped with a weighted,
[16] H. Li, L. Zhao, M. Juliato, S. Ahmed, M. R. Sastry, and L. L. directed graph on S, the CluSim method captures high-order co-
Yang, “Poster: Intrusion detection system for in-vehicle networks using occurrences of elements by taking into account their paths to obtain
sensor correlation and integration,” in Proceedings of the 2017 ACM an equilibrium distribution of a personalized diffusion process on
SIGSAC Conference on Computer and Communications Security, 2017,
the graph, or personalized PageRank (PPR) [29], i.e., for each X i
pp. 2531–2533.
in S, a PageRank version with restart to X i given by probability
[17] P. Sharma, J. Petit, and H. Liu, “Pearson correlation analysis to detect
1 − α is used to produced stationary distribution pi . The element-
misbehavior in vanet,” in Proceedings of the 88th Vehicular Technology
Conference (VTC-Fall), 2018, pp. 1–5. wise similarity of an element X i in two different clusterings A and B
[18] F. Guo, Z. Wang, S. Du, H. Li, H. Zhu, Q. Pei, Z. Cao, and J. Zhao,
is found by comparing the stationary distributions pA B
i and pi using
1
“Detecting vehicle anomaly in the edge via sensor consistency and a variation of the ` metric for probability distributions. Finally, the
frequency characteristic,” IEEE Transactions on Vehicular Technology, similarity score of two clusterings A, B is the average of element-
vol. 68, no. 6, pp. 5618–5628, 2019. wise similarities. CluSim is parametrized by specifying r and α.
[19] T. He, L. Zhang, F. Kong, and A. Salekin, “Exploring inherent sensor Here, r is a scaling parameter that defines the relative importance of
redundancy for automotive anomaly detection,” in Proceedings of the memberships at different levels of the hierarchy. That is, the larger
57th ACM/IEEE Design Automation Conference (DAC), 2020, pp. 1–6. r, the more emphasis on comparing lower levels of the dendrogram
[20] N. Leslie, “An unsupervised learning approach for in-vehicle network (zoom in). In addition, α is a parameter that controls the influence of
intrusion detection,” in Proceedings of the 55th Annual Conference on hierarchical clusterings with shared lineages. That is, the larger α, the
Information Sciences and Systems (CISS), 2021, pp. 1–4. further the process will explore from the focus data element, so more
[21] A. Javed, B. S. Lee, and D. M. Rizzo, “A benchmark study on time series of the cluster structure is taken into account into the comparison. We
clustering,” Machine Learning with Applications, vol. 1, p. 100001, used r = 5.0 and α = 0.9 in Figure 2.
2020.
[22] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Intro-
duction to Cluster Analysis. John Wiley & Sons, 2009, vol. 344.
[23] A. J. Gates, I. B. Wood, W. P. Hetrick, and Y.-Y. Ahn, “Element-centric
clustering comparison unifies overlaps and hierarchy,” Scientific Reports,
vol. 9, no. 1, pp. 1–13, 2019.
[24] G. N. Lance and W. T. Williams, “A general theory of classificatory
sorting strategies: 1. hierarchical systems,” The Computer Journal,
vol. 9, no. 4, pp. 373–380, 1967.
[25] A. J. Gates and Y.-Y. Ahn, “CluSim: A Python package for calculating
clustering similarity,” Journal of Open Source Software, vol. 4, no. 35,
p. 1264, 2019.
[26] K. Pearson, “Notes on regression and inheritance in the case of two
parents,” Proceedings of the Royal Society of London, vol. 58, pp. 240–
242, 1895.
[27] H. B. Mann and D. R. Whitney, “On a test of whether one of two
random variables is stochastically larger than the other,” The Annals of
Mathematical Statistics, pp. 50–60, 1947.
[28] M. L. Waskom, “seaborn: statistical data visualization,” Journal of Open
Source Software, vol. 6, no. 60, p. 3021, 2021.
[29] T. H. Haveliwala, “Topic-sensitive PageRank: A context-sensitive rank-
ing algorithm for web search,” IEEE Transactions on Knowledge and
Data Engineering, vol. 15, no. 4, pp. 784–796, 2003.