0% found this document useful (0 votes)
11 views

Detecting CAN Masquerade Attacks With Signal Clustering Similarity

The document discusses detecting masquerade attacks on vehicle Controller Area Networks (CANs). Masquerade attacks modify frames' content instead of timing patterns, making them stealthy. The document proposes detecting masquerade attacks by analyzing time series clustering similarity of CAN signals, and comparing clustering across captures with and without attacks. Testing on a dataset with masquerade attacks showed changes in cluster assignments can indicate anomalous behavior, demonstrating this approach for detecting masquerade attacks.

Uploaded by

grazzymaya02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Detecting CAN Masquerade Attacks With Signal Clustering Similarity

The document discusses detecting masquerade attacks on vehicle Controller Area Networks (CANs). Masquerade attacks modify frames' content instead of timing patterns, making them stealthy. The document proposes detecting masquerade attacks by analyzing time series clustering similarity of CAN signals, and comparing clustering across captures with and without attacks. Testing on a dataset with masquerade attacks showed changes in cluster assignments can indicate anomalous behavior, demonstrating this approach for detecting masquerade attacks.

Uploaded by

grazzymaya02
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Detecting CAN Masquerade Attacks with Signal

Clustering Similarity
Pablo Moriano∗ , Robert A. Bridges† , Michael D. Iannacone†
∗ Computer Science and Mathematics Division, † Cyber Resilience and Intelligence Division
Oak Ridge National Laboratory
{moriano, bridgesra, iannaconemd}@ornl.gov

Abstract— Vehicular Controller Area Networks (CANs) are port) and remote access (e.g., Bluetooth, 5G). This increasing
susceptible to cyber attacks of different levels of sophistication. connectivity enables more advanced vehicle features at the
arXiv:2201.02665v2 [cs.CR] 11 Mar 2022

Fabrication attacks are the easiest to administer—an adversary expense of expanding the attack surface. By hijacking ECUs,
simply sends (extra) frames on a CAN—but also the easiest to
detect because they disrupt frame frequency. To overcome time- attackers may stealthily manipulate CAN frames resulting
based detection methods, adversaries must administer masquer- in life threatening incidents. For example, malicious frame
ade attacks by sending frames in lieu of (and therefore at the injection through cellular networks has resulted in unintended
expected time of) benign frames but with malicious payloads. acceleration, vehicle brake deactivation, and rogue steering
Research efforts have proven that CAN attacks, and masquerade wheel turning [1], [2].
attacks in particular, can affect vehicle functionality. Examples
include causing unintended acceleration, deactivation of vehicle’s CAN attacks are commonly classified using a three-tiered
brakes, as well as steering the vehicle. We hypothesize that taxonomy that includes fabrication, suspension, and masquer-
masquerade attacks modify the nuanced correlations of CAN ade attacks [3], [4]. Fabrication attacks inject extra frames,
signal time series and how they cluster together. Therefore, whereas suspension attacks remove benign frames; conse-
changes in cluster assignments should indicate anomalous be- quently, both categories usually disturb regular frame timing
havior. We confirm this hypothesis by leveraging our previ-
ously developed capability for reverse engineering CAN signals on the bus and can be accurately detected using time-based
(i.e., CAN-D [Controller Area Network Decoder]) and focus methods [5], [6], [7]. Masquerade attacks require the adversary
on advancing the state of the art for detecting masquerade to send frames in lieu of (and therefore at the expected time
attacks by analyzing time series extracted from raw CAN frames. of) benign frames but with malicious payloads. In masquerade
Specifically, we demonstrate that masquerade attacks can be attacks, adversaries first suspend frames of a specific ID and
detected by computing time series clustering similarity using
hierarchical clustering on the vehicle’s CAN signals (time series) then inject spoofed frames that modify the content of the
and comparing the clustering similarity across CAN captures frames instead of their timing patterns. Hence, masquerade
with and without attacks. We test our approach in a previously attacks are the stealthiest CAN attacks.
collected CAN dataset with masquerade attacks (i.e., the ROAD Masquerade attacks can still be detected because they alter
dataset) and develop a forensic tool as a proof of concept to the regular relationships of a vehicle’s subsystems. Using an
demonstrate the potential of the proposed approach for detecting
CAN masquerade attacks. example attack from the ROAD [4] dataset, an adversary that
gains control of the ECU(s) that communicate the wheel speed
I. I NTRODUCTION signals (four nearly identical signals) can modify the frames to
break the near perfect correlation, which will stop the vehicle
Modern vehicles are complex cyber-physical systems con-
(regardless of the driver’s actions). By understating the regular
taining up to hundreds of electronic control units (ECUs).
relationships of the vehicle’s CAN signals, this condition can
ECUs are embedded computers that communicate over a (few)
be flagged as anomalous, even if the modified signals are not
Controller Area Networks (CANs) to help control vehicle
abnormal when considered individually.
functionality, including acceleration, braking, steering, and
The widespread dependence of modern vehicles on CANs,
engine status, among others. CANs are vulnerable to cyber
combined with the security vulnerabilities has been meet with
exploitation, both by adversaries with direct physical access
a push to develop intrusion detection systems (IDSs) for CAN.
(e.g., through the standard on-board diagnostic [OBD] II
Generally, there are two types of IDSs methods: signature and
This manuscript has been co-authored by UT-Battelle, LLC, under contract DE-
machine learning (ML). Signature-based methods rely on a
AC05-00OR22725 with the US Department of Energy (DOE). The US government predefined set of rules for attack conditions. Behavior that
retains and the publisher, by accepting the article for publication, acknowledges that
the US government retains a nonexclusive, paid-up, irrevocable, worldwide license to
matches the expected signature is regarded as an attack [8], [9],
publish or reproduce the published form of this manuscript, or allow others to do [10]. However, given the heterogeneous nature of the CAN bus
so, for US government purposes. DOE will provide public access to these results
of federally sponsored research in accordance with the DOE Public Access Plan
in terms of transmission rates and broadcasting, effective rules
(http://energy.gov/downloads/doe-public-access-plan). for detecting attacks are difficult to design, which contributes
to high rates of false negatives [11]. In contrast, ML-based
Workshop on Automotive and Autonomous Vehicle Security (AutoSec) 2022
24 April 2022, San Diego, CA, USA methods profile benign behavior to identify anomalies or
ISBN 1-891562-75-4 generalized attack patterns when the traffic does not behave
https://dx.doi.org/10.14722/autosec.2022.23028
www.ndss-symposium.org
as expected. used it for cluster analysis to identify distinct driver behaviors
In doing this, many ML-based methods leverage the CAN’s and detect potential attacks. Li et al. [16] leveraged correla-
frame payloads [12], [13]. Note that in passenger vehicles, tions from multiple sensors to train a regression model that
signals (sensor values communicated in CAN frames) are estimates a targeted sensor value. They used the difference
encoded into the frame payloads via proprietary (nonpublic, between the estimated and observed sensor values as an
original equipment manufacturer-specified) mappings. Some anomaly signature. Sharma et al. [17] proposed to compute
IDSs operate on the binary payload (raw bits) [12], whereas Pearson correlation matrices of geolocation-related signals
others operate on the time series of signal values [13]. Process- (e.g., latitude, longitude, elevation, speed, heading) to estimate
ing the binary payload has a set of associated challenges. First, the state of neighboring vehicles and detect location forging
there is a semantic gap with respect to the signals encoded in misbehavior based on correlation matrices’ distance. Guo et
the payload. This means that a single CAN frame’s payload al. [18] proposed Edge Computing Based Vehicle Anomaly
usually contains several signals encoded in different formats, Detection, which focuses on analyzing the time and frequency
including byte ordering, signedness, label and units, and scale domains of sensor data to detect anomalies. In the first step,
and offset [14]. Second, detecting subtle masquerade attacks they flag abrupt changes in the correlations of sensor readings
requires analyzing the payload content because the correlation in the time domain as an indication of anomalies. For more
between certain signals may change when the frame content is accurate anomaly detection in the second step, they further
modified during an attack; hence, analyzing translated signals analyze the sudden change in sensor readings by computing
is a promising avenue. Thus, considering the relationship the change in power spectral density (PSD) of sensor data in
between signals is important for achieving a more effective the frequency domain. Under anomalous circumstances, the
defense against advanced masquerade attacks. PSD is expected to be higher in the high-frequency band. He
In this work, we propose a forensic framework to decide if at al. [19] explored using correlations between heterogeneous
recorded CAN traffic contains masquerade attacks. The pro- sensors to identify consistency among sensor data (e.g., accel-
posed framework works at the signal-level and leverages time eration, engine RPM, vehicle speed, GPS) and then utilize the
series clustering similarity to arrive at statistical conclusions. data to detect anomalous sensor measurements. They accom-
In doing so, we use available and readable signal-level CAN plished this by embedding the relationship of multiple sensors
traffic in benign and attack conditions to test our framework. into an autoencoder and pinpointing anomalies based on the
The results obtained from our evaluation demonstrate the magnitude of the reconstruction loss. Leslie [20] developed
capability of the proposed framework to detect masquerade an unsupervised learning method to detect malicious traffic
attacks in previously recorded CAN traffic with high accuracy. over J1939 data. This method converts categorical features to
Our contributions in this paper are summarized as follows: numerical features with a one-hot encoding scheme and uses
• We detail a CAN forensic framework based on time series an ensemble AHC algorithm that integrates multiple linkage
clustering similarity for detecting masquerade attacks. The options.
proposed framework is based on (1) clustering time series Compared with the studies mentioned above, the present
using agglomerative hierarchical clustering (AHC); (2) com- paper is unique in that we model temporal and signal-wise
puting a clustering similarity; and (3) performing hypothesis dependencies between CAN signals using time series cluster-
testing using the clustering similarity distributions to decide ing [21]. Specifically, we use AHC to generate a hierarchical
between benign and attack conditions. relationship between signals known as a dendrogram [22].
• We perform a sensitivity analysis of detection capabilities Using a hypothesis test, we show that masquerade attacks
with respect to the type of AHC used. We report our results are detectable by the resultant distribution of clustering sim-
and offer possible explanations. ilarities. In addition, our method is tested on real CAN
• We evaluate the proposed framework on masquerade attacks data containing hundreds of signals, as opposed to previous
from the ROAD dataset [4]. Evaluation results show very methods that used a dozen signals at most.
high effectiveness of detecting attacks of different levels
III. M ETHODS
of sophistication. Our results indicate that the proposed
forensic framework can be built upon to yield a viable real- Our focus is on processing a set of N signals (i.e., time
time IDS, but using these results to craft a short-time-to- series, S = {X 1 , X 2 , . . . , X N }) obtained from a CAN log
detection IDS is future work. captured during a vehicle’s drive. The subsections below ex-
plain the mathematical details of each step of our method and
II. R ELATED W ORK the data source used to perform this research. The proposed
Our research is informed by past work leveraging time framework applies AHC (see § III-A) to produce a dendrogram
series signal correlations for context characterization of cyber- of clusters of S. Given two captures, each producing its
physical systems. Here, we provide an overview of related corresponding dendrogram, we compute a similarity between
work in this area. the dendrograms using the CluSim method (see § III-B) [23].
Ganesan et al. [15] introduced the notion of using pair- Finally, the pairwise similarities from each capture’s den-
wise correlations of vehicular sensor readings (e.g., speed, drograms are used to create a hypothesis test to distinguish
acceleration, steering) to characterize behavioral context. They between a benign CAN capture and an attack CAN capture.

2
A. Hierarchical Clustering by this method exists in the range [0, 1], where 0 implies
Hierarchical clustering is a method that outputs a hierarchy maximally dissimilar clusters, and 1 corresponds to identical
of clusters (i.e., a set of nested clusters that are organized clusterings. We parametrized the clustering similarity method
in a tree-like diagram known as dendrogram). It works by by letting r = −5.0 and α = 0.9. Figure 2 shows a
transforming a proximity matrix into a sequence of nested comparison between similarity scores of three dendrograms.
partitions. Figure 1 depicts the details of a hierarchical clus- The key advantage of CluSim is that it does not suffer from
tering and its subsequent dendrogram using the agglomera- critical biases found in previous methods (e.g., normalized
tive approach. The mathematical formulation of hierarchical mutual information) and works for hierarchical clusterings,
clustering can be found in the Appendix A. Agglomerative including in conditions of skew cluster sizes and a different
number of clusters. We detail the main steps of CluSim in the
<latexit sha1_base64="lh15BHjErcxaseqYjOIG8rS4yCs=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu3ix9o/29iJ2lGD4DFhd2TnYDttcHS22RqOxjkvM1KWSzRmyKLCxhVqK7ik2h+VhgrkFzihoaMKMzJx1VxdB++dMg7SXLunbNCoDycqzIyZZolzZmjPzXJvJv6vNyxt2osroYrSkuLzRWkpA5sHswiCsdDErZw6glwLd2vAz1Ejty6ohS1X81N9fzSm1CXclBVOSsxQu7qufn7/Wledbsj2eyFj/XrRmWtUk38u1g37+2FnyVOUupD3HuY8jHVC1u3X7jvYcviPyXGnzb609364f9mDOdbgLbyDXWDQhQM4hCMYAIdLuIY/8NcD74P3yfs8t3qtu5ktWIDXuwW8B8KR</latexit>

(a)
<latexit sha1_base64="yc6rs5zDFq7J5h+bSPd5R7Sqb9k=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu0mH2v/bGMnakcNgseE3ZGdg+20wdHZZms4Gue8zEhZLtGYIYsKG1eoreCSan9UGiqQX+CEho4qzMjEVXN1Hbx3yjhIc+2eskGjPpyoMDNmmiXOmaE9N8u9mfi/3rC0aS+uhCpKS4rPF6WlDGwezCIIxkITt3LqCHIt3K0BP0eN3LqgFrZczU/1/dGYUpdwU1Y4KTFD7eq6+vn9a111uiHb74WM9etFZ65RTf65WDfs74edJU9R6kLee5jzMNYJWbdfu+9gy+E/JsedNvvS3vvh/mUP5liDt/AOdoFBFw7gEI5gABwu4Rr+wF8PvA/eJ+/z3Oq17ma2YAFe7xa+GMKS</latexit>

(b)
<latexit sha1_base64="jNoJxxHFmZLNoi5fQ44jZY/YFjg=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu3yj7V/trETtaMGwWPC7sjOwXba4OhsszUcjXNeZqQsl2jMkEWFjSvUVnBJtT8qDRXIL3BCQ0cVZmTiqrm6Dt47ZRykuXZP2aBRH05UmBkzzRLnzNCem+XeTPxfb1jatBdXQhWlJcXni9JSBjYPZhEEY6GJWzl1BLkW7taAn6NGbl1QC1uu5qf6/mhMqUu4KSuclJihdnVd/fz+ta463ZDt90LG+vWiM9eoJv9crBv298POkqcodSHvPcx5GOuErNuv3Xew5fAfk+NOm31p7/1w/7IHc6zBW3gHu8CgCwdwCEcwAA6XcA1/4K8H3gfvk/d5bvVadzNbsACvdwvAKcKT</latexit>

(c)
Appendix B.
Clusterings
<latexit sha1_base64="PO0NqngryUnjKe32h7rSS1UGwRM=">AAAB+3icbVBNS8NAEN3Ur1q/Yj16CRbBU0mkqMdCLx4r2A9oQ9lsJ+3SzSbsTqQl9K948aCIV/+IN/+N2zYHbX0w8Hhvhpl5QSK4Rtf9tgpb2zu7e8X90sHh0fGJfVpu6zhVDFosFrHqBlSD4BJayFFAN1FAo0BAJ5g0Fn7nCZTmsXzEWQJ+REeSh5xRNNLALvcRppg1RKoRFJcjPR/YFbfqLuFsEi8nFZKjObC/+sOYpRFIZIJq3fPcBP2MKuRMwLzUTzUklE3oCHqGShqB9rPl7XPn0ihDJ4yVKYnOUv09kdFI61kUmM6I4livewvxP6+XYnjnZ1wmKYJkq0VhKhyMnUUQzpArYChmhlCmuLnVYWOqKDM56JIJwVt/eZO0r6veTbX2UKvUa3kcRXJOLsgV8cgtqZN70iQtwsiUPJNX8mbNrRfr3fpYtRasfOaM/IH1+QP9GZUG</latexit>

<latexit sha1_base64="lh15BHjErcxaseqYjOIG8rS4yCs=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu3ix9o/29iJ2lGD4DFhd2TnYDttcHS22RqOxjkvM1KWSzRmyKLCxhVqK7ik2h+VhgrkFzihoaMKMzJx1VxdB++dMg7SXLunbNCoDycqzIyZZolzZmjPzXJvJv6vNyxt2osroYrSkuLzRWkpA5sHswiCsdDErZw6glwLd2vAz1Ejty6ohS1X81N9fzSm1CXclBVOSsxQu7qufn7/Wledbsj2eyFj/XrRmWtUk38u1g37+2FnyVOUupD3HuY8jHVC1u3X7jvYcviPyXGnzb609364f9mDOdbgLbyDXWDQhQM4hCMYAIdLuIY/8NcD74P3yfs8t3qtu5ktWIDXuwW8B8KR</latexit> <latexit sha1_base64="yc6rs5zDFq7J5h+bSPd5R7Sqb9k=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu0mH2v/bGMnakcNgseE3ZGdg+20wdHZZms4Gue8zEhZLtGYIYsKG1eoreCSan9UGiqQX+CEho4qzMjEVXN1Hbx3yjhIc+2eskGjPpyoMDNmmiXOmaE9N8u9mfi/3rC0aS+uhCpKS4rPF6WlDGwezCIIxkITt3LqCHIt3K0BP0eN3LqgFrZczU/1/dGYUpdwU1Y4KTFD7eq6+vn9a111uiHb74WM9etFZ65RTf65WDfs74edJU9R6kLee5jzMNYJWbdfu+9gy+E/JsedNvvS3vvh/mUP5liDt/AOdoFBFw7gEI5gABwu4Rr+wF8PvA/eJ+/z3Oq17ma2YAFe7xa+GMKS</latexit> <latexit sha1_base64="jNoJxxHFmZLNoi5fQ44jZY/YFjg=">AAACfnicbVFNT9tAEJ24tID7BYVbL1ZRKyq5qTeCJrmh9lCOtGoAKbHQeDMOK9Zrs7uuiCz/Dq7tz+LfsHGgIqFPWunNmzea0dukkMLYKLppeU9Wnj5bXVv3n794+er1xuabY5OXmtOA5zLXpwkakkLRwAor6bTQhFki6SS5+Dbrn/wmbUSuftlpQXGGEyVSwdE6KR5ZurJJWu3yj7V/trETtaMGwWPC7sjOwXba4OhsszUcjXNeZqQsl2jMkEWFjSvUVnBJtT8qDRXIL3BCQ0cVZmTiqrm6Dt47ZRykuXZP2aBRH05UmBkzzRLnzNCem+XeTPxfb1jatBdXQhWlJcXni9JSBjYPZhEEY6GJWzl1BLkW7taAn6NGbl1QC1uu5qf6/mhMqUu4KSuclJihdnVd/fz+ta463ZDt90LG+vWiM9eoJv9crBv298POkqcodSHvPcx5GOuErNuv3Xew5fAfk+NOm31p7/1w/7IHc6zBW3gHu8CgCwdwCEcwAA6XcA1/4K8H3gfvk/d5bvVadzNbsACvdwvAKcKT</latexit>

X1 (a) (b) (c)


<latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit>
Distance

4
<latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit>

2
<latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit>

X X <latexit sha1_base64="ORpKQ/9xAsYZw1oBTpa4tcbORvA=">AAACj3icbVFNaxsxEJU3aZtuv+ykt+QgGgo2LGa1JFn7kGKaQ9tbWuokYBszK2sdEUm7SNoQs+ylv6bX9t/031Rep6V2+kDwZuYNM3qT5IIbG4a/Gt7W9qPHT3ae+s+ev3j5qtnavTBZoSkb0kxk+ioBwwRXbGi5Fewq1wxkIthlcnO2rF/eMm14pr7aRc4mEuaKp5yCdalp82Bs2Z0tDZdVuw2dALeTTgef4rDbi/xp8zDshjXwQ0LuyeHgdVrjfNpqjMazjBaSKUsFGDMiYW4nJWjLqWCVPy4My4HewJyNHFUgmZmU9Tcq/NZlZjjNtHvK4jr7b0cJ0piFTJxSgr02m7Vl8n+1UWHT3qTkKi8sU3Q1KC0EthleeoJnXDNqxcIRoJq7XTG9Bg3UOufWptytVvX98YylzvI6LGFegATt4qr88uF9VUZxQI57ASH9al2ZaVDzvyoSB/3jINrQ5IXOxR8NcRpCooDE/cqdg2ya/5BcRF1y0j367O5yhFbYQfvoDWojgmI0QB/RORoiir6h7+gH+um1vNh75w1WUq9x37OH1uB9+g0yZcZu</latexit> <latexit sha1_base64="kPtqeFbkxWGq/kxGNvMMkGoDLQ8=">AAACj3icbVFNTxsxEHWWltLtB0nprT1YRZUSaRWtI2DJARS1h5YbrRpASqJo1vEGC9u7sr2IaLUXfg1X+Df8mzobWpHAkyy9mXmjGb+JM8GNDcP7mrf24uX6q43X/pu3795v1hsfTkyaa8r6NBWpPovBMMEV61tuBTvLNAMZC3YaX3yf108vmTY8VX/sLGMjCVPFE07ButS4/nlo2ZUtDJdlsxm3AtykrRY+wGE72vPH9e2wHVbATwl5INu9j0mF43GjNhhOUppLpiwVYMyAhJkdFaAtp4KV/jA3LAN6AVM2cFSBZGZUVN8o8VeXmeAk1e4pi6vs444CpDEzGTulBHtuVmvz5HO1QW6T/VHBVZZbpuhiUJILbFM89wRPuGbUipkjQDV3u2J6Dhqodc4tTblarOr7wwlLnOVVWMA0BwnaxWXx+8e3suhEAdndDwjplsvKVIOa/leRKOjuBp0VTZbrTPzTEKchpBOQqFu6c5BV85+Sk06b7LV3frm77KAFNtAn9AU1EUER6qGf6Bj1EUXX6Abdojuv4UXeoddbSL3aQ88WWoJ39Bc8xsZz</latexit>

{{X 1 , X 4 , X 2 , X 3 }}
<latexit sha1_base64="tHtQS5nNUb+jjgkoA5NlxTAuHLY=">AAACq3icbVFLbxMxEHaWVwmvFLhxsaiQOCzReklJcqvgAMfySBuRDdGsM5ta9T5ke1Ejy3+FX8MV7vwbnN2CSMpIo/k8841n9E1aSaFNFP3qBNeu37h5a+92987de/cf9PYfnuiyVhwnvJSlmqagUYoCJ0YYidNKIeSpxNP0/M2mfvoVlRZl8cmsK5znsCpEJjgYn1r0RolNDF6Y5idb1aqS6KxN7PSLZS6kPgzaELfhpUuc877oHUT9qDF6FbBLcHD0OGvseLHfmSXLktc5FoZL0HrGosrMLSgjuB/aTWqNFfBzWOHMwwJy1HPbLOboM59Z0qxU3gtDm+y/HRZyrdd56pk5mDO9W9sk/1eb1SYbza0oqtpgwdtBWS2pKelGL7oUCrmRaw+AK+F3pfwMFHDjVd2actGu2u0mS8z8OVpJYVVDDsq/nf3w9rWz8TBkh6OQsbHbZpYKitVfFhuG48Mw3uH8OVDDYZ7DWByy4XhzDrYr/lVwEvfZq/7gvb/LgLS2R56Qp+Q5YWRIjsg7ckwmhJNv5Dv5QX4GL4KPwecgaalB57LnEdmyAH8DpmfVWQ==</latexit>

sim((a), (b)) = 0.82 sim((b), (c)) = 0.76

Distance
<latexit sha1_base64="e1gCTqy9Q3csip6NEZtqM4PlqD0=">AAAB+HicbVBNS8NAEN3Ur1o/GvXoJVgETyWRoh4LevBYwX5AG8pmO2mXbjZhdyLW0F/ixYMiXv0p3vw3btsctPXBwOO9GWbmBYngGl332yqsrW9sbhW3Szu7e/tl++CwpeNUMWiyWMSqE1ANgktoIkcBnUQBjQIB7WB8PfPbD6A0j+U9ThLwIzqUPOSMopH6drmH8IjZjdlEJYNp3664VXcOZ5V4OamQHI2+/dUbxCyNQCITVOuu5yboZ1QhZwKmpV6qIaFsTIfQNVTSCLSfzQ+fOqdGGThhrExJdObq74mMRlpPosB0RhRHetmbif953RTDKz/jMkkRJFssClPhYOzMUnAGXAFDMTGEMsXNrQ4bUUUZmqxKJgRv+eVV0jqvehfV2l2tUq/lcRTJMTkhZ8Qjl6RObkmDNAkjKXkmr+TNerJerHfrY9FasPKZI/IH1ucPV1GTgA==</latexit>

{{X 1 , X 4 }, {X 2 , X 3 }}
<latexit sha1_base64="nDdZAjrAeZjuo545xhGZsNfjglk=">AAACyXicbZFNbxMxEIadLR9l+UoLNy4WFRKHVbReUtLcKjhQiUtBJI2UDdGs402tej+wvSjB8om/xR/hypX+CJzdRmoSRrL8auaxZ/ROUgqudBj+bnl7d+7eu7//wH/46PGTp+2Dw6EqKknZgBaikKMEFBM8ZwPNtWCjUjLIEsEukqv3q/rFdyYVL/IvelmySQbznKecgnapaXsYGxxrttD1VwbmFWQg3WfWxGb01RAbYHd1bWydukUWEvL5mooa6s2Kwji2/rR9FHbCOvCuIDfi6PR5Wsf59KA1jmcFrTKWaypAqTEJSz0xIDWnglk/rhQrgV7BnI2dzCFjamLqWSx+5TIznBbSnVzjOnv7hYFMqWWWODIDfam2a6vk/2rjSqcnE8PzstIsp02jtBJYF3jlJp5xyagWSyeASu5mxfQSJFDtPN/osmhG9f14xlLn767fnz+8sybqBeT4JCCkbzfJtd8NRXpB/ziItpiykqVYM8QxhEQB6fWtWwfZNn9XDKMOedvpfnJ76aIm9tEL9BK9RgT10Ck6Q+dogCj6hf6gv+ja++h98xbejwb1WjdvnqGN8H7+A5Yh4dU=</latexit>

X3
<latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit>

{{X 1 , X 4 }, {X 2 }, {X 3 }}
<latexit sha1_base64="4dxjUewpABYhO8aiZT32T2TitzA=">AAAC53ichZFLbxMxEMe9Wx7t8koLNy4WFRKHVbTepqS5VXCAY0GkjZQN0azjTa16vSvbixpZ/gzcEFc+FnwanE1BJEEwkuW/Zn7z0ExeC65NknwPwp1bt+/c3d2L7t1/8PBRZ//gXFeNomxIK1GpUQ6aCS7Z0HAj2KhWDMpcsIv86vUyfvGJKc0r+cEsajYpYS55wSkY75p2TGYzw65NW8nCvIESlK/lbGZHHy1xMfZfz2XOq3+R6f+RoyWSuSiadg6TbtIa3hbkRhyePilaO5vuB+NsVtGmZNJQAVqPSVKbiQVlOBXMRVmjWQ30CuZs7KWEkumJbedw+Ln3zHBRKf+kwa33zwwLpdaLMvdkCeZSb8aWzr/Fxo0pTiaWy7oxTNJVo6IR2FR4uWs844pRIxZeAFXcz4rpJSigxl9krcv1atQoymas8Bvb3uD7N6+cTfsxOT6JCRm4dbJSIOe/KdKPB8dxusHUjarFL4Z4hpA0Jv2B8+cgm8vfFudpl7zs9t75u/TQynbRU/QMvUAE9dEpeovO0BBR9CNAwV4QhTz8HH4Jv67QMLjJeYzWLPz2E6t27UY=</latexit>

<latexit sha1_base64="e1gCTqy9Q3csip6NEZtqM4PlqD0=">AAAB+HicbVBNS8NAEN3Ur1o/GvXoJVgETyWRoh4LevBYwX5AG8pmO2mXbjZhdyLW0F/ixYMiXv0p3vw3btsctPXBwOO9GWbmBYngGl332yqsrW9sbhW3Szu7e/tl++CwpeNUMWiyWMSqE1ANgktoIkcBnUQBjQIB7WB8PfPbD6A0j+U9ThLwIzqUPOSMopH6drmH8IjZjdlEJYNp3664VXcOZ5V4OamQHI2+/dUbxCyNQCITVOuu5yboZ1QhZwKmpV6qIaFsTIfQNVTSCLSfzQ+fOqdGGThhrExJdObq74mMRlpPosB0RhRHetmbif953RTDKz/jMkkRJFssClPhYOzMUnAGXAFDMTGEMsXNrQ4bUUUZmqxKJgRv+eVV0jqvehfV2l2tUq/lcRTJMTkhZ8Qjl6RObkmDNAkjKXkmr+TNerJerHfrY9FasPKZI/IH1ucPV1GTgA==</latexit>
{{X 1 }, {X 4 }, {X 2 }, {X 3 }}
<latexit sha1_base64="dsosNUe8GowCTi+iaKqoXwEi+mc=">AAACrnicbVFNb9NAEN2YAsV8pdBbLysqJA5W5DUpacSlag9wLBVJI8UmGm/WyarrtbW7RkQr/xh+Dddy5N+wcVKUpIx2pPdm3mhmZ9JScG3C8E/Le7D38NHj/Sf+02fPX7xsH7wa6qJSlA1oIQo1SkEzwSUbGG4EG5WKQZ4Kdp3eXCzz19+Z0ryQX82iZEkOM8kzTsG40KT9MbbYvdE3S2oc18GadDdJtEneLwl27vuT9nHYCRvD9wFZg+Ozw6yxy8lBaxxPC1rlTBoqQOsxCUuTWFCGU8FqP640K4HewIyNHZSQM53Y5pc1fusiU5wVyrk0uIluVljItV7kqVPmYOZ6N7cM/i83rkx2mlguy8owSVeNskpgU+DlyvCUK0aNWDgAVHE3K6ZzUECNW+xWlx+rUX0/nrLMXaShFmYV5KAcr+3Vp/PaRr2AnJwGhPTrbWWhQM7+qUgv6J8E0Y6mrFQp7jTEaQiJAtLr1+4cZHf598Ew6pAPne4Xd5cuWtk+OkJv0DtEUA+doc/oEg0QRT/RL3SLfnuhN/QSb7KSeq11zWu0Zd78LzJT0js=</latexit>

X1 X4 X2 X3
<latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit> <latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit> <latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit> <latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit>

Time series
<latexit sha1_base64="pw5LD79KoXv6qexSC9opI02A02g=">AAAB+3icbVDLSgNBEJyNrxhfazx6GQyCp7ArQT0GvHiMkBckS5iddJIhsw9meiVh2V/x4kERr/6IN//GSbIHTSxoKKq66e7yYyk0Os63Vdja3tndK+6XDg6Pjk/s03JbR4ni0OKRjFTXZxqkCKGFAiV0YwUs8CV0/On9wu88gdIiCps4j8EL2DgUI8EZGmlgl/sIM0ybIgCqQQnQ2cCuOFVnCbpJ3JxUSI7GwP7qDyOeBBAil0zrnuvE6KVMoeASslI/0RAzPmVj6BkasgC0ly5vz+ilUYZ0FClTIdKl+nsiZYHW88A3nQHDiV73FuJ/Xi/B0Z2XijBOEEK+WjRKJMWILoKgQ6GAo5wbwrgS5lbKJ0wxjiaukgnBXX95k7Svq+5NtfZYq9RreRxFck4uyBVxyS2pkwfSIC3CyYw8k1fyZmXWi/VufaxaC1Y+c0b+wPr8AXPylK0=</latexit>

X1 X4 X2 X3
<latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit> <latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit> <latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit> <latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit>

X1 X4 X2 X3 X1 X4 X2 X3
<latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit> <latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit> <latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit> <latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit> <latexit sha1_base64="76yJpN9sto+vxFA0O6hXKkWatRs=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx5zb9qvVN2aOwdZJV5BqlCg2a989QYJy2KukElqTNdzUwxyqlEwyaflXmZ4StmYDnnXUkVjboJ8fuyUnFtlQKJE21JI5urviZzGxkzi0HbGFEdm2ZuJ/3ndDKObIBcqzZArtlgUZZJgQmafk4HQnKGcWEKZFvZWwkZUU4Y2n7INwVt+eZW0LmveVa1+X6826kUcJTiFM7gAD66hAXfQBB8YCHiGV3hzlPPivDsfi9Y1p5g5gT9wPn8AnHeOhg==</latexit> <latexit sha1_base64="buNJnrc6wr1cnqVqdrv8EfK/G8k=">AAAB7HicbVBNS8NAEJ34WetX1aOXxSJ4KokU9Vjw4rGCaQttLJvtpl262YTdiVBCf4MXD4p49Qd589+4bXPQ1gcDj/dmmJkXplIYdN1vZ219Y3Nru7RT3t3bPzisHB23TJJpxn2WyER3Qmq4FIr7KFDyTqo5jUPJ2+H4dua3n7g2IlEPOEl5ENOhEpFgFK3kdx7z+rRfqbo1dw6ySryCVKFAs1/56g0SlsVcIZPUmK7nphjkVKNgkk/LvczwlLIxHfKupYrG3AT5/NgpObfKgESJtqWQzNXfEzmNjZnEoe2MKY7MsjcT//O6GUY3QS5UmiFXbLEoyiTBhMw+JwOhOUM5sYQyLeythI2opgxtPmUbgrf88ippXda8q1r9vl5t1Is4SnAKZ3ABHlxDA+6gCT4wEPAMr/DmKOfFeXc+Fq1rTjFzAn/gfP4AoQaOiQ==</latexit> <latexit sha1_base64="rRmNyVlC88pSJpvD9EBst9Lbs+8=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0lKUY8FLx4rmLbQxrLZbtulm03YnQgl9Dd48aCIV3+QN/+N2zYHbX0w8Hhvhpl5YSKFQdf9dgobm1vbO8Xd0t7+weFR+fikZeJUM+6zWMa6E1LDpVDcR4GSdxLNaRRK3g4nt3O//cS1EbF6wGnCg4iOlBgKRtFKfucxq8365YpbdRcg68TLSQVyNPvlr94gZmnEFTJJjel6boJBRjUKJvms1EsNTyib0BHvWqpoxE2QLY6dkQurDMgw1rYUkoX6eyKjkTHTKLSdEcWxWfXm4n9eN8XhTZAJlaTIFVsuGqaSYEzmn5OB0JyhnFpCmRb2VsLGVFOGNp+SDcFbfXmdtGpV76pav69XGvU8jiKcwTlcggfX0IA7aIIPDAQ8wyu8Ocp5cd6dj2VrwclnTuEPnM8fnfyOhw==</latexit> <latexit sha1_base64="+cpSzqyrAVZ25KC2lJhMslHPYUk=">AAAB7HicbVBNS8NAEJ3Ur1q/qh69LBbBU0m0qMeCF48VTFtoY9lsp+3SzSbsboQS+hu8eFDEqz/Im//GbZuDtj4YeLw3w8y8MBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqeNUMfRZLGLVDqlGwSX6hhuB7UQhjUKBrXB8O/NbT6g0j+WDmSQYRHQo+YAzaqzktx+zy2mvXHGr7hxklXg5qUCORq/81e3HLI1QGiao1h3PTUyQUWU4EzgtdVONCWVjOsSOpZJGqINsfuyUnFmlTwaxsiUNmau/JzIaaT2JQtsZUTPSy95M/M/rpGZwE2RcJqlByRaLBqkgJiazz0mfK2RGTCyhTHF7K2EjqigzNp+SDcFbfnmVNC+q3lW1dl+r1Gt5HEU4gVM4Bw+uoQ530AAfGHB4hld4c6Tz4rw7H4vWgpPPHMMfOJ8/n4GOiA==</latexit>

Fig. 1. A hierarchical clustering using the agglomerative approach. (a) An Time series
<latexit sha1_base64="pw5LD79KoXv6qexSC9opI02A02g=">AAAB+3icbVDLSgNBEJyNrxhfazx6GQyCp7ArQT0GvHiMkBckS5iddJIhsw9meiVh2V/x4kERr/6IN//GSbIHTSxoKKq66e7yYyk0Os63Vdja3tndK+6XDg6Pjk/s03JbR4ni0OKRjFTXZxqkCKGFAiV0YwUs8CV0/On9wu88gdIiCps4j8EL2DgUI8EZGmlgl/sIM0ybIgCqQQnQ2cCuOFVnCbpJ3JxUSI7GwP7qDyOeBBAil0zrnuvE6KVMoeASslI/0RAzPmVj6BkasgC0ly5vz+ilUYZ0FClTIdKl+nsiZYHW88A3nQHDiV73FuJ/Xi/B0Z2XijBOEEK+WjRKJMWILoKgQ6GAo5wbwrgS5lbKJ0wxjiaukgnBXX95k7Svq+5NtfZYq9RreRxFck4uyBVxyS2pkwfSIC3CyYw8k1fyZmXWi/VufaxaC1Y+c0b+wPr8AXPylK0=</latexit>

example of a hierarchical clustering: here, {X 1 , X 2 , X 3 , X 4 } is a set of time


series to be clustered. They are grouped in a hierarchy of clusters by color Fig. 2. Hierarchical clustering similarity comparison using the CluSim
(i.e., aquamarine, orange, purple). The thickness of the clusters represents how method [23]. Both (a) and (c) depict slightly modified dendrograms from
the hierarchy is built: close time series (aquamarine), more distant time series that of (b). The similarity between (a) and (b) dendrograms is 0.82, whereas
(orange), and most distant time series (purple). (b) Corresponding dendrogram the similarity between (b) and (c) dendrograms is 0.76. This reflects that (a)
of the hierarchical clustering depicted in (a). Time series are placed in the and (b) are more similar than (b) and (c).
x-axis, and their relative distance is shown in the y-axis. Cluster colors
correspond with the ones in (a). (c) Different clusters at each level of the
hierarchy written explicitly. Note that cutting the dendrogram horizontally
creates clusterings. C. Dataset
We used the ROAD dataset [4] to test our proposed forensic
algorithms1 require a definition of dissimilarity between clus-
framework. The ROAD dataset is an open set of CAN data
ters called a linkage. The most popular linkages are the
collected from a real vehicle with fabrication attacks and a
(a) single linkage, which is the smallest dissimilarity between
few advanced attacks (e.g., masquerade attacks). All of these
two points in opposite clusters; (b) complete linkage, which
attacks are physically verified (i.e., the effect of the CAN ma-
is the largest dissimilarity between two points in opposite
nipulation is observed and documented). Notably, masquerade
clusters; (c) average linkage, which is the average dissimilarity
attacks are also included but are simulated from the targeted ID
over all points in opposite groups; and (d) Ward’s linkage,
fabrication attacks by removing the benign frames of the target
which focuses on how the sum of squares will increase when
ID. The ROAD dataset provides translated CAN time series
opposite groups are merged (or on the analysis of cluster
following a similar schema used by Hanselmann et al. [13].
variance). Ward’s linkage tends to produce similar clusters as
The fundamental advantage of using the ROAD dataset over
the k-means method [21].
prior released datasets is that it contains realistic, verified, and
Given a CAN capture that has been translated into its labeled attacks as opposed to synthetic ones. This opens the
constituent signal time series, S = {X 1 , X 2 , . . . , X N }, we possibility for the evaluation, comparison, and validation of
wish to cluster these time series to produce a dendrogram that CAN signal-based IDS methods in realistic conditions.
represents their hierarchical structure. Each linkage choice (a)–
We tested our forensic framework on the subset of masquer-
(d) produces a potentially different dendrogram. Understand-
ade attacks within the ROAD dataset. Each masquerade attack
ing the most effective choice is a research question we address.
file in the ROAD dataset contains time series from hundreds of
B. Clustering Similarity IDs that have a few to dozens of signals each. Table I shows
the files we used from the ROAD dataset. Specifically, we
Given two hierarchical clusterings (dendrograms) of a set S,
tested the following attacks (in increasing order of detection
a clustering similarity quantifies a distance between them. We
difficulty): correlated signal, max speedometer, max engine
computed the similarity between dendrograms using the open-
coolant temperature, reverse light on, and reverse light off. In
source CluSim method [25]. The similarity value provided
the correlated signal attack, the correlation of the four wheel
1 There are two main categories of hierarchical clustering: agglomerative and speed values is altered by manipulating their individual values.
divisive. Agglomerative places each object in its own cluster and gradually In the max speedometer and max engine coolant attacks, the
merges these atomic clusters into larger ones until all objects belong to a single speedometer and coolant temperature values are modified to
cluster. Divisive reverses the process starting with all objects belonging to a
single cluster and dividing them into smaller pieces [24]. Here, we used the their maximum. In the reverse light attacks, the state of the
agglomerative approach by virtue of its simplicity. reverse lights is altered to not match what gear the car is using

3
TABLE I
CAN CAPTURES USED FROM THE ROAD DATASET [4].
4) Similarity Distribution Computation: Once each dendro-
Description # Files Used Duration (min) gram has been computed for each file, we compute empirical
3 distributions of similarity between pairs of dendrograms using
Training

Dynamometer Various Ambient 10 108.2


Road Various Ambient 2 3 70.6
Total 12 12 178.8
the method described in § III-B. We focus on two distinct
Correlated Signal Fabrication Attack 3 7 1.3 groups. The first group is composed of all dendrograms
Correlated Signal Masquerade Attack 3 3 1.3 derived from files in benign conditions (i.e., 12 files). In doing
Fuzzing Fabrication Attack 3 7 0.7
Max Engine Coolant Temp Fabrication Attack 1 7 0.4
so, we computed pairwise similarities of dendrograms in this
Max Engine Coolant Temp Masquerade Attack 1 3 0.4 group, that is 12 2 = 66 possible combinations. The second
Max Speedometer Fabrication Attack 3 7 3.9 group comes from the similarity between dendrograms in each
Testing

Max Speedometer Masquerade Attack 3 3 3.9


Reverse Light Off Fabrication Attack 3 7 2.1 category of attack (i.e., correlated, max speedometer, max
Reverse Light Off Masquerade Attack 3 3 2.1 engine coolant, reverse light on, reverse light off) and each of
Reverse Light On Fabrication Attack 3 7 3.2 the files in benign conditions. This produces a varying number
Reverse Light On Masquerade Attack 3 3 3.2
Accelerator Attack (In Drive) 2 7 2.7 of combinations based on the number of files in each of the
Accelerator Attack (In Reverse) 2 7 3.2 attack categories.
Total 33 13 10.9
5) Hypothesis Testing: We used the Mann-Whitney U
test [27] and set the significance level to 0.05 to test the null
(i.e., the reverse light is on when the vehicle is not in reverse, hypothesis that the distribution underlying benign conditions
and the reverse light is off when the vehicle is in reverse). is the same as the distribution underlying attack conditions.
We used the complete set of 12 files to characterize the The Mann-Whitney U test is a nonparametric test often used
behavior in benign conditions (≈3 hours of data). We used as a test of difference in location between distributions.
each of the files in the masquerade attack category: 3 files
in the correlated attack (≈1.3 minutes), 3 files for the max E. Motivational Preliminary Data Analysis
speedometer attack (≈3.9 minutes), 1 file for the max engine As a first step to investigate our hypothesis that masquerade
coolant attack (≈0.4 minutes), 3 files for the reverse light on attacks will disrupt clustering based on correlation of the
attack (≈3.2 minutes), 3 files for the reverse light off attack CAN signals, we compute and visualize the CluSim similarity
(≈2.1 minutes) to characterize the attack conditions. (§ III-B) between every pair of files in the dataset discussed
above (12 benign files and 13 masquerade attack files of five
D. Pipeline Detailed Steps
different attack scenarios, all in their signal time series format).
The following steps are performed to arrive at the results. More specifically, we follow the steps describe in § III-D1
1) Same Length Time Series Transformation: Each ID has to interpolate time series to identical time steps, § III-D2 to
a characteristic frequency that is unique in most cases. We compute Pearson correlation for each pair of signals in a CAN
modified the time series to have the same frequency by linearly file, and § III-D3 to produce a dendrogram for the signals in
interpolating them in common timestamps. We chose a base- each file. We then apply CluSim, and visualize the pairwise
line frequency of 10 Hz because it is the lowest frequency in similarity results in Figure 3.
the IDs in this dataset. This ensures that ∀X i ∈ S, |X i | = T . To see if the benign files signal cluster dendrograms do
Time series of the same length enable easier computation of indeed “look” similar to each other but different than signal
similarity. After this, we also discarded any constant time cluster dendrograms from masquerade attacks, we apply AHC
series and normalized each remaining series to the unit norm. to all the files based on their CluSim similarities to each other.
2) Time Series Correlation Computation: We computed Figure 3 shows the resulting dendrogram revealing four main
pairwise Pearson correlations [26] among time series. Time clusters. Notably the first two (from left to right) contain all
series that have a positive correlation are expected to move in but one benign file (11/12), and only two (of 13) attack files.
tandem (i.e., when one measurement increases or decreases, This provides strong empirical motivation to pursue a more
the other measurement also increases or decreases). Pearson formal detection experiment based on hierarchical clustering
correlation values that are close to ±1.0 indicate strong of signals.
positive or negative correlation. As vehicle subsystems are
dependent, we expect (1) clusters of correlated signals (e.g., IV. R ESULTS
increasing speed of the vehicle matches increases in the Here we present our results on the efficacy of the proposed
speedometer reading and the speed of all four wheels), and forensic framework for detecting masquerade attacks in the
(2) such relationships to be broken or significantly changed CAN bus. We focus on analyzing the detection capabilities
upon a cyber attack. for each of the attacks described in § III-C. Figure 4 plots the
3) Hierarchical Clustering Computation: These pairwise probability density functions in the correlated attack (in benign
correlations populate a correlation matrix, which is used as and attack conditions) using the Gaussian kernel density
the input for AHC. The output is a dendrogram depicting estimate implementation from seaborn [28] with a default
hierarchies between clusters. We explored the effect of linkage bandwidth. We study the effect of the linkage selection (in the
selection (i.e., single, complete, average, Ward) in our detec- hierarchical clustering) for distinguishing between benign and
tion framework. attack conditions: (a) single, (b) complete, (c) average, and

4
TABLE II
S TATISTICAL H YPOTHESIS T EST R ESULTS (p- VALUES ).

Attack Scenario Single Complete Average Ward


Correlated 0.005 0.002 0.123 0.000
Max Speedometer 0.003 0.017 0.007 0.000
Max Engine Coolant 0.251 0.065 0.006 0.008
Reverse Light On 0.378 0.007 0.057 0.004
Reverse Light Off 0.039 0.004 0.004 0.000
Statistically significant values are printed in bold.

A. Correlated Attack
Figure 4 shows the comparison of similarity distributions
in the correlated attack. Among these, we found that the
framework that used the average linkage (i.e., [c]) is not
able to differentiate between benign and attack conditions. We
also noticed that the Ward’s method has the most distinctive
difference (i.e., smaller p-value).
Benign Attack
Fig. 3. CluSim cluster similarity heatmap for each pair of files from the ROAD 12 (a) p=0.005 30 (b)
dataset (12 benign files, 13 masquerade attack files of five attack scenario 10 25 p=0.002
types) depicted. For each file, a hierarchical clustering dendrogram is produced
based on similarity of the file’s CAN signals. For each pair of files, CluSim 8 20
produces a similarity measure between the two file’s dendrograms using 6 15
Ward’s linkage, r = −5.0, and α = 0.9, which is visualized by colors in the 4 10
heatmap. Atop the heatmap is a dendrogram showing hierarchical clustering
of all files based on their the CluSim similarities. We used Euclidean distance 2 5
on these similarities with Ward linkage. Notably, there are four main clusters. 0 0
Density

From left to right, the first and second contain all benign files except Benign 0.8 0.9 0.80 0.85 0.90 0.95
dyno reverse file, and only two attack files, i.e., Attack correlated
masquerade 3 and Attack reverse light on 3, while the final 15.0 (c) p=0.123 25 (d)
two clusters contain all remaining attack files and the aforementioned benign 12.5 20 p=0.000
file. As a preliminary analysis, this heatmap and clustering give positive 10.0 15
results for masquerade attack detection by comparing Pearson correlation 7.5
signal clusters. 10
5.0
2.5 5
0.0 0
(d) Ward. We also report the p-value, using three decimals, 0.7 0.8 0.9 0.80 0.85 0.90 0.95
of the associated Mann-Whitney U test to compare the two Similarity
Fig. 4. Empirical distribution comparison of the correlated attack for each
distributions in the inset; statistically significant values (i.e., linkage selection: (a) single, (b) complete, (c) average, and (d) Ward. Results
p-value < 0.05) are printed in bold. Recall that we fixed from these distributions appear in the first row of Table II.
the scaling parameter r = −5 for comparing hierarchical
clusterings. This is because we want to capture differences
at higher levels of the dendrograms, in which the focus is on B. Max Speedometer Attack
coarser groups of multiple correlated signals, instead of more
Table II shows that for the max speedometer attack, each
fine-grained groupings of individual to a few signals, in which
linkage option produces statistically significant differences.
not much emphasis is on their correlations.
We notice again that the Ward linkage produces the most
Overall, we find that detecting attacks depends heavily on distinctive results. We believe that speedometer readings cor-
(1) the linkage function used to compute the hierarchical relate closely with wheel speed and engine readings, so when
clusterings and (2) the severity of the attack in terms of the speedometer value is manipulated (via attack) to appear
the number of correlations perturbed. Specifically, out of the maximally, correlations broken with these signals should be
five attacks studied, the method based on Ward’s linkage captured by the similarity distributions.
detected all of them (5 of 5), followed by complete linkage
C. Max Engine Coolant Attack
(4 of 5). Both single and average linkage methods detected
fewer attacks (3 of 5). We report the p-values resulting from Table II shows the results of the max engine coolant
running the forensic framework for the remaining attacks (i.e., temperature attack. We notice that only average and Ward
max speedometer, max engine coolant, reverse light on, and linkages detect significant differences. In this attack, the engine
reverse light off) for each linkage in Table II. We elaborate on coolant signal value is set to maximum, which may cause
each attack scenario below. correlations with other engine signals to differ.

5
D. Reverse Light On Attack TABLE III
C LASSIFICATION R ESULTS (%).
Table II shows the comparison of similarity distributions
in the reverse light on attack. Note that only the complete Attack Scenario Precision Recall F1 score
and Ward linkages produce statistically significant differences
Correlated 88.00 100.00 93.62
(i.e., [b] and [d]). We also note that, although statistically Max Speedometer 88.00 100.00 93.62
significant, these p-values are not as small as in the correlated Max Engine Coolant 87.18 92.73 89.87
attack, which is a consequence of having an attack that is more Reverse Light On 87.23 93.18 90.11
Reverse Light Off 88.00 100.00 93.62
difficult to detect. This suggests that fewer correlated signals
are affected under this attack, (i.e., only a binary [1 bit] signal False positive rate (FPR), defined as F PF+T P
N
equals to
13.64%. Because this method is unsupervised, the training set
was targeted). is defined by the benign files in a given fold not the attack
files; hence, the false positive counts and rate are independent
E. Reverse Light Off Attack of the attack scenarios.
Table II shows that for the reverse light off attack, each
linkage method produces statistically significant differences.
Among these, Ward’s linkage produces the most significant
difference, followed by average and complete linkages. The
single linkage produces the least significant result, but it still
meets the threshold.
F. Detection Evaluation
We compute and compare the performance of the proposed
framework for classifying benign and attack files. Recall that
we use 12 benign files. To do so, we implement a cross-
validation as follows: We set apart three benign files to be
used for testing purposes (along with all attack files) and use
the remaining nine files for training, that is, for computing
the similarity distribution from benign files. We chose to Fig. 5. False positive rates for each benign file. All are at or below 20%,
hold out three benign files for testing to be consistent with except Benign dyno drive benign anomaly at 27%. Comparing
the maximum number of attack files found in the attack with Figure 3, we see the lone benign file to cluster with attack files in
preliminary analysis (i.e., Benign dyno reverse) has FPR = 20%.
dataset (i.e., correlated, max speedometer, reverse light on,
and reverse light off attacks each have three attack files). We
implementthe above train-test split of our benign files for each Unlike the false positive and true negative counts, the true
of the 12 9 = 220 possible combinations. This experimental positive and false negative results will vary across different
design allows us to decide if the difference between similarity attack scenarios. For correlated, max speedometer, and reverse
distributions in benign and attack scenarios is statistically light off attacks, our results are identical. In these attack
significant and further count the number of true positives (TP), scenarios, recall is 100.00% meaning that all the attack con-
false positives (FP), false negatives (FN), and true negatives figurations are detected even when changing the set of benign
(TN). files used in training. In these attack scenarios, precision is
We use the best set of parameter values derived from our 88% meaning that from the detected configurations, there are
previous experiments, i.e., Ward linkage, r = −5.0, α = 0.9, a few that come from the set of benign files or false positives.
and a significance level for the statistical hypothesis test of We obtain different results for the max engine coolant and
0.05. We report the following micro-averaged classification reverse light on attacks. In particular, the lower results for the
metrics based on these numbers: Precision, defined as T PT+F P
P, max engine coolant attack suggests that this attack was more
gives the likelihood that the computed similarity distribution difficult to detect when varying the set of benign files.
difference can be attributed to an attack; Recall, defined as
TP V. D ISCUSSION
T P +F N , gives the likelihood that attack files are detected.
Since higher precision often comes at the price of lower recall This work proposes a statistical forensic framework to detect
(and vice versa), it is important to consider a balance of both masquerade attacks in the CAN bus. We quantify the empirical
metrics, and the standard balanced metric is the F1 score, distribution of similarities of time series captures in benign
defined as 2 × precision×recall
precision+recall . Table III summarizes these and attack conditions. To accomplish this, we cluster time
findings. series using AHC and compute the similarity between their
Because our method is unsupervised, the training set is corresponding dendrograms. We find that masquerade attacks
defined by the benign files in a given fold not the attack files; can be detected effectively using the proposed framework, and
hence, the false positive counts and rate are independent of its discriminatory power depends on the linkage function being
the attack scenarios. In our experiment, the overall FPR is used in the AHC as well as the the impact of the attacks on
13.64%, with per-file FPR depicted in Figure 5. correlated signals.

6
These results suggest that the proposed framework is a the dendrograms or in groups of correlated signals. However,
viable approach for detecting masquerade attacks in a forensic we acknowledge that the optimal selection of these parameters
setting. We assume that the time series signal translation (or may depend on the type of attack and driving conditions. We
at least a high-fidelity translation) is readily available for use. did not explore those variables in this research.
This seems feasible with current and upcoming work in reverse Not real-time detection. As is currently presented, this is not
engineering CAN bus signals, such as CAN-D [14]. a real-time detector.
The proposed framework detects all masquerade attacks in Baseline comparison. We did not compare our proposed
the ROAD dataset when the Ward linkage is used. Note that forensic framework with other methods.
Ward’s linkage (d) is an appropriate choice in this context
because it tends to produce dense-enough clusters and enables VI. C ONCLUSION
the capture of meaningful changes in clustering assignations In this research, we proposed a forensics framework for the
when attacks occur. In contrast, for the single linkage (a), detection of masquerade attacks in the CAN bus. To ascertain
clusters of signals tend to be spread out and often not compact this fact in experiments, we compute time series clustering
enough with clusters having disparate elements. In the com- similarity. We show that the similarity of time series clusters
plete linkage (b), clusters of signals tend to be compact, but under benign conditions exhibits statistically significant dif-
not far enough apart, with clusters having similar members. ferences from the the similarity of time series clusters under
Additionally, for the average linkage (c), clusters tend to be attack conditions. We demonstrated these differences under
relatively compact and relatively far apart, which strikes a different attack scenarios with different levels of sophistication
balance between single and complete linkages. using data from the ROAD dataset. This work shows that it
We note that the detection performance may also depend is possible to detect masquerade attacks by effectively using
on specific attack features. Here, the detection difficulty is the time series clustering representation of signals in the CAN
based on the potential number of correlated signals that are bus and appropriate choices of parameters to group them.
affected by the attack. Thus, an attack scenario in which wheel Future work in this area includes the development of a
speed signals are modified, such as in the correlated attack, real-time IDS that uses the principles described in this work.
has a more noticeable effect of disrupting correlation with Additional work includes the translation of such developments
other signals than an attack that modifies the reverse lights to edge computing devices that can be integrated with real-
because the wheel speed correlation attack manipulates four world vehicle conditions.
highly correlated signals (and seemingly strong correlations to
many other signals), whereas the reverse light attacks modify VII. ACKNOWLEDGMENTS
a single signal that has correlation with gear selection but not This research was sponsored in part by Oak Ridge National
many other signals. Laboratory’s (ORNL’s) Laboratory Directed Research and
Detection metrics are also affected by the number of files Development Program. This research used resources of the
used to compute the similarity distribution. It other words, Compute and Data Environment for Science (CADES) at
augmenting the number of files to estimate the similarity ORNL, which is supported by the Office of Science of the
distribution helps to have better defined distributions that are U.S. Department of Energy under Contract No. DE-AC05-
later used for comparison purposes. This explains the lower 00OR22725.
results on the max engine coolant attack that contains a single
R EFERENCES
file.
To the best of our knowledge, the results from this research [1] C. Miller and C. Valasek, “Remote exploitation of an unaltered passenger
vehicle,” Black Hat USA, vol. 2015, p. 91, 2015.
are the first to show systemic evidence of a forensic framework [2] ——, “CAN message injection: OG dynamite edition,” Tech. Rep., 2016.
successfully detecting masquerade attacks based on time series [3] K.-T. Cho and K. G. Shin, “Fingerprinting electronic control units for
clustering using a dataset of realistic and verified masquerade vehicle intrusion detection,” in Proceedings of the 25th USENIX Security
Symposium, 2016, pp. 911–927.
attacks. The following are some limitations of our work. [4] M. E. Verma, M. D. Iannacone, R. A. Bridges, S. C. Hollifield, P. Mori-
ROAD dataset conditions. The ROAD dataset was collected ano, B. Kay, and F. L. Combs, “Addressing the Lack of Comparability &
on a single vehicle while being exercised mostly on a dy- Testing in CAN Intrusion Detection Research: A Comprehensive Guide
to CAN IDS Data & Introduction of the ROAD Dataset,” 2022, arXiv
namometer. We acknowledge that more comprehensive data preprint arXiv:2012.14600, January 2022.
collection using different vehicles may be necessary to gener- [5] H. M. Song, H. R. Kim, and H. K. Kim, “Intrusion detection system
alize our findings. We also are aware that driving conditions based on the analysis of time intervals of CAN messages for in-vehicle
network,” in Proceedings of the International Conference on Information
may affect correlations of CAN signals, and the dynamometer Networking (ICOIN), 2016, pp. 63–68.
conditions may be restrictive. [6] M. R. Moore, R. A. Bridges, F. L. Combs, M. S. Starr, and S. J. Prowell,
Parameter tuning. The proposed framework allows for flex- “Modeling inter-signal arrival times for accurate detection of CAN bus
signal injection attacks: a data-driven approach to in-vehicle intrusion
ible election of linkage functions (e.g., single, complete, detection,” in Proceedings of the 12th Annual Conference on Cyber and
average, Ward) for computing the hierarchical clusterings and Information Security Research, 2017, pp. 1–4.
the scaling parameter r and α to control the influence of [7] D. H. Blevins, P. Moriano, R. A. Bridges, M. E. Verma, M. D. Iannacone,
and S. C. Hollifield, “Time-based can intrusion detection benchmark,”
hierarchical clusterings with shared lineages. Here, we fixed in Proceedings of the Workshop on Automotive and Autonomous Vehicle
the values of r and α to focus on differences at higher levels of Security (AutoSec), 2021, pp. 1–6.

7
[8] U. E. Larson, D. K. Nilsson, and E. Jonsson, “An approach to A PPENDIX
specification-based attack detection for in-vehicle networks,” in Proceed-
ings of the IEEE Intelligent Vehicles Symposium, 2008, pp. 220–225. A. Hierarchical Clustering Definition
[9] M. Bresch and N. Salman, “Design and implementation of an intrusion Here we mathematically define hierarchical clustering. A partition
detection system (ids) for in-vehicle networks,” Master’s thesis, Univer- P, of S breaks S into non-overlapping subsets {C 1 , C 2 , . . . , C m },
i.e., S = i∈{1,2,...,m} C i . A clustering is a partition, so the elements
S
sity of Gothenburg, 2017.
[10] H. Olufowobi, C. Young, J. Zambreno, and G. Bloom, “SAIDuCANT: of the partition are called clusters. A partition B of S is nested in a
Specification-Based Automotive Intrusion Detection Using Controller partition A of S if every subset of B is a subset of a subset of A,
Area Network (CAN) Timing,” IEEE Transactions on Vehicular Tech- i.e., ∀C i ∈ B ∃j : C i ⊆ C j ∈ A. A hierarchical clustering is then a
nology, vol. 69, no. 2, pp. 1484–1494, 2019. sequence of partitions in which each partition is nested into the next
[11] W. Wu, R. Li, G. Xie, J. An, Y. Bai, J. Zhou, and K. Li, “A survey partition in the sequence.
of intrusion detection for in-vehicle networks,” IEEE Transactions on
Intelligent Transportation Systems, vol. 21, no. 3, pp. 919–933, 2019. B. Brief CluSim Overview
[12] A. Taylor, S. Leblanc, and N. Japkowicz, “Anomaly detection in auto-
mobile control network data with long short-term memory networks,”
Here we describe how CluSim works in brevity. See Gates et
in Proceedings of the IEEE International Conference on Data Science al. [23] for full details. Given S = {X 1 , X 2 . . . , X N } and a
and Advanced Analytics (DSAA), 2016, pp. 130–139. clustering A = {C 1 , C 2 , . . . , C m }, first make the bipartite graph
[13] M. Hanselmann, T. Strauss, K. Dormann, and H. Ulmer, “CANet: An with elements of S on the left, clustering assignments from A on
unsupervised intrusion detection system for high dimensional CAN bus the right, and edges denoting containment (i.e., (X i , C j ) is an edge
data,” IEEE Access, vol. 8, pp. 58 194–58 205, 2020. if and only if X i is in cluster C j ). Note that this can be naturally
[14] M. E. Verma, R. A. Bridges, J. J. Sosnowski, S. C. Hollifield, and M. D. extended to a dendrogram representing a hierarchical clustering A
Iannacone, “CAN-D: A Modular Four-Step Pipeline for Comprehen- by using a weighted bipartite graph, where the weight of the edges
sively Decoding Controller Area Network Data,” IEEE Transactions on is given by a hierarchy weighting function based on the level of
Vehicular Technology, vol. 70, no. 10, pp. 9685–9700, 2021. the cluster assignation within the hierarchical clustering. Next, the
[15] A. Ganesan, J. Rao, and K. Shin, “Exploiting consistency among bipartite graph is projected into the S elements producing a weighted,
heterogeneous sensors for vehicle anomaly detection,” SAE Technical directed graph that captures the inter-element relationships induced
Paper, Tech. Rep., 2017. by common cluster memberships. Now equipped with a weighted,
[16] H. Li, L. Zhao, M. Juliato, S. Ahmed, M. R. Sastry, and L. L. directed graph on S, the CluSim method captures high-order co-
Yang, “Poster: Intrusion detection system for in-vehicle networks using occurrences of elements by taking into account their paths to obtain
sensor correlation and integration,” in Proceedings of the 2017 ACM an equilibrium distribution of a personalized diffusion process on
SIGSAC Conference on Computer and Communications Security, 2017,
the graph, or personalized PageRank (PPR) [29], i.e., for each X i
pp. 2531–2533.
in S, a PageRank version with restart to X i given by probability
[17] P. Sharma, J. Petit, and H. Liu, “Pearson correlation analysis to detect
1 − α is used to produced stationary distribution pi . The element-
misbehavior in vanet,” in Proceedings of the 88th Vehicular Technology
Conference (VTC-Fall), 2018, pp. 1–5. wise similarity of an element X i in two different clusterings A and B
[18] F. Guo, Z. Wang, S. Du, H. Li, H. Zhu, Q. Pei, Z. Cao, and J. Zhao,
is found by comparing the stationary distributions pA B
i and pi using
1
“Detecting vehicle anomaly in the edge via sensor consistency and a variation of the ` metric for probability distributions. Finally, the
frequency characteristic,” IEEE Transactions on Vehicular Technology, similarity score of two clusterings A, B is the average of element-
vol. 68, no. 6, pp. 5618–5628, 2019. wise similarities. CluSim is parametrized by specifying r and α.
[19] T. He, L. Zhang, F. Kong, and A. Salekin, “Exploring inherent sensor Here, r is a scaling parameter that defines the relative importance of
redundancy for automotive anomaly detection,” in Proceedings of the memberships at different levels of the hierarchy. That is, the larger
57th ACM/IEEE Design Automation Conference (DAC), 2020, pp. 1–6. r, the more emphasis on comparing lower levels of the dendrogram
[20] N. Leslie, “An unsupervised learning approach for in-vehicle network (zoom in). In addition, α is a parameter that controls the influence of
intrusion detection,” in Proceedings of the 55th Annual Conference on hierarchical clusterings with shared lineages. That is, the larger α, the
Information Sciences and Systems (CISS), 2021, pp. 1–4. further the process will explore from the focus data element, so more
[21] A. Javed, B. S. Lee, and D. M. Rizzo, “A benchmark study on time series of the cluster structure is taken into account into the comparison. We
clustering,” Machine Learning with Applications, vol. 1, p. 100001, used r = 5.0 and α = 0.9 in Figure 2.
2020.
[22] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Intro-
duction to Cluster Analysis. John Wiley & Sons, 2009, vol. 344.
[23] A. J. Gates, I. B. Wood, W. P. Hetrick, and Y.-Y. Ahn, “Element-centric
clustering comparison unifies overlaps and hierarchy,” Scientific Reports,
vol. 9, no. 1, pp. 1–13, 2019.
[24] G. N. Lance and W. T. Williams, “A general theory of classificatory
sorting strategies: 1. hierarchical systems,” The Computer Journal,
vol. 9, no. 4, pp. 373–380, 1967.
[25] A. J. Gates and Y.-Y. Ahn, “CluSim: A Python package for calculating
clustering similarity,” Journal of Open Source Software, vol. 4, no. 35,
p. 1264, 2019.
[26] K. Pearson, “Notes on regression and inheritance in the case of two
parents,” Proceedings of the Royal Society of London, vol. 58, pp. 240–
242, 1895.
[27] H. B. Mann and D. R. Whitney, “On a test of whether one of two
random variables is stochastically larger than the other,” The Annals of
Mathematical Statistics, pp. 50–60, 1947.
[28] M. L. Waskom, “seaborn: statistical data visualization,” Journal of Open
Source Software, vol. 6, no. 60, p. 3021, 2021.
[29] T. H. Haveliwala, “Topic-sensitive PageRank: A context-sensitive rank-
ing algorithm for web search,” IEEE Transactions on Knowledge and
Data Engineering, vol. 15, no. 4, pp. 784–796, 2003.

You might also like