Connecting The Dots in The Sky Website
Connecting The Dots in The Sky Website
Abstract—Despite the implementation of encrypted channels, safeguards. Put briefly, an attacker can build a database of
such as those offered by anonymity networks like Tor, network website fingerprints, i.e., a set of signatures drawn from the
adversaries have demonstrated the ability to compromise users’ characteristics of the traffic observed when accessing a given
browsing privacy through website fingerprinting attacks. This website over Tor, and then attempt to match the traffic patterns
paper studies the susceptibility of Tor users to website finger- generated by a Tor user with a fingerprint in this database.
printing when data is exchanged over low Earth orbit (LEO)
satellite Internet links. Specifically, we design an experimental Despite the risks posed by website fingerprinting attacks,
testbed that incorporates a Starlink satellite Internet connection, their accuracy is known to be sensitive to the underlying condi-
allowing us to collect a dataset for evaluating the success of tions of the network segments under analysis [7], [14] (such as
website fingerprinting attacks in satellite environments compared the available bandwidth, jitter, or packet drop rates), since these
to conventional fiber connections. Our findings suggest that Tor
traffic transmitted via Starlink is as vulnerable to fingerprinting
conditions can lead to modifications on the overall shape of the
attacks as traffic over fiber links, despite the distinct networking traffic patterns generated when accessing websites [13]. Thus,
characteristics of Starlink connections in contrast to fiber. in the past, researchers have wondered whether (and to what
extent) the risks of website fingerprinting attacks would trans-
fer from traditional fiber connections to networking mediums
I. I NTRODUCTION with different transmission characteristics, such as wireless
To ensure that Internet users can communicate securely LTE/4G networks [17], [34]. Interestingly, these studies have
in the face of network adversaries with the capabilities to shown that attackers were still able to accurately fingerprint
intercept their communications, encryption protocols such as users’ Tor traffic in such settings.
TLS [31] were devised to prevent adversaries from eavesdrop- Today, we observe an increasing prevalence of satellite
ping or manipulating exchanged messages. However, while Internet solutions, powered by the launch of LEO (low
encryption obscures the content of communications, network Earth orbit) satellite constellations such as Starlink [39] and
adversaries may still discern privacy-sensitive information OneWeb [22]. These solutions have largely facilitated the
about users (e.g., insights into a user’s health status or financial provisioning of Internet access to users residing in remote
situation [42]), by simply tracking the sequence of websites a regions, and continue to be enhanced through the launch of
user visits over time. The main reason why these attacks are more capable satellites and upgrades to routing algorithms
possible is because widespread encryption protocols such as within the constellations themselves [2], [52]. While promising
TLS are unable to hide communication metadata, such as the connectivity speeds similar to fiber networks, LEO satellite
source and destination IPs of a given data exchange or the Internet makes use of wireless mediums which are known to be
times at which these data exchanges take place. prone to several sources of interference [16], [20], [29], [48].
To shield themselves from the above risks, savvy Internet It remains unclear, however, what implications these recent
users typically resort to privacy-enhancing technologies, such satellite networking environments may have on the privacy of
as the Tor anonymity network [8], to conceal the identity of users, especially when considering adversaries with the ability
the websites they access through the Internet. Specifically, Tor to eavesdrop and analyze the metadata of Internet connections
makes use of a technique known as onion routing to shroud the that are partly or entirely established via satellite links [26].
destination IP address of a user’s communication by routing In this paper, we aim to shed light on whether LEO satellite
the user’s traffic through multiple Internet nodes (or relays, Internet users are more vulnerable to website fingerprinting
usually three) that comprise a Tor circuit. attacks than users using traditional fiber. To this end, we set
up an experimental testbed including both a fiber and Starlink
Even though Tor provides an enhanced level of privacy
connection, and use them to collect a dataset of synchronized
to its users, studies on website fingerprinting [11], [28] have
website accesses over Tor. We leverage state-of-the-art website
shown that network eavesdroppers can still overcome Tor’s
fingerprinting attacks over our collected traces to understand
whether network adversaries able to inspect the ground links
Workshop on Security of Space and Satellite Systems (SpaceSec) 2024 between users and the first hop of both kinds of connections
1 March 2024, San Diego, CA, USA (like a snooping satellite ISP) can identify which websites
ISBN 979-8-9894372-1-4 are being accessed by users. Lastly, we evaluate the security
https://dx.doi.org/10.14722/spacesec.2024.23xxx benefits and performance trade-offs of website fingerprinting
www.ndss-symposium.org
Pre-print version (manuscript accepted for publication at the workshop)
defenses when applied to fiber and satellite Internet links.
Our findings suggest that Tor traffic exchanged over Star- latency and bandwidth consumption, limiting their practical-
link Internet links is equally vulnerable to website finger- ity. Conversely, adaptive and randomized padding defenses,
printing attacks as Tor traffic exchanged over traditional fiber exemplified by WTF-PAD [15] and FRONT [9], proactively
links. We hypothesize that, despite the different connectivity introduce chaff to obfuscate the timing of the true packets
characteristics of the ground-satellite link that connects our belonging to a connection, homogenizing the access to various
measurement node to the Tor network, most of the interference websites. While we use the aforementioned padding-centric
experienced in this link is absorbed by the network effects defenses in this work, Mathews et al. [19] outline a more
(e.g., added latency, jitter, etc.) inherent to Tor circuits. exhaustive examination of website fingerprinting defenses.
Contributions. We deliver the following main contributions: Website fingerprinting in wireless networks. Wireless net-
works, and mobile networks, in particular, have previously
• We implement a testbed that includes a Starlink satellite been analyzed in the context of website fingerprinting. Rup-
dish, and use it to collect a novel dataset of Tor traffic precht et al. [17] provided an analysis of potential attacks
over LEO satellite links. We open-source this dataset and that can be targeted towards LTE (Long-Term Evolution), a
our data collection code to foster further research [37]. prevalent mobile communication standard aimed at enhancing
• We perform a comparative study over the traffic char- data transmission rates and connectivity beyond the capabilities
acteristics observed in connections established with and of preceding 3G networks. By placing their focus on LTE’s
without Tor, both via Starlink and traditional fiber links. data link layer, Rupprecht et al. were able to find protocol
• We explore the success of state-of-the-art website fin- flaws that allow an eavesdropper to access a mobile’s device
gerprinting attacks over satellite links, and analyze the communication metadata, thus allowing for successful website
suitability of existing website fingerprinting defenses to fingerprinting attacks on encrypted LTE traffic.
be deployed on LEO satellite-based Internet links.
LEO satellite connections’ characteristics. Emerging LEO
II. R ELATED W ORK constellations have the objective of offering broadband in-
ternet services with decreased latency. Satellite constellation
Website fingerprinting. A website fingerprinting attack links networks are comprised of ground stations and orbiting satel-
a user to the websites they visit, thus defeating the privacy lites, enabling worldwide communication. LEO satellites are
property Tor aims to provide. A website fingerprinting attack positioned at varying altitudes, typically ranging from around
conducted over Tor usually requires the adversary to be located 180 km to 2 000 km above the Earth’s surface. This close
between the user and the entry node of the Tor circuit. The proximity provides the advantage of reduced communication
adversary has the ability to eavesdrop on communications but delays and the potential for enhanced data throughput [44].
not modify, delete, or add packets. In preparation for an attack,
the adversary repeatedly accesses a pool of websites it wishes The recent and ongoing deployment of Starlink, a promi-
to monitor, collecting the network traces generated upon each nent LEO satellite constellation developed by SpaceX, has
of these accesses. Then, the adversary extracts a set of at- given birth to a flurry of measurement studies focused on
tributes that characterize these traces (e.g., based on the timing, analyzing the performance and operational effectiveness of
volume, and direction of traffic) to build a database of website Starlink’s satellite Internet service [16], [18], [20], [29], [48],
fingerprints. Once this database is populated, the adversary [52]. For instance, Ma et. al [18] show that the throughput
employs machine learning techniques to create a model for and latency experienced by Starlink users are highly dynamic,
predicting which website a given fingerprint corresponds to. especially when compared to conventional terrestrial networks.
Finally, to launch an attack, the adversary waits for the target These results have also been corroborated by other studies [16]
user to access a website via the encrypted tunnel, extracts a which show satellite Internet performance to vary by geogra-
fingerprint from the resulting traffic, and uses the trained model phy – clients located in the USA were found to experience 2.3x
to identify which website the user has visited. higher communication latency than those in the UK, together
with lower network throughput – and also prone to unusually
Attacks. The first wave of website fingerprinting attacks made high packet loss rates. In addition, Starlink’s performance is
extensive use of manually engineered features to train classical also influenced by environmental factors such as terrain, rain,
machine learning classifiers. Well-known instances include the clouds, and temperature [16], [52].
k-NN [46], CUMUL [24], and k-fingerprinting [11] (k-FP) at-
tacks. In turn, recent developments have adopted deep learning From the above, one can see that satellite communication
to actively automate the process of feature extraction from links can be substantially more unstable than traditional fiber
less pre-processed representations of traffic. The AWF [32] links. Our work aims to shed light on whether website finger-
and DF [38] attacks used packets’ directional information as printing attacks can succeed in such a setting.
input to deep neural networks, while Tik-Tok [28] and Var-
CNN [1] extended these notions by incorporating packet timing III. M ETHODOLOGY
information alongside directional data. RF [35] makes use of
This section details the methodology of our study. In
a traffic aggregation matrix which also uses packet direction
Section III-A, we describe our threat model and assumptions,
and timing, but splits traces into fix-length time slots.
while Section III-B presents our experimental testbed and de-
Defenses. Website fingerprinting defenses aim to prevent at- tails the process we followed for collecting and pre-processing
tacks by concealing the true shape of website traces. Some websites’ network traces. Then, Section III-C describes the
defenses, such as CS-BuFLO [4] and Tamaraw [5], obfuscate website fingerprinting attacks and defenses we consider, and
packet timing and burst characteristics by keeping a consistent Section III-D describes the metrics we use for measuring the
rate of packet transmission. However, these defenses increase success of attacks and the performance of defenses.
2
Internet
Adversary Satellite
Satellite
Router1 Router2
Tor Network Satellite Dish Ground
station Tor Network
Satlink VM
ISP Router
Middle Middle Webserver
Ground Guard Relay Exit Guard
Relay Exit
ISP Relay Relay Web Client Relay Relay
Client Satellite Dish station Server Fiber VM
Router
Router
Figure 1: Threat model overview. Figure 2: Data collection testbed.
A. Assumptions and threat model access to the IP traffic exchanged by the satellite Internet users
We follow the typical threat model for website finger- and their destinations.
printing attacks, albeit with one important change to the Attack setting and other assumptions. Our study focuses
location and mode of operation of the adversary. This change on website fingerprinting attacks within the closed-world sce-
is motivated by the real-world challenges in acquiring traffic nario, where the target user is assumed to visit a website from a
metadata from Starlink signals. predetermined set of websites monitored by the adversary. We
assume that access to any monitored website in our closed-
More concretely, Starlink satellites beam data using sophis-
world setting is equally probable. We also assume that the
ticated signal encryption schemes that allow only the target
attacker can separate the traces associated with the loading of
satellite dish receiver to be able to decode the information
different websites and determine which defense is in use by a
being sent/received to/from the satellite [49]. In other words,
Tor user. Thus, we consider the use of defenses like GLUE [9]
even if the adversary places a Starlink satellite dish within
to be outside the scope of our study.
the same geographical data transmission cell where the target
user satellite dish sits in, the adversary would be unable
to access the raw IP packet stream (or other well-formatted B. Dataset collection and pre-processing
data comprising Starlink’s data-link layer) that is directed at We collect a novel dataset of website accesses via Tor
the target user. This prevents a typical website fingerprinting that contain two different sets of traces: those collected when
adversary from inspecting users’ communications. the client uses a simple terrestrial fiber network, and those
We note that in other cases, such as website fingerprinting collected when the client uses a satellite link. Having both of
attacks launched over LTE/4G networks [17], [34], the adver- these sets allows us to build a baseline of the effectiveness of
sary is first required to tap into the radio signals exchanged website fingerprinting attacks on a typical Internet connection
between a user’s equipment (e.g., a smartphone) and the LTE and further allows for direct comparisons with the effectiveness
base station. This capability, which can be obtained through the of the same attacks once deployed over Starlink traces.
use of LTE software stacks implemented in software-defined Apart from Tor traces, we also collect additional website
radios (e.g., srsRAN [23]) in tandem with sniffer analysis traces over direct connections to each website using plain
frameworks (e.g., OWL [3]), allows the adversary to access and Firefox. We collect these traces to characterize the overheads
decode transmissions ranging from the physical layer up to the of using Tor instead of Firefox on both satellite and fiber
data-link layer, and then derive user-specific traffic metadata. connections (see Section IV). Later in our evaluation, we also
However, to the best of our knowledge, such capabilities are use this data to compare the performance of a classifier when
not publicly available for Starlink satellite links, despite cur- fingerprinting plain traffic vs. Tor traffic.
rent advances in the reverse-engineering of Starlink downlink
signals [12], [21]. For this reason, we introduce a variation of Experimental testbed. Figure 2 depicts a birds-eye overview
the website fingerprinting adversary model. of our experimental testbed. It essentially comprises a client
machine under our control, which is used to access a set
An ISP-based website fingerprinting adversary. While of websites included in our closed-world website list via
strong signal encryption may prevent most third parties from the Tor network. The client machine executes two virtual
inspecting the traffic of satellite Internet users, ISPs operating machines (VMs), each with 16GB of storage and 2GB of
the satellite networking service itself might be interested in RAM. It is equipped with two network interface cards, and
launching website fingerprinting attacks. In this setting, it each VM routes its traffic through a different interface. One
is possible that, despite allowing users to leverage privacy- of these cards is connected to a Starlink dish (satellite Internet
preserving communication protocols such as those provided connection), while the other is attached to our university’s
by Tor, snooping ISPs might wish to identify which content fiber-based network (terrestrial Internet connection). On each
is being accessed by their users, e.g., towards preventing the VM, we deploy Docker containers that we orchestrate for
access to websites used for streaming pirated DRM-protected simultaneously collecting network traces of a given website.
content [33]. For instance, according to Starlink’s fair use Website access is only deemed successful if we correctly
policy [41], the company reserves the right to take additional retrieve a website’s homepage both via fiber and Starlink.
network management measures as necessary to comply with
applicable laws, including the analysis of traffic patterns. The ability to collect traces for the same website simul-
taneously over the two different links enables us to collect
Figure 1 illustrates this scenario, placing the eavesdropping traces that represent a given website at roughly the same
adversary at the satellite provider’s infrastructure, with full instant of time. This way, we mitigate the effect of concept or
3
data drift [14] by minimizing the chance that our fingerprint The k-FP attack leverages 150 different summary statistics
database would include significantly different versions of a extracted from network traces. The classifier operates by
given website, should, for instance, all fiber traces be collected generating a fingerprint for each website using a modified
after all Starlink traces. version of the Random Forest algorithm. Then, the attack uses
the k-Nearest Neighbours classifier to predict website accesses.
Considered websites. In our data collection procedure, we Instead, DF and Tik-Tok are based on deep neural networks
considered the top 125 websites found on the Tranco list [27], that directly extract latent features from input traces passed
as of September 2022 [43]. We manually verified that each of to the classifier during the training and inference step. The
these websites was active by sending a request to the website’s DF attack accepts as input a direction vector representing the
homepage and confirming it returned an HTTP 200 response direction of packets in the trace, while the Tik-Tok model
code. We configured our scripts to collect a total of 125 enriches this trace representation by computing the element-
instances of each of the 125 websites. However, the number wise product of the direction and timing of packets in a trace.
of websites (and per-website samples) included in our dataset We leverage all attacks with their default hyperparameters.
was later trimmed to account for transmission errors detected
upon data pre-processing (discussed later in this section). Website fingerprinting defenses. Our study leverages a set of
popular open-source implementations of website fingerprinting
Collection of website traces. As stated before, we collect defenses, including: a) Tamaraw [5] and CS-BuFLO [4], two
traces using the fiber and Starlink connections simultaneously, defense mechanisms that employ strategies based on fixed-
interleaving Tor- and plain Firefox-based requests. We visit rate packet transmissions to conceal timing patterns and packet
each website via Firefox by automating web browser inter- burst behavior; b) FRONT [9], a defense which adds a variable
actions via selenium, and visit each website via Tor by number of randomly-padded dummy packets to the start of
leveraging tbselenium, a headless wrapper around the Tor packet sequences, and; c) WTF-PAD [15], a lightweight adap-
browser. We used the default configuration setup for Tor and, tive padding defense that inserts dummy packets to conceal
to ensure the freshness of each website visit, we restarted the the existing time gaps between packets.
tbselenium Tor driver after clearing its cache upon each visit
and forcing the selection of different circuits. If a given visit To avoid the repeated collection of traffic traces for evalu-
to a website returns an explicit error to tbselenium (e.g., due ating website fingerprinting defenses, these defenses’ authors
to network instabilities), we try revisiting the website (up to a released simulators that can turn undefended Tor traffic traces
maximum of three times) towards receiving a valid response. into their defended versions in an offline manner. Gong et
al. [10] have recently compared the simulation and true im-
Pre-processing of network traces. We only deem a given plementation results for a set of WF defenses and reached
website access as valid if we can confirm the successful access the conclusion that simulators can accurately reflect the ef-
to the website’s homepage via Tor (on both interfaces) and fectiveness of defenses on live traffic. We utilize the same
Firefox (also on both interfaces). This effectively comprises a defense simulators and configurations recently used in the
batch of four individual trace samples which we add to our work of Veicht et al. [45], which focused on the security
dataset. In our pre-processing step, we aim to weed out from analysis of website fingerprinting defenses. We refer the reader
the dataset those traces that resulted in timeouts (we consider to Appendix B for details on the defenses’ parameters.
a request to timeout if one minute has elapsed before the
page can be successfully retrieved) or that include errors that
D. Evaluation procedure and metrics
prevented the website from being fetched correctly (but that
did not trigger explicit errors in selenium or tbselenium). Evaluation procedure. We make use of 10-fold cross-
validation when training and testing our classifiers to minimize
After removing traces afflicted by the above issues, we ob-
the effects of selection bias. In particular, we employ stratified
tained a dataset that includes 80 instances each of 75 different
cross-validation to ensure an equal distribution of instances
websites (listed in Appendix A) visited over both Starlink and
across all the classes comprising our dataset. In each cross-
terrestrial fiber, using both Tor and Firefox. We denote this
validation fold, we use 80% of the data for training, 10% for
dataset as T orF iref ox-SatF iber4×75×80 , containing a total
the model’s validation, and the remaining 10% for testing.
of 4 × 75 × 80 = 24 000 samples, and publicly released it [37].
Attack performance metrics. The main metric we pay atten-
Interestingly, our pre-processing step revealed that we were tion to when analyzing the success of a website fingerprinting
unable to access specific websites via Tor entirely, despite the attack (whether a defense is being used or not) is accuracy. Ac-
per-access refresh of tbselenium and Tor circuitry. In line curacy is defined as the ratio of correctly predicted instances to
with previous findings, we conjecture that these websites may the total number of instances in the dataset, and this metric has
actively block accesses coming from Tor [50], [51]. been extensively used in the website fingerprinting literature
for determining the efficacy of both attacks and defenses in
C. Attacks and defenses the closed-world scenario, carrying a rather intuitive meaning
for an adversary – it quantifies the adversary’s success in
Website fingerprinting attacks. Our study employs prominent discerning exactly which website a given user is accessing.
attacks in the literature, including those that leverage manually-
engineered features (k-FP), as well as those that leverage latent Defense performance metrics. Apart from a desirable reduc-
feature spaces learned through deep learning (DF and Tik- tion in attacks’ accuracy, website fingerprinting defenses may
Tok). While k-FP provides human-interpretable information on also be evaluated on the amount of overhead they impose
which traffic features lead the classifier to issue predictions, the over an undefended Tor network trace. For this reason, in
latter attacks have been shown to be more effective in practice. our experiments, we also leverage bandwidth and latency
4
40 10k 7k 25k
overheads as efficiency indicators of website fingerprinting distribution around the mean. The opposite is true for accesses
defenses. Defenses are typically deemed to be practical if over Tor where, albeit less evident, the distribution seems to
and only if they can substantially reduce an attack’s accuracy be more concentrated around the mean for the connections
while having a small impact on latency (i.e., the time to making use of the fiber connection.
load a website) and bandwidth overhead (i.e., the amount of
Starlink-exchanged packets tend to be smaller. Figure 3(c)
additional data required to load a website).
depicts the length of IP packets observed when accessing each
Traffic analysis machine. To train and test our models on of the 75 websites over each of our networking configurations.
the network traces we collected for our study, we leverage a We can see from the figure that the packets composing Tor
server machine with 2 AMD EPYC 7302 16-Core CPUs, 512 traffic exhibit a rather concentrated size, with a median size
GB RAM, and an NVIDIA A100 GPU with 40 GB memory. of 1501.79 when Tor data is exchanged via Starlink and a
median size of 1 785.11 when exchanged via fiber connections.
Interestingly, we observe that the size of plain Firefox packets
IV. C HARACTERIZING F IBER AND S TARLINK T RACES
is also rather concentrated around a mean of 1 471.64, while
This section presents a characterization of the traces in- plain Firefox packets exchanged over fiber connections exhibit
cluded in our dataset. We aim to uncover the major differences a more variable (and typically larger) length, with a median of
between connections established over terrestrial fiber and Star- 2 188.94 and a size of 3 254.31 at the 75th percentile.
link, as well as highlight the performance drops expected when TCP retransmissions are more common in Starlink. To-
using Tor instead of plain Firefox in these different networking wards understanding the differences in the number of packets
environments. We describe our main takeaways below. observed in our traces (Starlink vs. fiber), we conducted an
Starlink is 33% slower than fiber (on our deployment). Fig- additional analysis focused on the study of TCP retransmis-
ure 3(a) depicts the page load times observed when loading the sions. In general, while retransmission requests are fairly rare
75 websites included in our dataset over plain Firefox and Tor, throughout our traces, they are more common in Starlink
both for Starlink and fiber connections. We can observe that connections. For instance, when accessing the website google-
Starlink-based connections consistently reveal higher times- domains.com via plain Firefox, we observed that 0.02% of
to-last-byte when compared to fiber connections, representing packets are retransmitted when using the fiber link, while
a total average increase of 33.2% when considering website 0.3% of packets are retransmitted when using Starlink. When
accesses established over Firefox, and 32% average increase accessing the same website using Tor, we find that 0.03% of
when considering website accesses performed via Tor. packets are retransmitted over a fiber connection, while 0.7%
of packets are retransmitted over Starlink. This shows that even
Tor is almost 4× slower than plain Firefox. Figure 3(a) if the percentage of retransmitted packets is not excessively
shows that, for fiber connections, Tor accesses are on average high in any of the scenarios, there is a disparity of packet
3.86× slower than those via plain Firefox. This difference is retransmissions (up to an order of magnitude) when using the
even more pronounced when considering Starlink connections, Starlink connection instead of fiber. This alludes to the inherent
where Tor accesses are on average 3.83× slower than those noise previously found in Starlink internet connections [16].
via plain Firefox.
A comparable number of Tor cells are exchanged over
Starlink connections require more packet exchanges. Fig- Starlink and fiber connections. In contrast to the average
ure 3(b) summarizes the number of packets observed when number of IP packets exchanged, the average number of Tor
accessing each of the 75 websites via plain Firefox and Tor, cells transmitted through both fiber and Starlink exhibit a
both for Starlink and fiber connections. We can see that the remarkable similarity (see Figure 3(d)). In addition, one can
median number of packets exchanged when using Firefox observe that 75% of the traces exhibit a number of Tor cells
more than doubles when using a Starlink connection (1376.47 that is less than or equal to 5 436. As most deep-learning
packets) when compared to the use of fiber (560.11 packets). website fingerprinting attacks trim their input vectors to 5 000
For Tor, we are also able to observe an increase in exchanged cells, we posit the same trimming threshold should also work
packets when moving from fiber to Starlink setting, but this well for our dataset (we validate this claim in Section V-B).
increase seems less pronounced (∼21% more packets).
Summary. Overall, our findings suggest that the use of
Interestingly, website accesses using Firefox via Starlink Starlink imposes a larger relative penalty on plain Firefox
reveal a smaller inter-quartile range than accesses via a connections as compared to Tor connections. We hypothesize
terrestrial fiber connection, indicating a more concentrated that the variable latency and jitter which is introduced (and
5
Table I: Attack accuracy for Firefox traces (on TCP/IP data). Table II: Attack accuracy using Tor cell data.
Dataset k-FP k-FP (w/ pkt. lengths) DF TikTok Dataset k-FP DF TikTok
Firefox w/fiber 0.8557 0.8892 0.8837 0.7945 Tor w/fiber 0.7282 0.8738 0.8860
Firefox w/Starlink 0.4075 0.4285 0.4915 0.4715 Tor w/Starlink 0.6426 0.8540 0.8682
avg_order_in
in_size total_size in_percentage avg_order_in
total_size in_size out_percentage out_count
avg_total_size std_out_size avg_out_concentration sum_alt_concentration
avg_out_size var_out_size std_order_in in_percentage
avg_in_size avg_out_size in_count out_percentage
var_out_size max_out_size sum_alt_concentration avg_out_concentration
std_out_size avg_in_size out_count std_order_in
out_size out_size std_order_out std_order_out
max_out_size trace_dependent_91 avg_order_out in_count
75th_percentile_in_times avg_total_size sum_alt_per_sec avg_order_out
25th_percentile_out_times in_count total_count total_count
75th_percentile_total_times 100th_percentile_total_times sum_number_pkts sum_alt_per_sec
100th_percentile_out_times 100th_percentile_out_times std_out_concentration sum_number_pkts
100th_percentile_total_times avg_in_interarrival max_out_concentrations std_out_concentration
100th_percentile_in_times 100th_percentile_in_times 50th_count_per_sec 25th_percentile_out_times
50th_percentile_total_times avg_total_interarrival 25th_percentile_out_times trace_dependent_3
out_count_in_first30 std_total_size std_count_per_sec 50th_count_per_sec
in_count_in_first30 std_in_interarrival altconc_2 75th_percentile_out_interarrival
50th_percentile_in_times var_total_size trace_dependent_4
25th_percentile_in_times max_in_interarrival max_count_per_sec avg_out_interarrival
0.000 0.025 0.050 0.000 0.025 0.050 0.000 0.025 0.050 0.000 0.025 0.050
Feature Importance Feature Importance Feature Importance Feature Importance
(a) Via fiber. (b) Via Starlink. (a) Via fiber. (b) Via Starlink.
Figure 4: Top-20 most important features (Firefox traces). Figure 5: Top-20 most important features (Tor traces).
compounded) by the multiple relays composing Tor circuits the accuracy of the website fingerprinting attacks we consider
may help amortize the performance penalties incurred by on Firefox, over both fiber and Starlink connections. We see
clients that use Starlink’s up/downlinks to connect to the that the original k-FP classifier achieves an accuracy of 85%
Internet via Tor. Next, we discuss the findings resulting from a when fingerprinting websites accessed via Firefox w/fiber,
set of experiments with state-of-the-art website fingerprinting but achieves an accuracy of only 40% when fingerprinting
attacks and defenses conducted over our dataset. websites accessed via Firefox w/Starlink. The inclusion of
packet lengths in k-FP brings only marginal benefits for the
V. F INGERPRINTING F IBER AND S TARLINK TRACES attack in both settings, amounting to a ∼3% accuracy increase.
In this section, we compare the susceptibility of fiber and Figure 4(a) and Figure 4(b) show the top-20 most important
Starlink connections to different website fingerprinting attacks. features for the k-FP attack when launched over plain Firefox
traffic exchanged via fiber and Starlink, respectively. We can
A. Attacks based on classical machine learning observe that the two most important features for classifying
website accesses on Firefox via fiber are the sum of all
We start by providing the main takeaways of our experi- incoming packet sizes and the sum of all packet sizes in the
ments with k-FP on the different sets of traces composing our data exchange, whereas the importance of these features is
dataset. Besides assessing the success of this attack on Tor swapped for Firefox via Starlink traffic. We can also see from
traffic, we also attempt to fingerprint plain Firefox connections. Figure 4(a) that 9 out of the 20 most important features focus
Note that, in practice, the destination of Firefox connections on packet timing information, whereas only 7 timing-related
would be trivially disclosed to an adversary (e.g., by looking at features are within the top 20 most important features for
the connection’s destination IP address). However, we do this Firefox traffic exchanged over Starlink. Interestingly, while
as an exercise towards understanding how satellite connections timing features in the former case are mostly related to
affect traffic features and whether these effects degrade or percentiles, timing features are more related to the average
improve our ability to fingerprint network traffic. and standard deviation of packet arrivals for the latter.
To perform the above comparisons with plain Firefox The above observations bear further work to ensure that
traffic, we modify the original implementation of the k-FP our results are not specific to one site (we discuss a multisite
attack in two meaningful ways. First, we allow for features to study in Section VII), as well as further analysis into what
be directly generated from TCP/IP header information (e.g., characteristics of Firefox w/Starlink traffic can make it less
IP packet length, time between IP packets, etc.) instead of Tor fingerprintable (possibly borrowing some of these findings to
cells as in the original attack. Second, we create a version of develop a novel website fingerprinting defense).
the k-FP classifier which takes packet lengths into account as
features for building and matching website fingerprints. The k-FP is more accurate for Tor traffic over fiber. In this sec-
rationale for these modifications on k-FP hinges on the fact ond experiment, we analyzed the effectiveness of the original
that Tor exchanges data in cells padded to 512B, thus making k-FP attack on Tor traffic exchanged via fiber and Starlink.
packet length analysis irrelevant [47]. In contrast, Firefox does Thus, in this case, we extract the attack features based on our
not exchange data via cells, thus providing a network data estimates of the Tor cells exchanged within these traces. The
analyst with access to raw TCP/IP packet length information. results in Table II show that the accuracy of k-FP in Tor w/fiber
traffic is close to 73% and around 64% for Tor w/Starlink.
k-FP is more accurate for plain Firefox traffic over fiber. This discrepancy is consistent with the results observed for
In this first experiment, we used raw TCP/IP packet header Firefox browsing, where the classifier had performed better
data to generate features for the k-FP attack. Table I depicts for fingerprinting websites visited via the fiber connection.
6
A close look at the top 20 most important features for Table III: Tik-Tok’s accuracy when exchanging the sets of data
classifying Tor traffic (Figure 5) reveals that the cumulative used for training and, respectively, testing the classifier.
average of incoming packets is the most important feature
for classifying both kinds of connections. Moreover, 14 out Training Data Testing Data Accuracy
of the top 20 features are shared between both (though not Fiber Fiber 0.8860
necessarily in the same order). This may be the case due to Starlink Starlink 0.8682
the similarity observed in Tor cell statistics for both fiber and Fiber Starlink 0.8168
Starlink traffic, as observed in Figure 3(d). Nevertheless, the Starlink Fiber 0.8627
remaining features in the top 20 exhibit some variations (e.g.,
the inclusion of the avg. inter-arrival time of outgoing packets of cells considered in each trace (n) is set at 5 000 (samples
in Starlink traffic) which may also be explained by the noise with lengths below 5 000 are appended with zeros, and those
inherent to Starlink connections. with lengths greater than 5 000 are truncated). Our previous
analysis in Section IV – Figure 3(d) revealed that n = 5 000
B. Attacks based on deep learning roughly corresponded to the 75th percentile of cells across
We now focus on comparing the effectiveness of the DF all Tor traces (both fiber and Starlink). Our experiments in
and Tik-Tok attacks on the network traces we collected. Appendix C suggest that n = 5 000 is also adequate for
classifying Tor traffic in both of our networking environments.
The success of deep learning attacks is comparable to k-FP
on plain Firefox traffic. The results in Table II show that the VI. D EFENDING F IBER AND S TARLINK T RACES
DF and Tik-Tok deep learning attacks achieve a comparable
accuracy to the classical machine learning attack k-FP when This section presents the main takeaways of our experi-
fingerprinting plain Firefox traffic. More closely, we see that ments when assessing the effectiveness and efficiency of ex-
the accuracy of DF is comparable to the accuracy obtained by isting website fingerprinting defenses when deployed over Tor
k-FP when considering packet lengths. These results suggest connections established via fiber and Starlink. As mentioned in
that the application of deep learning attacks brings only Section III-C, we use a set of defense simulators that convert
marginal improvements, if any, for the classification of Firefox our undefended Tor cell traces into their defended versions.
traces – for instance, Tik-Tok achieves an accuracy of only
79%, which is around 15% below the accuracy obtained by A. Evaluating the effectiveness of defenses
k-FP without considering packet size information.
Table IV lists the accuracy of Tik-Tok on defended Tor
Deep learning attacks can successfully fingerprint Tor traffic. Overall, we can see that the accuracy obtained by
traffic via fiber and Starlink. The accuracy results reported the attack for Starlink traces is less than that observed for
in Table II reveal that the DF and Tik-Tok attacks achieve a fiber traces. While this was also true for non-defended traffic
similar performance when applied to Tor traffic regardless of (see Table II), constant-rate defenses such as Tamaraw and
whether the traces were collected via fiber or Starlink con- CS-BuFLO achieve similar accuracy reductions for both fiber
nections. Interestingly, we also observe that the DF classifier and Starlink traces, bringing the attack’s accuracy down to
can achieve roughly the same accuracy for plain Firefox traffic approximately 10% and 16%, respectively. This is expected,
collected over fiber (Table I) and Tor traffic, suggesting that as both defenses heavily shape the timing and sizes of packets
users have little benefits when using Tor for shielding their sent to the network to obfuscate traffic patterns.
browsing behaviors against website fingerprinting attacks. While other defenses can also moderately decrease the Tik-
Models trained on Starlink data are more robust. Table III Tok attack’s accuracy, we can observe that the application of
presents the accuracy of the Tik-Tok attack when swapping these defenses results in disparate effectiveness when applied
the shares of the dataset used for training and testing the to fiber and Starlink traces. For instance, we can see that
classifier. We can see that using Tor traces collected on the fiber the FRONT T1 and FRONT T2 defense variants reduce the
connection to train an attack that aims to fingerprint Tor traffic attack’s accuracy for an extra 12% and 11% when deployed
exchanged via Starlink results in an accuracy decrease of about on Starlink traces. While less pronounced, this trend can also
4% when compared to the use of Starlink training data. In turn, be observed for the WTF-PAD defense, where its application
using Tor traces collected on the Starlink connection to train an to Starlink traces leads to an accuracy reduction of about 5%.
attack that aims to fingerprint Tor traffic exchanged via fiber These results suggest that the incorporation of dummy traffic,
results in an accuracy decrease of only 2% when compared although generally effective on fiber, has a comparatively
to the use of fiber training data. The above results suggest greater impact on the ability of traffic classifiers to accurately
that an adversary who trains the Tik-Tok attack on traces fingerprint Tor connections established over Starlink.
obtained via Starlink can obtain a relatively high accuracy
when fingerprinting both Starlink and fiber traffic. A potential B. Evaluating the overhead imposed by defenses
explanation for this fact is that the noise inherent to Starlink
may contribute to an increased per-class trace diversity and an After gauging the effectiveness of defenses over the Tor
overall enhancement of the model’s robustness. traces included in our dataset, we now turn our attention to the
comparison of the overheads imposed by these defenses when
Impact of trace length on fingerprinting accuracy. As men- applied to fiber and Starlink traces. When reporting our results,
tioned in Section III-C, DF and Tik-Tok automatically extract we present the bandwidth and latency overheads imposed by
latent features from a trace’s direction or direction+timing each defense as the median value of the bandwidth and latency
representation, respectively. In the original attacks, the number values observed among the defended traces.
7
Table IV: Tik-Tok accuracy against Tor with different defenses. Table V: Defenses’ latency and bandwidth overheads.
Defense Fiber Traces Starlink Traces Defense Latency Overhead (×) Bandwidth Overhead (×)
Tamaraw 0.1087 0.1008 Fiber Starlink Fiber Starlink
CS-BuFLO 0.1655 0.1540
FRONT T1 0.5910 0.4700 Tamaraw 5.83 4.28 1.60 1.65
FRONT T2 0.5462 0.4358 CS-BuFLO 30.54 21.50 1.48 1.47
WTF-PAD 0.8360 0.7880 FRONT T1 1.00 1.00 1.24 1.31
FRONT T2 1.00 1.00 1.34 1.46
No defense 0.8860 0.8682 WTF-PAD 1.00 1.00 1.18 1.21
Defended Starlink traces impose a smaller latency over- on LEO satellite performance have reported that different
head. Table V shows the latency overhead of the considered weather conditions might affect satellite Internet performance
defenses when applied to fiber and Starlink traces. Overall, (e.g., imposing additional jitter and latency) [16], [52]. An
one can observe that the latency overhead tends to be the interesting direction for future work includes the collection
same (or less pronounced) when applied to Starlink traces than of website access traces under different weather conditions
when the same defense is applied to fiber traces. Note that the (e.g., clouds, rain, snow, etc.) towards assessing whether these
latency overhead is effectively zero on both kinds of traces for conditions result in significant differences in the ability of an
adaptive and random padding defenses like FRONT variants adversary to perform accurate website fingerprinting.
and WTF-PAD, since these defenses largely aim to avoid the Browsing Tor via satellite hopping. Our study is limited to
introduction of communication delays. However, considering investigating the effectiveness of website fingerprinting attacks
Tamaraw and CS-BuFLO, defended Starlink traces impose a over the traffic of users who are connected to the Internet via
latency overhead that is about 1.36 and 1.42 times smaller than Starlink. However, it does not consider a potential scenario
that imposed on fiber traces, respectively. where some or all of the Tor nodes comprising a circuit are
Defended Starlink traces impose a larger bandwidth also connected via Starlink up/downlinks. Creating a testbed
overhead. Table V also shows the bandwidth overhead of where multiple legs of a Tor circuit are connected via Starlink
the defenses. For instance, it shows that Tamaraw, the most (e.g., enabled through Starlink Business fixed sites [40] which
bandwidth-inefficient defense, imposes an overhead of 1.6 provides public IPs) is an interesting direction for future work.
times that of a Tor undefended trace over fiber. We can also Lack of open-world experiments. Our study focused on
see from the table that Starlink traces impose an equivalent website fingerprinting in the closed-world setting. We aim to
or slightly larger bandwidth overhead than that of fiber traces, extend our study to also consider the open-world setting.
for all the considered defenses. This increase in overhead is
particularly noticeable for the FRONT T2 defense, where the Considering performance enhancing proxies (PEPs).
bandwidth overhead is about 11% larger when the defense is PEPs [30] are often deployed on satellite links to improve
applied to Starlink traces rather than fiber traces. TCP’s performance on satellite links [20]. QPEP [25] wraps
users’ TCP traffic within a QUIC-based encrypted tunnel to
While the tested defenses allow for a reduction in attack improve performance while protecting users’ traffic against
accuracy on Starlink connections (see Table IV), the above eavesdropping. However, QPEP does not actively attempt to
analysis reveals that the defenses lead to a small increase shape traffic patterns, and QUIC traffic has been found to
in bandwidth usage when compared to their counterpart de- be vulnerable to fingerprinting [36]. Studying the resistance
ployments over fiber. Still, this additional overhead can pose against website fingerprinting provided by QPEP deployments
a concern for satellite Internet users (in particular, Starlink over Starlink is an interesting direction for future work.
users [41]) whose satellite ISPs may apply data caps or
limit the amount of high-speed data exchanged by customers VIII. C ONCLUSIONS
towards balancing the supply and demand of traffic [16].
In this paper, we evaluated the effectiveness of website
fingerprinting attacks on Tor connections established via Star-
VII. L IMITATIONS AND F UTURE W ORK link, a prominent LEO satellite constellation providing satellite
This section discusses the limitations of our study and Internet services. Through a synchronous collection of web
points to several directions for future work. browsing traces over traditional fiber and Starlink, we charac-
terized Tor accesses over both kinds of links, and compared the
Geo-distributed Starlink testbed. Our evaluation setup con- effectiveness of website fingerprinting attacks when applied to
sidered a single node connected to Starlink. Recent stud- these different networking settings. Our findings suggest that
ies [16], [52] have shown that the performance of Starlink undefended Tor traffic is equally fingerprintable over Starlink
client nodes may vary across different continents (or even and fiber and that defenses, while effective, may be further
countries) due to the configuration of the satellite constellation parameterized to trade-off security with network efficiency.
and the number of active subscribers in specific regions.
Future work includes the deployment of additional Starlink ACKNOWLEDGMENTS
data collection nodes in different points of the globe, towards
understanding whether our findings generalize across locations. This work was supported in part by NSERC under grant
DGECR-2023-00037, by the NRC-Waterloo Collaboration
Considering the influence of weather. The weather ex- Center under project reference number 090755, and by the
perienced by our Starlink node during the data collection EPSRC New Investigator Award (ref: EP/V011294/1). We also
period was characterized by a clear sky. However, past studies thank Aravindh Raman for helpful discussions about this work.
8
R EFERENCES Internet Measurement Conference, Nice, France, October 2022, pp.
130–136.
[1] S. Bhat, D. Lu, A. Kwon, and S. Devadas, “Var-cnn: A data-efficient [21] M. Neinavaie and Z. M. Kassas, “Signal mode transition detection in
website fingerprinting attack based on deep learning,” Proceedings on starlink leo satellite downlink signals,” in 2023 IEEE/ION Position,
Privacy Enhancing Technologies, vol. 2019, no. 4, pp. 292–310, 2019. Location and Navigation Symposium (PLANS), 2023, pp. 360–364.
[2] V. Bhosale, A. Saeed, K. Bhardwaj, and A. Gavrilovska, “A character- [22] One Web, https://oneweb.net/, last Accessed: 2024-02-09.
ization of route variability in leo satellite networks,” in Proceedings of
the International Conference on Passive and Active Network Measure- [23] Open source SDR 4G software suite,
ment. Springer, 2023, pp. 313–342. https://github.com/srsran/srsRAN 4G , last Accessed: 2024-02-09.
[24] A. Panchenko, F. Lanze, J. Pennekamp, T. Engel, A. Zinnen, M. Henze,
[3] N. Bui and J. Widmer, “Owl: A reliable online watcher for lte control
and K. Wehrle, “Website fingerprinting at internet scale.” in Proceedings
channel measurements,” in Proceedings of the 5th Workshop on All
of the 23rd Annual Network and Distributed System Security Sympo-
Things Cellular: Operations, Applications and Challenges, 2016, pp.
sium, 2016.
25–30.
[25] J. Pavur, M. Strohmeier, V. Lenders, and I. Martinovic, “QPEP: An
[4] X. Cai, R. Nithyanand, and R. Johnson, “Cs-buflo: A congestion Actionable Approach to Secure and Performant Broadband From Geo-
sensitive website fingerprinting defense,” in Proceedings of the 13th stationary Orbit,” in Proceedings of the 28th Network and Distributed
Workshop on Privacy in the Electronic Society, 2014, pp. 121–130. System Security Symposium, 2021.
[5] X. Cai, R. Nithyanand, T. Wang, R. Johnson, and I. Goldberg, “A [26] J. Pavur and I. Martinovic, “Sok: Building a launchpad for impactful
systematic approach to developing and evaluating website fingerprinting satellite cyber-security research,” in arXiv cs.CR 2010.10872, 2020.
defenses,” in Proceedings of the 2014 ACM SIGSAC Conference on
Computer and Communications Security, 2014, pp. 227–238. [27] V. L. Pochat, T. Van Goethem, S. Tajalizadehkhoob, M. Korczyński,
and W. Joosen, “Tranco: A research-oriented top sites ranking hardened
[6] G. Cherubin, “Bayes, not naı̈ve: Security bounds on website fingerprint- against manipulation,” in Proceedings of the Network and Distributed
ing defenses,” Proceedings on Privacy Enhancing Technologies, vol. 4, Systems Security Symposium, 2019.
pp. 135–151, 2017.
[28] M. S. Rahman, P. Sirinam, N. Mathews, K. G. Gangadhara, and
[7] G. Cherubin, R. Jansen, and C. Troncoso, “Online website fingerprint- M. Wright, “Tik-tok: The utility of packet timing in website finger-
ing: Evaluating website fingerprinting attacks on tor in the real world,” printing attacks,” Proceedings on Privacy Enhancing Technologies, vol.
in Proceedings of the 31st USENIX Security Symposium, 2022, pp. 753– 2020, no. 3, 2020.
770.
[29] A. Raman, M. Varvello, H. Chang, N. Sastry, and Y. Zaki, “Dissecting
[8] R. Dingledine, N. Mathewson, P. F. Syverson et al., “Tor: The second- the performance of satellite network operators,” Proc. ACM Netw.,
generation onion router.” in Proceedings of the USENIX Security vol. 1, no. CoNEXT3, nov 2023.
Symposium, vol. 4, 2004, pp. 303–320.
[30] RFC 3135 - Performance Enhancing Proxies Intended to Mitigate
[9] J. Gong and T. Wang, “Zero-delay lightweight defenses against website Link-Related Degradations, https://www.rfc-editor.org/rfc/rfc3135, last
fingerprinting,” in Proceedings of the 29th USENIX Security Sympo- Accessed: 2024-02-09.
sium, 2020, pp. 717–734. [31] RFC 8446 - The Transport Layer Security (TLS) Protocol Version 1.3,
[10] J. Gong, W. Zhang, C. Zhang, and T. Wang, “Wfdefproxy: Modu- https://www.rfc-editor.org/rfc/rfc8446, last Accessed: 2024-02-09.
larly implementing and empirically evaluating website fingerprinting [32] V. Rimmer, D. Preuveneers, M. Juarez, T. Van Goethem, and W. Joosen,
defenses,” arXiv preprint arXiv:2111.12629, 2021. “Automated website fingerprinting through deep learning,” in Proceed-
[11] J. Hayes and G. Danezis, “k-fingerprinting: A robust scalable website ings of the 25th Network and Distributed Systems Security Symposium,
fingerprinting technique.” in Proceedings of the 25th USENIX Security 2018.
Symposium, 2016, pp. 1187–1203. [33] Rogers Media Inc. v. John Doe 1, 2022 FC 775 (CanLII),
[12] T. E. Humphreys, P. A. Iannucci, Z. M. Komodromos, and A. M. Graff, https://canlii.ca/t/jpncf, last Accessed: 2024-02-09.
“Signal structure of the starlink ku-band downlink,” IEEE Transactions [34] D. Rupprecht, K. Kohls, T. Holz, and C. Pöpper, “Breaking lte on layer
on Aerospace and Electronic Systems, p. 1–16, 2023. two,” in Proceedings of the IEEE Symposium on Security and Privacy,
[13] R. Jansen and R. Wails, “Data-explainable website fingerprinting with 2019, pp. 1121–1136.
network simulation,” Proceedings on Privacy Enhancing Technologies, [35] M. Shen, K. Ji, Z. Gao, Q. Li, L. Zhu, and K. Xu, “Subverting
vol. 4, pp. 559–577, 2023. website fingerprinting defenses with robust traffic representation,” in
[14] M. Juarez, S. Afroz, G. Acar, C. Diaz, and R. Greenstadt, “A critical Proceedings of the 32nd USENIX Security Symposium, 2023, pp. 607–
evaluation of website fingerprinting attacks,” in Proceedings of the 2014 624.
ACM SIGSAC Conference on Computer and Communications Security, [36] S. Siby, L. Barman, C. Wood, M. Fayed, N. Sullivan, and C. Tron-
Scottsdale, Arizona, USA, 2014, p. 263–274. coso, “Evaluating practical quic website fingerprinting defenses for the
[15] M. Juarez, M. Imani, M. Perry, C. Diaz, and M. Wright, “Toward an masses,” Proceedings on Privacy Enhancing Technologies, vol. 4, pp.
efficient website fingerprinting defense,” in Proceedings of the European 79–95, 2023.
Symposium on Research in Computer Security, 2016, pp. 27–46. [37] P. Singh, D. Barradas, T. Elahi, and N. Limam, “Website access traces
[16] M. M. Kassem, A. Raman, D. Perino, and N. Sastry, “A browser-side for Tor/Firefox over Starlink and fiber links,” Feb. 2024. [Online].
view of starlink connectivity,” in Proceedings of the 22nd ACM Internet Available: https://doi.org/10.5281/zenodo.10641853
Measurement Conference, 2022, pp. 151–158. [38] P. Sirinam, M. Imani, M. Juarez, and M. Wright, “Deep fingerprinting:
[17] K. Kohls, D. Rupprecht, T. Holz, and C. Pöpper, “Lost traffic en- Undermining website fingerprinting defenses with deep learning,” in
cryption: fingerprinting lte/4g traffic on layer two,” in Proceedings of Proceedings of the 2018 ACM SIGSAC Conference on Computer and
the 12th Conference on Security and Privacy in Wireless and Mobile Communications Security, 2018, pp. 1928–1943.
Networks, 2019, pp. 249–260. [39] Starlink, https://www.starlink.com/, last Accessed: 2024-02-09.
[18] S. Ma, Y. C. Chou, H. Zhao, L. Chen, X. Ma, and J. Liu, “Network [40] Starlink Business - Fixed Site , https://www.starlink.com/business/fixed-
characteristics of leo satellite constellations: A starlink-based measure- site , last Accessed: 2024-02-09.
ment from end users,” in Proceedings of the 2023 IEEE Conference on [41] Starlink Fair Use Policy , https://www.starlink.com/legal/documents/DOC-
Computer Communications, 2023, pp. 1–10. 1134-82708-70 , last Accessed: 2024-02-09.
[19] N. Mathews, J. K. Holland, S. E. Oh, M. S. Rahman, N. Hopper, and [42] Q. Sun, D. R. Simon, Y.-M. Wang, W. Russell, V. N. Padmanabhan, and
M. Wright, “Sok: A critical evaluation of efficient website fingerprinting L. Qiu, “Statistical identification of encrypted web browsing traffic,” in
defenses,” in Proceedings of the 44th IEEE Symposium on Security and Proceedings of the 23rd IEEE Symposium on Security and Privacy,
Privacy, 2023, pp. 969–986. Oakland, CA, USA, 2002, pp. 19–30.
[20] F. Michel, M. Trevisan, D. Giordano, and O. Bonaventure, “A First [43] Tranco list generated on September 2022, https://tranco-
Look at Starlink Performance,” in Proceedings of the 22nd ACM list.eu/list/82GXV, last Accessed: 2024-02-09.
9
[44] F. Vatalaro, G. Corazza, C. Caini, and C. Ferrarelli, “Analysis of leo, over 86% for both connection types but that n = 5 000
meo, and geo global mobile satellite systems in the presence of interfer- provides an adequate trade-off between the length of input
ence and fading,” IEEE Journal on Selected Areas in Communications, traces and the accuracy obtained by the classifier.
vol. 13, no. 2, pp. 291–300, 1995.
[45] A. Veicht, C. Renggli, and D. Barradas, “Deepse-wf: Unified security
estimation for website fingerprinting defenses,” Proceedings on Privacy
Enhancing Technologies, vol. 2023, no. 2, 2023.
[46] T. Wang, X. Cai, R. Nithyanand, R. Johnson, and I. Goldberg, “Effective 0. adobe.com 38. mozilla.org
1. amazon.co.jp 39. msn.com
attacks and provable defenses for website fingerprinting.” in Proceed-
2. amazon.in 40. naver.com
ings of the 23rd USENIX Security Symposium, 2014, pp. 143–157. 3. apache.org 41. netflix.com
[47] T. Wang and I. Goldberg, “Improved website fingerprinting on tor,” in 4. apple.com 42. nytimes.com
Proceedings of the 12th ACM workshop on Workshop on privacy in the 5. azure.com 43. office365.com
electronic society, 2013, pp. 201–212. 6. bbc.co.uk 44. opera.com
7. bbc.com 45. oracle.com
[48] Y. Zhang, Q. Wu, Z. Lai, and H. Li, “Enabling Low-latency-capable 8. bing.com 46. outlook.com
Satellite-Ground Topology for Emerging LEO Satellite Networks,” in 9. bit.ly 47. paypal.com
Proceedings of the 2022 IEEE Conference on Computer Communica- 10. booking.com 48. pornhub.com
tions, Virtual Event, May 2022, pp. 1329–1338. 11. cdc.gov 49. reddit.com
12. cnn.com 50. reuters.com
[49] Y. Zhang, S. Zhao, J. He, Y. Zhang, Y. Shen, X. Jiang et al., “A survey 13. digicert.com 51. salesforce.com
of secure communications for satellite internet based on cryptography 14. dnsmadeeasy.com 52. salesforceliveagent.com
and physical layer security,” IET Information Security, vol. 2023, 2023. 15. doubleclick.net 53. skype.com
[50] Z. Zhang, T. Vaidya, K. Subramanian, W. Zhou, and M. Sherr, 16. dropbox.com 54. soundcloud.com
“Ephemeral exit bridges for tor,” in Proceedings of the 50th Annual 17. ebay.com 55. sourceforge.net
18. etsy.com 56. spotify.com
IEEE/IFIP International Conference on Dependable Systems and Net-
19. facebook.com 57. stackoverflow.com
works, 2020, pp. 253–265. 20. fandom.com 58. t.me
[51] Z. Zhang, W. Zhou, and M. Sherr, “Bypassing tor exit blocking with 21. fastly.net 59. telegram.org
exit bridge onion services,” in Proceedings of the 2020 ACM SIGSAC 22. fbcdn.net 60. theguardian.com
Conference on Computer and Communications Security, 2020, p. 3–16. 23. flickr.com 61. tiktok.com
24. force.com 62. tumblr.com
[52] H. Zhao, H. Fang, F. Wang, and J. Liu, “Realtime multimedia services 25. gandi.net 63. twitch.tv
over starlink: A reality check,” in Proceedings of the 33rd Workshop on 26. github.com 64. vimeo.com
Network and Operating System Support for Digital Audio and Video, 27. github.io 65. w3.org
2023, pp. 43–49. 28. google-analytics.com 66. weebly.com
29. googledomains.com 67. wellsfargo.com
30. icloud.com 68. whatsapp.com
A PPENDIX 31. instagram.com 69. wikimedia.org
32. intuit.com 70. wikipedia.org
A. Websites included in our dataset 33. issuu.com 71. xvideos.com
34. linode.com 72. yahoo.co.jp
35. live.com 73. youtube.com
Listing 1 includes the websites contained in our dataset 36. mail.ru 74. zemanta.com
(drawn from the September 2022 Tranco list [43]). The ma- 37. microsoft.com
jority of these websites are still considered relevant: 75% of
the websites we consider are within the top 250 websites in
Listing 1: List of websites considered in our experiments.
the January 2024 Tranco list (the most up to date list upon
this work’s submission for peer review).
0.90
TikTok Accuracy
10