0% found this document useful (0 votes)

11 views

snipuzz

The document presents Snipuzz, a novel black-box fuzzing approach for IoT firmware that utilizes message snippet inference to identify vulnerabilities without requiring internal execution information. By employing a feedback mechanism based on device responses, Snipuzz optimizes mutation strategies and effectively tests various IoT devices, leading to the discovery of multiple zero-day vulnerabilities. The approach addresses challenges such as diverse message formats and randomness in responses, making it a significant advancement in IoT security testing.

Uploaded by

Ayman Fakri

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

snipuzz

Uploaded by

Ayman Fakri

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Snipuzz: Black-box Fuzzing of IoT Firmware via

Message Snippet Inference

Xiaotao Feng∗ , Ruoxi Sun† , Xiaogang Zhu∗‡ , Minhui Xue† ,

Sheng Wen∗ , Dongxi Liu‡ , Surya Nepal‡ , Yang Xiang∗
∗ Swinburne University of Technology, Australia
† The University of Adelaide, Australia
‡ CSIRO Data61, Australia
arXiv:2105.05445v2 [cs.CR] 21 May 2021

ABSTRACT Virtual Event, South Korea. ACM, New York, NY, USA, 15 pages. https:
The proliferation of Internet of Things (IoT) devices has made peo- //doi.org/10.1145/1122445.1122456
ple’s lives more convenient, but it has also raised many security
concerns. Due to the difficulty of obtaining and emulating IoT 1 INTRODUCTION
firmware, in the absence of internal execution information, black- The Internet of Things (IoT) refers to the billions of physical de-
box fuzzing of IoT devices has become a viable option. However, vices around the world which are now connected to the Internet, all
existing black-box fuzzers cannot form effective mutation optimiza- collecting and sharing data. As early as 2017, IoT devices have out-
tion mechanisms to guide their testing processes, mainly due to the numbered the world’s population [39], and by 2020, every person on
lack of feedback. In addition, because of the prevalent use of various this planet has four IoT devices on average [23]. While these devices
and non-standard communication message formats in IoT devices, enrich our lives and industries, unfortunately, they also introduce
it is difficult or even impossible to apply existing grammar-based blind spots and security risks in the form of vulnerabilities. We take
fuzzing strategies. Therefore, an efficient fuzzing approach with Mirai [25] as an example. Mirai is one of the most prominent types
syntax inference is required in the IoT fuzzing domain. of IoT botnet malware. In 2016, Mirai took down widely-used web-
To address these critical problems, we propose a novel automatic sites in a distributed denial of service (DDoS) campaign consisting
black-box fuzzing for IoT firmware, termed Snipuzz. Snipuzz runs of thousands of compromised household IoT devices. In the case
as a client communicating with the devices and infers message of Mirai, attackers exploited vulnerabilities to target IoT devices
snippets for mutation based on the responses. Each snippet refers themselves and then weaponized the devices for larger campaigns
to a block of consecutive bytes that reflect the approximate code or spreading malware to the network. In fact, attackers can also use
coverage in fuzzing. This mutation strategy based on message snip- vulnerable devices for lateral movement, allowing them to reach crit-
pets considerably narrows down the search space to change the ical targets. For example, in the work-from-home scenarios during
probing messages. We compared Snipuzz with four state-of-the- COVID-19, Trend Micro has reported that, introducing vulnerable
art IoT fuzzing approaches, i.e., IoTFuzzer, BooFuzz, Doona, and IoT devices to the household will expose employees to malware and
Nemesys. Snipuzz not only inherits the advantages of app-based attacks that could slip into a company’s network [26]. Considering
fuzzing (e.g., IoTFuzzer), but also utilizes communication responses the ubiquity of IoT devices, we believe that these known security
to perform efficient mutation. Furthermore, Snipuzz is lightweight incidents and risky scenarios are nothing but a tip of the iceberg.
as its execution does not rely on any prerequisite operations, such IoT vulnerabilities are normally about the implementation flaws
as reverse engineering of apps. We also evaluated Snipuzz on 20 within a device’s firmware. To launch new products as soon as
popular real-world IoT devices. Our results show that Snipuzz could possible, developers always tend to use open-source components in
identify 5 zero-day vulnerabilities, and 3 of them could be exposed firmware development without good update plans [1]. This sacri-
only by Snipuzz. All the newly discovered vulnerabilities have been fices the security of IoT devices and exposes them to vulnerabilities
confirmed by their vendors. that security teams cannot remedy quickly. Even if vendors plan to
ACM Reference Format: fix the vulnerabilities in their products, the over-the-air patching is
Xiaotao Feng, Ruoxi Sun, Xiaogang Zhu, Minhui Xue, Sheng Wen, Dongxi usually infeasible because IoT devices do not have reliable network
Liu, Surya Nepal, and Yang Xiang. 2021. Snipuzz: Black-box Fuzzing of IoT connectivity [16]. As a result, half of the IoT devices in the market
Firmware via Message Snippet Inference. In 2021 ACM SIGSAC Conference were reported to have vulnerabilities [28].
on Computer and Communications Security (CCS ’21), November 14–19, 2021, It is hence crucial to discover such vulnerabilities and fix them
before an attacker does. However, most IoT software security tests
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed heavily rely on the assumption of device firmware availability.
for profit or commercial advantage and that copies bear this notice and the full citation In many cases, manufacturers tend not to release their product
on the first page. Copyrights for components of this work owned by others than ACM firmware and that makes various dynamic analysis methods based
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a on code analysis [7, 13, 15, 18, 32, 46] (or emulation [8, 10, 20, 50, 51])
fee. Request permissions from [email protected]. difficult. Among the existing defense techniques, fuzz testing has
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea shown promises to overcome these issues and has been widely
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 used as an efficient approach in finding vulnerabilities. Moreover,
https://doi.org/10.1145/1122445.1122456 the ability of IoT devices to communicate with the outside world
1
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang

offers us a new option, and that is to test device firmware through Table 1: Format requirements of IoT Devices.
exchanging network messages. Therefore, an IoT fuzzer could be Firmware
# Device Type Vendor Model Format
designed to send random communication messages to the target de- Version

vice in order to detect if it shows any symptoms of malfunctioning. 1 Smart Bulb Yeelight YLDP05YL 1.4.2_0016 JSON
2 Smart Bulb Yeelight YLDP13YL 1.4.2_0016 JSON
Potential vulnerabilities could be exposed if crashes are triggered 3 Smart Bulb Philips A60 1.46.13_r26312 JSON
during execution or the device is pushed to send back abnormal 4 Smart Bulb LIFX Mini C v3.60 Custom Byte
5 Smart Bulb FloodLight BR30 35.V7.63.7189-A Custom Byte
messages. 6 Home Bridge Philips Hue 1935144040 JSON
However, using network communication to fuzz the firmware of 7 Home Bridge Alro Base Station 1.12.2.8_9_fc4b603 JSON
8 Smart Plug Tplink HS100 1.5.2 JSON
IoT devices is very challenging. Since obtaining internal execution 9 Smart Plug Tplink HS110 1.5.2 JSON∗
information from the device is not possible, most existing network 10 Smart Plug Belkin WeMo F7C027au 2.00.1821 SOAP
11 Smart Plug Meross MSS310 2.1.14 JSON∗
IoT fuzzers [9, 31, 44] work in a black-box manner. This makes opti- 12 Smart Plug Orvibo B25AUS v3.1.3 JSON
mizing the mutation strategies very difficult. Because the selection 13 Smart Plug Konke Mini US us1.1.0 String
14 Smart Plug Broadlink SP4L-AU v57209 Custom Byte
of mutated seeds is entirely random, existing black-box IoT fuzzing 15 Router Netgear R6400 1.0.1.46 SOAP∗
approaches could become very hard to handle, and sometimes, even 16 TA Assistant ZKteco WL10 ZLM-FX1-3.0.23 Custom Byte
17 Camera Alro Alro Pro 2 1.125.14.0_34_1189 JSON∗
become more like brute force crack testing. In addition, IoT devices 18 Camera Foscam F19821W 2.21.1.127 JSON∗
have strict grammatical specifications for inputs in communication. 19 NAS QNAP T-131P 4.3.6.0959 Key-value pairs
20 Universal Remote BroadLink RM mini 3 v44057 Custom Byte
Most of the messages that are generated by random mutation will ∗: have randomness in response.
break the syntax rules of the input, and will be quickly rejected
during syntax validation in the firmware before being executed. A same role in the message form the initial message snippets, which
grammar-based mutation strategy [2, 40] can effectively generate is the basic unit of mutation. Moreover, Snipuzz utilizes a hier-
messages that meet the input requirements though. This can be archical clustering strategy to optimize mutation strategies and
done by learning the syntax via documented grammatical specifica- reduce the misclassification of categories caused by randomness
tions or from a labeled training set. However, as shown in Table 1, in the response messages and the firmware’s internal mechanism.
many non-standard IoT device communication formats are being Therefore, Snipuzz, as a black-box fuzzer, can still effectively test
used in practice. Therefore, preparing enough learning materials the firmware of IoT devices without the support of grammatical
for grammar-based mutation strategies is a huge workload, which rules and internal execution information of the device.
makes the deployment of grammar-based IoT fuzzing difficult. Snipuzz resolves Challenge 1 by using responses as the guid-
Challenges. In this paper, we focus on detecting vulnerabilities ance to optimize the fuzzing process. Based on the responses, Snipuzz
in IoT firmware by sending messages to IoT devices. To design an designs a novel heuristic algorithm to initially infer the role of each
effective and efficient fuzzing method, several challenges have to byte in the message, which resolves Challenge 2. Snipuzz utilizes
be overcome. edit distance [42] and agglomerative hierarchical clustering [43]
to resolve Challenge 3. We summarize our main contributions as
• Challenge 1: Lack of a feedback mechanism. Without access to
follows:
firmware, it is nearly impossible to obtain the internal execu-
tion information from IoT device to guide the fuzzing process • Message snippet inference mechanism. The responses from
(as is done in most typical fuzzers). Therefore, we need a light- IoT devices are related to code execution path in firmware. Based
weight solution to obtain feedback from device, and optimize the on responses, we infer the relationship between message snip-
generation process. pets and code execution path in firmware. This novel mutation
• Challenge 2: Diverse message formats. Table 1 shows some message mechanism enables that Snipuzz does not need any syntax rules
formats that are used in IoT communication, including JSON, to infer the hidden grammatical structure of the input through
SOAP, Key-value pairs, string, or even customized formats. In the device responses. Compared with the actual syntax rules that
order to be applied to various devices, a solution should be able determine the input string format, the result of snippet determi-
to infer the format from a raw message. nation proposed by Snipuzz has a similarity of 87.1%.
• Challenge 3: Randomness in responses. The response messages of • More effective IoT fuzzing. When testing IoT devices, the num-
an IoT device may contain random elements, such as timestamps ber of response categories is positively correlated with the num-
or tokens. Such randomness results in different responses for ber of code execution paths in the firmware. In the experiment,
the same message, and diminishes the effectiveness of fuzzing the number of response categories explored by Snipuzz far ex-
because the input generation of Snipuzz relies on responses. ceeded other methods on most devices, no matter how long the
analysis duration was (in 10 minutes or 24 hours).
Our approach. In this paper, we propose a novel and automatic
• Implementation and vulnerability findings. We implemented
black-box IoT fuzzing, named Snipuzz, to detect vulnerabilities in
the prototype of Snipuzz.1 We used it to test 20 real-world
IoT firmware. Different from other existing IoT fuzzing approaches,
consumer-grade IoT devices while comparing with the state-
Snipuzz implements a snippet-based mutation strategy which uti-
of-the-art fuzzing tools, i.e., IoTFuzzer, Doona, Boofuzz, and
lizes feedback from IoT devices to guide the fuzzing. Specifically,
Nemesys. In 5 out of 20 devices, Snipuzz successfully found 5
Snipuzz uses a novel heuristic algorithm to detect the role of each
zero-day vulnerabilities, including null pointer exceptions, denial
byte in the message. It will first mutate bytes in a message one by
one to generate probe messages, and categorize the correspond-
ing responses collected from device. Adjacent bytes that have the 1 Publicly available at https://github.com/XtEsco/Snipuzz.
2
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea

of service, and unknown crashes, and 3 of them could be exposed feedback mechanism to guide the fuzzing process. Without feed-
only by Snipuzz. back mechanism, the fuzzing tests could be blind in the selection
of mutation targets, and may lean to a brute force random test.
2 BACKGROUND As discussed previously, due to the lack of open-sourced firmware,
it is difficult or even impossible to instrument the IoT devices. There-
2.1 Fuzz Testing fore, the response messages returned by the firmware can be re-
Fuzzing is a powerful automatic testing tool to detect software garded as a valuable source of device status information at run-time.
vulnerabilities. After decades of development, fuzzing has been The Replier in Figure 1 will use the value of the variable code to
widely used as a base in several security testing domains, such as determine the content of the response messages. The value of code
the OS kernel [12, 36], servers [33], and the blockchain [3]. comes from many different function blocks in the firmware. Pa-
In general, fuzzing feeds the target programs with numerous rameters are passed when Sanitizer fails to parse the input or some
mutated inputs and monitors exceptions (e.g., crashes). If an execu- exceptions are triggered; or when the Function Switch cannot match
tion reveals undesired behavior, a vulnerability could be detected. the key command characters in the input; or after each input is
To discover vulnerabilities more effectively, fuzzing algorithms op- executed in the Functions. Therefore, through the content of the
timize the mutation process based on feedback of executions (e.g., response message, the code block that has been executed in the
coverage knowledge), instead of using a purely random mutation firmware can be inferred. When the firmware source code is not
strategy. Moreover, fuzzers can judge from the feedback mechanism available, the correspondence between the firmware execution and
whether each test case generated by seed mutation is “interesting” the response messages cannot be directly extracted. Moreover, the
(i.e., whether the test case has explored unseen execution states). If a firmware may return the same response messages even executing
test case is interesting, it will be reserved as a new seed to participate different functions.
in future mutation. With the feedback, many fuzzers [4, 5, 29, 41, 49] Although the response message cannot be equated to the exe-
steer the computing resources towards the interesting test cases cution path of the device, it can still play an important role in the
and achieve higher possibility to discover vulnerabilities. black-box fuzz testing for IoT devices. Although it is hard to link
the code execution path corresponding to each response message,
2.2 Generic Communication Architecture of if the two inputs get different response messages, we can deduce
IoT Devices that the two inputs go to different firmware code execution paths.
To react with external inputs, most IoT devices implement a similar Our approach. Snipuzz uses the response message to establish
high-level communication architecture. As per the pseudo code a new feedback mechanism. Snipuzz will collect every response,
example presented in Figure 1, a typical implementation of the and when a new response is found, the input corresponding to the
communication architecture may consist of four parts: 1) Sanitizer, response will be queued as a seed for subsequent mutation testing.
2) Function Switch, 3) Function Definitions, and 4) Replier.
When an IoT device receives an external input, Sanitizer starts
3.2 Message Snippet Inference
parsing the input and performs regular matching. If the input for-
mat breaches the syntactic requirements, or an exception occurs The firmware of the IoT device can be regarded as a software pro-
during the parsing process, Sanitizer will directly notify Replier by gram with strict syntax requirements for input. If the byte-based
sending a response message describing the input error and termi- mutation strategies (such as mutating each byte in the input one
nate the processing of input. If the input is syntactically correct, by one or randomly selecting bytes for mutation testing) are used
Function Switch transfers control to the corresponding Functions in the fuzz testing, the generated test cases could be rare to meet
according to the attribute, Key, and corresponding value, val, ex- the input syntax requirements. The grammar-based fuzzers utilize
tracted from the input. If Key cannot be matched, the processing of detailed documents or a large training data set to learn the gram-
this input will be terminated, similarly as done by Replier. When matical rules and use it to guide the generation of mutation [34, 40].
Functions completes the processing, such as setFlow(), with the In many cases, the input syntax in IoT devices is diverse or non-
parameter val, it notifies Replier to generate the response message. standard. Table 1 shows the communication format requirements
Note that, the implementation of Functions is specific to IoT devices. used in 20 IoT devices from different vendors. Some of them are
As described above, Replier is responsible for sending responses using well-known formats such as JSON and SOAP, but some use
to the client (such as the user’s APP). Based on the calling situ- Key-value pairs or even custom strings as communication format.
ation (indicated by the parameter code in the example), Replier Therefore, it is difficult to provide grammar specifications or estab-
determines the content of response message to be sent. lish training data sets that cover communication formats on a large
scale for the grammar-based mutation strategy.
3 MOTIVATION The best grammar guidance originates from the firmware itself.
Responses from IoT devices suggest the execution results of mes-
3.1 Response-Based Feedback Mechanism sages. If we mutate a valid message byte by byte (i.e., breaching the
The interactive capabilities of IoT devices make it possible to test se- format), we will get many different responses. If mutation of two
curity of device firmware through the network. However, there are different positions in the valid message receives the same response,
also some challenges when testing IoT devices using network-based these two positions have a high possibility that they are related to
fuzzers. Since most network fuzzing methods cannot directly obtain the same functionality in firmware. Therefore, those consecutive
execution status of the device, it is hard to establish an effective bytes with the same response can be merged into one snippet. This
3
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang

Figure 1: Interaction with IoT Firmware. Most implementations of IoT devices have a similar communication architecture, including Sanitizer,
Function Switch, Function Definitions, and Replier. If the Sanitizer and the Function Switch perform correctly, corresponding functionalities
will be executed. Except for crashes, the Replier will always send responses to clients.

method of inferring message snippets can clearly reflect the util- Section 4.3). Throughout the fuzzing process, Snipuzz sets up a net-
ity of each byte after entering the firmware. In addition, mutation work monitor to detect crashes which may indicate vulnerabilities
based on message snippets can largely reduce the search space and (Section 4.4).
improve the efficiency of fuzzing.
Our approach. Snipuzz merges consecutive bytes with the same 4.1 Message Sequence Acquisition
response into one snippet. We also propose different mutation
The quality of initial seeds could influence the fuzzing campaigns
operators performing on snippets.
significantly. Therefore, we consider to obtain high-quality initial
seeds conforming to highly-structured formats required by IoT
devices, as such inputs may exercise complex execution paths and
4 METHODOLOGY enlarge the opportunity of exposing vulnerabilities at deep code.
Generating seeds based on companion app reverse-engineering [9]
In order to clearly present our approach, we first introduce some
or accessible specifications (as mentioned in Section 3.2) could be
notations while explaining the fuzzing process of Snipuzz. At a high
intuitive solutions. However, they either require heavy engineering
level, Snipuzz performs as a client which sends a message sequence
efforts or could be error-prone (e.g., seeds may violate the required
𝑀 to request certain actions from IoT devices. Any message 𝑚 ∈ 𝑀
formats or have the wrong order of messages).
requests the IoT device to perform a certain functionality, and all
Ð Initial seed acquisition. Snipuzz proposes a lightweight solution
the messages 𝑚𝑘 = 𝑀 work together to request an action (or
𝑘 to obtain initial valid seeds. Considering that many IoT devices have
actions). Similarly to the typical fuzzers, we initialize a seed 𝑆 with first- or third-party API documents as well as the test suites, the
an initial message sequence, and a seed corpus 𝐶 with all the testing programs provided by both parties can effectively act as a
seeds (Section 4.1). Meanwhile, restoring message sequences are client, sending control commands to IoT devices or remote servers.
collected for resetting the IoT device to a predefined status. Most structural information (e.g., header, message content) and
To establish an effective fuzzing, as depicted in Figure 2, Snipuzz protocols (e.g., HTTP, HNAP, MQTT) of communication packets
first conducts a snippet determination process. Concretely, Snipuzz are defined in the API programs as message payloads. Therefore,
selects a message 𝑚 in a seed 𝑆 ⊂ 𝐶, from which a probe message Snipuzz leverages these test suites to communicate with the target
𝑝𝑖 and a corresponding sequence 𝑀𝑖 will be generated. Each mes- devices, while at the same time, extracting the message sequences
sage in 𝑀𝑖 will trigger a response message 𝑟𝑖 (response for short) as initial seeds. For example, when using an API program to turn
containing the information about the execution output. Snipuzz on a light bulb, the program first sends login information to the
assigns each message 𝑚 a response pool 𝑅, which is utilized to de- server or to the IoT device, then sends a message to locate a specific
termine if a new response 𝑟𝑖 is unique. The uniqueness of a response light bulb device, and finally sends a message to control the device
indicates that it does not belong to any category of responses ex- to turn on the light. Snipuzz captures such a message sequence
isted in the response pool. If 𝑟𝑖 is unique, Snipuzz will add 𝑟𝑖 into that triggers a functionality of IoT device as an initial seed.
the pool 𝑅, and reserve the corresponding message sequence 𝑀𝑖 Restoring message sequence acquisition. In order to replay a
as a new seed. Snipuzz then divides the message 𝑚 into different test case for the crash triage, Snipuzz ensures that the device under
snippets based on the responses (Section 4.2). Upon the snippets are test has the same initial state in each round of testing. After sending
obtained, Snipuzz performs mutation according to various strate- any message sequence to the device, Snipuzz will send a restoring
gies, e.g., empty, bytes flip, data boundary, or havoc (detailed in message sequence to reset the device to a predefined status.
4
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea

Figure 2: Workflow of Snipuzz. With the valid message sequences (seeds), Snipuzz performs snippet determination on each individual mes-
sage. Then, Snipuzz mutates snippet(s) to generate new message sequences. By monitoring the network traffic, Snipuzz determines a crash
when no responses are received.

Manual efforts. Although we try our best efforts to provide a Snipuzz first uses a heuristic algorithm to roughly divide each
lightweight fuzzer, Snipuzz still requires some manual efforts to message into initial snippets. The core idea of the heuristic algo-
obtain valid and usable initial seeds. First, we manually configure rithm is to generate probe messages 𝑝𝑖 by deleting a certain byte
the programs from the test suites, such as setting up the IP address in the message 𝑚 (𝑚 ∈ 𝑠𝑒𝑒𝑑 𝑆). By categorizing the responses 𝑟𝑖 of
and the login information. Note that, we only need to configure each probe message, Snipuzz preliminarily determines the snippets
these programs once per device. Second, to capture the message in the message 𝑚.
sequences dynamically, we need to manually define the specific For example, as shown in Table 2, to determine snippets in the
format and protocol in the network traffic monitor. Finally, we message 𝑚 = {"on":true}, Snipuzz generates probe messages by re-
filter out some message sequences that will mislead the fuzzing moving the bytes in 𝑚 one by one. When the first byte ‘{’ in 𝑚 is
process. For instance, some API programs provide operations that deleted, the corresponding probe message 𝑝 1 is "on":true}. Similarly,
can automatically update or restart the device. These operations will when the second byte is deleted, the corresponding probe message
halt the device and thus no response will be sent back. This leads to 𝑝 2 is {on":true}. Therefore, the message 𝑚 with 11 bytes can gener-
false-positive crashes because we consider a no-response execution ate 11 different probe messages (𝑝 1 to 𝑝 11 ). Snipuzz will send the
as a crash. The manual work costs roughly 5 man-hours per device 11 corresponding message sequences (𝑀1 to 𝑀11 ) containing the
and is only required during the message sequence acquisition phase probe messages to the device and collect responses.
of Snipuzz. Snipuzz then distinguishes the snippets in the message 𝑚 by cat-
egorizing the responses. Specifically, the consecutive bytes with the
same corresponding response type are merged into the same snip-
4.2 Snippet Determination
pet. According to the examples illustrated in Table 2, the Response
The key idea of Snipuzz is to optimize fuzzing process based on 𝑟 1 , 𝑟 2 , and 𝑟 5 are merged into one category that indicates an error
snippets determined by responses. Put differently, Snipuzz lever- in JSON syntax, while Response 𝑟 3 and 𝑟 4 are merged into another
ages snippet mutation to reduce the search space of inputs, while category which indicates an error of an invalid input parameter.
the snippets are automatically clustered via categorizing responses Therefore, the consecutive bytes whose corresponding responses
from IoT devices. The major challenge is to correctly understand belong to the same category can form a message snippet. Through
the semantics of responses. For instance, due to the presence of this heuristic approach, Snipuzz can determine all initial snippets
timestamp, two semantically identical responses will be classified in the message 𝑚.
into different categories if utilizing a simple string comparison. A naive method to categorize responses is to utilize a string
Therefore, Snipuzz utilizes a heuristic algorithm and a hierarchical comparison, i.e., comparing the content of responses byte by byte.
clustering approach to determine the snippets in each message. However, due to the existence of randomness in responses (e.g.,
timestamp and token), a simple string comparison may incorrectly
4.2.1 Initial Determination. The essence of a message snippet is distinguish the responses with same semantic meaning into dif-
the consecutive bytes in a message that enables the firmware to ferent categories. Therefore, a more advanced solution, Edit Dis-
execute a specific code segment. For experienced experts, it is not tance [42], is introduced to determine the category of responses.
difficult to segment message snippets according to the semantic As shown in Equation (1), a similarity score, 𝑠𝑘𝑡 , between two re-
definition in official documents. However, for algorithms that lack sponses 𝑟𝑘 and 𝑟𝑡 is calculated.
such knowledge, it is essential to apply some automatic approaches
to identify the meaning of each byte in the message.
5
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang

Table 2: Examples of probe messages and corresponding response messages.

Messages Content Responses Content Category
Message 𝑚 {"on":true} Response 𝑟 0 {"success":"/lights/1/state/on":true} 0
Probe message 𝑝 1 "on":true} Response 𝑟 1 {"error":{"type":2,"address":"/lights/1/state","description":"body contains invalid json"}} 1
Probe message 𝑝 2 {on":true} Response 𝑟 2 {"error":{"type":2,"address":"/lights/1/state","description":"body contains invalid json"}} 1
Probe message 𝑝 3 {"n":true} Response 𝑟 3 {"error":{"type":6,"address":"/lights/1/state/n","description":"parameter, n, not available"}} 2
Probe message 𝑝 4 {"o":true} Response 𝑟 4 {"error":{"type":6,"address":"/lights/1/state/o","description":"parameter, o, not available"}} 3
Probe message 𝑝 5 {"on:true} Response 𝑟 5 {"error":{"type":2,"address":"/lights/1/state","description":"body contains invalid json"}} 1
Probe message 𝑝 11 {"on":true Response 𝑟 11 {"error":{"type":2,"address":"/lights/1/state","description":"body contains invalid json"}} 1

4.2.2 Hierarchical Clustering. Although Snipuzz utilizes similarity

comparison to mitigate the mis-categorization caused by random-
ness in responses, two semantically identical responses may still be
mis-categorized into different categories. This could occur when
Figure 3: An example of snippet determination. the responses contain contents extracted or copied from probe mes-
sages. For example, due to the quotation of specific error contents
from probe messages, the heuristic algorithm will not assign them to
one category. Specifically, the similarity score 𝑠 34 of 𝑚 = {"on":true}
𝑒𝑑𝑖𝑡_𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝑟𝑘 , 𝑟𝑡 )
𝑠𝑘𝑡 = 1 − , (1) in Table 2 is 0.979, which is smaller than the self-similarity scores
𝑚𝑎𝑥_𝑙𝑒𝑛(𝑟𝑘 , 𝑟𝑡 )
𝑠 33 = 1.000 and 𝑠 44 = 1.000 (as there is no randomness in the re-
sponses). However, these two responses are semantically identical
where the 𝑚𝑎𝑥_𝑙𝑒𝑛() in the equation selects the longer string be- and should be identified into one category, i.e., they are both error
tween the two responses and the 𝑒𝑑𝑖𝑡_𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 () counts the mini- messages, indicating parameter syntax errors are located in the
mum number of operations, including insertion, deletion, and sub- probe messages and the device is executing the same code block.
stitution, required to transform one string into the other. Therefore, In order to solve the aforementioned problem, Snipuzz uses
the more similar two responses are, the larger the value of 𝑠𝑘𝑡 is. agglomerative hierarchical clusters to refine message snippets. The
Snipuzz first calculates a self-similarity score 𝑠𝑖𝑖 for each probe core idea of hierarchical clustering is to continuously merge the
message 𝑝𝑖 . Note that 𝑝𝑖 is generated by mutating the 𝑖-th byte in two most similar clusters until only one cluster remains.
the message 𝑚. Concretely, Snipuzz sends the same probe message As shown in Algorithm 1, Snipuzz will initialize the snippets
𝑝𝑖 twice within an interval of one second. Two responses 𝑟𝑖 , 𝑟𝑖′ according to Initial Snippets determined in Section 4.2.1 (line 1).
will be collected from the IoT device, correspondingly. The self- After that, each response category in the response pool will be
similarity score 𝑠𝑖𝑖 is then calculated based on the two responses initialized as a cluster (line 2). Snipuzz will convert the responses
𝑟𝑖 , 𝑟𝑖′ according to Equation (1). Note that, due to the randomness in into feature vectors (line 3, detailed in the later paragraph) which
the responses, there could be differences between the two responses will be used to compute the distance between each pair of clusters
𝑟𝑖 , 𝑟𝑖′ , even though they are from the same probe message. Therefore, (lines 5-7). Then the two closest clusters will be merged and the
the self-similarity score could be smaller than 1. cluster center will be updated accordingly (lines 8-10). After per-
To determine whether two responses belong to the same cate- forming the cluster process, Snipuzz will generate new snippets
gory, Snipuzz computes the similarity score of two responses and according to the current cluster result and add the new snippets
compares it with the self-similarity score. For example, for two into the snippet segmentation result (line 11), which will be further
responses 𝑟𝑖 and 𝑟 𝑗 , Snipuzz uses the Equation (1) to compute the used for mutation.
similarity score 𝑠𝑖 𝑗 . After that, 𝑠𝑖 𝑗 will be compared with the self- Concretely, Snipuzz first extracts features from responses, which
similarity. If 𝑠𝑖 𝑗 >= 𝑠𝑖𝑖 or 𝑠𝑖 𝑗 >= 𝑠 𝑗 𝑗 satisfies, responses 𝑟𝑖 and vectorize responses into tuples of the self-similarity score, the
𝑟 𝑗 will be considered belonging to the same category; otherwise, length of the response, the number of alphabetic segments, the
responses 𝑟𝑖 and 𝑟 𝑗 are then assigned to the different categories. number of numeric segments, and the number of symbol segments.
For a newly received response 𝑟𝑖 , Snipuzz will compare it with Each segment consists of consecutive bytes that have the same type.
all the responses in the corresponding response pool 𝑅 based on For instance, “123” is 1 numeric segment, and there are 2 alphabetic
the similarity score. If the new response 𝑟𝑖 does not belong to any segments and 1 numeric segment in “𝑎1𝑏”. More specifically, the
existing category, the response 𝑟𝑖 as well as the corresponding 𝑟 1 in Table 2 will be vectorized to 𝑣 1 = (1, 91, 10, 2, 10). Similarly,
probe message 𝑝𝑖 will be added into the Response Pool. responses 𝑟 3 and 𝑟 4 will be converted to 𝑣 2 = (1, 94, 11, 2, 13) and
With the response pool 𝑅, Snipuzz categories each byte in the 𝑣 3 = (1, 94, 11, 2, 13).
message 𝑚. Specifically, the category of the 𝑖-th byte in message Figure 4 shows an example of clustering according to the message
𝑚 is assigned according to the category of response 𝑟𝑖 . Then the 𝑚 = {"on":true} in Table 2. According to the Algorithm 1, in the
consecutive bytes with the same category will be merged into one preparation round (0th round) of clustering, each category in the
snippet. Figure 3 shows an example of the initial snippet determi- response pool will be initialized a single cluster. In the 1st round,
nation on the message 𝑚 = {"on":true} according to the response as clusters 2 and 3 are the two clusters with minimum distance
categories in Table 2.
6
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea

Algorithm 1: Hierarchical Clustering for Snippets • Dictionary. For the scheme of Dictionary, Snipuzz replaces a
Input: Initial Snippets 𝐹 0 , Response Pool 𝑅 snippet with a pre-defined string such as “true” and “false”, which
Result: Snippets 𝐹 may directly explore more code coverage.
1 𝐹 ← 𝐹0 ; • Repeat. In order to detect bugs in syntax parsers, Snipuzz repeats
2 𝐶 ← 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑧𝑒 (𝐹 0 ); a snippet for multiple times. Meanwhile, the repetition of data
3 𝑉 ← 𝑣𝑒𝑐𝑡𝑜𝑟𝑖𝑧𝑒 (𝑅); domain can detect defects caused by out-of-boundary problems.
4 while size(𝐶) > 1 do
5 for 𝑖 ← 𝑠𝑖𝑧𝑒 (𝐶) to 2 do Havoc. The conditions for triggering bugs may be complicated.
6 for 𝑗 ← 𝑠𝑖𝑧𝑒 (𝐶)-1 to 1 do For example, it may require modifying different data domains in
7 𝐷 ← 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑖 𝑗 = 𝑣𝑖 − 𝑣 𝑗 ; the same message to trigger a bug. The aforementioned snippet
end mutation schemes only mutate one snippet at a time. However,
end the havoc mutation randomly selects some random snippets in a
8 𝑖, 𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐷); message, and performs the aforementioned mutation schemes on
9 𝐶 ← 𝑚𝑒𝑟𝑔𝑒_𝑐𝑙𝑢𝑠𝑡𝑒𝑟 (𝐶, 𝑖, 𝑗); each of the selected snippets. Havoc mutation will not stop until
10 𝑉 ← 𝑢𝑝𝑑𝑎𝑡𝑒_𝑐𝑙𝑢𝑠𝑡𝑒𝑟 _𝑐𝑒𝑛𝑡𝑒𝑟 (𝑉 , 𝑖, 𝑗); finding a new response category or the target IoT device crashes.
11 𝐹 ← 𝐹 + 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒_𝑠𝑛𝑖𝑝𝑝𝑒𝑡𝑠 (𝐶);
end 4.4 Network Traffic Monitor
The network communication of the device is monitored and a time-
out is set to determine whether the device has been crashed. In fact,
the monitoring of device network communication is not a single
step, and it occurs during the entire fuzzing process. In case of
timeout, Snipuzz will continue to send the same message sequence
for three times, as the cause of timeout could be network fluctua-
tions instead of device crashes. If the timeout occurs for three times,
Snipuzz will use the control command to physically restart the
device and send the same sequence of messages to the device again.
If the device still does not return the message on time, Snipuzz will
record the crash and the corresponding message sequence.

Figure 4: An example of hierarchical clustering. 4.5 Implementation

The design of Snipuzz consists of four steps: Message Sequence
(∥𝑣 2 − 𝑣 3 ∥ = 0), the two clusters are merged into a new cluster.
Acquisition, Snippet Determination, Mutation, and Network
Correspondingly, the message snippets ‘o’ and ‘n’ are merged into a
Communication Monitoring. In the Message Sequence Acqui-
new snippet, marked with index #4. Similarly, in the next round, the
sition step, we use WireShark [45] in the program to detect and
two closest clusters, the cluster 1 and the new cluster, are merged,
record the communication packets between the API and the IoT
and a new snippet will also be generated. Finally, all snippets in the
device, and manually cleaned these message sequences. The remain-
message are merged into one new snippet, i.e., the message itself.
ing core functional steps are packaged in a prototype implemented
All the new generated snippets together with the initial snippets
with 4,000 lines of C# code. The network monitor will record every
will be used in message mutation in the next stage.
message sent to the device, and send the information to the device
again when the device does not reply. A smart plug was used to
4.3 Mutation Schemes implement the physical restart function of the target device. When
Snippet Mutation. In order to conduct an efficient fuzzing, Snipuzz Snipuzz needs to physically restart the device under test, it will
mutates the snippets obtained in the stage of Snippet Determina- send control messages to the smart plug, and the plug will be closed
tion. Note that the mutation schemes are performed on the entire and then opened. In this way, the device under test will be powered
snippet instead of a single byte in a message. off briefly and restarted.
• Empty. The empty of a data domain may crash the firmware
if the data domain is not properly checked. Therefore, Snipuzz 5 EXPERIMENTAL EVALUATION
deletes an entire snippet to empty the data domain.
• Byte Flip. To detect bugs in both the syntax parsers and the 5.1 Experiment Setup
functional code, Snipuzz flips all bytes in a snippet. This changes Environment setup. To initialize IoT devices, we use the applica-
the syntactic meaning of strings and will discover bugs when the tions provided by the manufacturers to complete the pairing. In
parser does not properly check syntax. On the other hand, Byte order to better monitor the network communication, all devices
Flip changes the values of data domains to examine firmware. under test are connected to a local router. Our automatic packet
• Data Boundary. To detect the out-of-bound bugs that occur extractor and Snipuzz run on a Windows 10 desktop PC with Intel
during assignment, Snipuzz modifies the values of numeric data Core i7 six-core x 3.70 GHz CPU and 16 GB RAM. The PC is also
to some boundary values (e.g., 65536). connected to the router.
7
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang

IoT Devices under test. We have selected 20 popular consumer • Doona [44]. Doona is a fork of the Bruterforce Exploit Detec-
IoT devices from both online and offline markets worldwide, cover- tor (BED) [6], which is designed to detect potential vulnerabili-
ing various well-known brands, such as Philips, Xiaomi, TP-Link, ties related to buffer and formats in network protocol. Different
Netgear. The types of selected IoT devices include smart plugs, from other tools, Doona does not take network communication
smart bulbs, routers, home bridge, IP camera, fingerprint terminal, packets as seeds. The test cases of Doona are required to be
etc. These devices are either recommended items in Amazon or the pre-defined for each device or protocol under test.
best-selling products that can be bought in supermarkets. Table 1 • Snipuzz-NoSnippet. Snipuzz uses the segmentation of mes-
details the information of the IoT devices under test. sage snippets to enhance the efficiency of fuzzing and the ability
Benchmark tools. In order to verify Snipuzz’s performance in to find crashes. In order to verify whether the snippet determina-
finding crashes and message segmentation, we used seven different tion indeed benefits fuzzing, we implement Snipuzz-NoSnippet
fuzzing schemes as benchmarks. based on Snipuzz. Snipuzz-NoSnippet does not have the com-
ponent of snippet determination, and blindly mutates bytes in
messages without the knowledge of responses.

• IoTFuzzer [9]. The core idea of IotFuzzer is to find the func- Except for Doona, whose test cases are preset, all benchmark
tions that send control commands to the IoT device by static tools and Snipuzz are tested on same input sets. These input sets
analysis of companion apps, and to mutate the value of specific may be in different formats (e.g., BooFuzz requires to manually
variables to perform fuzzing test without breaking the message set the input, and Numesys requires the input to be the pcap file
format. Note that our implementation of IoTFuzzer is the best format), but the content is the same.
effort to replicate since their code is not publicly available, and There are many other popular fuzzing tools which are able to
we acknowledge that this could provide slightly different results test IoT devices via network communication, such as Peach [30]
with respect to the original version. and AFLNET [33]. However, since they are grey-box fuzzing that
We implement the IoTFuzzer by replacing the mutation algo- requires to instrument firmware, it is infeasible and unfair to regard
rithm in Snipuzz framework with the mutation strategies in those tools as baselines for black-box schemes.
IoTFuzzer. Considering that the purpose of companion apps
analysis in IoTFuzzer is to ensure that only the data domain in
the communication message is mutated, to make the benchmark 5.2 Vulnerability Identification
as fair as possible, we use seeds same as the ones used in Snipuzz 5.2.1 Snipuzz. After performing fuzz testing using Snipuzz on
and manually segment the data domain of each seed message each of the 20 IoT devices for 24 hours, we detected 13 crashes in 5
before feeding it to IoTFuzzer. We believe that such manual seg- devices. As shown in Table 3, the detected crashes include 7 null
mentation is sufficient to provide an upper bound performance pointer dereferences, 1 denial of service, and 5 unknown crashes
of IoTFuzzer. Note that we remove the methods that are related that we further manually verified. The 13 crashes found by Snipuzz
to the feedback mechanism and snippet segmentation because are triggered by providing malformed inputs. These malformed
these methods are not used in IoTFuzzer. inputs break the message format in different ways. For example,
• Nemesys [22]. Nemesys is a protocol reverse engineering tool deleting placeholders, emptying the data domain or fortunately
for network message analysis. It utilizes the distribution of value changing the type of data value.
changes in a single message to infer the boundaries of each data Note that all the crashes identified by Snipuzz are in JSON-
domain. Considering that Nemesys is a protocol inference method based devices, although we successfully conducted experiments on
instead of an off-the-shelf fuzzing tool, we implement the method the 20 IoT devices with various communication formats, such as
of Nemesys based on the Snipuzz framework to infer the snip- JSON, SOAP, and K-V pair. The experiments also show that Snipuzz
pet boundary, replacing corresponding snippet determination observes a higher number of response categories compared to the
method (Section 4.2). other fuzzers (as detailed in Section 5.3).
• BooFuzz [31]. As a successor of Sulley [19], BooFuzz is an ex- Null pointer dereferences. As shown in Table 3, the 7 crashes
cellent network protocol fuzzer that has been involved in several triggered by Snipuzz in TP-Link HS110 and HS100 are all caused
recent fuzzing research [9, 37, 48]. Different from other automatic by null pointer dereferences. After sending the test cases to HS110
fuzzers, BooFuzz requires human-guided message segmentation and HS100, the devices crashed, unable to reply to any interaction.
strategies as inputs. In our research, we leverage this property However, after a few minutes, the devices automatically restarted
and manually define more fuzzing strategies to enrich the bench- and recovered to the initial state. Based on the analysis of test
mark evaluation. cases, we found that the vulnerabilities are all triggered by mes-
– BooFuzz-Default. In this strategy, we set each message in sages that mutated in JSON syntax. Put differently, when some
the input as a complete string, that is, BooFuzz will use the important placeholders, such as curly braces and colons, or a part
message as a string for mutation testing. of the test message are mutated, the syntax structure and the se-
– BooFuzz-Byte. Each byte of the message in the input will be mantic meaning of the message are broken. If the device cannot
used for a mutation test individually. handle the mutated input message properly, it will crash the device.
– BooFuzz-Reversal. Contrary to the idea of IoTFuzzer, in this We reported the vulnerabilities to the device vendor, TP-Link, via
strategy, we focus on the mutation of non-data domain in the email on June 13, 2020. They have confirmed the vulnerability and
message, while keeping data domain unchanged. promised to fix it through a firmware update.
8
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea

Table 3: Experiment Results. Snipuzz discovers the most number of categories and exposes the most number of bugs.
Snipuzz IoTFuzzer Doona BooFuzz-Default BooFuzz-Byte BooFuzz-Reversal Nemesys Snipuzz-NoSnippet
# Devices
T C 10/24 T C 10/24 C 10/24 C 10/24 C 10/24 C 10/24 C 10/24 C 10/24
1 YLDP05YL UC 3∗ 46/71 UC 1∗ 31/33 NA NA/NA 0 11/17 0 11/41 0 11/22 0 26/61 0 21/69
2 YLDP13YL UC 2∗ 35/76 UC 1∗ 20/24 NA NA/NA 0 8/18 0 8/42 0 8/22 0 18/62 0 22/70
3 A60 DoS 1 28/41 / 0 18/22 0 5/16 0 7/13 0 8/33 0 5/21 0 22/36 0 20/39
4 Mini C / 0 46/72 / 0 18/31 0 7/15 0 5/11 0 6/31 0 5/21 0 18/68 0 18/70
5 BR30 / 0 28/51 / 0 8/19 NA NA/NA 0 4/11 0 4/31 0 4/20 0 13/40 0 13/48
6 Hue / 0 65/110 / 0 29/36 0 4/11 0 7/11 0 9/31 0 7/25 0 34/110 0 22/99
7 Base Station / 0 34/51 / 0 29/33 0 7/16 0 6/9 0 9/17 0 7/13 0 19/38 0 23/50
8 HS100 NPD 3 24/64 / 0 20/27 NA NA/NA 0 6/13 0 6/31 0 6/22 0 20/64 0 19/71
9 HS110 NPD 4 24/79 / 0 17/22 NA NA/NA 0 6/14 0 9/33 0 6/22 0 20/62 0 19/78
10 F7C027au / 0 13/21 / 0 7/10 0 6/14 0 8/12 0 6/18 0 6/15 0 8/14 0 12/21
11 MSS310 / 0 42/61 / 0 15/17 0 8/16 0 5/11 0 8/45 0 8/21 0 30/59 0 20/61
12 B25AUS / 0 19/42 / 0 8/13 0 7/19 0 7/14 0 11/17 0 7/11 0 16/36 0 9/41
13 Mini US / 0 25/61 / 0 8/41 NA NA/NA 0 7/16 0 7/35 0 7/22 0 9/55 0 8/49
14 SP4L-AU / 0 37/43 / 0 18/32 0 5/11 0 5/17 0 7/32 0 5/23 0 23/40 0 17/40
15 R6400 / 0 11/37 / 0 20/24 0 4/13 0 3/12 0 4/24 0 4/18 0 6/30 0 6/41
16 WL100 / 0 53/81 / 0 38/44 NA NA/NA 0 8/16 0 8/46 0 8/27 0 41/70 0 29/76
17 Alro Pro 2 / 0 25/36 / 0 16/22 0 10/14 0 8/13 0 14/22 0 10/17 0 18/22 0 13/41
18 F19821W / 0 39/75 / 0 36/33 0 7/13 0 5/11 0 7/23 0 7/14 0 27/65 0 21/76
19 T-131P / 0 36/80 / 0 9/22 0 7/16 0 7/20 0 9/42 0 7/35 0 21/65 0 20/91
20 RM mini 3 / 0 14/36 / 0 9/30 NA NA/NA 0 10/17 0 14/31 0 10/23 0 6/30 0 5/35
UC: Unknown crash. NPD: Null pointer dereference. DoS: Denial of service. T: Vulnerability type. C: Number of crashes. 10/24: Number of response categories (10 minutes/24 hours).
∗ : Remotely exploitable. NA: Since Doona is only applicable to some network protocols, devices that cannot be tested are represented by ‘NA’.

Table 4: Mutated messages of Snipuzz & IoTFuzzer. all devices, which also limits its capacity. Since Boofuzz directly re-
Contents of mutated messages Generated by places the specified positions in the message with a preset string, it
can only trigger a limited types of vulnerabilities. Nemesys offers a
{"{"id": 0, "method": "start_cf", "params": ["4, 4, "1000,
Original Message new idea of determining message snippets. However, since it deter-
2, 2700,100,500 ,1,255,10,5000,7,0,0,500,2,5000,1"]}"
mines message snippets by the distribution of values in messages, it
{"{"id": 0, "method": "start_cf", "params": ["4, , "1000,
Snipuzz is difficult for Nemesys to accurately decide the boundary between
2, 2700,100,500 ,1,255,10,5000,7,0,0,500,2,5000,1"]}"
data and non-data domains. Therefore, Nemesys can hardly detect
{"{"id": 0, "method": "start_cf", "params": [", 4, "1000,
IoTFuzzer vulnerabilities that can only be triggered by mutating the data or
2, 270000,100,500 ,1,255,10,5000,7,0,0,500,2,5000,1"]}"
non-data domains. Snipuzz-NoSnippet, which does not apply the
Denial of service. Another interesting finding is the denial of snippet-based mutation method used in Snipuzz, is similar to the
service vulnerability detected in Philips A60 smart bulb. After being classic fuzzer AFL[24]. Since Snipuzz-NoSnippet does not infer
tested by Snipuzz for 24 hours, Philips’ official companion app could the structure of the message but directly uses single or multiple
not manage the device normally. Specifically, the device cannot be consecutive bytes as the unit of mutation, most of the test cases
found in the app and if any further messages are sent through the generated by Snipuzz-NoSnippet destroy the structure of the mes-
app, the response in the app will keep asking to bound the device to sages. Such a method is difficult to work on devices that require
a device group and no further interaction is available. However, we highly-structured inputs.
observe that if the message packet is sent directly to the device, the IoTFuzzer detected 2 crashes in 2 smart bulb devices, i.e., the
device can work normally. This indicates that the device does not YLDP05Y and YLDP013Y. Due to the mutation strategy of IoT-
completely crash but its service via the companion app is denied. Fuzzer, the malformed input provided by IoTFuzzer is obtained
Unknown crashes. Snipuzz found 5 crashes on Yeelight bulbs, by emptying the data domain. According to the mutated messages
YLDP05YL, and YLDP13YL. The devices crashed and restarted by listed in Table 4, we can see that the messages mutated by IoT-
themselves within roughly one minute. By analyzing the test cases, Fuzzer resemble the ones generated by Snipuzz. The mutated do-
we found that the crashes are due to the deletion of certain data mains of messages from Snipuzz and IoTFuzzer in Table 4 are all in
domains, such as the nullify of parameters, marked as red in Table 4. the data domain. In terms of the effect of the mutation test, Snipuzz
As the firmware of the 2 devices is not publicly available, the root and IoTFuzzer achieve the same goal on these two messages. How-
cause of the vulnerability cannot be determined; However, we can ever, Snipuzz can cover the mutation space of IoTFuzzer because
still deduce that the vulnerability is due to the device reading in IoTFuzzer only focuses on the data domain mutation while Snipuzz
null values during the parsing process, causing a crash during the can mutate both the data and non-data domains.
assignment. We also find that communication using a local network To further determine the root cause of the crash, we obtained
does not require any authentication, which means that the device the firmware source code of HS100 and HS110, two typical mar-
can be crashed by any attackers in the local network. Therefore, ket consumer-grade smart plugs manufactured by TP-Link, and
we consider the vulnerabilities as ‘remotely exploitable’. conducted a case study which reflected the differences between
Snipuzz and IoTFuzzer. We found that one of the crashes triggered
5.2.2 Benchmark with state-of-the-art tools. As shown in Table 3, by Snipuzz on the two devices is caused by breaking the syntax
for 24 hours fuzz testing on each devices, none of the benchmark structure and mutating both on data and non-data domains. More
tools found a crash except for IoTFuzzer. They did not find the specifically, the mutated messages successfully bypassed the sani-
crash due to various reasons. Donna focuses more on the mutation tizer and triggered the crash during function execution. We deduce
of communication protocols. Further, Donna cannot be applied on
9
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang

of responses on each device. The limitation of category discovery

is due to the mutation strategy of Boofuzz, which replaces the
target contents with a specific pre-defined string. For example,
using strings, such as “/./././././././.”, to replace the content
of messages in different strategies (e.g., replacement of the entire
strings, a single byte, or a non-data domain), causes the violation
of message format and could be easily rejected by the sanitizer.
Therefore, most of the responses obtained by Boofuzz fall into the
category of “error responses”.
The number of response categories explored by IotFuzzer grows
rapidly within a short period of time and then stagnates. In the
mutation stage, IotFuzzer randomly selects a set of inputs from the
original candidate inputs and randomly mutates the data domain
for one or more message(s). It will continue to repeat this method
until the device crashes or reaches the time limit. Such a method
based on randomness helps IotFuzzer to mutate and test a large
number of message data domains in the original input and collect
response message categories quickly in the beginning. However,
the number of response categories found by IotFuzzer will soon
reaches the limitation due to the data domain mutation.
In most devices, Snipuzz has maintained a steady upward trend
in most cases, and after a period of time the number of response
categories found by Snipuzz exceeds IotFuzzer. Unlike IotFuzzer,
Figure 5: The number of categories discovered over time. Snipuzz mainly searches for the response categories through the
Snippet Determination stage. As per the message snippet explo-
that this could be caused by an error-prone third-party sanitizer ration strategy, Snipuzz first explores all the response categories
(more details could be found in Appendix B). On the other hand, due of a certain message as many as possible. After the snippets of a
to the design of IoTFuzzer, the fuzzing is based on the grammatical message are obtained and tested by Snippet Mutation, the next
rules as the IoTFuzzer tends to satisfy the grammar requirements message will be processed in the same way until all messages in
with first-priority, in order not to be rejected by the sanitizer and the initial message sequence have been tested. Followed by this
ensure that each test case can reach the functional execution part method, Snipuzz may not get a large number of response categories
in the firmware. Such strategy constraints the test range of fuzzing in a short time. When Snipuzz detects a message snippet, every
and its capacity to cover the sanitization part in comparison to byte in the message content will be included in the test. Therefore,
Snipuzz. Therefore, we argue that considering the complexity of as shown by the bold numbers in Table 3, for 15 out of 20 devices,
IoT firmware testing, a lightweight and effective black-box vulner- Snipuzz covers the most number of response categories after 24-
ability detection tool, such as Snipuzz, is a pressing need. hour fuzz testing, compared to other state-of-the-art IoT fuzzing
tools.
5.3 Runtime Performance On 5 devices, Snipuzz-NoSnippet collected more response cate-
Figure 5 shows how Snipuzz and the other seven fuzzers explored gories than Snipuzz within 24 hours. The mutation method used
the device firmware during the first 10 minutes. Limited by spac- by Snipuzz-NoSnippet is similar to the classic fuzzer AFL [24]. It
ing, we only present the results of 5 devices here but plot results of directly performs mutation on a single byte or several consecutive
all 20 devices in Appendix A. We repeated the fuzz testing for 10 bytes. However, Snipuzz-NoSnippet is difficult to cover response
times and recorded the medium values of the numbers of response categories that are not obtained by breaking the grammatical format
categories discovered by each method, indicating that the coverage (e.g., data out of bounds in the data domain). Theoretically, although
has been explored. We manually review the presented response the Snipuzz-NoSnippet mutation method is not so efficient, it still
categories to remove the mis-categorization caused by randomness has the capability to explore the most categories of responses.
in responses or the response mechanism of devices. Nemesys explores more categories of responses than BooFuzz
As shown in Figure 5, Doona can only detect a small number of and Doona, but does not exceed Snipuzz. The Nemesys strategy
response categories. Doona is protocol-based fuzzing methods, and performs deterministic mutations on each data domain of the mes-
its tests are more biased towards protocol content. The mutation sages in turn, which makes its trend of run-time performance simi-
test on the communication protocol has a high probability of being lar to Snipuzz. However, the data domain determination strategy
directly rejected or ignored by the device unilaterally, resulting in of Nemesys is not based on the responses from IoT device. Thus,
few categories of responses that can be received. the distribution of byte values in messages does not benefit in cov-
We implemented three fuzzing strategies based on Boofuzz, i.e., ering more response categories. Therefore, the number of response
mutating the whole message as a string, mutating each byte of categories collected by Nemesys is limited.
the message, and mutating non-data domain. However, the testing It is interesting to observe that, in the case of R6400, Snipuzz also
results indicate that all of them explored very limited categories enters a stagnation after only finding a few response categories. We
10
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea

Table 5: Inference results of Snipuzz and Nemesys. grammatical rules (i.e., ‘true’, ‘140’ and ‘254’) with some placehold-
Method Ave. Similarity Example ers (such as double quotes and curly brackets). After analyzing
the response messages, we found that the responses obtained after
Snipuzz 87.1% {" on ":true," sta ":140," bri ":254} destroying these data domains and destroying placeholders are
Nemesys 64.5% {"on": true ,"sta": 140 ,"bri" 254}
all about invalid format. This may be due to the fact that in the
Grammar 100.0% {" on ": true ," sta ": 140 ," bri ": 254 }
firmware, when an error occurs in the parsing format, the response
carefully checked the initial input message sequences and found does not report a detailed description of the error but instead returns
that the average length of the message exceeds 400 bytes, forcing a general format error.
Snipuzz to generate and send a large number of probe messages On the other hand, Nemesys uses the distribution of value changes
to determine message snippets. Therefore, in the first 10 minutes, in the protocol to determine the boundary of different data domains,
Snipuzz was still exploring the response category of the first few and to achieve the semantic segmentation of a message. The advan-
messages, so it did not exceed IotFuzzer. tage of this method is that it does not require any other additional
information, such as grammar rules or a large number of training
data sets in addition to the message itself.
5.4 Assessment on Message Snippet Inference The average similarity result of Nemesys, 64.5%, is lower than
Among all strategies, Snipuzz and Nemesys utilize semantic seg- the Snipuzz result. Given the example shown in Table 5, when
mentation, to assess their performance of message snippet inference. segmenting messages in a format requires restricted syntax, such as
We compare the snippets they produce during the fuzzing process Json and XML, Nemesys can achieve a good semantic segmentation
with the grammar rules defined in API documents. Specifically, for performance, because the placeholders usually use symbols unusu-
some mature and popular languages, such as JSON, we establish ally used in data domains. This distribution of byte value enables
the grammar rules as per their standard syntax; for custom for- Nemesys to effectively find the boundaries between data domains.
mats, such as strings or custom bytes, we refer to the official API However, in IoT devices, customized formats are prevalent. For
documents and define the grammar rules based on the instructions. example, the smart bulb BR30 uses custom bytes as a means of com-
Equation (2) quantifies the quality of snippet inference, and munication, where each byte corresponds to a special meaning (i.e.,
Similarity indicates the percentage of correctly categorized bytes "0x61" represents "CHANGE_MODE" and "0x0f" represents "TRUE").
in a snippet-determined message, 𝑚, compared with the ground In such cases, the value distribution of characters can no longer be
truth, 𝑔, manually extracted from the grammar rules. used as a guidance for the data domain determination, and thus the
message segmentation determined by Nemesys is error-prone.
𝑐𝑜𝑢𝑛𝑡 [𝑐𝑎𝑡𝑒 (𝑚) ⊕ 𝑐𝑎𝑡𝑒 (𝑔)]
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑚) = 1 − , (2)
𝑙𝑒𝑛(𝑚) 6 DISCUSSION AND LIMITATIONS
where 𝑐𝑎𝑡𝑒 () returns the category of each message byte in a series Snipuzz has successfully examined 20 different devices and exposed
of “0” and “1” bits, 𝑐𝑜𝑢𝑛𝑡 () counts the number of mis-categorized security vulnerabilities on five of them. However, there are still
bytes, and 𝑙𝑒𝑛() represents the length of a message. Note that in a some limitations relevant to efficiency and scalability of Snipuzz.
ground truth message, “0” indicates the non-data domain (marked We discuss the limitations in this section and propose solutions as
blue in Table 5), while “1” indicates the data domain (marked red future work.
in Table 5). Therefore, the ⊕ is the bitwise 𝑋𝑂𝑅 operation. Scalability and manual effort. IoT devices can be tested by Snipuzz
In addition, followed by Equation (2), we compute the average if the valid network packets are known. In our prototype, we capture
similarity of the snippets (or data domain) determined by Snipuzz communication packets by running API programs and monitoring
and Nemesys for all the 235 messages obtained from experiments. network communication (Note that packets can also be obtained by
Note that during the calculation of the average similarity, for each statically analyzing API programs without running them). In the
message, if there are multiple snippet sets determined, we will select absence of API programs or documents, we can recover the message
the snippet inference with the highest similarity value; therefore formats from the official Apps of IoT devices through decompilation
a snippet could reflect the grammatical rules as many as possible and taint analysis. Or as a second way, we can solve this problem by
and maximize the performance of message semantic segmentation. intercepting the communication between APPs and IoT devices, and
The average similarity result of Snipuzz, 87.1%, indicates that, then recovering message formats from the captured packets. The
by applying snippet inference based on the hierarchical cluster- second way is feasible and we have experimented it in TP-Link’s
ing approach, Snipuzz can effectively find the grammatical rules IoT control APP KASA, which can be further developed for more
hidden in the message. Ideally, in Snipuzz, the merging of clusters IoT devices. However, both methods could introduce overhead and
removes the influence caused by the randomness in responses and involve manual effort.
by the replying message mechanism itself. Therefore, the message Recall in Section 4.1 that Snipuzz requires manual effort, which
snippets will conform to the grammatical rules gradually, which takes 5 man-hours per device to collect the initial seeds during the
leads Snipuzz to a higher similarity result. message sequence acquisition phase. The manual effort is mainly
However, we also found some differences between the snippet referred to cleaning the packets from the API programs that are
inference method and the grammatical rules in some results. For obtained from publicly available first- and third-party resources. To
example, given the example shown in Table 5, the snippet inference mitigate this limitation when applying Snipuzz to IoT devices, tech-
method combines the strings belonging to the data domain in the niques such as crawlers could be used to automatically gather API
11
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang

programs associated with the IoT devices in the future work. More- feedback mechanism improves the effectiveness of bug discovery.
over, the process of cleaning the packets could also be improved For instance, IoTFuzzer [9] obtains the data domain, on which
by pre-processing keywords through scripts to achieve automatic IoTFuzzer performs blind mutation. Thus, IoTFuzzer lacks the
collection of communication packages. knowledge of the quality of the generated inputs, resulting in a
Threats to validity. As Snipuzz collects initial message sequences waste of resource on the low-quality inputs. There are also several
via API programs and network sniffers, the first threat comes from dynamic analysis approaches focusing on the networking modules
the absence of API programs. In this case, we can recover message of IoT devices. For example, SPFuzz defines a new language for
formats based on the companion apps of IoT devices (similar to IoT- describing protocol specifications, protocol state transitions, and
Fuzzer) but may need more manual efforts. Second, the encryption their correlations [37]. SPFuzz can ensure the correctness of the
in messages decreases the effectiveness of snippet determination message format in the conversation state and the dependence of
because the semantic information could be corrupted. A potential the protocol. IoTHunter is a grey-box approach to fuzz the state
solution to the encryption issue is to integrate decryption mod- protocol of IoT firmware [47]. IoTHunter can constantly switch
ules into Snipuzz. Finally, the code coverage of firmware could the protocol state to perform a feedback-based exploration of IoT
be subject to the accessibility of API programs, since Snipuzz can devices. In a recent example, AFLnet acts as a client and continu-
only examine the functionalities that are covered in API programs. ously replays the variation of the original message sequence sent
Recombining the message snippets from different seeds to generate to target (i.e., server or device) [33]. AFLnet uses response codes,
new valid inputs could mitigate this limitation. which are the numbers indicating the execution states, to identify
the execution status of targets and explore more regions of their
Encryption. During Message Acquisition, we noticed that encryp-
networking modules.
tion is used to protect communication in some API programs. En-
Another research line for dynamic analysis of IoT devices is the
cryption has no effect on the message sequence mutation process,
usage of emulators. The disadvantages of emulation are the heavy
but the snippet determination process basically fails. Because the
engineering efforts and the requisite of firmware, although the emu-
encryption algorithm disrupts the original format of the message,
lation of IoT firmware can analyze more thoroughly than black-box
the segmentation of snippets is sensitive to the position of the char-
fuzzing. Two major challenges for emulation of IoT firmware are the
acter. Moreover, because the response messages from the device
scalability and throughput. Therefore, the efforts in improving the
are also encrypted, Snipuzz cannot get useful feedback from them.
performance of emulation include full-system emulation [8, 27], im-
Similarly, the encryption and decryption algorithms in the API
provement of emulation success rates [21], hardware-independent
program can be integrated into the Snipuzz module to address this
emulation [17, 38], and combination of user- and system-mode em-
limitation, or the difficulties caused by encryption can be addressed
ulation [51]. Based on the emulation, fuzzing can be integrated into
from the perspective of mutation strategy design.
those frameworks and can hunter defects in firmware [38, 51].
Coverage. The code coverage of firmware explored by Snipuzz Static analysis of firmware is the complementary approach of dy-
depends on the API programs. For example, if the API programs namic analysis. Semantic similarity is one of the major techniques
of a bulb only support the functionality of turning on power, it that make static analysis successful. Researchers analyze seman-
is almost impossible to explore the functionality of adjusting the tic similarity via comparison of files and modules [13], Control
brightness via mutating the messages captured during the power Flow Graphs (CFGs) [14], parser and complex processing logic [11],
turned on. In the future work, without the support of grammar, we and multi-binary interactions [35]. There are also many similarity-
will consider recombining the message snippets to try to generate based approaches that can detect vulnerabilities across different
new valid inputs. This method can help explore more firmware firmware architectures. They usually extract various architecture-
execution coverage in addition to the original inputs provided. independent features from firmware for each node in a CFG to
Requirements on detailed responses. The detection effective- represent a function, and then check whether two functions’ CFG
ness of Snipuzz depends on the quality of message snippets which representations are similar [15, 32].
is contingent on how much information could be obtained from
the responses of IoT devices. To put differently, if the IoT device
8 CONCLUSION
does not provide responses that are detailed enough, for example
reporting all the errors with a uniform message, it could be hard for In this paper we have presented a black-box fuzzing framework
Snipuzz to determine the message snippets. Fortunately, in many Snipuzz designed for detecting vulnerabilities hiding in IoT de-
IoT devices, advanced error descriptions could be obtained in debug vices. Different from other black-box network fuzz testing, Snipuzz
mode which will significantly improve the determination process uses the response messages returned by the device to establish a
of message snippets in Snipuzz. feedback mechanism for guiding the fuzzing mutation process. In
addition, Snipuzz infers the grammatical role of each byte in the
messages based on the responses from the device, so that Snipuzz
can generate test cases that meet the device’s grammar without the
7 RELATED WORK guidance of grammatical rules. We have used 20 consumer-grade
Our Snipuzz performs in a black-box manner for detecting vulner- IoT devices from the market to test Snipuzz, and it has successfully
abilities in IoT devices. Unlike existing black-box fuzzing for IoT found 5 zero-day vulnerabilities on 5 different devices.
devices, which blindly mutates messages, Snipuzz optimizes the
mutation process of black-box fuzzing via utilizing responses. This
12
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea

REFERENCES embedded devices. In NDSS 2018, Network and Distributed Systems Security Sym-
[1] 2020. The Three Software Stacks Required for IoT Architectures. IoT Eclipse posium.
(White Paper) (2020). [28] Lindsey O’Donnell. 2020. More than half of IoT devices vulnerable to severe attacks.
[2] Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Technical Report. ThreatPost.
Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for deep [29] Sebastian Österlund, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2020.
bugs with grammars.. In The Network and Distributed System Security Symposium ParmeSan: Sanitizer-guided greybox fuzzing. In 29th USENIX Security Symposium
(NDSS). (USENIX Security 20).
[3] I. Ashraf, X. Ma, B. Jiang, and W. K. Chan. 2020. GasFuzzer: Fuzzing ethereum [30] Peachtech. 2021. PEACH: The PEACH fuzzer platform. https://www.peach.tech/
smart contract binaries to expose gas-oriented exception security vulnerabilities. products/peach-fuzzer/ Accessed: 2021-01.
IEEE Access (2020). [31] Joshua Pereyda. 2017. boofuzz: Network protocol fuzzing for humans. https:
[4] Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. //boofuzz.readthedocs.io/en/stable/.
2017. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference [32] Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten
on Computer and Communications Security. Holz. 2015. Cross-architecture bug search in binary executables. In 2015 IEEE
[5] Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage- Symposium on Security and Privacy (SP).
based greybox fuzzing as markov chain. In Proceedings of the 2016 ACM SIGSAC [33] Van-Thuan Pham, Marcel Böhme, and Abhik Roychoudhury. 2020. AFLNET:
Conference on Computer and Communications Security. A greybox fuzzer for network protocols. In IEEE International Conference on
[6] Kali Bot. 2019. bed. https://gitlab.com/kalilinux/packages/bed. Software Testing, Verification and Validation (ICST) 2020.
[7] Z. Berkay Celik, Patrick McDaniel, and Gang Tan. 2018. Soteria: Automated [34] Van-Thuan Pham, Marcel Böhme, Andrew Edward Santosa, Alexandru Razvan
IoT safety and security analysis. In 2018 USENIX Annual Technical Conference Caciulescu, and Abhik Roychoudhury. 2019. Smart greybox fuzzing. IEEE Trans-
(USENIX ATC 18). actions on Software Engineering (2019).
[8] Daming D Chen, Maverick Woo, David Brumley, and Manuel Egele. 2016. Towards [35] Nilo Redini, Aravind Machiry, Ruoyu Wang, Chad Spensky, Andrea Continella,
automated dynamic analysis for Linux-based embedded firmware. In The Network Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2020. Karonte:
and Distributed System Security Symposium (NDSS). Detecting insecure multi-binary interactions in embedded firmware. In 2020 IEEE
[9] Jiongyi Chen, Wenrui Diao, Qingchuan Zhao, Chaoshun Zuo, Zhiqiang Lin, Symposium on Security and Privacy (SP).
XiaoFeng Wang, Wing Cheong Lau, Menghan Sun, Ronghai Yang, and Kehuan [36] Sergej Schumilo, Cornelius Aschermann, Robert Gawlik, Sebastian Schinzel, and
Zhang. 2018. IOTFUZZER: Discovering memory corruptions in IoT through Thorsten Holz. 2017. kAFL: Hardware-assisted feedback fuzzing for OS Kernels.
app-based fuzzing. In The Network and Distributed System Security Symposium In 26th USENIX Security Symposium (USENIX Security 17).
(NDSS). [37] Congxi Song, Bo Yu, Xu Zhou, and Qiang Yang. 2019. SPFuzz: a hierarchical
[10] Abraham Clements, Eric Gustafson, Tobias Scharnowski, Paul Grosen, David scheduling framework for stateful network protocol fuzzing. IEEE Access (2019).
Fritz, Christopher Kruegel, Giovanni Vigna, Saurabh Bagchi, and Mathias Payer. [38] Prashast Srivastava, Hui Peng, Jiahao Li, Hamed Okhravi, Howard Shrobe, and
2020. HALucinator: Firmware re-hosting through abstraction layer emulation. Mathias Payer. 2019. FirmFuzz: automated IoT firmware introspection and analy-
In Proceedings of the 29th USENIX Security Symposium (USENIX ’20). sis. In Proceedings of the 2nd International ACM Workshop on Security and Privacy
[11] Lucian Cojocar, Jonas Zaddach, Roel Verdult, Herbert Bos, Aurélien Francillon, for the Internet-of-Things.
and Davide Balzarotti. 2015. PIE: Parser identification in embedded systems. [39] Liam Tung. 2017. IoT devices will outnumber the world’s population this year for
[12] Jake Corina, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Shuang the first time. Technical Report. ZDNet.
Hao, Christopher Kruegel, and Giovanni Vigna. 2017. Difuze: Interface aware [40] Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven
fuzzing for kernel drivers. In Proceedings of the 2017 ACM SIGSAC Conference on seed generation for fuzzing. In 2017 IEEE Symposium on Security and Privacy
Computer and Communications Security. (SP).
[13] Andrei Costin, Jonas Zaddach, Aurélien Francillon, and Davide Balzarotti. 2014. [41] Yanhao Wang, Xiangkun Jia, Yuwei Liu, Kyle Zeng, Tiffany Bao, Dinghao Wu, and
A Large-Scale Analysis of the Security of Embedded Firmwares. In 23rd USENIX Purui Su. 2020. Not all coverage measurements are equal: Fuzzing by coverage
Security Symposium (USENIX Security 14). accounting for input prioritization. In The Network and Distributed System Security
[14] Thomas Dullien and Rolf Rolles. 2005. Graph-based comparison of executable Symposium (NDSS).
objects (english version). Journal of Computer Virology and Hacking Techniques [42] Wikipedia. 2021. Edit distance. https://en.wikipedia.org/wiki/Edit_distance.
(2005). [43] Wikipedia. 2021. Hierarchical clustering. https://en.wikipedia.org/wiki/
[15] Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Hierarchical_clustering.
Efficient cross-architecture identification of bugs in binary code. In Network and [44] wireghoul. 2019. Doona. https://github.com/wireghoul/doona.
Distributed Systems Security (NDSS). [45] wireshark. 2020. About wireshark. https://www.wireshark.org/about.html.
[16] Pwnie Express. 2020. What makes IoT so vulnerable to attack? Technical Report. [46] Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017.
Outpost24. Neural network-based graph embedding for cross-platform binary code similarity
[17] Bo Feng, Alejandro Mera, and Long Lu. 2020. P2IM: Scalable and hardware- detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and
independent firmware testing via automatic peripheral interface modeling. In Communications Security.
29th {USENIX } Security Symposium ( {USENIX } Security 20). [47] Bo Yu, Pengfei Wang, Tai Yue, and Yong Tang. 2019. Poster: Fuzzing iot firmware
[18] Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng via multi-stage message generation. In Proceedings of the 2019 ACM SIGSAC
Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings Conference on Computer and Communications Security.
of the 2016 ACM SIGSAC Conference on Computer and Communications Security. [48] Y. Yu, Z. Chen, S. Gan, and X. Wang. 2020. SGPFuzzer: A state-driven smart
[19] Fitblip. 2019. Sulley. https://github.com/OpenRCE/sulley. graybox protocol fuzzer for network protocol implementations. IEEE Access
[20] Eric Gustafson, Marius Muench, Chad Spensky, Nilo Redini, Aravind Machiry, (2020).
Yanick Fratantonio, Davide Balzarotti, Aurélien Francillon, Yung Ryn Choe, [49] Tai Yue, Pengfei Wang, Yong Tang, Enze Wang, Bo Yu, Kai Lu, and Xu Zhou. 2020.
Christophe Kruegel, et al. 2019. Toward the analysis of embedded firmware EcoFuzz: Adaptive energy-saving greybox fuzzing as a variant of the adversarial
through automated re-hosting. In 22nd International Symposium on Research in multi-armed bandit. In 29th USENIX Security Symposium (USENIX Security 20).
Attacks, Intrusions and Defenses ( {RAID } 2019). [50] Jonas Zaddach, Luca Bruno, Aurelien Francillon, Davide Balzarotti, et al. 2014.
[21] Mingeun Kim, Dongkwan Kim, Eunsoo Kim, Suryeon Kim, Yeongjin Jang, and AVATAR: A framework to support dynamic security analysis of embedded sys-
Yongdae Kim. 2020. FirmAE: Towards large-scale emulation of IoT firmware for tems’ firmwares. In The Network and Distributed System Security Symposium
dynamic analysis. In Annual Computer Security Applications Conference. (NDSS).
[22] Stephan Kleber, Henning Kopp, and Frank Kargl. 2018. NEMESYS: Network [51] Yaowen Zheng, Ali Davanian, Heng Yin, Chengyu Song, Hongsong Zhu, and
message syntax reverse engineering by analysis of the intrinsic structure of indi- Limin Sun. 2019. FIRM-AFL: High-throughput greybox fuzzing of IoT firmware
vidual messages. In 12th {USENIX } Workshop on Offensive Technologies ( {WOOT } via augmented process emulation. In 28th USENIX Security Symposium (USENIX
18). Security 19).
[23] Karla Lant. 2017. By 2020, there will be 4 devices for every human on earth.
Futurism (2017).
[24] lcamtuf. 2017. AFL. https://lcamtuf.coredump.cx/afl/. APPENDIX
[25] Trend Micro. 2020. Mirai botnet exploit weaponized to attack IoT devices via
CVE-2020-5902. Technical Report. Security Intelligence Blog. A RUNTIME PERFORMANCE
[26] Trend Micro. 2020. Smart yet flawed: IoT device vulnerabilities explained. Technical
Report. Security News. Fig 6 shows the run-time performance of Snipuzz and other seven
[27] Marius Muench, Jan Stijohann, Frank Kargl, Aurélien Francillon, and Davide baselines during the first 10 minutes. In most benchmarks, Snipuzz
Balzarotti. 2018. What you corrupt is not what you crash: Challenges in fuzzing
discovers the most number of categories. Since Snipuzz spends
13
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang

Figure 6: Runtime performance. The number of categories discovered in 10 minutes on all the 20 IoT devices. Snipuzz performs
the best on 19 devices.

time on snippet determination, it discovers less categories than

IoTFuzzer in the beginning. However, IoTFuzzer quickly reaches
its peak and cannot discover new categories. On the contrary, after
the stage of snippet determination, Snipuzz gradually discovers
more categories than IoTFuzzer and other baselines. More detailed
analysis can be found in Section 5.3.

B MUTATION EFFECTIVENESS: A CASE

STUDY
The HS100 and HS110 manufactured by TP-Link are 2 classic market
consumer-grade smart plugs. In the work by Chen et al. [9], they use
HS110 with firmware version 1.3.1 to test IoTFuzzer. The results Figure 7: An example of vulnerability triggering
of their experiment show that IoTFuzzer triggered a vulnerability
in the device by mutating the data domain in a message (changing
“light” to 0).
However, in the updated version of the firmware (1.5.2), IoT-
Fuzzer did not find any vulnerabilities but Snipuzz did. Figure 7
shows an example of the original input message and the mutated
snippets (inside the red frame) in the mutated message that can trig-
Figure 8: A vulnerable code snippet from HS110 firmware.
ger the vulnerability. In this case, Snipuzz triggered a vulnerability
related to firmware input by breaking the JSON syntax structure This may cause errors about parsing messages or passing parame-
in the message. The intention of the original message is to change ters incorrectly handled by the firmware and, consequently, crashes
some attributes (e.g., ‘stime_opt’ & ‘wday’) in a rule (inferred by the device.
‘edit_rule’). In the mutated message, Snipuzz randomly deleted To further determine the root cause of the crash, we obtained
some contents (inside the blue frame), which break the JSON syntax. the firmware source code. Figure 8 shows a code snippet from the
14
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea

firmware, using cJSON,2 a popular open-source lightweight JSON Due to the design of IoTFuzzer, the fuzzing based on grammati-
parser (5.4k stars in GitHub), to interpret input message fragments. cal rules will offer priority to satisfying the grammar requirements
The jalr instruction will save the result of cJson_GetObjectItem in the mutation process in order not to be rejected by the firmware
in $t9 and jump to this address unconditionally (see line 3 in Fig- grammar detector. The advantage of this is to ensure that each test
ure 8), which means the firmware will pick the value corresponding case can reach the functional execution part of the firmware. How-
to ‘schedule’. In the original message, the value corresponding to ever, in this case, the test range of fuzzing based on grammatical
‘schedule’ is a JSON object headed by ‘edit_rule’ (from line 4 rules cannot cover the firmware sanitising part.
to line 16). Note that the aforementioned snippet-based mutation To conclude, the root cause of the crash has two factors: 1) the
strategy implemented in Snipuzz is able to break the syntax struc- validation of message syntax heavily relies on a third-party library;
ture and mutate both on data and non-data domains. Interestingly, 2) the firmware does not correctly handle the null pointer exception
although the removing of two left curly braces breaks the JSON caused by data type mismatch. Although it is not reasonable to
syntax, it is not recognized by cJSON parser, so the mutated mes- require a vendor to develop products purely from scratch, we argue
sage successfully bypasses the syntax validation and enters the that thorough testing and validation on the open-source library
functional code in firmware. When the firmware tries to access the are essential. Considering the complexity of IoT firmware testing,
successor JSON object in ‘schedule’, i.e., the object starts with a lightweight and effective black-box vulnerability detection tool,
‘edit_rule’, since the corresponding value is no more a JSON such as Snipuzz, is a pressing need.
object, but an array, a null pointer exception is triggered. 2 https://github.com/DaveGamble/cJSON

Studio 5000 Logix Designer - 37.00.00 (Released 9 - 2024)
No ratings yet
Studio 5000 Logix Designer - 37.00.00 (Released 9 - 2024)
24 pages
Saes o 313
100% (1)
Saes o 313
10 pages
IT Agile Project Manager Resume Example (FINRA) - Gaithersburg, Maryland
No ratings yet
IT Agile Project Manager Resume Example (FINRA) - Gaithersburg, Maryland
5 pages
2b22fda2-e00c-441a-b7a6-e61c0d12de57
No ratings yet
2b22fda2-e00c-441a-b7a6-e61c0d12de57
14 pages
applsci-12-06429
No ratings yet
applsci-12-06429
13 pages
Sensors 23 04117 v2
No ratings yet
Sensors 23 04117 v2
46 pages
sensors-23-06067-v2
No ratings yet
sensors-23-06067-v2
53 pages
Iotriskanalyzer: A Probabilistic Model Checking Based Framework For Formal Risk Analytics of The Internet of Things
No ratings yet
Iotriskanalyzer: A Probabilistic Model Checking Based Framework For Formal Risk Analytics of The Internet of Things
12 pages
MINOR PROJECT
No ratings yet
MINOR PROJECT
10 pages
FEMI SEMINAR REPORT
No ratings yet
FEMI SEMINAR REPORT
11 pages
Internet On things-WPS Office
No ratings yet
Internet On things-WPS Office
32 pages
Algorithms 15 00239
No ratings yet
Algorithms 15 00239
25 pages
Computers & Security: Omnia Abu Waraga, Meriem Bettayeb, Qassim Nasir, Manar Abu Talib
No ratings yet
Computers & Security: Omnia Abu Waraga, Meriem Bettayeb, Qassim Nasir, Manar Abu Talib
17 pages
CICIoT2023 A Real-Time Dataset and Benchmark For L
No ratings yet
CICIoT2023 A Real-Time Dataset and Benchmark For L
22 pages
Gyal Gen
No ratings yet
Gyal Gen
32 pages
Needed Paper
No ratings yet
Needed Paper
11 pages
Narayan Edit
No ratings yet
Narayan Edit
30 pages
Anjana Edit
No ratings yet
Anjana Edit
28 pages
Sensors 23 05941 With Cover
No ratings yet
Sensors 23 05941 With Cover
27 pages
Trusted Edge Computing System Based On Intelligent Risk Detection For Smart IoT
No ratings yet
Trusted Edge Computing System Based On Intelligent Risk Detection For Smart IoT
10 pages
Internet of Things Security and Forensics Concern and Challenges
No ratings yet
Internet of Things Security and Forensics Concern and Challenges
6 pages
Keysight IoT Security Assessment
No ratings yet
Keysight IoT Security Assessment
13 pages
Sensors 22 00567
No ratings yet
Sensors 22 00567
27 pages
Security Concerns
No ratings yet
Security Concerns
38 pages
IoT Dataset 2023
No ratings yet
IoT Dataset 2023
23 pages
Information Security Fundamental
No ratings yet
Information Security Fundamental
37 pages
Wang2017 - 0009835
No ratings yet
Wang2017 - 0009835
11 pages
Decentralized Actionable Cyber Threat Intelligence For Networks and The Internet of Things
No ratings yet
Decentralized Actionable Cyber Threat Intelligence For Networks and The Internet of Things
17 pages
Harbinger A Toolkit For Vulnerability Testing of IoT Devices and Web Clients
No ratings yet
Harbinger A Toolkit For Vulnerability Testing of IoT Devices and Web Clients
9 pages
Electronics 11 02023
No ratings yet
Electronics 11 02023
24 pages
Edge IIoTset DatasetFL
No ratings yet
Edge IIoTset DatasetFL
25 pages
[0] GEN Proposed Paper
No ratings yet
[0] GEN Proposed Paper
9 pages
CYBERSECURITY IN THE INTERNET OF THINGS
No ratings yet
CYBERSECURITY IN THE INTERNET OF THINGS
17 pages
EdgeIIOT Dataset Oct2023
No ratings yet
EdgeIIOT Dataset Oct2023
25 pages
Docnumero 9
No ratings yet
Docnumero 9
21 pages
Model-Based Security Testing in IoT Systems - A Rapid Review
No ratings yet
Model-Based Security Testing in IoT Systems - A Rapid Review
16 pages
04 Sep 2020 - Iotsecurity
No ratings yet
04 Sep 2020 - Iotsecurity
37 pages
Securing IoT Devices Against Exploitation for Cyber Attacks through Detection and Mitigation Strategies Case Study of Public Institutions in Rwanda
No ratings yet
Securing IoT Devices Against Exploitation for Cyber Attacks through Detection and Mitigation Strategies Case Study of Public Institutions in Rwanda
14 pages
WoS Paper1-River Publisher - Keerthi Vardhan
No ratings yet
WoS Paper1-River Publisher - Keerthi Vardhan
19 pages
EasyChair Preprint 11840
No ratings yet
EasyChair Preprint 11840
10 pages
Hids by Signature For Embedded Devices in Iot Networks
No ratings yet
Hids by Signature For Embedded Devices in Iot Networks
8 pages
Security
No ratings yet
Security
22 pages
Federated Learning-Based Anomaly Detection For IoT Security Attacks
No ratings yet
Federated Learning-Based Anomaly Detection For IoT Security Attacks
10 pages
Sensors 23 05941 v2
No ratings yet
Sensors 23 05941 v2
26 pages
SMART DEFENSES: MACHINE LEARNING-BASED PROACTIVE CYBER ATTACK DETECTION IN IOT SYSTEMS
No ratings yet
SMART DEFENSES: MACHINE LEARNING-BASED PROACTIVE CYBER ATTACK DETECTION IN IOT SYSTEMS
5 pages
Iot Security: Narudom Roongsiriwong, Cissp
No ratings yet
Iot Security: Narudom Roongsiriwong, Cissp
42 pages
IoT Network Attack Detection Using Supervised Machine Learning
No ratings yet
IoT Network Attack Detection Using Supervised Machine Learning
15 pages
CSUR2021 EmbeddedSystemsVul
No ratings yet
CSUR2021 EmbeddedSystemsVul
37 pages
Iot Security Techniques Based On Machine Learning
No ratings yet
Iot Security Techniques Based On Machine Learning
20 pages
30 - Ciciot2023dataset
No ratings yet
30 - Ciciot2023dataset
22 pages
cybersecurity in the Age of IoT Challenges and Strategies
No ratings yet
cybersecurity in the Age of IoT Challenges and Strategies
4 pages
s43926-025-00099-4
No ratings yet
s43926-025-00099-4
33 pages
Role of Artificial Intelligence in The Internet of
No ratings yet
Role of Artificial Intelligence in The Internet of
15 pages
Augmenting IoT Intrusion Detection Syste
No ratings yet
Augmenting IoT Intrusion Detection Syste
24 pages
Role of AI in IoT Devices
No ratings yet
Role of AI in IoT Devices
14 pages
A Machine Learning Security Framework For Iot Systems
No ratings yet
A Machine Learning Security Framework For Iot Systems
12 pages
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
From Everand
Botnet Attack Detection in the Internet of Things Using Selected Learning Algorithms: A Research Study on Securing IoT Against Cyber Threats Using Machine Learning
Bolakale Aremu
5/5 (1)
10 1109@jiot 2019 2926365
No ratings yet
10 1109@jiot 2019 2926365
13 pages
Gurunath 2018
No ratings yet
Gurunath 2018
4 pages
Caveat (IoT) Emptor : Towards Transparency of IoT Device Presence
No ratings yet
Caveat (IoT) Emptor : Towards Transparency of IoT Device Presence
15 pages
Intrusion Detection For Internet of Things
100% (1)
Intrusion Detection For Internet of Things
5 pages
Ethical Hacking For IoT
No ratings yet
Ethical Hacking For IoT
29 pages
Cyber Security and IoT - The Future
From Everand
Cyber Security and IoT - The Future
Mark Hayward
No ratings yet
Student Quick Start Guide v2.4
No ratings yet
Student Quick Start Guide v2.4
16 pages
IBM z13 Overview For DFW System Z User Group - 2015mar
No ratings yet
IBM z13 Overview For DFW System Z User Group - 2015mar
107 pages
Inside Microsoft SharePoint 2013
No ratings yet
Inside Microsoft SharePoint 2013
1 page
Mohammad Saifullah
No ratings yet
Mohammad Saifullah
1 page
Understanding Colour Assignment in ASYCUDAWorld
100% (1)
Understanding Colour Assignment in ASYCUDAWorld
3 pages
HSKK 4
No ratings yet
HSKK 4
1 page
Auto Cart Robot
No ratings yet
Auto Cart Robot
18 pages
European Consumer Trends 2016 Mintel Hi Res
No ratings yet
European Consumer Trends 2016 Mintel Hi Res
53 pages
Cyber Security
No ratings yet
Cyber Security
36 pages
21 23 Answer
No ratings yet
21 23 Answer
16 pages
Advanced Product Quality Planning PDF
No ratings yet
Advanced Product Quality Planning PDF
35 pages
Wikibon 2021 Cloud Database Platform Ratings
No ratings yet
Wikibon 2021 Cloud Database Platform Ratings
12 pages
Remove Windows10 Bloat - Bat
No ratings yet
Remove Windows10 Bloat - Bat
4 pages
Emtech Q2 Week 1
0% (1)
Emtech Q2 Week 1
5 pages
YUMI Readme
No ratings yet
YUMI Readme
5 pages
Compiler Design
No ratings yet
Compiler Design
188 pages
Database Assignment
No ratings yet
Database Assignment
4 pages
Turn Your Mic Jack Into A Headphone Jack
No ratings yet
Turn Your Mic Jack Into A Headphone Jack
25 pages
MIS Assignment Question - Revised
No ratings yet
MIS Assignment Question - Revised
3 pages
Using UML Activity Diagrams For The Process View: Ben Lieberman
No ratings yet
Using UML Activity Diagrams For The Process View: Ben Lieberman
10 pages
Create Database in Amibroker
100% (1)
Create Database in Amibroker
5 pages
Navaneethan
No ratings yet
Navaneethan
37 pages
Akshaya Project123
0% (1)
Akshaya Project123
13 pages
Hierarchical Inheritance
No ratings yet
Hierarchical Inheritance
6 pages
College Tie-Up Proposal
No ratings yet
College Tie-Up Proposal
13 pages
Vocabular 8th Level
No ratings yet
Vocabular 8th Level
12 pages
Coupling Facility Configuration Options
No ratings yet
Coupling Facility Configuration Options
83 pages

snipuzz

Uploaded by

snipuzz

Uploaded by

Snipuzz: Black-box Fuzzing of IoT Firmware via

Message Snippet Inference

Xiaotao Feng∗ , Ruoxi Sun† , Xiaogang Zhu∗‡ , Minhui Xue† ,

Table 2: Examples of probe messages and corresponding response messages.

4.2.2 Hierarchical Clustering. Although Snipuzz utilizes similarity

Figure 4: An example of hierarchical clustering. 4.5 Implementation

of responses on each device. The limitation of category discovery

time on snippet determination, it discovers less categories than

B MUTATION EFFECTIVENESS: A CASE

You might also like