snipuzz
snipuzz
ABSTRACT Virtual Event, South Korea. ACM, New York, NY, USA, 15 pages. https:
The proliferation of Internet of Things (IoT) devices has made peo- //doi.org/10.1145/1122445.1122456
ple’s lives more convenient, but it has also raised many security
concerns. Due to the difficulty of obtaining and emulating IoT 1 INTRODUCTION
firmware, in the absence of internal execution information, black- The Internet of Things (IoT) refers to the billions of physical de-
box fuzzing of IoT devices has become a viable option. However, vices around the world which are now connected to the Internet, all
existing black-box fuzzers cannot form effective mutation optimiza- collecting and sharing data. As early as 2017, IoT devices have out-
tion mechanisms to guide their testing processes, mainly due to the numbered the world’s population [39], and by 2020, every person on
lack of feedback. In addition, because of the prevalent use of various this planet has four IoT devices on average [23]. While these devices
and non-standard communication message formats in IoT devices, enrich our lives and industries, unfortunately, they also introduce
it is difficult or even impossible to apply existing grammar-based blind spots and security risks in the form of vulnerabilities. We take
fuzzing strategies. Therefore, an efficient fuzzing approach with Mirai [25] as an example. Mirai is one of the most prominent types
syntax inference is required in the IoT fuzzing domain. of IoT botnet malware. In 2016, Mirai took down widely-used web-
To address these critical problems, we propose a novel automatic sites in a distributed denial of service (DDoS) campaign consisting
black-box fuzzing for IoT firmware, termed Snipuzz. Snipuzz runs of thousands of compromised household IoT devices. In the case
as a client communicating with the devices and infers message of Mirai, attackers exploited vulnerabilities to target IoT devices
snippets for mutation based on the responses. Each snippet refers themselves and then weaponized the devices for larger campaigns
to a block of consecutive bytes that reflect the approximate code or spreading malware to the network. In fact, attackers can also use
coverage in fuzzing. This mutation strategy based on message snip- vulnerable devices for lateral movement, allowing them to reach crit-
pets considerably narrows down the search space to change the ical targets. For example, in the work-from-home scenarios during
probing messages. We compared Snipuzz with four state-of-the- COVID-19, Trend Micro has reported that, introducing vulnerable
art IoT fuzzing approaches, i.e., IoTFuzzer, BooFuzz, Doona, and IoT devices to the household will expose employees to malware and
Nemesys. Snipuzz not only inherits the advantages of app-based attacks that could slip into a company’s network [26]. Considering
fuzzing (e.g., IoTFuzzer), but also utilizes communication responses the ubiquity of IoT devices, we believe that these known security
to perform efficient mutation. Furthermore, Snipuzz is lightweight incidents and risky scenarios are nothing but a tip of the iceberg.
as its execution does not rely on any prerequisite operations, such IoT vulnerabilities are normally about the implementation flaws
as reverse engineering of apps. We also evaluated Snipuzz on 20 within a device’s firmware. To launch new products as soon as
popular real-world IoT devices. Our results show that Snipuzz could possible, developers always tend to use open-source components in
identify 5 zero-day vulnerabilities, and 3 of them could be exposed firmware development without good update plans [1]. This sacri-
only by Snipuzz. All the newly discovered vulnerabilities have been fices the security of IoT devices and exposes them to vulnerabilities
confirmed by their vendors. that security teams cannot remedy quickly. Even if vendors plan to
ACM Reference Format: fix the vulnerabilities in their products, the over-the-air patching is
Xiaotao Feng, Ruoxi Sun, Xiaogang Zhu, Minhui Xue, Sheng Wen, Dongxi usually infeasible because IoT devices do not have reliable network
Liu, Surya Nepal, and Yang Xiang. 2021. Snipuzz: Black-box Fuzzing of IoT connectivity [16]. As a result, half of the IoT devices in the market
Firmware via Message Snippet Inference. In 2021 ACM SIGSAC Conference were reported to have vulnerabilities [28].
on Computer and Communications Security (CCS ’21), November 14–19, 2021, It is hence crucial to discover such vulnerabilities and fix them
before an attacker does. However, most IoT software security tests
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed heavily rely on the assumption of device firmware availability.
for profit or commercial advantage and that copies bear this notice and the full citation In many cases, manufacturers tend not to release their product
on the first page. Copyrights for components of this work owned by others than ACM firmware and that makes various dynamic analysis methods based
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specific permission and/or a on code analysis [7, 13, 15, 18, 32, 46] (or emulation [8, 10, 20, 50, 51])
fee. Request permissions from [email protected]. difficult. Among the existing defense techniques, fuzz testing has
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea shown promises to overcome these issues and has been widely
© 2021 Association for Computing Machinery.
ACM ISBN 978-1-4503-XXXX-X/18/06. . . $15.00 used as an efficient approach in finding vulnerabilities. Moreover,
https://doi.org/10.1145/1122445.1122456 the ability of IoT devices to communicate with the outside world
1
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang
offers us a new option, and that is to test device firmware through Table 1: Format requirements of IoT Devices.
exchanging network messages. Therefore, an IoT fuzzer could be Firmware
# Device Type Vendor Model Format
designed to send random communication messages to the target de- Version
vice in order to detect if it shows any symptoms of malfunctioning. 1 Smart Bulb Yeelight YLDP05YL 1.4.2_0016 JSON
2 Smart Bulb Yeelight YLDP13YL 1.4.2_0016 JSON
Potential vulnerabilities could be exposed if crashes are triggered 3 Smart Bulb Philips A60 1.46.13_r26312 JSON
during execution or the device is pushed to send back abnormal 4 Smart Bulb LIFX Mini C v3.60 Custom Byte
5 Smart Bulb FloodLight BR30 35.V7.63.7189-A Custom Byte
messages. 6 Home Bridge Philips Hue 1935144040 JSON
However, using network communication to fuzz the firmware of 7 Home Bridge Alro Base Station 1.12.2.8_9_fc4b603 JSON
8 Smart Plug Tplink HS100 1.5.2 JSON
IoT devices is very challenging. Since obtaining internal execution 9 Smart Plug Tplink HS110 1.5.2 JSON∗
information from the device is not possible, most existing network 10 Smart Plug Belkin WeMo F7C027au 2.00.1821 SOAP
11 Smart Plug Meross MSS310 2.1.14 JSON∗
IoT fuzzers [9, 31, 44] work in a black-box manner. This makes opti- 12 Smart Plug Orvibo B25AUS v3.1.3 JSON
mizing the mutation strategies very difficult. Because the selection 13 Smart Plug Konke Mini US us1.1.0 String
14 Smart Plug Broadlink SP4L-AU v57209 Custom Byte
of mutated seeds is entirely random, existing black-box IoT fuzzing 15 Router Netgear R6400 1.0.1.46 SOAP∗
approaches could become very hard to handle, and sometimes, even 16 TA Assistant ZKteco WL10 ZLM-FX1-3.0.23 Custom Byte
17 Camera Alro Alro Pro 2 1.125.14.0_34_1189 JSON∗
become more like brute force crack testing. In addition, IoT devices 18 Camera Foscam F19821W 2.21.1.127 JSON∗
have strict grammatical specifications for inputs in communication. 19 NAS QNAP T-131P 4.3.6.0959 Key-value pairs
20 Universal Remote BroadLink RM mini 3 v44057 Custom Byte
Most of the messages that are generated by random mutation will ∗: have randomness in response.
break the syntax rules of the input, and will be quickly rejected
during syntax validation in the firmware before being executed. A same role in the message form the initial message snippets, which
grammar-based mutation strategy [2, 40] can effectively generate is the basic unit of mutation. Moreover, Snipuzz utilizes a hier-
messages that meet the input requirements though. This can be archical clustering strategy to optimize mutation strategies and
done by learning the syntax via documented grammatical specifica- reduce the misclassification of categories caused by randomness
tions or from a labeled training set. However, as shown in Table 1, in the response messages and the firmware’s internal mechanism.
many non-standard IoT device communication formats are being Therefore, Snipuzz, as a black-box fuzzer, can still effectively test
used in practice. Therefore, preparing enough learning materials the firmware of IoT devices without the support of grammatical
for grammar-based mutation strategies is a huge workload, which rules and internal execution information of the device.
makes the deployment of grammar-based IoT fuzzing difficult. Snipuzz resolves Challenge 1 by using responses as the guid-
Challenges. In this paper, we focus on detecting vulnerabilities ance to optimize the fuzzing process. Based on the responses, Snipuzz
in IoT firmware by sending messages to IoT devices. To design an designs a novel heuristic algorithm to initially infer the role of each
effective and efficient fuzzing method, several challenges have to byte in the message, which resolves Challenge 2. Snipuzz utilizes
be overcome. edit distance [42] and agglomerative hierarchical clustering [43]
to resolve Challenge 3. We summarize our main contributions as
• Challenge 1: Lack of a feedback mechanism. Without access to
follows:
firmware, it is nearly impossible to obtain the internal execu-
tion information from IoT device to guide the fuzzing process • Message snippet inference mechanism. The responses from
(as is done in most typical fuzzers). Therefore, we need a light- IoT devices are related to code execution path in firmware. Based
weight solution to obtain feedback from device, and optimize the on responses, we infer the relationship between message snip-
generation process. pets and code execution path in firmware. This novel mutation
• Challenge 2: Diverse message formats. Table 1 shows some message mechanism enables that Snipuzz does not need any syntax rules
formats that are used in IoT communication, including JSON, to infer the hidden grammatical structure of the input through
SOAP, Key-value pairs, string, or even customized formats. In the device responses. Compared with the actual syntax rules that
order to be applied to various devices, a solution should be able determine the input string format, the result of snippet determi-
to infer the format from a raw message. nation proposed by Snipuzz has a similarity of 87.1%.
• Challenge 3: Randomness in responses. The response messages of • More effective IoT fuzzing. When testing IoT devices, the num-
an IoT device may contain random elements, such as timestamps ber of response categories is positively correlated with the num-
or tokens. Such randomness results in different responses for ber of code execution paths in the firmware. In the experiment,
the same message, and diminishes the effectiveness of fuzzing the number of response categories explored by Snipuzz far ex-
because the input generation of Snipuzz relies on responses. ceeded other methods on most devices, no matter how long the
analysis duration was (in 10 minutes or 24 hours).
Our approach. In this paper, we propose a novel and automatic
• Implementation and vulnerability findings. We implemented
black-box IoT fuzzing, named Snipuzz, to detect vulnerabilities in
the prototype of Snipuzz.1 We used it to test 20 real-world
IoT firmware. Different from other existing IoT fuzzing approaches,
consumer-grade IoT devices while comparing with the state-
Snipuzz implements a snippet-based mutation strategy which uti-
of-the-art fuzzing tools, i.e., IoTFuzzer, Doona, Boofuzz, and
lizes feedback from IoT devices to guide the fuzzing. Specifically,
Nemesys. In 5 out of 20 devices, Snipuzz successfully found 5
Snipuzz uses a novel heuristic algorithm to detect the role of each
zero-day vulnerabilities, including null pointer exceptions, denial
byte in the message. It will first mutate bytes in a message one by
one to generate probe messages, and categorize the correspond-
ing responses collected from device. Adjacent bytes that have the 1 Publicly available at https://github.com/XtEsco/Snipuzz.
2
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea
of service, and unknown crashes, and 3 of them could be exposed feedback mechanism to guide the fuzzing process. Without feed-
only by Snipuzz. back mechanism, the fuzzing tests could be blind in the selection
of mutation targets, and may lean to a brute force random test.
2 BACKGROUND As discussed previously, due to the lack of open-sourced firmware,
it is difficult or even impossible to instrument the IoT devices. There-
2.1 Fuzz Testing fore, the response messages returned by the firmware can be re-
Fuzzing is a powerful automatic testing tool to detect software garded as a valuable source of device status information at run-time.
vulnerabilities. After decades of development, fuzzing has been The Replier in Figure 1 will use the value of the variable code to
widely used as a base in several security testing domains, such as determine the content of the response messages. The value of code
the OS kernel [12, 36], servers [33], and the blockchain [3]. comes from many different function blocks in the firmware. Pa-
In general, fuzzing feeds the target programs with numerous rameters are passed when Sanitizer fails to parse the input or some
mutated inputs and monitors exceptions (e.g., crashes). If an execu- exceptions are triggered; or when the Function Switch cannot match
tion reveals undesired behavior, a vulnerability could be detected. the key command characters in the input; or after each input is
To discover vulnerabilities more effectively, fuzzing algorithms op- executed in the Functions. Therefore, through the content of the
timize the mutation process based on feedback of executions (e.g., response message, the code block that has been executed in the
coverage knowledge), instead of using a purely random mutation firmware can be inferred. When the firmware source code is not
strategy. Moreover, fuzzers can judge from the feedback mechanism available, the correspondence between the firmware execution and
whether each test case generated by seed mutation is “interesting” the response messages cannot be directly extracted. Moreover, the
(i.e., whether the test case has explored unseen execution states). If a firmware may return the same response messages even executing
test case is interesting, it will be reserved as a new seed to participate different functions.
in future mutation. With the feedback, many fuzzers [4, 5, 29, 41, 49] Although the response message cannot be equated to the exe-
steer the computing resources towards the interesting test cases cution path of the device, it can still play an important role in the
and achieve higher possibility to discover vulnerabilities. black-box fuzz testing for IoT devices. Although it is hard to link
the code execution path corresponding to each response message,
2.2 Generic Communication Architecture of if the two inputs get different response messages, we can deduce
IoT Devices that the two inputs go to different firmware code execution paths.
To react with external inputs, most IoT devices implement a similar Our approach. Snipuzz uses the response message to establish
high-level communication architecture. As per the pseudo code a new feedback mechanism. Snipuzz will collect every response,
example presented in Figure 1, a typical implementation of the and when a new response is found, the input corresponding to the
communication architecture may consist of four parts: 1) Sanitizer, response will be queued as a seed for subsequent mutation testing.
2) Function Switch, 3) Function Definitions, and 4) Replier.
When an IoT device receives an external input, Sanitizer starts
3.2 Message Snippet Inference
parsing the input and performs regular matching. If the input for-
mat breaches the syntactic requirements, or an exception occurs The firmware of the IoT device can be regarded as a software pro-
during the parsing process, Sanitizer will directly notify Replier by gram with strict syntax requirements for input. If the byte-based
sending a response message describing the input error and termi- mutation strategies (such as mutating each byte in the input one
nate the processing of input. If the input is syntactically correct, by one or randomly selecting bytes for mutation testing) are used
Function Switch transfers control to the corresponding Functions in the fuzz testing, the generated test cases could be rare to meet
according to the attribute, Key, and corresponding value, val, ex- the input syntax requirements. The grammar-based fuzzers utilize
tracted from the input. If Key cannot be matched, the processing of detailed documents or a large training data set to learn the gram-
this input will be terminated, similarly as done by Replier. When matical rules and use it to guide the generation of mutation [34, 40].
Functions completes the processing, such as setFlow(), with the In many cases, the input syntax in IoT devices is diverse or non-
parameter val, it notifies Replier to generate the response message. standard. Table 1 shows the communication format requirements
Note that, the implementation of Functions is specific to IoT devices. used in 20 IoT devices from different vendors. Some of them are
As described above, Replier is responsible for sending responses using well-known formats such as JSON and SOAP, but some use
to the client (such as the user’s APP). Based on the calling situ- Key-value pairs or even custom strings as communication format.
ation (indicated by the parameter code in the example), Replier Therefore, it is difficult to provide grammar specifications or estab-
determines the content of response message to be sent. lish training data sets that cover communication formats on a large
scale for the grammar-based mutation strategy.
3 MOTIVATION The best grammar guidance originates from the firmware itself.
Responses from IoT devices suggest the execution results of mes-
3.1 Response-Based Feedback Mechanism sages. If we mutate a valid message byte by byte (i.e., breaching the
The interactive capabilities of IoT devices make it possible to test se- format), we will get many different responses. If mutation of two
curity of device firmware through the network. However, there are different positions in the valid message receives the same response,
also some challenges when testing IoT devices using network-based these two positions have a high possibility that they are related to
fuzzers. Since most network fuzzing methods cannot directly obtain the same functionality in firmware. Therefore, those consecutive
execution status of the device, it is hard to establish an effective bytes with the same response can be merged into one snippet. This
3
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang
Figure 1: Interaction with IoT Firmware. Most implementations of IoT devices have a similar communication architecture, including Sanitizer,
Function Switch, Function Definitions, and Replier. If the Sanitizer and the Function Switch perform correctly, corresponding functionalities
will be executed. Except for crashes, the Replier will always send responses to clients.
method of inferring message snippets can clearly reflect the util- Section 4.3). Throughout the fuzzing process, Snipuzz sets up a net-
ity of each byte after entering the firmware. In addition, mutation work monitor to detect crashes which may indicate vulnerabilities
based on message snippets can largely reduce the search space and (Section 4.4).
improve the efficiency of fuzzing.
Our approach. Snipuzz merges consecutive bytes with the same 4.1 Message Sequence Acquisition
response into one snippet. We also propose different mutation
The quality of initial seeds could influence the fuzzing campaigns
operators performing on snippets.
significantly. Therefore, we consider to obtain high-quality initial
seeds conforming to highly-structured formats required by IoT
devices, as such inputs may exercise complex execution paths and
4 METHODOLOGY enlarge the opportunity of exposing vulnerabilities at deep code.
Generating seeds based on companion app reverse-engineering [9]
In order to clearly present our approach, we first introduce some
or accessible specifications (as mentioned in Section 3.2) could be
notations while explaining the fuzzing process of Snipuzz. At a high
intuitive solutions. However, they either require heavy engineering
level, Snipuzz performs as a client which sends a message sequence
efforts or could be error-prone (e.g., seeds may violate the required
𝑀 to request certain actions from IoT devices. Any message 𝑚 ∈ 𝑀
formats or have the wrong order of messages).
requests the IoT device to perform a certain functionality, and all
Ð Initial seed acquisition. Snipuzz proposes a lightweight solution
the messages 𝑚𝑘 = 𝑀 work together to request an action (or
𝑘 to obtain initial valid seeds. Considering that many IoT devices have
actions). Similarly to the typical fuzzers, we initialize a seed 𝑆 with first- or third-party API documents as well as the test suites, the
an initial message sequence, and a seed corpus 𝐶 with all the testing programs provided by both parties can effectively act as a
seeds (Section 4.1). Meanwhile, restoring message sequences are client, sending control commands to IoT devices or remote servers.
collected for resetting the IoT device to a predefined status. Most structural information (e.g., header, message content) and
To establish an effective fuzzing, as depicted in Figure 2, Snipuzz protocols (e.g., HTTP, HNAP, MQTT) of communication packets
first conducts a snippet determination process. Concretely, Snipuzz are defined in the API programs as message payloads. Therefore,
selects a message 𝑚 in a seed 𝑆 ⊂ 𝐶, from which a probe message Snipuzz leverages these test suites to communicate with the target
𝑝𝑖 and a corresponding sequence 𝑀𝑖 will be generated. Each mes- devices, while at the same time, extracting the message sequences
sage in 𝑀𝑖 will trigger a response message 𝑟𝑖 (response for short) as initial seeds. For example, when using an API program to turn
containing the information about the execution output. Snipuzz on a light bulb, the program first sends login information to the
assigns each message 𝑚 a response pool 𝑅, which is utilized to de- server or to the IoT device, then sends a message to locate a specific
termine if a new response 𝑟𝑖 is unique. The uniqueness of a response light bulb device, and finally sends a message to control the device
indicates that it does not belong to any category of responses ex- to turn on the light. Snipuzz captures such a message sequence
isted in the response pool. If 𝑟𝑖 is unique, Snipuzz will add 𝑟𝑖 into that triggers a functionality of IoT device as an initial seed.
the pool 𝑅, and reserve the corresponding message sequence 𝑀𝑖 Restoring message sequence acquisition. In order to replay a
as a new seed. Snipuzz then divides the message 𝑚 into different test case for the crash triage, Snipuzz ensures that the device under
snippets based on the responses (Section 4.2). Upon the snippets are test has the same initial state in each round of testing. After sending
obtained, Snipuzz performs mutation according to various strate- any message sequence to the device, Snipuzz will send a restoring
gies, e.g., empty, bytes flip, data boundary, or havoc (detailed in message sequence to reset the device to a predefined status.
4
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea
Figure 2: Workflow of Snipuzz. With the valid message sequences (seeds), Snipuzz performs snippet determination on each individual mes-
sage. Then, Snipuzz mutates snippet(s) to generate new message sequences. By monitoring the network traffic, Snipuzz determines a crash
when no responses are received.
Manual efforts. Although we try our best efforts to provide a Snipuzz first uses a heuristic algorithm to roughly divide each
lightweight fuzzer, Snipuzz still requires some manual efforts to message into initial snippets. The core idea of the heuristic algo-
obtain valid and usable initial seeds. First, we manually configure rithm is to generate probe messages 𝑝𝑖 by deleting a certain byte
the programs from the test suites, such as setting up the IP address in the message 𝑚 (𝑚 ∈ 𝑠𝑒𝑒𝑑 𝑆). By categorizing the responses 𝑟𝑖 of
and the login information. Note that, we only need to configure each probe message, Snipuzz preliminarily determines the snippets
these programs once per device. Second, to capture the message in the message 𝑚.
sequences dynamically, we need to manually define the specific For example, as shown in Table 2, to determine snippets in the
format and protocol in the network traffic monitor. Finally, we message 𝑚 = {"on":true}, Snipuzz generates probe messages by re-
filter out some message sequences that will mislead the fuzzing moving the bytes in 𝑚 one by one. When the first byte ‘{’ in 𝑚 is
process. For instance, some API programs provide operations that deleted, the corresponding probe message 𝑝 1 is "on":true}. Similarly,
can automatically update or restart the device. These operations will when the second byte is deleted, the corresponding probe message
halt the device and thus no response will be sent back. This leads to 𝑝 2 is {on":true}. Therefore, the message 𝑚 with 11 bytes can gener-
false-positive crashes because we consider a no-response execution ate 11 different probe messages (𝑝 1 to 𝑝 11 ). Snipuzz will send the
as a crash. The manual work costs roughly 5 man-hours per device 11 corresponding message sequences (𝑀1 to 𝑀11 ) containing the
and is only required during the message sequence acquisition phase probe messages to the device and collect responses.
of Snipuzz. Snipuzz then distinguishes the snippets in the message 𝑚 by cat-
egorizing the responses. Specifically, the consecutive bytes with the
same corresponding response type are merged into the same snip-
4.2 Snippet Determination
pet. According to the examples illustrated in Table 2, the Response
The key idea of Snipuzz is to optimize fuzzing process based on 𝑟 1 , 𝑟 2 , and 𝑟 5 are merged into one category that indicates an error
snippets determined by responses. Put differently, Snipuzz lever- in JSON syntax, while Response 𝑟 3 and 𝑟 4 are merged into another
ages snippet mutation to reduce the search space of inputs, while category which indicates an error of an invalid input parameter.
the snippets are automatically clustered via categorizing responses Therefore, the consecutive bytes whose corresponding responses
from IoT devices. The major challenge is to correctly understand belong to the same category can form a message snippet. Through
the semantics of responses. For instance, due to the presence of this heuristic approach, Snipuzz can determine all initial snippets
timestamp, two semantically identical responses will be classified in the message 𝑚.
into different categories if utilizing a simple string comparison. A naive method to categorize responses is to utilize a string
Therefore, Snipuzz utilizes a heuristic algorithm and a hierarchical comparison, i.e., comparing the content of responses byte by byte.
clustering approach to determine the snippets in each message. However, due to the existence of randomness in responses (e.g.,
timestamp and token), a simple string comparison may incorrectly
4.2.1 Initial Determination. The essence of a message snippet is distinguish the responses with same semantic meaning into dif-
the consecutive bytes in a message that enables the firmware to ferent categories. Therefore, a more advanced solution, Edit Dis-
execute a specific code segment. For experienced experts, it is not tance [42], is introduced to determine the category of responses.
difficult to segment message snippets according to the semantic As shown in Equation (1), a similarity score, 𝑠𝑘𝑡 , between two re-
definition in official documents. However, for algorithms that lack sponses 𝑟𝑘 and 𝑟𝑡 is calculated.
such knowledge, it is essential to apply some automatic approaches
to identify the meaning of each byte in the message.
5
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang
Algorithm 1: Hierarchical Clustering for Snippets • Dictionary. For the scheme of Dictionary, Snipuzz replaces a
Input: Initial Snippets 𝐹 0 , Response Pool 𝑅 snippet with a pre-defined string such as “true” and “false”, which
Result: Snippets 𝐹 may directly explore more code coverage.
1 𝐹 ← 𝐹0 ; • Repeat. In order to detect bugs in syntax parsers, Snipuzz repeats
2 𝐶 ← 𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑧𝑒 (𝐹 0 ); a snippet for multiple times. Meanwhile, the repetition of data
3 𝑉 ← 𝑣𝑒𝑐𝑡𝑜𝑟𝑖𝑧𝑒 (𝑅); domain can detect defects caused by out-of-boundary problems.
4 while size(𝐶) > 1 do
5 for 𝑖 ← 𝑠𝑖𝑧𝑒 (𝐶) to 2 do Havoc. The conditions for triggering bugs may be complicated.
6 for 𝑗 ← 𝑠𝑖𝑧𝑒 (𝐶)-1 to 1 do For example, it may require modifying different data domains in
7 𝐷 ← 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒𝑖 𝑗 = 𝑣𝑖 − 𝑣 𝑗 ; the same message to trigger a bug. The aforementioned snippet
end mutation schemes only mutate one snippet at a time. However,
end the havoc mutation randomly selects some random snippets in a
8 𝑖, 𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛 (𝐷); message, and performs the aforementioned mutation schemes on
9 𝐶 ← 𝑚𝑒𝑟𝑔𝑒_𝑐𝑙𝑢𝑠𝑡𝑒𝑟 (𝐶, 𝑖, 𝑗); each of the selected snippets. Havoc mutation will not stop until
10 𝑉 ← 𝑢𝑝𝑑𝑎𝑡𝑒_𝑐𝑙𝑢𝑠𝑡𝑒𝑟 _𝑐𝑒𝑛𝑡𝑒𝑟 (𝑉 , 𝑖, 𝑗); finding a new response category or the target IoT device crashes.
11 𝐹 ← 𝐹 + 𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒_𝑠𝑛𝑖𝑝𝑝𝑒𝑡𝑠 (𝐶);
end 4.4 Network Traffic Monitor
The network communication of the device is monitored and a time-
out is set to determine whether the device has been crashed. In fact,
the monitoring of device network communication is not a single
step, and it occurs during the entire fuzzing process. In case of
timeout, Snipuzz will continue to send the same message sequence
for three times, as the cause of timeout could be network fluctua-
tions instead of device crashes. If the timeout occurs for three times,
Snipuzz will use the control command to physically restart the
device and send the same sequence of messages to the device again.
If the device still does not return the message on time, Snipuzz will
record the crash and the corresponding message sequence.
IoT Devices under test. We have selected 20 popular consumer • Doona [44]. Doona is a fork of the Bruterforce Exploit Detec-
IoT devices from both online and offline markets worldwide, cover- tor (BED) [6], which is designed to detect potential vulnerabili-
ing various well-known brands, such as Philips, Xiaomi, TP-Link, ties related to buffer and formats in network protocol. Different
Netgear. The types of selected IoT devices include smart plugs, from other tools, Doona does not take network communication
smart bulbs, routers, home bridge, IP camera, fingerprint terminal, packets as seeds. The test cases of Doona are required to be
etc. These devices are either recommended items in Amazon or the pre-defined for each device or protocol under test.
best-selling products that can be bought in supermarkets. Table 1 • Snipuzz-NoSnippet. Snipuzz uses the segmentation of mes-
details the information of the IoT devices under test. sage snippets to enhance the efficiency of fuzzing and the ability
Benchmark tools. In order to verify Snipuzz’s performance in to find crashes. In order to verify whether the snippet determina-
finding crashes and message segmentation, we used seven different tion indeed benefits fuzzing, we implement Snipuzz-NoSnippet
fuzzing schemes as benchmarks. based on Snipuzz. Snipuzz-NoSnippet does not have the com-
ponent of snippet determination, and blindly mutates bytes in
messages without the knowledge of responses.
• IoTFuzzer [9]. The core idea of IotFuzzer is to find the func- Except for Doona, whose test cases are preset, all benchmark
tions that send control commands to the IoT device by static tools and Snipuzz are tested on same input sets. These input sets
analysis of companion apps, and to mutate the value of specific may be in different formats (e.g., BooFuzz requires to manually
variables to perform fuzzing test without breaking the message set the input, and Numesys requires the input to be the pcap file
format. Note that our implementation of IoTFuzzer is the best format), but the content is the same.
effort to replicate since their code is not publicly available, and There are many other popular fuzzing tools which are able to
we acknowledge that this could provide slightly different results test IoT devices via network communication, such as Peach [30]
with respect to the original version. and AFLNET [33]. However, since they are grey-box fuzzing that
We implement the IoTFuzzer by replacing the mutation algo- requires to instrument firmware, it is infeasible and unfair to regard
rithm in Snipuzz framework with the mutation strategies in those tools as baselines for black-box schemes.
IoTFuzzer. Considering that the purpose of companion apps
analysis in IoTFuzzer is to ensure that only the data domain in
the communication message is mutated, to make the benchmark 5.2 Vulnerability Identification
as fair as possible, we use seeds same as the ones used in Snipuzz 5.2.1 Snipuzz. After performing fuzz testing using Snipuzz on
and manually segment the data domain of each seed message each of the 20 IoT devices for 24 hours, we detected 13 crashes in 5
before feeding it to IoTFuzzer. We believe that such manual seg- devices. As shown in Table 3, the detected crashes include 7 null
mentation is sufficient to provide an upper bound performance pointer dereferences, 1 denial of service, and 5 unknown crashes
of IoTFuzzer. Note that we remove the methods that are related that we further manually verified. The 13 crashes found by Snipuzz
to the feedback mechanism and snippet segmentation because are triggered by providing malformed inputs. These malformed
these methods are not used in IoTFuzzer. inputs break the message format in different ways. For example,
• Nemesys [22]. Nemesys is a protocol reverse engineering tool deleting placeholders, emptying the data domain or fortunately
for network message analysis. It utilizes the distribution of value changing the type of data value.
changes in a single message to infer the boundaries of each data Note that all the crashes identified by Snipuzz are in JSON-
domain. Considering that Nemesys is a protocol inference method based devices, although we successfully conducted experiments on
instead of an off-the-shelf fuzzing tool, we implement the method the 20 IoT devices with various communication formats, such as
of Nemesys based on the Snipuzz framework to infer the snip- JSON, SOAP, and K-V pair. The experiments also show that Snipuzz
pet boundary, replacing corresponding snippet determination observes a higher number of response categories compared to the
method (Section 4.2). other fuzzers (as detailed in Section 5.3).
• BooFuzz [31]. As a successor of Sulley [19], BooFuzz is an ex- Null pointer dereferences. As shown in Table 3, the 7 crashes
cellent network protocol fuzzer that has been involved in several triggered by Snipuzz in TP-Link HS110 and HS100 are all caused
recent fuzzing research [9, 37, 48]. Different from other automatic by null pointer dereferences. After sending the test cases to HS110
fuzzers, BooFuzz requires human-guided message segmentation and HS100, the devices crashed, unable to reply to any interaction.
strategies as inputs. In our research, we leverage this property However, after a few minutes, the devices automatically restarted
and manually define more fuzzing strategies to enrich the bench- and recovered to the initial state. Based on the analysis of test
mark evaluation. cases, we found that the vulnerabilities are all triggered by mes-
– BooFuzz-Default. In this strategy, we set each message in sages that mutated in JSON syntax. Put differently, when some
the input as a complete string, that is, BooFuzz will use the important placeholders, such as curly braces and colons, or a part
message as a string for mutation testing. of the test message are mutated, the syntax structure and the se-
– BooFuzz-Byte. Each byte of the message in the input will be mantic meaning of the message are broken. If the device cannot
used for a mutation test individually. handle the mutated input message properly, it will crash the device.
– BooFuzz-Reversal. Contrary to the idea of IoTFuzzer, in this We reported the vulnerabilities to the device vendor, TP-Link, via
strategy, we focus on the mutation of non-data domain in the email on June 13, 2020. They have confirmed the vulnerability and
message, while keeping data domain unchanged. promised to fix it through a firmware update.
8
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea
Table 3: Experiment Results. Snipuzz discovers the most number of categories and exposes the most number of bugs.
Snipuzz IoTFuzzer Doona BooFuzz-Default BooFuzz-Byte BooFuzz-Reversal Nemesys Snipuzz-NoSnippet
# Devices
T C 10/24 T C 10/24 C 10/24 C 10/24 C 10/24 C 10/24 C 10/24 C 10/24
1 YLDP05YL UC 3∗ 46/71 UC 1∗ 31/33 NA NA/NA 0 11/17 0 11/41 0 11/22 0 26/61 0 21/69
2 YLDP13YL UC 2∗ 35/76 UC 1∗ 20/24 NA NA/NA 0 8/18 0 8/42 0 8/22 0 18/62 0 22/70
3 A60 DoS 1 28/41 / 0 18/22 0 5/16 0 7/13 0 8/33 0 5/21 0 22/36 0 20/39
4 Mini C / 0 46/72 / 0 18/31 0 7/15 0 5/11 0 6/31 0 5/21 0 18/68 0 18/70
5 BR30 / 0 28/51 / 0 8/19 NA NA/NA 0 4/11 0 4/31 0 4/20 0 13/40 0 13/48
6 Hue / 0 65/110 / 0 29/36 0 4/11 0 7/11 0 9/31 0 7/25 0 34/110 0 22/99
7 Base Station / 0 34/51 / 0 29/33 0 7/16 0 6/9 0 9/17 0 7/13 0 19/38 0 23/50
8 HS100 NPD 3 24/64 / 0 20/27 NA NA/NA 0 6/13 0 6/31 0 6/22 0 20/64 0 19/71
9 HS110 NPD 4 24/79 / 0 17/22 NA NA/NA 0 6/14 0 9/33 0 6/22 0 20/62 0 19/78
10 F7C027au / 0 13/21 / 0 7/10 0 6/14 0 8/12 0 6/18 0 6/15 0 8/14 0 12/21
11 MSS310 / 0 42/61 / 0 15/17 0 8/16 0 5/11 0 8/45 0 8/21 0 30/59 0 20/61
12 B25AUS / 0 19/42 / 0 8/13 0 7/19 0 7/14 0 11/17 0 7/11 0 16/36 0 9/41
13 Mini US / 0 25/61 / 0 8/41 NA NA/NA 0 7/16 0 7/35 0 7/22 0 9/55 0 8/49
14 SP4L-AU / 0 37/43 / 0 18/32 0 5/11 0 5/17 0 7/32 0 5/23 0 23/40 0 17/40
15 R6400 / 0 11/37 / 0 20/24 0 4/13 0 3/12 0 4/24 0 4/18 0 6/30 0 6/41
16 WL100 / 0 53/81 / 0 38/44 NA NA/NA 0 8/16 0 8/46 0 8/27 0 41/70 0 29/76
17 Alro Pro 2 / 0 25/36 / 0 16/22 0 10/14 0 8/13 0 14/22 0 10/17 0 18/22 0 13/41
18 F19821W / 0 39/75 / 0 36/33 0 7/13 0 5/11 0 7/23 0 7/14 0 27/65 0 21/76
19 T-131P / 0 36/80 / 0 9/22 0 7/16 0 7/20 0 9/42 0 7/35 0 21/65 0 20/91
20 RM mini 3 / 0 14/36 / 0 9/30 NA NA/NA 0 10/17 0 14/31 0 10/23 0 6/30 0 5/35
UC: Unknown crash. NPD: Null pointer dereference. DoS: Denial of service. T: Vulnerability type. C: Number of crashes. 10/24: Number of response categories (10 minutes/24 hours).
∗ : Remotely exploitable. NA: Since Doona is only applicable to some network protocols, devices that cannot be tested are represented by ‘NA’.
Table 4: Mutated messages of Snipuzz & IoTFuzzer. all devices, which also limits its capacity. Since Boofuzz directly re-
Contents of mutated messages Generated by places the specified positions in the message with a preset string, it
can only trigger a limited types of vulnerabilities. Nemesys offers a
{"{"id": 0, "method": "start_cf", "params": ["4, 4, "1000,
Original Message new idea of determining message snippets. However, since it deter-
2, 2700,100,500 ,1,255,10,5000,7,0,0,500,2,5000,1"]}"
mines message snippets by the distribution of values in messages, it
{"{"id": 0, "method": "start_cf", "params": ["4, , "1000,
Snipuzz is difficult for Nemesys to accurately decide the boundary between
2, 2700,100,500 ,1,255,10,5000,7,0,0,500,2,5000,1"]}"
data and non-data domains. Therefore, Nemesys can hardly detect
{"{"id": 0, "method": "start_cf", "params": [", 4, "1000,
IoTFuzzer vulnerabilities that can only be triggered by mutating the data or
2, 270000,100,500 ,1,255,10,5000,7,0,0,500,2,5000,1"]}"
non-data domains. Snipuzz-NoSnippet, which does not apply the
Denial of service. Another interesting finding is the denial of snippet-based mutation method used in Snipuzz, is similar to the
service vulnerability detected in Philips A60 smart bulb. After being classic fuzzer AFL[24]. Since Snipuzz-NoSnippet does not infer
tested by Snipuzz for 24 hours, Philips’ official companion app could the structure of the message but directly uses single or multiple
not manage the device normally. Specifically, the device cannot be consecutive bytes as the unit of mutation, most of the test cases
found in the app and if any further messages are sent through the generated by Snipuzz-NoSnippet destroy the structure of the mes-
app, the response in the app will keep asking to bound the device to sages. Such a method is difficult to work on devices that require
a device group and no further interaction is available. However, we highly-structured inputs.
observe that if the message packet is sent directly to the device, the IoTFuzzer detected 2 crashes in 2 smart bulb devices, i.e., the
device can work normally. This indicates that the device does not YLDP05Y and YLDP013Y. Due to the mutation strategy of IoT-
completely crash but its service via the companion app is denied. Fuzzer, the malformed input provided by IoTFuzzer is obtained
Unknown crashes. Snipuzz found 5 crashes on Yeelight bulbs, by emptying the data domain. According to the mutated messages
YLDP05YL, and YLDP13YL. The devices crashed and restarted by listed in Table 4, we can see that the messages mutated by IoT-
themselves within roughly one minute. By analyzing the test cases, Fuzzer resemble the ones generated by Snipuzz. The mutated do-
we found that the crashes are due to the deletion of certain data mains of messages from Snipuzz and IoTFuzzer in Table 4 are all in
domains, such as the nullify of parameters, marked as red in Table 4. the data domain. In terms of the effect of the mutation test, Snipuzz
As the firmware of the 2 devices is not publicly available, the root and IoTFuzzer achieve the same goal on these two messages. How-
cause of the vulnerability cannot be determined; However, we can ever, Snipuzz can cover the mutation space of IoTFuzzer because
still deduce that the vulnerability is due to the device reading in IoTFuzzer only focuses on the data domain mutation while Snipuzz
null values during the parsing process, causing a crash during the can mutate both the data and non-data domains.
assignment. We also find that communication using a local network To further determine the root cause of the crash, we obtained
does not require any authentication, which means that the device the firmware source code of HS100 and HS110, two typical mar-
can be crashed by any attackers in the local network. Therefore, ket consumer-grade smart plugs manufactured by TP-Link, and
we consider the vulnerabilities as ‘remotely exploitable’. conducted a case study which reflected the differences between
Snipuzz and IoTFuzzer. We found that one of the crashes triggered
5.2.2 Benchmark with state-of-the-art tools. As shown in Table 3, by Snipuzz on the two devices is caused by breaking the syntax
for 24 hours fuzz testing on each devices, none of the benchmark structure and mutating both on data and non-data domains. More
tools found a crash except for IoTFuzzer. They did not find the specifically, the mutated messages successfully bypassed the sani-
crash due to various reasons. Donna focuses more on the mutation tizer and triggered the crash during function execution. We deduce
of communication protocols. Further, Donna cannot be applied on
9
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang
Table 5: Inference results of Snipuzz and Nemesys. grammatical rules (i.e., ‘true’, ‘140’ and ‘254’) with some placehold-
Method Ave. Similarity Example ers (such as double quotes and curly brackets). After analyzing
the response messages, we found that the responses obtained after
Snipuzz 87.1% {" on ":true," sta ":140," bri ":254} destroying these data domains and destroying placeholders are
Nemesys 64.5% {"on": true ,"sta": 140 ,"bri" 254}
all about invalid format. This may be due to the fact that in the
Grammar 100.0% {" on ": true ," sta ": 140 ," bri ": 254 }
firmware, when an error occurs in the parsing format, the response
carefully checked the initial input message sequences and found does not report a detailed description of the error but instead returns
that the average length of the message exceeds 400 bytes, forcing a general format error.
Snipuzz to generate and send a large number of probe messages On the other hand, Nemesys uses the distribution of value changes
to determine message snippets. Therefore, in the first 10 minutes, in the protocol to determine the boundary of different data domains,
Snipuzz was still exploring the response category of the first few and to achieve the semantic segmentation of a message. The advan-
messages, so it did not exceed IotFuzzer. tage of this method is that it does not require any other additional
information, such as grammar rules or a large number of training
data sets in addition to the message itself.
5.4 Assessment on Message Snippet Inference The average similarity result of Nemesys, 64.5%, is lower than
Among all strategies, Snipuzz and Nemesys utilize semantic seg- the Snipuzz result. Given the example shown in Table 5, when
mentation, to assess their performance of message snippet inference. segmenting messages in a format requires restricted syntax, such as
We compare the snippets they produce during the fuzzing process Json and XML, Nemesys can achieve a good semantic segmentation
with the grammar rules defined in API documents. Specifically, for performance, because the placeholders usually use symbols unusu-
some mature and popular languages, such as JSON, we establish ally used in data domains. This distribution of byte value enables
the grammar rules as per their standard syntax; for custom for- Nemesys to effectively find the boundaries between data domains.
mats, such as strings or custom bytes, we refer to the official API However, in IoT devices, customized formats are prevalent. For
documents and define the grammar rules based on the instructions. example, the smart bulb BR30 uses custom bytes as a means of com-
Equation (2) quantifies the quality of snippet inference, and munication, where each byte corresponds to a special meaning (i.e.,
Similarity indicates the percentage of correctly categorized bytes "0x61" represents "CHANGE_MODE" and "0x0f" represents "TRUE").
in a snippet-determined message, 𝑚, compared with the ground In such cases, the value distribution of characters can no longer be
truth, 𝑔, manually extracted from the grammar rules. used as a guidance for the data domain determination, and thus the
message segmentation determined by Nemesys is error-prone.
𝑐𝑜𝑢𝑛𝑡 [𝑐𝑎𝑡𝑒 (𝑚) ⊕ 𝑐𝑎𝑡𝑒 (𝑔)]
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦 (𝑚) = 1 − , (2)
𝑙𝑒𝑛(𝑚) 6 DISCUSSION AND LIMITATIONS
where 𝑐𝑎𝑡𝑒 () returns the category of each message byte in a series Snipuzz has successfully examined 20 different devices and exposed
of “0” and “1” bits, 𝑐𝑜𝑢𝑛𝑡 () counts the number of mis-categorized security vulnerabilities on five of them. However, there are still
bytes, and 𝑙𝑒𝑛() represents the length of a message. Note that in a some limitations relevant to efficiency and scalability of Snipuzz.
ground truth message, “0” indicates the non-data domain (marked We discuss the limitations in this section and propose solutions as
blue in Table 5), while “1” indicates the data domain (marked red future work.
in Table 5). Therefore, the ⊕ is the bitwise 𝑋𝑂𝑅 operation. Scalability and manual effort. IoT devices can be tested by Snipuzz
In addition, followed by Equation (2), we compute the average if the valid network packets are known. In our prototype, we capture
similarity of the snippets (or data domain) determined by Snipuzz communication packets by running API programs and monitoring
and Nemesys for all the 235 messages obtained from experiments. network communication (Note that packets can also be obtained by
Note that during the calculation of the average similarity, for each statically analyzing API programs without running them). In the
message, if there are multiple snippet sets determined, we will select absence of API programs or documents, we can recover the message
the snippet inference with the highest similarity value; therefore formats from the official Apps of IoT devices through decompilation
a snippet could reflect the grammatical rules as many as possible and taint analysis. Or as a second way, we can solve this problem by
and maximize the performance of message semantic segmentation. intercepting the communication between APPs and IoT devices, and
The average similarity result of Snipuzz, 87.1%, indicates that, then recovering message formats from the captured packets. The
by applying snippet inference based on the hierarchical cluster- second way is feasible and we have experimented it in TP-Link’s
ing approach, Snipuzz can effectively find the grammatical rules IoT control APP KASA, which can be further developed for more
hidden in the message. Ideally, in Snipuzz, the merging of clusters IoT devices. However, both methods could introduce overhead and
removes the influence caused by the randomness in responses and involve manual effort.
by the replying message mechanism itself. Therefore, the message Recall in Section 4.1 that Snipuzz requires manual effort, which
snippets will conform to the grammatical rules gradually, which takes 5 man-hours per device to collect the initial seeds during the
leads Snipuzz to a higher similarity result. message sequence acquisition phase. The manual effort is mainly
However, we also found some differences between the snippet referred to cleaning the packets from the API programs that are
inference method and the grammatical rules in some results. For obtained from publicly available first- and third-party resources. To
example, given the example shown in Table 5, the snippet inference mitigate this limitation when applying Snipuzz to IoT devices, tech-
method combines the strings belonging to the data domain in the niques such as crawlers could be used to automatically gather API
11
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang
programs associated with the IoT devices in the future work. More- feedback mechanism improves the effectiveness of bug discovery.
over, the process of cleaning the packets could also be improved For instance, IoTFuzzer [9] obtains the data domain, on which
by pre-processing keywords through scripts to achieve automatic IoTFuzzer performs blind mutation. Thus, IoTFuzzer lacks the
collection of communication packages. knowledge of the quality of the generated inputs, resulting in a
Threats to validity. As Snipuzz collects initial message sequences waste of resource on the low-quality inputs. There are also several
via API programs and network sniffers, the first threat comes from dynamic analysis approaches focusing on the networking modules
the absence of API programs. In this case, we can recover message of IoT devices. For example, SPFuzz defines a new language for
formats based on the companion apps of IoT devices (similar to IoT- describing protocol specifications, protocol state transitions, and
Fuzzer) but may need more manual efforts. Second, the encryption their correlations [37]. SPFuzz can ensure the correctness of the
in messages decreases the effectiveness of snippet determination message format in the conversation state and the dependence of
because the semantic information could be corrupted. A potential the protocol. IoTHunter is a grey-box approach to fuzz the state
solution to the encryption issue is to integrate decryption mod- protocol of IoT firmware [47]. IoTHunter can constantly switch
ules into Snipuzz. Finally, the code coverage of firmware could the protocol state to perform a feedback-based exploration of IoT
be subject to the accessibility of API programs, since Snipuzz can devices. In a recent example, AFLnet acts as a client and continu-
only examine the functionalities that are covered in API programs. ously replays the variation of the original message sequence sent
Recombining the message snippets from different seeds to generate to target (i.e., server or device) [33]. AFLnet uses response codes,
new valid inputs could mitigate this limitation. which are the numbers indicating the execution states, to identify
the execution status of targets and explore more regions of their
Encryption. During Message Acquisition, we noticed that encryp-
networking modules.
tion is used to protect communication in some API programs. En-
Another research line for dynamic analysis of IoT devices is the
cryption has no effect on the message sequence mutation process,
usage of emulators. The disadvantages of emulation are the heavy
but the snippet determination process basically fails. Because the
engineering efforts and the requisite of firmware, although the emu-
encryption algorithm disrupts the original format of the message,
lation of IoT firmware can analyze more thoroughly than black-box
the segmentation of snippets is sensitive to the position of the char-
fuzzing. Two major challenges for emulation of IoT firmware are the
acter. Moreover, because the response messages from the device
scalability and throughput. Therefore, the efforts in improving the
are also encrypted, Snipuzz cannot get useful feedback from them.
performance of emulation include full-system emulation [8, 27], im-
Similarly, the encryption and decryption algorithms in the API
provement of emulation success rates [21], hardware-independent
program can be integrated into the Snipuzz module to address this
emulation [17, 38], and combination of user- and system-mode em-
limitation, or the difficulties caused by encryption can be addressed
ulation [51]. Based on the emulation, fuzzing can be integrated into
from the perspective of mutation strategy design.
those frameworks and can hunter defects in firmware [38, 51].
Coverage. The code coverage of firmware explored by Snipuzz Static analysis of firmware is the complementary approach of dy-
depends on the API programs. For example, if the API programs namic analysis. Semantic similarity is one of the major techniques
of a bulb only support the functionality of turning on power, it that make static analysis successful. Researchers analyze seman-
is almost impossible to explore the functionality of adjusting the tic similarity via comparison of files and modules [13], Control
brightness via mutating the messages captured during the power Flow Graphs (CFGs) [14], parser and complex processing logic [11],
turned on. In the future work, without the support of grammar, we and multi-binary interactions [35]. There are also many similarity-
will consider recombining the message snippets to try to generate based approaches that can detect vulnerabilities across different
new valid inputs. This method can help explore more firmware firmware architectures. They usually extract various architecture-
execution coverage in addition to the original inputs provided. independent features from firmware for each node in a CFG to
Requirements on detailed responses. The detection effective- represent a function, and then check whether two functions’ CFG
ness of Snipuzz depends on the quality of message snippets which representations are similar [15, 32].
is contingent on how much information could be obtained from
the responses of IoT devices. To put differently, if the IoT device
8 CONCLUSION
does not provide responses that are detailed enough, for example
reporting all the errors with a uniform message, it could be hard for In this paper we have presented a black-box fuzzing framework
Snipuzz to determine the message snippets. Fortunately, in many Snipuzz designed for detecting vulnerabilities hiding in IoT de-
IoT devices, advanced error descriptions could be obtained in debug vices. Different from other black-box network fuzz testing, Snipuzz
mode which will significantly improve the determination process uses the response messages returned by the device to establish a
of message snippets in Snipuzz. feedback mechanism for guiding the fuzzing mutation process. In
addition, Snipuzz infers the grammatical role of each byte in the
messages based on the responses from the device, so that Snipuzz
can generate test cases that meet the device’s grammar without the
7 RELATED WORK guidance of grammatical rules. We have used 20 consumer-grade
Our Snipuzz performs in a black-box manner for detecting vulner- IoT devices from the market to test Snipuzz, and it has successfully
abilities in IoT devices. Unlike existing black-box fuzzing for IoT found 5 zero-day vulnerabilities on 5 different devices.
devices, which blindly mutates messages, Snipuzz optimizes the
mutation process of black-box fuzzing via utilizing responses. This
12
Snipuzz: Black-box Fuzzing of IoT Firmware via
Message Snippet Inference CCS 2021, 14 - 21 November, 2021, Seoul, South Korea
REFERENCES embedded devices. In NDSS 2018, Network and Distributed Systems Security Sym-
[1] 2020. The Three Software Stacks Required for IoT Architectures. IoT Eclipse posium.
(White Paper) (2020). [28] Lindsey O’Donnell. 2020. More than half of IoT devices vulnerable to severe attacks.
[2] Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Technical Report. ThreatPost.
Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for deep [29] Sebastian Österlund, Kaveh Razavi, Herbert Bos, and Cristiano Giuffrida. 2020.
bugs with grammars.. In The Network and Distributed System Security Symposium ParmeSan: Sanitizer-guided greybox fuzzing. In 29th USENIX Security Symposium
(NDSS). (USENIX Security 20).
[3] I. Ashraf, X. Ma, B. Jiang, and W. K. Chan. 2020. GasFuzzer: Fuzzing ethereum [30] Peachtech. 2021. PEACH: The PEACH fuzzer platform. https://www.peach.tech/
smart contract binaries to expose gas-oriented exception security vulnerabilities. products/peach-fuzzer/ Accessed: 2021-01.
IEEE Access (2020). [31] Joshua Pereyda. 2017. boofuzz: Network protocol fuzzing for humans. https:
[4] Marcel Böhme, Van-Thuan Pham, Manh-Dung Nguyen, and Abhik Roychoudhury. //boofuzz.readthedocs.io/en/stable/.
2017. Directed greybox fuzzing. In Proceedings of the 2017 ACM SIGSAC Conference [32] Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten
on Computer and Communications Security. Holz. 2015. Cross-architecture bug search in binary executables. In 2015 IEEE
[5] Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. 2016. Coverage- Symposium on Security and Privacy (SP).
based greybox fuzzing as markov chain. In Proceedings of the 2016 ACM SIGSAC [33] Van-Thuan Pham, Marcel Böhme, and Abhik Roychoudhury. 2020. AFLNET:
Conference on Computer and Communications Security. A greybox fuzzer for network protocols. In IEEE International Conference on
[6] Kali Bot. 2019. bed. https://gitlab.com/kalilinux/packages/bed. Software Testing, Verification and Validation (ICST) 2020.
[7] Z. Berkay Celik, Patrick McDaniel, and Gang Tan. 2018. Soteria: Automated [34] Van-Thuan Pham, Marcel Böhme, Andrew Edward Santosa, Alexandru Razvan
IoT safety and security analysis. In 2018 USENIX Annual Technical Conference Caciulescu, and Abhik Roychoudhury. 2019. Smart greybox fuzzing. IEEE Trans-
(USENIX ATC 18). actions on Software Engineering (2019).
[8] Daming D Chen, Maverick Woo, David Brumley, and Manuel Egele. 2016. Towards [35] Nilo Redini, Aravind Machiry, Ruoyu Wang, Chad Spensky, Andrea Continella,
automated dynamic analysis for Linux-based embedded firmware. In The Network Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. 2020. Karonte:
and Distributed System Security Symposium (NDSS). Detecting insecure multi-binary interactions in embedded firmware. In 2020 IEEE
[9] Jiongyi Chen, Wenrui Diao, Qingchuan Zhao, Chaoshun Zuo, Zhiqiang Lin, Symposium on Security and Privacy (SP).
XiaoFeng Wang, Wing Cheong Lau, Menghan Sun, Ronghai Yang, and Kehuan [36] Sergej Schumilo, Cornelius Aschermann, Robert Gawlik, Sebastian Schinzel, and
Zhang. 2018. IOTFUZZER: Discovering memory corruptions in IoT through Thorsten Holz. 2017. kAFL: Hardware-assisted feedback fuzzing for OS Kernels.
app-based fuzzing. In The Network and Distributed System Security Symposium In 26th USENIX Security Symposium (USENIX Security 17).
(NDSS). [37] Congxi Song, Bo Yu, Xu Zhou, and Qiang Yang. 2019. SPFuzz: a hierarchical
[10] Abraham Clements, Eric Gustafson, Tobias Scharnowski, Paul Grosen, David scheduling framework for stateful network protocol fuzzing. IEEE Access (2019).
Fritz, Christopher Kruegel, Giovanni Vigna, Saurabh Bagchi, and Mathias Payer. [38] Prashast Srivastava, Hui Peng, Jiahao Li, Hamed Okhravi, Howard Shrobe, and
2020. HALucinator: Firmware re-hosting through abstraction layer emulation. Mathias Payer. 2019. FirmFuzz: automated IoT firmware introspection and analy-
In Proceedings of the 29th USENIX Security Symposium (USENIX ’20). sis. In Proceedings of the 2nd International ACM Workshop on Security and Privacy
[11] Lucian Cojocar, Jonas Zaddach, Roel Verdult, Herbert Bos, Aurélien Francillon, for the Internet-of-Things.
and Davide Balzarotti. 2015. PIE: Parser identification in embedded systems. [39] Liam Tung. 2017. IoT devices will outnumber the world’s population this year for
[12] Jake Corina, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Shuang the first time. Technical Report. ZDNet.
Hao, Christopher Kruegel, and Giovanni Vigna. 2017. Difuze: Interface aware [40] Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2017. Skyfire: Data-driven
fuzzing for kernel drivers. In Proceedings of the 2017 ACM SIGSAC Conference on seed generation for fuzzing. In 2017 IEEE Symposium on Security and Privacy
Computer and Communications Security. (SP).
[13] Andrei Costin, Jonas Zaddach, Aurélien Francillon, and Davide Balzarotti. 2014. [41] Yanhao Wang, Xiangkun Jia, Yuwei Liu, Kyle Zeng, Tiffany Bao, Dinghao Wu, and
A Large-Scale Analysis of the Security of Embedded Firmwares. In 23rd USENIX Purui Su. 2020. Not all coverage measurements are equal: Fuzzing by coverage
Security Symposium (USENIX Security 14). accounting for input prioritization. In The Network and Distributed System Security
[14] Thomas Dullien and Rolf Rolles. 2005. Graph-based comparison of executable Symposium (NDSS).
objects (english version). Journal of Computer Virology and Hacking Techniques [42] Wikipedia. 2021. Edit distance. https://en.wikipedia.org/wiki/Edit_distance.
(2005). [43] Wikipedia. 2021. Hierarchical clustering. https://en.wikipedia.org/wiki/
[15] Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. 2016. discovRE: Hierarchical_clustering.
Efficient cross-architecture identification of bugs in binary code. In Network and [44] wireghoul. 2019. Doona. https://github.com/wireghoul/doona.
Distributed Systems Security (NDSS). [45] wireshark. 2020. About wireshark. https://www.wireshark.org/about.html.
[16] Pwnie Express. 2020. What makes IoT so vulnerable to attack? Technical Report. [46] Xiaojun Xu, Chang Liu, Qian Feng, Heng Yin, Le Song, and Dawn Song. 2017.
Outpost24. Neural network-based graph embedding for cross-platform binary code similarity
[17] Bo Feng, Alejandro Mera, and Long Lu. 2020. P2IM: Scalable and hardware- detection. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and
independent firmware testing via automatic peripheral interface modeling. In Communications Security.
29th {USENIX } Security Symposium ( {USENIX } Security 20). [47] Bo Yu, Pengfei Wang, Tai Yue, and Yong Tang. 2019. Poster: Fuzzing iot firmware
[18] Qian Feng, Rundong Zhou, Chengcheng Xu, Yao Cheng, Brian Testa, and Heng via multi-stage message generation. In Proceedings of the 2019 ACM SIGSAC
Yin. 2016. Scalable graph-based bug search for firmware images. In Proceedings Conference on Computer and Communications Security.
of the 2016 ACM SIGSAC Conference on Computer and Communications Security. [48] Y. Yu, Z. Chen, S. Gan, and X. Wang. 2020. SGPFuzzer: A state-driven smart
[19] Fitblip. 2019. Sulley. https://github.com/OpenRCE/sulley. graybox protocol fuzzer for network protocol implementations. IEEE Access
[20] Eric Gustafson, Marius Muench, Chad Spensky, Nilo Redini, Aravind Machiry, (2020).
Yanick Fratantonio, Davide Balzarotti, Aurélien Francillon, Yung Ryn Choe, [49] Tai Yue, Pengfei Wang, Yong Tang, Enze Wang, Bo Yu, Kai Lu, and Xu Zhou. 2020.
Christophe Kruegel, et al. 2019. Toward the analysis of embedded firmware EcoFuzz: Adaptive energy-saving greybox fuzzing as a variant of the adversarial
through automated re-hosting. In 22nd International Symposium on Research in multi-armed bandit. In 29th USENIX Security Symposium (USENIX Security 20).
Attacks, Intrusions and Defenses ( {RAID } 2019). [50] Jonas Zaddach, Luca Bruno, Aurelien Francillon, Davide Balzarotti, et al. 2014.
[21] Mingeun Kim, Dongkwan Kim, Eunsoo Kim, Suryeon Kim, Yeongjin Jang, and AVATAR: A framework to support dynamic security analysis of embedded sys-
Yongdae Kim. 2020. FirmAE: Towards large-scale emulation of IoT firmware for tems’ firmwares. In The Network and Distributed System Security Symposium
dynamic analysis. In Annual Computer Security Applications Conference. (NDSS).
[22] Stephan Kleber, Henning Kopp, and Frank Kargl. 2018. NEMESYS: Network [51] Yaowen Zheng, Ali Davanian, Heng Yin, Chengyu Song, Hongsong Zhu, and
message syntax reverse engineering by analysis of the intrinsic structure of indi- Limin Sun. 2019. FIRM-AFL: High-throughput greybox fuzzing of IoT firmware
vidual messages. In 12th {USENIX } Workshop on Offensive Technologies ( {WOOT } via augmented process emulation. In 28th USENIX Security Symposium (USENIX
18). Security 19).
[23] Karla Lant. 2017. By 2020, there will be 4 devices for every human on earth.
Futurism (2017).
[24] lcamtuf. 2017. AFL. https://lcamtuf.coredump.cx/afl/. APPENDIX
[25] Trend Micro. 2020. Mirai botnet exploit weaponized to attack IoT devices via
CVE-2020-5902. Technical Report. Security Intelligence Blog. A RUNTIME PERFORMANCE
[26] Trend Micro. 2020. Smart yet flawed: IoT device vulnerabilities explained. Technical
Report. Security News. Fig 6 shows the run-time performance of Snipuzz and other seven
[27] Marius Muench, Jan Stijohann, Frank Kargl, Aurélien Francillon, and Davide baselines during the first 10 minutes. In most benchmarks, Snipuzz
Balzarotti. 2018. What you corrupt is not what you crash: Challenges in fuzzing
discovers the most number of categories. Since Snipuzz spends
13
CCS 2021, 14 - 21 November, 2021, Seoul, South Korea X. Feng, R. Sun, X. Zhu, M. Xue, S. Wen, D. Liu, S. Nepal, and Y. Xiang
Figure 6: Runtime performance. The number of categories discovered in 10 minutes on all the 20 IoT devices. Snipuzz performs
the best on 19 devices.
firmware, using cJSON,2 a popular open-source lightweight JSON Due to the design of IoTFuzzer, the fuzzing based on grammati-
parser (5.4k stars in GitHub), to interpret input message fragments. cal rules will offer priority to satisfying the grammar requirements
The jalr instruction will save the result of cJson_GetObjectItem in the mutation process in order not to be rejected by the firmware
in $t9 and jump to this address unconditionally (see line 3 in Fig- grammar detector. The advantage of this is to ensure that each test
ure 8), which means the firmware will pick the value corresponding case can reach the functional execution part of the firmware. How-
to ‘schedule’. In the original message, the value corresponding to ever, in this case, the test range of fuzzing based on grammatical
‘schedule’ is a JSON object headed by ‘edit_rule’ (from line 4 rules cannot cover the firmware sanitising part.
to line 16). Note that the aforementioned snippet-based mutation To conclude, the root cause of the crash has two factors: 1) the
strategy implemented in Snipuzz is able to break the syntax struc- validation of message syntax heavily relies on a third-party library;
ture and mutate both on data and non-data domains. Interestingly, 2) the firmware does not correctly handle the null pointer exception
although the removing of two left curly braces breaks the JSON caused by data type mismatch. Although it is not reasonable to
syntax, it is not recognized by cJSON parser, so the mutated mes- require a vendor to develop products purely from scratch, we argue
sage successfully bypasses the syntax validation and enters the that thorough testing and validation on the open-source library
functional code in firmware. When the firmware tries to access the are essential. Considering the complexity of IoT firmware testing,
successor JSON object in ‘schedule’, i.e., the object starts with a lightweight and effective black-box vulnerability detection tool,
‘edit_rule’, since the corresponding value is no more a JSON such as Snipuzz, is a pressing need.
object, but an array, a null pointer exception is triggered. 2 https://github.com/DaveGamble/cJSON
15