Adventures in Bug Hunting
Adventures in Bug Hunting
@joedamato http://timetobleed.com
whoami
http://timetobleed.com
@joedamato
rst, a confession.
bprobe
boundary IPFIX ow meter collects ow data by snifng packets with libpcap also collects low level NIC data from the driver
at a high level... the bonding driver creates a virtual device when a packet is sent, bonding driver gures when a packet comes in, the NICs pass the
out which physical NIC to transmit the packet on. incoming packet up for the higher layers of the network stack to gure out.
Step 0
Step 1
Examine our assumptions: The packets are making it to the kernel. The packets are being handed up from the
kernel to libpcap.
so, do they?
tcpdump
bonded ethernet interfaces (on linux) are virtual devices created by combining other devices. for example:
bond0
Everything is cool.
only way to gure out where they are getting lost is to follow them through the kernel.
Step 2
Steps 3-5
Dig until you see something you havent
seen before.
lets just look at the simple case. an interrupt is raised when a packet arrives. both paths hand data up to the higher
layers in similar ways.
e1000
e1000
netif_rx
queues packets up. another thread pulls packets off and processes them.
bprobe/tcpdump/etc
(in userland) (in userland)
libpcap
bprobe/tcpdump/etc
(in userland) (in userland)
libpcap
bprobe/tcpdump/etc
(userland)
call pcap_next_ex to get packets from libpcap. examine the packets and do stuff.
bprobe/tcpdump/etc
(in userland) (in userland)
libpcap
libpcap (userland)
creates a socket of type PF_PACKET two ways to get get packets from the kernel: one by one (slow) via shared memory (fast) libpcap tries to use the fast method if it fails, it falls back to slow.
the old way is getting setup when the new way failed to initialize.
bprobe/tcpdump/etc
(in userland) (in userland)
libpcap
PF_PACKET (kernel)
libpcap creates the PF_PACKET socket the PF_PACKET code in the kernel
(eventually) executes.
bprobe/tcpdump/etc
(in userland) (in userland)
libpcap
pulls packets off the backlog queue. calls netif_receive_skb() has some logic to determine who the real
sender is when bonding is enabled. hooks.
we now know the path packets take so they can be examined by pcap apps.
bprobe/tcpdump/etc
(in userland) (in userland)
libpcap
we know
we know
before
after
skb->dev = bond0 code returns eth0 as orig_dev
skb->dev = eth0
we know
LOOK
we know
Bug
We overwrite the packets device with the bond device. The protocol hook check, checks to see if the hook is for the device on the packet. It isnt we are snifng eth0 skb->dev was overwritten to bond0. Thats why if you sniff bond0 you see packets but if you sniff eth0 you see nothing.
EASY FIX
Everything is cool.
NO
!"
NEIN!
???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????? ???????????????????????????????????????????????????????????????????????????????????????
In real life I spent the next 4 days looking over the same kernel code, hundreds of times.
Every single day from the moment I woke up (9am) until I searched all day until I collapsed with exhaustion (3am).
Until I realized...
Step 0
Step 1
Examine our assumptions: The kernel code is still broken. The incoming packets are being queued up for
libpcap to pull out of PF_PACKET properly. tcpdump.
Step 2
Steps 3-5
Dig until you see something you havent
seen before.
verify my assumption
modify libpcap to verify that the kernel really is still broken
i used apt-get source to retrieve the ofcial source for debian lennys libpcap and I found something surprising.
enough and therefore uses the old way to examine packets. the libpcap version i want, my app will just perform worse on lenny.
the index of the bond device is different from the index of the physical device we are snifng
why?
this code exists to prevent a race condition when snifng packets the old way in some kernels.
solution
boot into our xed debian lenny kernel. download a version of libpcap that is newer and
supports the mmap method for packet snifng. and has better performance.
new method doesnt have this race condition link bprobe/tcpdump/other pcap apps against it.
summarize
kernel bug when overwriting the device the
packet arrived on. failed.
xed this bug, but bprobe/tcpdump still libpcap bug when pulling packets out the
kernel the old way
Step 0
Step 1-5
Examine your assumptions. Start digging. Keep going until you see something you
havent seen before.
! Happy debugging!
questions?
twitter: @joedamato blog: http://timetobleed.com
an warmup bug
ipx_reader
a test program links against yajl because it generates JSON output works on ubuntu, but not on centos5
but, wait.
heres another program that links ne to a lib in /usr/local/lib ON THE SAME SYSTEM.
W A T
We have 2 programs: Both link against libraries in /usr/local/lib/ Only one works. The broken programs library is in /usr/local/lib/
Step 0
Step 1
Examine our assumptions: The programs and libraries are both 64bit. /usr/local/lib/ is in the library search path
program 1: ipx_reader
program 2: bprobe
So... ipx_reader doesnt work because /usr/local/lib is not in the search path.
Strange
This is confusing. bprobe should fail. But, the shared libraries a particular binary
dynamically links to at runtime are built into the binary itself.
So....
Step 2
Steps 3-5
Dig until you see something you havent
seen before.
rpath
ah ha!
bprobe works and can link because the but, now there are 2 more questions: how did the rpath tag get there? why doesnt ipx_reader have one?
binary is storing the library path inside of itself.
almost forgot...