CueCat Output Decode and Decipherment
CueCat Output Decode and Decipherment
http://oilcan.org/cuecat/base64.html
Table of Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
Goal
There have been claims that some people have gone to extreme lengths to determine the output scheme for the Digital Convergence CueCat. This essay
intends to show that the determination of the output of the CueCat is simple and straightforward, and requires a minimum of effort and cryptoanalysis.
The other purpose of this essay is to show that the intent of the encoding scheme is not to "protect" the output from being misused, but instead the output was
designed to be placed into a URL with a minimum of post-processing on the part of the program that sends barcode output to a central site.
15/07/2012 11:29 PM
http://oilcan.org/cuecat/base64.html
characters.
Here are some examples of the output of the barcode wand:
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.CNf7.edCMmYSMlwmqiJCGkYyVlW. <enter>
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.aabI.F39-FN5+Fx19. <enter>
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3rZDNzWENP6. <enter>
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3v6ENzWENP6. <enter>
alt+F10 .C3nZC3nZC3nZCNzYDNPYC3nX.cGf2.ENr7CNz0DxjZD3v7CxzWENP6. <enter>
As is easy to inspect, the code consists of the "header", a period, a series of graphic characters, a second period, a much shorter series of graphic characters, a
third period, a series of graphic characters, a fourth period, and an ENTER code.
First conclusion: the period characters delimit fields of data. So let's break the data into the obvious fields and see what we get:
C3nZC3nZC3nZCNzYDNPYC3nX
CNf7
edCMmYSMlwmqiJCGkYyVlW
C3nZC3nZC3nZCNzYDNPYC3nX
aabI
F39-FN5+Fx192
C3nZC3nZC3nZCNzYDNPYC3nX
cGf2
ENr7CNz0DxjZD3rZDNzWENP6
C3nZC3nZC3nZCNzYDNPYC3nX
cGf2
ENr7CNz0DxjZD3v6ENzWENP6
C3nZC3nZC3nZCNzYDNPYC3nX
cGf2
ENr7CNz0DxjZD3v7CxzWENP6
The first column of data is constant. When different barcode wands are tried, though, this number changes, so it's obvious that this represents some sort of
identifier. The second and third fields change; the second field is a fixed-length field, while the third field varies. We don't show the barcodes that were
scanned, but the first one is a base-128 representation of the author's name, the second is a "test" barcode in Q-fur format, and the third is the ISBN
"BookLan" barcode of the author's book Linux IP Stacks Commentary. The fourth barcode is particularly interesting in that it is the BookLan scan of a sister
book in the Coriolis series, Linux Core Kernel Commendary -- interesting because the ISBN codes are exactly one digit apart. Finally, the last barcode is of
another companion book, Apache Server Commenary , which is one down from the IP Stacks number. In other words, we have an interesting bracketing of
numbers.
2 of 9
Title
ISBN
Wand coding
978157610468253999
ENr7CNz0DxjZD3v7CxzWENP6
15/07/2012 11:29 PM
http://oilcan.org/cuecat/base64.html
978157610469953999
ENr7CNz0DxjZD3v6ENzWENP6
978157610470553999
ENr7CNz0DxjZD3rZDNzWENP6
Now, let's eliminate all the "common" information and see what changes from code to code:
Title
ISBN delta
Apache Server
682
v7Cx
699
v6EN
Linux IP Stacks
705
rZDN
Let's look at the bit patterns. We'll take the bits of the expected data and group them into six bits, the standard for base-64 conversion. Here is what we get:
682
001101 100011 100000 110010
13 35 32 50
v 7 C x
699
001101 100011 100100 111001
13 35 36 57
v 6 E N
705
001101 110011 000000 110101
13 51 0 53
r Z D N
3 of 9
15/07/2012 11:29 PM
http://oilcan.org/cuecat/base64.html
in the first field, which suggests that the field also contains zeros in it -- for this particular wand, there appears to be at least nine leading zeros. So let's start
looking at bit patterns again:
000
00110000 00110000 00110000
001100 000011 000000 110000
12 3 0 48
C 3 n Z
Whatever scrambling is being done, it has to be done after the conversion from base-64 coding to 8-bit data. Whatever code is being used doesn't use any
form of cypher streaming, because identical input appears to generate identical output.
The generation of base-64 code sets usually follows a preset pattern. For example, the classic uuencode(1) program used the decode table
`!"#$%&'()*+,-./
0123456789:;<=>?
@ABCDEFGHIJKLMNO
PQRSTUVWXYZ[\]^_
Which is the ASCII collating sequence (except that the space is replaced by the backquote). Most variants of the base-64 scheme tend to use regular
sequences for their translate tables. The most common alternative for base-64 make use of the upper-case letters, lower-case letters, and two special
characters to make up the alphabet. So let's assume that the coding follows this pattern, so we have an initial table that looks like this:
ABCDEFGHIJKLMNOP
QRSTUVWXYZabcdef
ghijklmnopqrstuv
wxyz0123456789??
Now let's see what we get when we take our "zero sequence" and apply it to our data.
C 3 n Z
2 55 39 25
000010 110111 100111 010101
00001011 01111001 11010101
Nope, that didn't work; the three resulting characters are not the same. Let's reverse the capital and lower-case letters and try again.
abcdefghijklmnop
qrstuvwxyzABCDEF
GHIJKLMNOPQRSTUV
WXYZ0123456789??
4 of 9
15/07/2012 11:29 PM
http://oilcan.org/cuecat/base64.html
C 3 n Z
28 55 13 51
011100 110111 001101 110011
01110011 01110011 01110011
Eureka! We have a constant pattern. When we subtract out the desired value of the character '0', or 0011000, we get a delta pattern of 01000011 or the ASCII
character 'C'. Now the question is, how is this delta applied? Best guess is that we use an exclusive-OR operation on each character, because that way there
is no end-around wrap problem.
ISBN delta
Apache Server
468253
D3v7CxzW
469953
D3v6ENzW
Linux IP Stacks
470553
D3rZDNzW
Given our table of a..zA..Z0..9??, let's see if we can see the data come out correctly:
468253
00110100 00110110 00111000 00110010 00110101 00110011
01000011 01000011 01000011 01000011 01000011 01000011 pattern
01110111 01110101 01111011 01110001 01110110 01110000 XOR
011101 110111 010101 111011 011100 010111 011001 110000
29 55 21 59 28 23 25 48
D 3 v 7 C x z W
And we have a match. I leave the other two ISBN codes to you, the student, as an exercise.
5 of 9
15/07/2012 11:29 PM
http://oilcan.org/cuecat/base64.html
The best way to find out what's going on is to open this test document (PDF), print it on a 600-dpi laser printer, and (using the driver of your choice) swipe
the barcode. See for yourself what characters come back -- within the code report you will see the representation for both positions 62 and 63. If you see
something other that what's described on the page, let me know and I'll update this information.
6 of 9
15/07/2012 11:29 PM
http://oilcan.org/cuecat/base64.html
samples of barcode wands, we may or may not be able to justify this speculation.
The second field reveals itself to identify the particular symbology detected by the barcode wand when scanning the field. The capabilities page shows all the
symbology fields that have been confirmed by the author; others have reported additional report values as well that have not been confirmed and added to the
list.
Scanning several dozen different barcodes with human-readable values, including proprietary cues, shows that the code developed to the guidelines described
here works fine. This completes the analysis of the output of the CueCat.
Conclusions
The choice of character code set used with the CueCat is obvious on its face when you compare the codeset with the character set allowed in HTML forms
reply strings. First, let's review how HTML forms work.
When a forms-capable browser is asked to submit form data to the HTTP server, it does it in one of two ways. Using the GET method, the form data
becomes part of the URL itself. Using the POST method, the data is placed in a datastream. In either case, the data is in a very specific format. In general,
the data itself is defined in BNF as follows:
<stream>
::= <element>
| <stream> & <element>
<element> ::= <name> = <value>
Within each and , spaces are converted to plus signs, and any occurrence of an ampersand, equal sign, percent sign, or plus sign must be escapes using the
form "%XX" where the "X"s are hexidecimal values.
When building a GET method URL, there are a couple of additional considerations. For example, it's a good idea to avoid the slash, blackslash, and colon
characters because the presence of these characters in astonishing (to software) positions can screw up HTTP proxy servers. Avoiding the ampersand and
equal characters eliminate the need to post-process the CatCue output to generate escape sequences. The plus sign is problematical, because an unadorned
plus sign is supposed to be interpreted as a space code...but the CGI script or program can handle that one fairly easily. The comma character can be
problematic for the server programmer when CGI libraries are used that impose a meaning on comma.
The changes in the codes in positions 62 and 63 reflect obvious corrections to fix problems with HTML transmission. Originally the codes used in this
position were comma and slash. In other CueCats, these positions are now represented by plus and minus
In other words, the use of a base-64 strategy was to ease the programming of the client support software.
The reason for the XOR dodge is considerably more opaque. Perhaps it was a misguided attempt at trying to bring the output of the CueCat under the Digital
7 of 9
15/07/2012 11:29 PM
http://oilcan.org/cuecat/base64.html
Millenium Copyright Act, but such an attempt fails on its face because the information being "protected" does not fall under the classification of information
that can be copyrighted. The idea that a denial-of-service attack can be averted is doomed to fail because there is nothing that prevents, for example, a replay
attack.
From a theft-of-service perspective, the encryption doesn't make sense. The barcodes are unique to the CueCat, being a non-standard configuration of Code
128 symbology, so in theory no other scanner on the planet can read the damn things, let alone make sense of them. Yes, the encoding scheme using XOR
would prevent a casual third party from submitting codes for steering.
In short, I'm puzzled by what Digital Convergence was trying to do.
(Humor)
So What's Next?
Now I'm beginning to suspect that we will see future 'Cats with different 62/63 reports. What are possible additional 62/63 pairs? (Aren't you sorry you
asked?)
( ) For LISPers
< > Why HTML people didn't think of these I don't know
[ ] I so love array indexes
{ } For the blockheads in the group
\ ` Hey, these are slanted just like 'CatCues!
@ ! Wasn't this an organization at SF conventions?
^ ~ Not Not joke for C/Java programmers
& | For bit-ty C'ers
? : For Java programmers wanting to make up their mind
% = The ultimate in breaking HTML form strings
8 of 9
15/07/2012 11:29 PM
http://oilcan.org/cuecat/base64.html
9 of 9
15/07/2012 11:29 PM