Signatures
Signatures
Introduction
CVD (ClamAV Virus Database) is a digitally signed container that includes signature databases in various text formats. The header of the container is a 512 bytes long string with colon separated elds: ClamAV-VDB:build time:version:number of signatures:functionality level required:MD5 checksum:digital signature:builder name:build time (sec) sigtool --info displays detailed information about a given CVD le: zolw@localhost:/usr/local/share/clamav$ sigtool -i main.cvd File: main.cvd Build time: 09 Dec 2007 15:50 +0000 Version: 45 Signatures: 169676 Functionality level: 21 Builder: sven MD5: b35429d8d5d60368eea9630062f7c75a Digital signature: dxsusO/HWP3/GAA7VuZpxYwVsE9b+tCk+tPN6OyjVF/U8 JVh4vYmW8mZ62ZHYMlM903TMZFg5hZIxcjQB3SX0TapdF1SFNzoWjsyH53eXvMDY eaPVNe2ccXLfEegoda4xU2TezbGfbSEGoU1qolyQYLX674sNA2Ni6l6/CEKYYh Verification OK. The ClamAV project distributes a number of CVD les, including main.cvd and daily.cvd.
In order to create efcient signatures for ClamAV its important to understand how the engine handles input les. The best way to see how it works is having a look 1
at the debug information from libclamav. You can do it by calling clamscan with the --debug and --leave-temps ags. The rst switch makes clamscan display all the interesting information from libclamav and the second one avoids deleting temporary les so they can be analyzed further. The now important part of the info is: $ clamscan --debug attachment.exe [...] LibClamAV debug: Recognized MS-EXE/DLL file LibClamAV debug: Matched signature for file type PE LibClamAV debug: File type: Executable The engine recognized a windows executable. LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: Machine type: 80386 NumberOfSections: 3 TimeDateStamp: Fri Jan 10 04:57:55 2003 SizeOfOptionalHeader: e0 File format: PE MajorLinkerVersion: 6 MinorLinkerVersion: 0 SizeOfCode: 0x9000 SizeOfInitializedData: 0x1000 SizeOfUninitializedData: 0x1e000 AddressOfEntryPoint: 0x27070 BaseOfCode: 0x1f000 SectionAlignment: 0x1000 FileAlignment: 0x200 MajorSubsystemVersion: 4 MinorSubsystemVersion: 0 SizeOfImage: 0x29000 SizeOfHeaders: 0x400 NumberOfRvaAndSizes: 16 Subsystem: Win32 GUI -----------------------------------Section 0 Section name: UPX0 Section data (from headers - in memory) VirtualSize: 0x1e000 0x1e000 VirtualAddress: 0x1000 0x1000 SizeOfRawData: 0x0 0x0 2
LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV
debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug:
PointerToRawData: 0x400 0x400 Sections memory is executable Sections memory is writeable -----------------------------------Section 1 Section name: UPX1 Section data (from headers - in memory) VirtualSize: 0x9000 0x9000 VirtualAddress: 0x1f000 0x1f000 SizeOfRawData: 0x8200 0x8200 PointerToRawData: 0x400 0x400 Sections memory is executable Sections memory is writeable -----------------------------------Section 2 Section name: UPX2 Section data (from headers - in memory) VirtualSize: 0x1000 0x1000 VirtualAddress: 0x28000 0x28000 SizeOfRawData: 0x200 0x1ff PointerToRawData: 0x8600 0x8600 Sections memory is writeable -----------------------------------EntryPoint offset: 0x8470 (33904)
The section structure displayed above suggests the executable is packed with UPX. LibClamAV debug: -----------------------------------LibClamAV debug: EntryPoint offset: 0x8470 (33904) LibClamAV debug: UPX/FSG/MEW: empty section found - assuming compression LibClamAV debug: UPX: bad magic - scanning for imports LibClamAV debug: UPX: PE structure rebuilt from compressed file LibClamAV debug: UPX: Successfully decompressed with NRV2B LibClamAV debug: UPX/FSG: Decompressed data saved in /tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede LibClamAV debug: ***** Scanning decompressed file ***** LibClamAV debug: Recognized MS-EXE/DLL file LibClamAV debug: Matched signature for file type PE
Indeed, libclamav recognizes the UPX data and saves the decompressed (and rebuilt) executable into /tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede. Then it continues by scanning this new le: LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV LibClamAV debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: debug: File type: Executable Machine type: 80386 NumberOfSections: 3 TimeDateStamp: Thu Jan 27 11:43:15 2011 SizeOfOptionalHeader: e0 File format: PE MajorLinkerVersion: 6 MinorLinkerVersion: 0 SizeOfCode: 0xc000 SizeOfInitializedData: 0x19000 SizeOfUninitializedData: 0x0 AddressOfEntryPoint: 0x7b9f BaseOfCode: 0x1000 SectionAlignment: 0x1000 FileAlignment: 0x1000 MajorSubsystemVersion: 4 MinorSubsystemVersion: 0 SizeOfImage: 0x26000 SizeOfHeaders: 0x1000 NumberOfRvaAndSizes: 16 Subsystem: Win32 GUI -----------------------------------Section 0 Section name: .text Section data (from headers - in memory) VirtualSize: 0xc000 0xc000 VirtualAddress: 0x1000 0x1000 SizeOfRawData: 0xc000 0xc000 PointerToRawData: 0x1000 0x1000 Section contains executable code Sections memory is executable -----------------------------------Section 1 Section name: .rdata Section data (from headers - in memory) VirtualSize: 0x2000 0x2000 VirtualAddress: 0xd000 0xd000 4
LibClamAV debug: SizeOfRawData: 0x2000 0x2000 LibClamAV debug: PointerToRawData: 0xd000 0xd000 LibClamAV debug: -----------------------------------LibClamAV debug: Section 2 LibClamAV debug: Section name: .data LibClamAV debug: Section data (from headers - in memory) LibClamAV debug: VirtualSize: 0x17000 0x17000 LibClamAV debug: VirtualAddress: 0xf000 0xf000 LibClamAV debug: SizeOfRawData: 0x17000 0x17000 LibClamAV debug: PointerToRawData: 0xf000 0xf000 LibClamAV debug: Sections memory is writeable LibClamAV debug: -----------------------------------LibClamAV debug: EntryPoint offset: 0x7b9f (31647) LibClamAV debug: Bytecode executing hook id 257 (0 hooks) attachment.exe: OK [...] No additional les get created by libclamav. By writing a signature for the decompressed le you have more chances that the engine will detect the target data when it gets compressed with another packer. This method should be applied to all les for which you want to create signatures. By analyzing the debug information you can quickly see how the engine recognizes and preprocesses the data and what additional les get created. Signatures created for bottom-level temporary les are usually more generic and should help detecting the same malware in different forms.
Signature formats
3.1 MD5
The easiest way to create signatures for ClamAV is to use MD5 checksums, however this method can be only used against static malware. To create a signature for test.exe use the --md5 option of sigtool: zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb zolw@localhost:/tmp/test$ cat test.hdb 48c4533230e1ae1c118c741c0db19dfb:17387:test.exe Thats it! The signature is ready for use: 5
zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe test.exe: test.exe FOUND ----------- SCAN SUMMARY ----------Known viruses: 1 Scanned directories: 0 Engine version: 0.92.1 Scanned files: 1 Infected files: 1 Data scanned: 0.02 MB Time: 0.024 sec (0 m 0 s) You can change the name (by default sigtool uses the name of the le) and place it inside a *.hdb le. A single database le can include any number of signatures. To get them automatically loaded each time clamscan/clamd starts just copy the database le(s) into the local virus database directory (eg. /usr/local/share/clamav). The hash-based signatures shall not be used for text les, HTML and any other data that gets internally preprocessed before pattern matching. If you really want to use a hash signature in such a case, run clamscan with debug and leavetemps ags as described above and create a signature for a preprocessed le left in /tmp. Please keep in mind that a hash signature will stop matching as soon as a single byte changes in the target le.
3.3.1 Hexadecimal format You can use sigtool --hex-dump to convert any data into a hex-string: zolw@localhost:/tmp/test$ sigtool --hex-dump How do I look in hex? 486f7720646f2049206c6f6f6b20696e206865783f0a
3.3.2 Wildcards ClamAV supports the following extensions for hex-signatures: ?? Match any byte. a? Match a high nibble (the four high bits). IMPORTANT NOTE: The nibble matching is only available in libclamav with the functionality level 17 and higher therefore please only use it with .ndb signatures followed by :17 (MinEngineFunctionalityLevel, see 3.3.5). ?a Match a low nibble (the four low bits). * Match any number of bytes. {n} Match n bytes. {-n} Match n or less bytes. {n-} Match n or more bytes. {n-m} Match between n and m bytes (m > n). (aa|bb|cc|..) Match aa or bb or cc..
!(aa|bb|cc|..) Match any byte except aa and bb and cc.. (ClamAV0.96) HEXSIG[x-y]aa or aa[x-y]HEXSIG Match aa anchored to a hex-signature, see https://wwws.clamav.net/ bugzilla/show_bug.cgi?id=776 for discussion and examples. (B) Match word boundary (including le boundaries). (L) Match CR, CRLF or le boundaries. The range signatures * and {} virtually separate a hex-signature into two parts, eg. aabbcc*bbaacc is treated as two sub-signatures aabbcc and bbaacc with any number of bytes between them. Its a requirement that each sub-signature includes a block of two static characters somewhere in its body. 3.3.3 Basic signature format The simplest (and now deprecated) signature format is: MalwareName=HexSignature ClamAV will scan the entire le looking for HexSignature. All signatures of this type must be placed inside *.db les. 3.3.4 Extended signature format The extended signature format allows for specication of additional information such as a target le type, virus offset or engine version, making the detection more reliable. The format is: MalwareName:TargetType:Offset:HexSignature[:MinFL:[MaxFL]] where TargetType is one of the following numbers specifying the type of the target le: 0 = any le 1 = Portable Executable, both 32- and 64-bit.
2 = le inside OLE2 container (e.g. image, embedded executable, VBA script). The OLE2 format is primarily used by MS Ofce and MSI installation les. 3 = HTML (normalized: whitespace transformed to spaces, tags/tag attributes normalized, all lowercase), Javascript is normalized too: all strings are normalized (hex encoding is decoded), numbers are parsed and normalized, local variables/function names are normalized to n001 format, argument to eval() is parsed as JS again, unescape() is handled, some simple JS packers are handled, output is whitespace normalized. 4 = Mail le 5 = Graphics 6 = ELF 7 = ASCII text le (normalized) 8 = Unused 9 = Mach-O les And Offset is an asterisk or a decimal number n possibly combined with a special modier: * = any n = absolute offset EOF-n = end of le minus n bytes Signatures for PE, ELF and Mach-O les additionally support: EP+n = entry point plus n bytes (EP+0 for EP) EP-n = entry point minus n bytes Sx+n = start of section xs (counted from 0) data plus n bytes Sx-n = start of section xs data minus n bytes SL+n = start of last section plus n bytes SL-n = start of last section minus n bytes
All the above offsets except * can be turned into oating offsets and represented as Offset,MaxShift where MaxShift is an unsigned integer. A oating offset will match every offset between Offset and Offset+MaxShift, eg. 10,5 will match all offsets from 10 to 15 and EP+n,y will match all offsets from EP+n to EP+n+y. Versions of ClamAV older than 0.91 will silently ignore the MaxShift extension and only use Offset. Optional MinFL and MaxFL parameters can restrict the signature to specic engine releases. All signatures in the extended format must be placed inside *.ndb les. 3.3.5 Logical signatures Logical signatures allow combining of multiple signatures in extended format using logical operators. They can provide both more detailed and exible pattern matching. The logical sigs are stored inside *.ldb les in the following format: SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0; Subsig1;Subsig2;... where: TargetDescriptionBlock provides information about the engine and target le with comma separated Arg:Val pairs, currently (as of 0.95.1) only Target:X and Engine:X-Y are supported. LogicalExpression species the logical expression describing the relationship between Subsig0...SubsigN. Basis clause: 0,1,...,N decimal indexes are SUB-EXPRESSIONS representing Subsig0, Subsig1,...,SubsigN respectively. Inductive clause: if A and B are SUB-EXPRESSIONS and X, Y are decimal numbers then (A&B), (A|B), A=X, A=X,Y, A>X, A>X,Y, A<X and A<X,Y are SUB-EXPRESSIONS SubsigN is n-th subsignature in extended format possibly preceded with an offset. There can be specied up to 64 subsigs. Keywords used in TargetDescriptionBlock: Target:X: Target le type Engine:X-Y: Required engine functionality (range; 0.96) FileSize:X-Y: Required le size (range in bytes; 0.96) 10
EntryPoint: Entry point offset (range in bytes; 0.96) NumberOfSections: Required number of sections in executable (range; 0.96) Container:CL_TYPE_*: File type of the container which stores the scanned le Modiers for subexpressions: A=X: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched exactly X times; if it refers to a (logical) block of signatures then this block must generate exactly X matches (with any of its sigs). A=0 species negation (signature or block of signatures cannot be matched) A=X,Y: If the SUB-EXPRESSION A refers to a single signature then this signature must be matched exactly X times; if it refers to a (logical) block of signatures then this block must generate X matches and at least Y different signatures must get matched. A>X: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches (with any of its sigs). A>X,Y: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches and at least Y different signatures must be matched. A<X and A<X,Y as above with the change of more to less. Examples: Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656 6616e;deadbeef Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737 46566616e Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737 46566616e;deadbeef 11
Sig4;Target:1,Engine:18-20;((0|1)&(2|3))&4;EP+123:33c06834f04100 f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573 (63|64)61706528;S+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58d cf43987e4f519d629b103375;SL+550:6300680065005c0046006900 ClamAV 0.96 introduced support for special macro subsignatures in the following format: ${min-max}MACROID$, where MACROID points to a group of signatures and {min-max} species the offset range at which one of the group signatures should match. The range is calculated against the match offset of the previous subsignature. The macro subsignature makes its preceding subsignature considered a match only if both of them get matched. For more information and examples please see https://wwws.clamav.net/bugzilla/show_bug.cgi?id=164.
12
0740069006f006e000000000045006e007400650072007400610069006e006d 0065006e00740020005000610063006b0020004600720065006500430065006 c006c002000470061006d0065000000 VersionInfo (d396): FileVersion=5.1.2600.0 (xpclient.010817 -1148) - VI:460069006c006500560065007200730069006f006e00000000 0035002e0031002e0032003600300030002e003000200028007800700063006 c00690065006e0074002e003000310030003800310037002d00310031003400 380029000000 VersionInfo (d3fa): InternalName=freecell - VI:49006e007400 650072006e0061006c004e0061006d006500000066007200650065006300650 06c006c000000 VersionInfo (d4ba): OriginalFilename=freecell - VI:4f007200 6900670069006e0061006c00460069006c0065006e0061006d0065000000660 0720065006500630065006c006c000000 VersionInfo (d4f6): ProductName=Sistema operativo Microsoft Windows - VI:500072006f0064007500630074004e0061006d00650000000 000530069007300740065006d00610020006f00700065007200610074006900 76006f0020004d006900630072006f0073006f0066007400ae0020005700690 06e0064006f0077007300ae000000 VersionInfo (d562): ProductVersion=5.1.2600.0 - VI:50007200 6f006400750063007400560065007200730069006f006e00000035002e00310 02e0032003600300030002e0030000000 [...] Although VI-based signatures are intended for use in logical signatures you can test them using ordinary .ndb les. For example: my_test_vi_sig:1:VI:paste_your_hex_sig_here Final note. If you want to decode a VI-based signature into a human readable form you can use: echo hex_string | xxd -r -p | strings -el For example: $ echo 460069006c0065004400650073006300720069007000740069006f006e 000000000045006e007400650072007400610069006e006d0065006e007400200 05000610063006b0020004600720065006500430065006c006c00200047006100 6d0065000000 | xxd -r -p | strings -el 14
16
always use a -rarpwd sufx in the malware name for signatures of type rmd, only use alphanumeric characters, dash (-), dot (.), underscores ( ) in malware names, never use space, apostrophe or quote mark.
Special les
4.1 HTML
ClamAV contains a special HTML normalisation code which helps to detect HTML exploits. Running sigtool --html-normalise on a HTML le should generate the following les: nocomment.html - the le is normalized, lower-case, with all comments and superous white space removed notags.html - as above but with all HTML tags removed The code automatically decodes JScript.encode parts and char refs (e.g. f). You need to create a signature against one of the created les. To eliminate potential false positive alerts the target type should be set to 3.