Cpio 5
Cpio 5
NAME
cpio — format of cpio archive files
DESCRIPTION
The cpio archive format collects any number of files, directories, and other file system objects (symbolic
links, device nodes, etc.) into a single stream of bytes.
General Format
Each file system object in a cpio archive comprises a header record with basic numeric metadata followed
by the full pathname of the entry and the file data. The header record stores a series of integer values that
generally follow the fields in struct stat. (See stat(2) for details.) The variants differ primarily in how they
store those integers (binary, octal, or hexadecimal). The header is followed by the pathname of the entry (the
length of the pathname is stored in the header) and any file data. The end of the archive is indicated by a spe-
cial record with the pathname “TRAILER!!!”.
PWB format
XXX Any documentation of the original PWB/UNIX 1.0 format? XXX
The fields are identical to those in the old binary format. The name and file body follow the fixed header.
Unlike the old binary format, there is no additional padding after the pathname or file contents. If the files
being archived are themselves entirely ASCII, then the resulting archive will be entirely ASCII, except for
the NUL byte that terminates the name field.
HP variants
The cpio implementation distributed with HPUX used XXXX but stored device numbers differently XXX.
SEE ALSO
cpio(1), tar(5)
STANDARDS
The cpio utility is no longer a part of POSIX or the Single Unix Standard. It last appeared in Version 2 of
the Single UNIX Specification (“SUSv2”). It has been supplanted in subsequent standards by pax(1). The
portable ASCII format is currently part of the specification for the pax(1) utility.
HISTORY
The original cpio utility was written by Dick Haight while working in AT&T’s Unix Support Group. It
appeared in 1977 as part of PWB/UNIX 1.0, the “Programmer’s Work Bench” derived from Version 6 AT&T
UNIX that was used internally at AT&T. Both the old binary and old character formats were in use by 1980,
according to the System III source released by SCO under their “Ancient Unix” license. The character for-
mat was adopted as part of IEEE Std 1003.1-1988 (“POSIX.1”). XXX when did "newc" appear? Who
invented it? When did HP come out with their variant? When did Sun introduce ACLs and extended
attributes? XXX
BUGS
The “CRC” format is mis-named, as it uses a simple checksum and not a cyclic redundancy check.
The old binary format is limited to 16 bits for user id, group id, device, and inode numbers. It is limited to 4
gigabyte file sizes.
The old ASCII format is limited to 18 bits for the user id, group id, device, and inode numbers. It is limited
to 8 gigabyte file sizes.
The new ASCII format is limited to 4 gigabyte file sizes.
None of the cpio formats store user or group names, which are essential when moving files between systems
with dissimilar user or group numbering.
Especially when writing older cpio variants, it may be necessary to map actual device/inode values to synthe-
sized values that fit the available fields. With very large filesystems, this may be necessary even for the
newer formats.