0% found this document useful (0 votes)
167 views

A Brief History of Formats

The document discusses various data formats that can be used in SEGY files, including EBCDIC, ASCII, 16-bit integers, 32-bit integers, IBM floating point, and IEEE floating point. It also describes the structure of a standard SEGY file, which includes an EBCDIC header, binary header, trace header, and data samples for each trace. The binary header contains key parameters like sample interval, number of samples, and format code. Together, the headers and data samples allow seismic trace data to be exchanged in a standardized format.

Uploaded by

jifarina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views

A Brief History of Formats

The document discusses various data formats that can be used in SEGY files, including EBCDIC, ASCII, 16-bit integers, 32-bit integers, IBM floating point, and IEEE floating point. It also describes the structure of a standard SEGY file, which includes an EBCDIC header, binary header, trace header, and data samples for each trace. The binary header contains key parameters like sample interval, number of samples, and format code. Together, the headers and data samples allow seismic trace data to be exchanged in a standardized format.

Uploaded by

jifarina
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

A brief history of formats

Data can be stored in a computer in various formats. The SEGY


format itself can contain
several formats within the same file depending on what data is being
represented.
A reminder;
There are 8 bits in a byte regardless of the computer.
1 Kilobyte is 1024 bytes or 2**10.
1 Megabyte is 1,048,576 bytes, or 1024 kilobytes or 2**20.
1 Gigabyte is 1,073,741,824 bytes, or 1024 megabytes, or 2**30.
The formats most likely to be encountered are;
EBCDIC Stands for "Extended Binary Coded Digital Interchange
Character". A format for representing text data that was used at the
time the SEGY standard was developed. Has been largely supplanted
by ASCII (see below). Text data in SEGY files is still written in EBCDIC
for reasons of backward compatibility although some PC based
systems use ASCII.
ASCII Stands for "American Standard Code for Information
Interchange". This is the standard format for text information in all
North American and European computers. ASCII uses 7 bits to
represent all the letters of the alphabet, the numbers 0-9, and all
special characters and punctuation. The 8th bit is a sign bit. With 7
bits ASCII can represent only 128 characters which is enough for
English and most Western European languages but is inadequate for
Asian languages with much larger alphabets. ASCII is now being
replaced by 32 bit character sets which can represent all the
characters in all the alphabets in the world. It is unlikely however that
we will see these character sets in SEGY for some time to come.
16 bit integer or short integer. A 16 bit (2 byte ) integer. Short
integers are now largely obsolete. Modern computers deal with
numbers 32 bits (or more) at a time. In fact modern computers take
longer to deal with 16 bit numbers than with 32 bit numbers. A 16 bit
integer can represent a range of values from -32767 to +32767,
(2**15). The 16th bit is the sign bit. Obviously a 16 bit integer

cannot represent a UTM coordinate or a trace number in a large 3D


data volume. It was used primarily to save space and because the
most powerful computers in existence at the time worked with data in
16 bit chunks.
32 bit or long integer. The format used for integers in modern
computers. This format can represent 2**31 or +-2,147,483,648. 32
bit integers have been used since the inception of the SEGY standard
for large values such as UTM coordinates.
IBM floating point.. A 32 bit (4 byte) floating point format. This was
the standard floating point format at the time the SEGY standard was
set down. It is not used in any modern computers, (even those made
by IBM). Most SEGY data is still written in IBM float however and any
program dealing with SEGY must be able to make the conversion.
IEEE floating point. A 32 bit (4 byte) floating point format. The
modern standard for floating point values. Used internally by most
computers. Some SEGY data, particularly that which is written for a
PC based system contains IEEE floating point data.
A further consideration is byte order. Personal computers user "little
endian" or "low order byte first". Sun and other workstations use "big
endian" or "high order byte first".
To understand the difference consider the number "1" written in
binary as a 16 bit (2 byte) integer. In low byte order it would appear
as
00000001 00000000
In high byte order it would appear as
00000000 00000001
Obviously reading a low order byte number as high order would result
in an error. Instead of "1" the number would be interpreted as "256".
Most SEGY data has been written with the high order byte first but
this is changing as PCs are used more and more. Any program
dealing with SEGY data should be able to work with either byte order.

Segy format overview

The SEGY format has been adapted by the SEG as a standard for
trace sequential seismic data. The SEGY format is widely supported
and is in fact used almost exclusively used for the exchange of
seismic data. All geophysical interpretation workstations read SEGY
and some even use SEGY as their internal format.
With a standard so widely used there are of course, millions of tapes
and disk files in existence containing SEGY data.
SegyTool does not read SEGY data from tape so the procedures for
reading SEGY data from tape will not be covered here. The essential
layout of a SEGY data set is the same whether on disk or tape.
The SEGY standard is made up of;
1

An EBCDIC format header of exactly 3200 bytes. There is one


EBCDIC header per SEGY file.This header contains text which
(hopefully ) describes the area name, line name, shotpoint range,
recording parameters, and processing history. Not all EBCDIC header
are so informative but most do contain the area name and line name.
Information is usually written in 40 lines of 80 characters each.

Binary header of exactly 400 bytes. There is only one binary header
per file. This header contains the number of samples, sample rate,
and format code. The layout of the binary header is as follows.
o

Bytes 17-18 Sample interval in microseconds for this file.

Bytes 19-20 Sample interval in microseconds as originally


recorded in the field.

Bytes 21-22 Number of samples for this file.

Bytes 23-24 Number of samples as originally recorded in the


field.

Bytes 25-26 Format code.

1 = IBM float

2 = 32 bit integer

3 = 16 bit integer

6 = IEEE float

There are many other data fields in the binary header but the above
represent the critical values for viewing and editing.

Trace header of exactly 240 bytes. There is one trace header per trace. The
header contains information about the trace such as shotpoint number, CDP,
and survey locations. The number of samples and sample rate for each
trace are also written in the header.

Data samples. Each trace consists of a trace header followed by n data


samples where n is the number of samples per trace as defined in the trace
header. Note that most programs that read SEGY disk files, including
SegyTool, set the number of samples by the value in the binary header and
assume a consistent number of bytes for each trace. The number of bytes
per sample is dependant upon the format of the data samples. Floating
point and 32 bit itegers use 4 bytes per sample, 16 bit integer uses 2 bytes,
8 bit only one.The most common sample formats are IBM float and 16 bit
integer, although SeisX uses IEEE float instead of IBM. This is for
performance reasons as IEEE is the native floating point format for the

computers SeisX is run on. 32 bit and 8 bit integer samples are rarely, if
ever, seen. Note that the sample format has nothing to do with the format
of trace header values such as shotpoints or XY coordinates.
3 and 4 above are repeated for each trace in the file.
The number of samples multiplied by the sample rate in milliseconds yields
the record length. The number of bytes per trace can be computed from the
number of samples multiplied by the bytes per sample plus 240 bytes for
the trace header. The overall size of the file will be exactly the number of
bytes per trace times the number of traces plus 400 bytes for the binary
header and 3200 bytes for the EBCDIC header.

You might also like