0% found this document useful (0 votes)
53 views

Managing Files of Records

This document discusses key concepts related to file structures, including: 1) Records can be identified using primary keys that uniquely define each record or secondary keys that do not uniquely identify records. 2) Direct access allows seeking directly to the start of a record when the relative record number is known, while sequential search examines records sequentially. 3) File organization and access methods are interlinked - fixed-length records enable direct access via record numbers but are inflexible, while variable-length records require sequential access but are flexible.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

Managing Files of Records

This document discusses key concepts related to file structures, including: 1) Records can be identified using primary keys that uniquely define each record or secondary keys that do not uniquely identify records. 2) Direct access allows seeking directly to the start of a record when the relative record number is known, while sequential search examines records sequentially. 3) File organization and access methods are interlinked - fixed-length records enable direct access via record numbers but are inflexible, while variable-length records require sequential access but are flexible.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

5

Fundamental File Structure 5

Managing Files of Records

CIS 256 (File Structures) 17

Record Access: Keys


►When looking for an individual record, it is convenient to
Fundamental File Structure 5

identify the record with a key based on the record’s


content (e.g., the Ames record).
►Keys should uniquely define a record and be unchanging.
►Records can also be searched based on a secondary key.
Those do not typically uniquely identify a record.

CIS 256 (File Structures) 18


Secondary Key
► Does not identify records uniquely
Fundamental File Structure 5

► It is not dataless (no real data).


► Has a canonical form (i.e.there are restrictions on the
values that the key must take)
►A primary key should be unchanging.

CIS 256 (File Structures) 19

Primary and Secondary Keys


►Primary Key
Fundamental File Structure 5

A key that uniquely identifies a record.


►Secondary Key
Other keys that may be used for search
►Note that
In general not every field is a key
Keys correspond to fields, or combination of fields, that
may be used in a search

CIS 256 (File Structures) 20


Sequential Search
►Evaluating Performance of Sequential Search.
Fundamental File Structure 5

►Improving Sequential Search Performance with Record


Blocking.
►When is Sequential Search Useful?

Answers to these questions are given in the class.

CIS 256 (File Structures) 21

Direct Access
► How do we know where the beginning of the required record is?
 It may be in an Index (discussed in a different lecture)
Fundamental File Structure 5

 We know the relative record number (RRN)


• RRN are not useful when working with variable length-records: the
access is still sequential O(n).
• With fixed-length records, however, they are useful.

CIS 256 (File Structures) 22


Direct Access by RRN
►Requires records of fixed length.
Fundamental File Structure 5

– RRN=30 (31st record)


– Record length = 101 bytes
– Byte offset = 30 × 101 = 3030
►Now, how to go directly to the byte 3030 in the file
– By seeking

CIS 256 (File Structures) 23

Sequential Search and Direct Access (Comparison)


Search for a record matching a given key
Fundamental File Structure 5

►Sequential Search
– Look at records sequentially until matching record is found.
Time is in O(n) for n records.
– Appropriate for Pattern matching, file with few records
►Direct Access
– Being able to seek directly to the beginning of the record. Time
is in O(1) for n records.
– Possible when we know the Relative Record Number (RRN):
First record has RRN 0, the next has RRN 1, etc.

CIS 256 (File Structures) 24


File Access and File Organization: A Summary
► File organization issues
– Variable-length record.
Fundamental File Structure 5

– Fixed-length record.
► File Access issues
– Direct Access.
– Sequential Access
► File organization depends on what use you want to make of the file.

CIS 256 (File Structures) 25

File Access and File Organization: A Summary


► Since using a file implies accessing it, file access and file
organization are intimately linked.
Fundamental File Structure 5

– Direct Access caused us to choose a fixed length record file organization.


– But, we can use fixed length record with sequential access.
– Yet, a variable length record organization can be used with direct access
(indexing).
► Example: though using fixed-length records makes direct access
easier, if the documents have very variable lengths, fixed-length
records is not a good solution: the application determines our choice
of both access and organization, as well as the limitations from the
file system and the programming language.

CIS 256 (File Structures) 26


Record Structure
► Choosing a Record Structure and Record Length within a fixed-
length record. 2 approaches:
Fundamental File Structure 5

– Fixed-Length Fields in record (simple but problematic).


– Varying Field boundaries within the fixed-length record.
► Header Records are often used at the beginning of the file to hold
some general info about a file to assist in future use of the file.

CIS 256 (File Structures) 27

Beyond Record Structure


►Headers and Self-Describing File
Fundamental File Structure 5

►Metadata

CIS 256 (File Structures) 28


Headers & self-describing fields
– Want to keep user from having to know about objects
• One way is to put information in the file
Fundamental File Structure 5

• Allows file-access software to understand objects.


• Abstract data model: how the application views the data rather than how it is
viewed by the medium.
– Put more information in the header
• Makes the file self-describing
• Information such as:
– Name for each field
– Width of each field
– Number of fields per record
• Can write program to read and print
– Regardless of number of fields per record
– With any combination of fixed-length fields
• The more information we put into a file’s header, the less our software needs to
know about the specific structure of an individual file.
CIS 256 (File Structures) 29

Metadata
►Data About Data
– Usually in the form of a file header
Fundamental File Structure 5

– Example in text
• Astronomy image storage format
– Where in the sky the image from?
– When it was made?
– …..
• HTML format (name = value)
– FITS (Flexible Image Transport System)
• A stander which developed by the international Astronomer’s
Union.
• 2880 byte blocks of 80-byte ASCII records.
• Each record contains a single piece of metadata.
• See page 177.
CIS 256 (File Structures) 30
Metadata
►Parsing this kind of data
– Read field name; read field value
Fundamental File Structure 5

– Convert ASCII value to type required for storage & use


– Store converted value into right variable
– Why use this type of header?
• ASCII headers are easy to read and process.
• If binary is uses, we need another meta-data!

CIS 256 (File Structures) 31

More Metadata
►Graphics Storage Formats
– Data
Fundamental File Structure 5

• Color values for each pixel in image


• Data compression often used (GIF, JPG)
• Different color “depth” possibilities
– Metadata
• Height & width of image
• Number of bits per pixel (color depth)
• If not true color (24 bits / pixel)
– Color look-up table
» Normally 256 entries
» Indexed by values stored for each pixel (normally 1 byte)
» Contains R/G/B values for color combination
– Often formatted to be loaded directly into graphics RAM
CIS 256 (File Structures) 32
Fundamental File Structure 5

File Portability

CIS 256 (File Structures) 33

Portability and Standardization


►Want to be able to share files
Fundamental File Structure 5

– Must be accessible on different computers.


– Must be compatible with different programs that will access
them
– Several factors affect portability
• Operating systems
• Languages
• Machine architectures

CIS 256 (File Structures) 34


Factors affecting Portability
– Differences among operating systems
• In Chapter 2:
Fundamental File Structure 5

– Saw DOS adds extra line-feed character when it sees CR


– Not the case on most other file systems
• Ultimate physical format of the same logical file can vary depending
on the OS
– Differences among languages
• Talked about C++ versus Pascal
– C++ can have header and data records of different sizes
– Pascal cannot
• Physical layout of files may be constrained by the way languages
allow file structure definitions

CIS 256 (File Structures) 35

– Differences in machine architectures


• Saw problem of “Endean-ness”
– Multi-byte integers:
9Store high-order byte first or low-order byte first?
Fundamental File Structure 5

» PC stores the low order byte flowed by the high order (2000).
» SUN stores the high order byte flowed by low order byte (0020).
• Word size may affect file layout
– For a struct item, may allocate:
» 8-bytes (64-bit word)
» 4-bytes (32-bit word)
» 3-bytes (24-bit word)
– Different encodings for text
• ASCII
• EBCDIC
• Maybe other problems with international languages

CIS 256 (File Structures) 36


Achieving Portability
– Must determine how to deal with differences among languages, OSs,
and hardware
Fundamental File Structure 5

• It is not a trivial matter


• Text offers some guidelines
– Agree on standard physical record format
• Physical format is one that is represented the same physically, no
matter what language, machine, or operating system is used.
• FITS is a good example
– Specifies physical format, keywords, order of keywords, bit
pattern for binary numbers
• Once get standard, stay with it
– Make the standard extensible
– Make it simple enough for wide range of machines,
languages, and OSs
CIS 256 (File Structures) 37

– Agree on a standard binary encoding


• ASCII vs EBCDIC for text
• Binary numbers have more options
Fundamental File Structure 5

– IEEE standard
» Specifies format for 32, 64, & 128-bit floating point
» Specifies format for 8, 16, &32-bit integers
» Most computers follow
– XDR (External Data Representation)
» External Data Representation
» Specifies IEEE formats
» Also provides routines to convert to/from XDR format
and host machine format

CIS 256 (File Structures) 38


– Number and text conversion
• May not want conversions all the time
– Waste time on every read/write
– May lose some accuracy
Fundamental File Structure 5

• But may need conversion for different platforms


– Can write routines to convert among all encodings
» n encodings requires n(n-1) translators!
– Better to use a standard intermediate format
» Such as XDR
» Less translators, but 2 translations between each platform
» 2n instead of n(n-1).

CIS 256 (File Structures) 39

IBM IBM

Vax Vax

Cray Cray
Fundamental File Structure 5

Sun 3 Sun 3
IBM PC IBM PC

IBM IBM

Vax Vax

Cray XDR Cray


Sun 3 Sun 3
IBM PC IBM PC
CIS 256 (File Structures) 40

You might also like