0% found this document useful (0 votes)
6 views

Untitled Document

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Untitled Document

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

GROUP 4: TOPIC - FILES

GROUP MEMBERS:
- BILL
- PETER
- ANDRICK
- MARIUS
- KINGSTON

File Storage (HDD, SSD, Flash Drive, Cloud Storage)

File storage is an essential aspect of modern computing, involving different types of media and
technologies to save, retrieve, and manage digital data. Below are the primary types of file
storage:

1. Hard Disk Drive (HDD)

● Structure: HDDs use spinning magnetic disks (platters) to store data. The read/write
head moves across these platters to locate and manipulate data.
● Capacity: Typically offers large storage capacities (up to several terabytes).
● Speed: Slower than other forms of storage, as it involves mechanical movement. HDDs
have an average read/write speed of around 80-160 MB/s.
● Durability: Due to moving parts, HDDs are more prone to physical damage from drops
or impacts.
● Usage: Ideal for bulk storage where speed is not a critical factor, like backups, archiving,
or storing large media files.

2. Solid-State Drive (SSD)

● Structure: SSDs use NAND flash memory to store data, with no moving parts, making
them faster and more reliable than HDDs.
● Capacity: Typically offers lower storage capacities than HDDs (up to a few terabytes),
but this is improving with time.
● Speed: Much faster than HDDs, with read/write speeds ranging from 200 MB/s to over
3,500 MB/s depending on the model.
● Durability: More durable due to the lack of moving parts. They are shock-resistant and
have longer lifespans in most cases.
● Usage: Ideal for system drives, gaming, or high-performance applications that require
fast data access.

3. Flash Drive (USB Drive)


● Structure: Flash drives use the same type of NAND flash memory as SSDs but come in
smaller, more portable formats.
● Capacity: Usually ranges from a few gigabytes to a couple of terabytes.
● Speed: Generally slower than SSDs, but some high-end USB drives offer competitive
speeds.
● Durability: Portable and fairly durable, although they can still be prone to damage if
mishandled.
● Usage: Used for portable storage, transferring files between computers, or booting
operating systems.

4. Cloud Storage

● Structure: Cloud storage refers to storing data on remote servers managed by


third-party providers, accessible via the internet.
● Capacity: Virtually limitless, depending on the service provider and subscription plan.
● Speed: Dependent on the user's internet speed and the provider’s infrastructure.
● Durability: Highly reliable, with most cloud services offering redundant storage across
multiple locations to protect against data loss.
● Usage: Ideal for remote access, collaboration, and backup of important files. Examples
include Google Drive, Dropbox, and Microsoft OneDrive.

Advanced File Concepts

As file storage technologies evolve, advanced file management and optimization techniques
become increasingly important. Here are some key concepts:

1. File Fragmentation

● Definition: Occurs when a file is broken into non-contiguous clusters on a storage


medium. As files grow, new data may be written to separate areas of the disk, leading to
slower file access times.
● Impact: Fragmentation is most common in HDDs, where the read/write head has to
move across different parts of the disk to read a fragmented file, slowing performance.
SSDs are less affected due to their lack of moving parts, though fragmentation can still
cause inefficiencies.
● Solution: Defragmentation tools reorganize data so that files are stored in contiguous
blocks, improving access speeds, particularly for HDDs.

2. File Compression

● Definition: A process that reduces the size of a file by encoding data more efficiently,
which helps save storage space and reduce transmission time.
● Types:
○ Lossless Compression: Preserves the original data, allowing the file to be fully
restored to its original form. Example formats: ZIP, PNG.
○ Lossy Compression: Discards some data to achieve higher compression ratios,
usually used for media files like images, videos, or audio. Example formats:
JPEG, MP3..

3. File Encryption

● Definition: The process of converting files into unreadable formats (ciphertext) using
cryptographic algorithms, ensuring data security during storage or transmission.
● Types:
○ Symmetric Encryption: Uses the same key for both encryption and decryption
(e.g., AES).
○ Asymmetric Encryption: Uses a pair of public and private keys for encryption
and decryption (e.g., RSA).
● Usage: Encryption is critical for protecting sensitive data, especially in industries like
finance, healthcare, or when using cloud storage services.

4. File Caching

● Definition: A performance-enhancing technique where frequently accessed data is


stored in a temporary storage area (cache) to allow quicker access.
● Types:
○ Disk Caching: Stores frequently accessed disk data in faster memory (like RAM)
to speed up future access.
○ Web Caching: Temporarily stores web pages or content to reduce loading times
for subsequent visits.
○ Memory Caching: Keeps data in memory to avoid accessing the slower disk
storage repeatedly.

1. Sequential File Organization

In sequential file organization, records are stored in a specific order, usually based on a key,
such as an ID or name. The records are arranged one after another in sequence. To retrieve a
record, the system reads the file starting from the beginning and continues in sequence until it
finds the desired record. This method is simple and often used when all records need to be
processed in a specific order, like in payroll or billing systems.

2. Indexed File Organization

In indexed file organization, an index is created to help locate records quickly. The index
contains pointers that indicate where records are stored in the main file. This allows the system
to find a record by searching the index, then going directly to the specific location in the file,
rather than reading the entire file sequentially. This method is ideal when random access to
records is needed, as the index makes searching more efficient.

3. Hashed File Organization

In hashed file organization, records are stored based on a hash function, which transforms a
key (like an ID) into a specific location in the file. The hash function determines the position of
each record in the storage space, allowing for direct access to records without needing to
search sequentially or use an index. This method is particularly useful for scenarios that require
frequent lookups and quick access, such as databases where keys must be quickly matched
with their corresponding records.

File Structures

File structures refer to how data is organized, stored, and accessed in memory or on a storage
medium. They allow efficient data manipulation, retrieval, and storage. Below are explanations
of common file structures: B-Tree, B+ Tree, Hash Table, Linked List, Stack, and Queue.

1. B-Tree

A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches,
insertions, deletions, and sequential access in logarithmic time. It is widely used in databases
and file systems to manage large amounts of data efficiently.

● Structure: Each node in a B-Tree contains multiple keys and children, and all leaf nodes
are at the same level.
● Properties:
○ Every node contains a certain number of keys (within a pre-defined range).
○ Nodes are sorted, and searches are performed by comparing keys.
○ B-Trees are balanced, meaning the tree height grows logarithmically, ensuring
efficient operations.
● Use Case: Database indexing, where quick access to large datasets is required.

2. B+ Tree

A B+ Tree is an extension of the B-Tree structure, but with an important distinction: all data is
stored at the leaf level, and internal nodes only store keys for navigation.

● Structure:
○ Internal nodes contain keys that guide searches.
○ Leaf nodes store actual data and are linked together, enabling efficient sequential
access.
● Properties:
○ Like B-Trees, B+ Trees are balanced and maintain sorted data.
○ B+ Trees provide faster sequential access due to the linked leaf nodes.
● Use Case: File systems and database indexes, where both random and sequential
access are needed, such as in relational databases.

3. Hash Table

A Hash Table is a data structure that maps keys to values using a hash function, which
computes an index (or hash code) from the key to determine where to store or retrieve the
corresponding value.

● Structure:
○ The table consists of an array of "buckets" or "slots" where data is stored.
○ A hash function computes the index from a key, determining where the value is
stored.
● Properties:
○ It provides constant-time complexity (O(1)) for search, insertion, and deletion
operations under ideal conditions.
○ Hash collisions, where two keys generate the same hash code, are handled
using techniques like chaining or open addressing.
● Use Case: Efficient key-value storage, such as in dictionaries, caches, and database
indexing.

4. Linked List

A Linked List is a linear data structure consisting of nodes, where each node contains a data
element and a pointer/reference to the next node in the sequence.

● Structure:
○ Each node holds data and a reference (or pointer) to the next node.
○ The list can be singly linked (pointing only to the next node) or doubly linked (with
pointers to both the next and previous nodes).
● Properties:
○ Dynamic size: The size of a linked list can grow or shrink during runtime.
○ Efficient insertions and deletions: These operations can be performed in constant
time if the node position is known.
● Use Case: Managing dynamic collections of data, such as undo/redo functionality in
applications, or handling memory allocation.

5. Stack

A Stack is a linear data structure that follows the Last-In-First-Out (LIFO) principle, where the
last element added to the stack is the first to be removed.

● Structure:
○ Stack operations are mainly performed at one end (the "top").
○ Operations include push (add element to the top) and pop (remove the top
element).
● Properties:
○ Only the top element can be accessed directly.
○ LIFO access pattern, useful for problems where the most recent data is needed
first.
● Use Case: Used in function call management, expression evaluation, and backtracking
algorithms.

6. Queue

A Queue is a linear data structure that follows the First-In-First-Out (FIFO) principle, where the
first element added is the first to be removed.

● Structure:
○ Elements are added at the back (enqueue) and removed from the front
(dequeue).
● Properties:
○ FIFO access pattern, making it suitable for situations where data must be
processed in the order it arrives.
● Use Case: Used in scheduling tasks, handling requests in systems (like print queues),
and managing buffers in communication systems.

1. Sequential Access

In sequential access, records are accessed in the same order they are stored. The system
reads or processes data one record at a time, starting from the beginning of the file and
progressing through each record until the desired one is found.
● How It Works: The system must go through each record in sequence, meaning if you
want to access a record near the end of the file, you must first process all the earlier
records.
● Use Cases: Sequential access is useful when all data needs to be processed, such as
in payroll systems, log files, or billing systems where each record is processed one after
another. Batch processing systems often utilize this method.
● Analogy: Sequential access is like reading a book from page one onward—if you need
to get to chapter ten, you must flip through the previous chapters first.

2. Random Access

In random access, records can be accessed directly without having to read through preceding
records. Each record in the file is identified by a unique address, and the system can jump to
that address immediately, making retrieval fast and efficient.

● How It Works: When you want to access a specific record, the system calculates the
position of that record and goes straight to it, bypassing all other data.
● Use Cases: Random access is commonly used in database systems where records
need to be retrieved quickly based on specific queries. It's ideal for tasks where data is
accessed frequently but not necessarily in a specific order, such as file systems,
databases, and memory storage in programs.
● Analogy: This is like skipping directly to a specific page number in a book without
flipping through the earlier pages.

3. Direct Access

Direct access is similar to random access but is based on a key or identifier. Instead of going to
a physical position based on order, the system calculates the location of the data based on a
unique key (such as an ID or name).

● How It Works: A key, like a customer ID or product code, is provided, and the system
uses an algorithm (often a hash function) to determine where that record is stored. It
then accesses the record directly.
● Use Cases: Direct access is often used in database management systems (DBMS) and
large-scale file storage systems where records are indexed by keys. It's commonly
applied in tasks like searching a database for a specific user by ID.
● Analogy: This is like looking up a word in a dictionary. You don’t need to read every
word; you go directly to the section where the word is found based on its first letter.
File Manipulation

File manipulation refers to operations performed on the contents of a file, such as inserting,
deleting, modifying, and searching records. These operations allow for effective management
and updating of data within a file.

1. Insertion

Insertion is the process of adding new data or records to a file. How this is handled depends on
the organization of the file.

● Sequential File: In a sequential file, new records are usually added at the end of the file.
If the file is ordered by a key, the system may need to reorganize the entire file after
insertion to maintain the correct order.
● Indexed or Random Access File: In files that use random access or an index, new
records can be inserted directly into the proper location based on the key or index.
● How It Works: Insertion may involve either appending the record at the end of the file
(for unsorted files) or adjusting existing records to fit the new data while maintaining
order.
● Example: Adding a new student record to a school database, with the system
determining where to place the record based on the student ID or name.

2. Deletion

Deletion refers to removing a specific record from a file. Deletion processes vary depending on
the file organization.

● Sequential File: In sequential files, records are often marked as "deleted" but not
physically removed, creating gaps that might be filled by future insertions.
● Indexed or Random Access File: In random or indexed files, the record can be
removed directly, but the system may need to update or re-index the file to maintain
efficiency.
● How It Works: When a record is deleted, the file may leave empty space or gaps, which
can lead to fragmentation. This might require compaction or reorganization of the file
later.
● Example: Deleting a customer's details from an e-commerce platform’s database, with
the system freeing the space or marking it for future use.

3. Modification
Modification involves changing the contents of an existing record in the file. Depending on the
size and type of file, this could be a simple operation or require more complex handling.

● Sequential File: If the file uses fixed-size records, modification is straightforward. If not,
modifying the size of a record may require rewriting portions of the file or reorganizing it.
● Indexed or Random Access File: In random access files, modification is performed by
locating the record via an index or key and updating it.
● How It Works: The system first finds the record, either sequentially or directly, and then
updates its contents. If the modified record is larger than the original, it may be
necessary to move the record to another location or reallocate space.
● Example: Updating an employee's salary in the HR database by modifying their existing
record.

4. Searching

Searching is the process of locating a specific record within a file. The method used depends
on how the file is organized.

● Sequential File: In sequential access files, the system scans each record one by one
until it finds the desired record.
● Indexed or Random Access File: In indexed files, the system uses the index to jump
directly to the correct record. In random access files, it retrieves the record based on its
address or key.
● How It Works: Searching can be time-consuming in sequential files since it requires
scanning each record in order. In random and indexed files, search efficiency is greatly
improved by using keys or indices to locate records.
● Example: Searching for a specific transaction in a bank's database by transaction ID,
where the system retrieves the record directly.

Data Structures Used in File Systems

File systems utilize specific data structures to efficiently organize, manage, and retrieve data.
Some of the key data structures used in file systems are Inode, Directory Structure, and FAT
(File Allocation Table).

1. Inode
An inode (index node) is a data structure used by many file systems, such as those in Unix and
Linux, to store information about a file or directory. It doesn’t store the file's name or actual data
but contains metadata and pointers to the file’s data blocks.

● Contents of an Inode:
○ File metadata, such as size, permissions, owner, timestamps, etc.
○ Pointers to the data blocks where the actual file content is stored.
● How It Works: When a file is created, an inode is allocated, which contains all relevant
file metadata. The system then uses this inode to locate and manage the file’s data on
disk.
● Use Case: Inodes are fundamental in file systems like ext3 or ext4 used in Linux,
enabling efficient file management by separating file metadata from the actual file data.

2. Directory Structure

A directory structure is the hierarchical layout that organizes files and directories within a file
system. Directories are essentially special types of files that list file names and corresponding
inodes.

● Structure:
○ Directories are organized in a tree-like structure, starting from a root directory ("/"
in Unix/Linux).
○ Each directory can contain files and other directories, forming a hierarchy.
● How It Works: The directory stores file names and references (e.g., inode numbers) to
locate the files. When you access a file, the system first checks the directory to retrieve
the inode and then fetches the data from the storage.
● Use Case: Directory structures are used universally in file systems (like NTFS, ext4,
HFS+) to enable hierarchical organization of data, making it easier to navigate through
files and directories.

3. FAT (File Allocation Table)

The File Allocation Table (FAT) is a data structure used in FAT file systems (e.g., FAT12,
FAT16, FAT32). It tracks where files are stored on the disk by maintaining a mapping of which
clusters (or blocks) belong to which files.

● Structure:
○ FAT is essentially an array where each entry corresponds to a block (cluster) on
the disk.
○ Each file consists of one or more clusters, and FAT entries point to the next
cluster of the file.
● How It Works: When a file is saved, the FAT is updated to record which clusters are
used by that file. When you access the file, the system reads the FAT to find where the
data blocks are stored.
● Use Case: FAT is widely used in simpler file systems, such as those on USB drives and
memory cards, due to its compatibility and ease of use across operating systems like
Windows and macOS.

Applications

The data structures mentioned above are vital to a wide range of applications, including:

1. Database Management Systems (DBMS)

In DBMS, file systems and data structures like inodes and directory structures are crucial for
organizing, storing, and retrieving large datasets efficiently.

● Use of Inodes/Directories: Files storing database records are organized using


hierarchical directory structures, and file pointers (stored in inodes) are used for fast
access.
● FAT Usage: In older or lightweight systems, FAT may be used to manage database logs
or backup files.

2. Operating Systems (OS)

Operating systems use file system data structures to manage all files on disk, including system
files, user files, and executable programs.

● Inodes: In Unix/Linux-based systems, inodes are heavily relied upon to store file
metadata.
● Directory Structures: The OS organizes files in a directory hierarchy, making navigation
intuitive for users.
● FAT: Earlier versions of Windows used FAT file systems, while modern systems use
more advanced structures like NTFS.

3. File Systems
File systems themselves, like ext4, NTFS, HFS+, or FAT32, are built around data structures
such as inodes, FAT, and directory structures to store and retrieve files efficiently.

● Inode-Based Systems: Unix/Linux file systems heavily depend on inodes for file
management.
● FAT Systems: FAT is often used in portable devices like flash drives, ensuring
compatibility with various OSs.
● Directory Structures: Most file systems use a directory structure to help users organize
and locate files in a hierarchical manner.

4. Compilers

Compilers use file systems to store and access source code, object files, and executables. Data
structures like directory trees and file pointers are crucial for managing multiple files during the
compilation process.

● How They Work: During the compilation process, the compiler reads source code files,
generates object code files, and links them into executables—all managed through file
systems that organize these files.

5. Text Editors

Text editors, such as Notepad or Vim, interact with the file system to open, modify, and save
files. Data structures like inodes and FAT ensure that the editor can efficiently read the file's
content and write back any changes.

● How They Work: When opening a text file, the editor uses the file descriptor to locate
the file's blocks on disk. When saving changes, the file system allocates space and
updates the relevant data structures to reflect the changes.

Key Terms

1. File Pointer

A file pointer is a variable that holds the address or location of the next byte to be read or
written in a file. It is essential for navigating within a file, especially for operations like reading
from or writing to different parts of a file.
● How It Works: The file pointer moves as data is read from or written to the file. In
random access files, it can be set to specific positions to enable quick access.
● Use Case: When a program opens a file, the file pointer is initialized to the beginning. As
the file is read, the pointer moves forward, and for random access files, it can jump to
specific locations.

2. File Descriptor

A file descriptor is a unique identifier assigned by the operating system to an open file. It
serves as a reference for performing operations like reading, writing, or closing the file.

● How It Works: When a file is opened, the OS assigns a file descriptor. The application
uses this descriptor to refer to the file in all subsequent operations (e.g., reading,
writing).
● Use Case: In Unix-based systems, file descriptors are commonly used in system calls,
such as open(), read(), and write().

3. File Metadata

File metadata refers to information about a file, such as its name, size, creation date,
modification date, permissions, and ownership. This data is stored in structures like inodes and
helps the operating system manage the file.

● How It Works: When a file is created, the OS assigns metadata that helps track the file’s
properties. This metadata is updated whenever the file is modified or its attributes
change.
● Use Case: Metadata is critical in file searches, permissions handling, and system
backups.

4. File System Mounting

Mounting a file system means making it accessible to the operating system by attaching it to a
directory (often called a "mount point"). Once mounted, files within that file system can be
accessed as if they are part of the main file system.

● How It Works: When you mount a drive, such as a USB, the OS assigns a mount point,
allowing users and applications to interact with the drive’s files.
● Use Case: External drives, network shares, and partitions are mounted to make their
contents accessible within the operating system.

You might also like