Unit 3
Unit 3
Storage fundamentals
Data Storage
Information Retrieval (IR) can be defined as a software program that deals with the
organization, storage, retrieval, and evaluation of information from document
repositories, particularly textual information. Information Retrieval is the activity of
obtaining material that can usually be documented on an unstructured
nature i.e. usually text which satisfies an information need from within large
collections which is stored on computers. For example, Information Retrieval can be
when a user enters a query into the system.
Not only librarians, professional searchers, etc engage themselves in the activity of
information retrieval but nowadays hundreds of millions of people engage in IR every
day when they use web search engines. Information Retrieval is believed to be the
dominant form of Information access. The IR system assists the users in finding the
information they require but it does not explicitly return the answers to the question. It
notifies regarding the existence and location of documents that might consist of the
required information. Information retrieval also extends support to users in browsing
or filtering document collection or processing a set of retrieved documents. The
system searches over billions of documents stored on millions of computers. A spam
filter, manual or automatic means are provided by Email program for classifying the
mails so that it can be placed directly into particular folders.
An IR system has the ability to represent, store, organize, and access information
items. A set of keywords are required to search. Keywords are what people are
searching for in search engines. These keywords summarize the description of the
information.
What is an IR Model?
An Information Retrieval (IR) model selects and ranks the document that is required
by the user or the user has asked for in the form of a query. The documents and the
queries are represented in a similar manner, so that document selection and ranking
can be formalized by a matching function that returns a retrieval status value
(RSV) for each document in the collection. Many of the Information Retrieval
systems represent document contents by a set of descriptors, called terms, belonging
to a vocabulary V. An IR model determines the query-document matching function
according to four main approaches:
The estimation of the probability of user’s relevance rel for each document d and
query q with respect to a set R q of training documents: Prob (rel|d, q, Rq)
Types of IR Models
Acquisition: In this step, the selection of documents and other objects from
various web resources that consist of text-based documents takes place. The
required data is collected by web crawlers and stored in the database.
Representation: It consists of indexing that contains free-text terms, controlled
vocabulary, manual & automatic techniques as well. example: Abstracting
contains summarizing and Bibliographic description that contains author, title,
sources, data, and metadata.
File Organization: There are two types of file organization methods.
i.e. Sequential: It contains documents by document data. Inverted: It contains term
by term, list of records under each term. Combination of both.
Query: An IR process starts when a user enters a query into the system. Queries
are formal statements of information needs, for example, search strings in web
search engines. In information retrieval, a query does not uniquely identify a single
object in the collection. Instead, several objects may match the query, perhaps with
different degrees of relevancy.
Difference Between Information Retrieval and Data Retrieval
Information Retrieval Data Retrieval
The software program that deals with Data retrieval deals with obtaining data from a
the organization, storage, retrieval, and database management system such as ODBMS.
evaluation of information from It is A process of identifying and retrieving the
document repositories particularly data from the database, based on the query
textual information. provided by user or application.
Information Retrieval Data Retrieval
Small errors are likely to go unnoticed. A single error object means total failure.
Does not provide a solution to the user Provides solutions to the user of the database
of the database system. system.
The User Task: The information first is supposed to be translated into a query by the
user. In the information retrieval system, there is a set of words that convey the
semantics of the information that is required whereas, in a data retrieval system, a
query expression is used to convey the constraints which are satisfied by the objects.
Example: A user wants to search for something but ends up searching with another
thing. This means that the user is browsing and not searching. The above figure shows
the interaction of the user through different tasks.
Logical View of the Documents: A long time ago, documents were represented
through a set of index terms or keywords. Nowadays, modern computers represent
documents by a full set of words which reduces the set of representative keywords.
This can be done by eliminating stopwords i.e. articles and connectives. These
operations are text operations. These text operations reduce the complexity of the
document representation from full text to set of index terms.
Past, Present, and Future of Information Retrieval
1. Early Developments: As there was an increase in the need for a lot of information,
it became necessary to build data structures to get faster access. The index is the data
structure for faster retrieval of information. Over centuries manual categorization of
hierarchies was done for indexes.
2. Information Retrieval In Libraries: Libraries were the first to adopt IR systems
for information retrieval. In first-generation, it consisted, automation of previous
technologies, and the search was based on author name and title. In the second
generation, it included searching by subject heading, keywords, etc. In the third
generation, it consisted of graphical interfaces, electronic forms, hypertext features,
etc.
3. The Web and Digital Libraries: It is cheaper than various sources of information,
it provides greater access to networks due to digital communication and it gives free
access to publish on a larger medium.
Advantages of Information Retrieval
1. Efficient Access: Information retrieval techniques make it possible for users to
easily locate and retrieve vast amounts of data or information.
2. Personalization of Results: User profiling and personalization techniques are used
in information retrieval models to tailor search results to individual preferences and
behaviors.
3. Scalability: Information retrieval models are capable of handling increasing data
volumes.
4. Precision: These systems can provide highly accurate and relevant search results,
reducing the likelihood of irrelevant information appearing in search results.
Disadvantages of Information Retrieval
1. Information Overload: When a lot of information is available, users often face
information overload, making it difficult to find the most useful and relevant material.
2. Lack of Context: Information retrieval systems may fail to understand the context
of a user’s query, potentially leading to inaccurate results.
3. Privacy and Security Concerns: As information retrieval systems often access
sensitive user data, they can raise privacy and security concerns.
4. Maintenance Challenges: Keeping these systems up-to-date and effective requires
ongoing efforts, including regular updates, data cleaning, and algorithm adjustments.
5. Bias and fairness: Ensuring that information retrieval systems do not exhibit
biases and provide fair and unbiased results is a crucial challenge, especially in
contexts like web search engines and recommendation systems
Primary Memory
Primary storage or memory is also known as the main memory, which is the part of
the computer that stores current data, programs, and instructions. Primary storage is
stored in the motherboard which results in the data from and to primary storage can be
read and written at a very good pace.
What is Primary Memory
Primary memory is a segment of computer memory that can be accessed directly by
the processor. In a hierarchy of memory, primary memory has access time less than
secondary memory and greater than cache memory. Generally, primary memory has a
storage capacity lesser than secondary memory and greater than cache memory.
Need of primary memory
In order to enhance the efficiency of the system, memory is organized in such a way
that access time for the ready process is minimized. The following approach is
followed to minimize access time for the ready process.
All programs, files, and data are stored in secondary storage that is larger and
hence has greater access time.
Secondary memory can not be accessed directly by a CPU or processor.
In order, to execute any process operating system loads the process in primary
memory which is smaller and can be accessed directly by the CPU.
Since only those processes are loaded in primary memory which is ready to be
executed, the CPU can access those processes efficiently and this optimizes the
performance of the system.
This organization of memory in a stepwise manner is known as Memory Hierarchy.
Primary Memory Example
Primary Memory examples are RAM, ROM, cache, PROM, EPROM, registers, etc.
Classification of Primary Memory
Primary memory can be broadly classified into two parts:
1. Read-Only Memory (ROM)
2. Random Access Memory (RAM)
Read-Only Memory
Any data which need not be altered are stored in ROM. ROM includes those programs
which run on booting of the system (known as a bootstrap
program that initializes OS) along with data like algorithm required by OS. Anything
stored in ROM cannot be altered or changed.
Types of ROM:
ROM can be broadly classified into 4 types based on their behavior:
MROM: Masked ROM is hardwired and pre-programmed ROM. Any content that
is once written cannot be altered anyhow.
PROM: Programmable ROM can be modified once by the user. The user buys a
blank PROM and writes the desired content but once written content cannot be
altered.
EPROM: Erasable and Programmable ROM Content can be changed by erasing
the initial content which can be done by exposing EPROM to UV radiation. This
exposure to ultra-violet light dissipates the charge on ROM and content can be
rewritten on it.
EEPROM: Electrically Erasable and Programmable ROM Content can be
changed by erasing the initial content which could be easily erased electrically.
However, one byte can be erased at a time instead of deleting in one go. Hence,
reprogramming of EEPROM is a slow process.
Random Access Memory
Any process in the system which needs to be executed is loaded in RAM which is
processed by the CPU as per Instructions in the program. Like if we click on
applications like Browser, firstly browser code will be loaded by the Operating system
into the RAM after which the CPU will execute and open up the Browser.
Types of RAM:
RAM can be broadly classified into SRAM (Static RAM) and DRAM (Dynamic
RAM) based on their behavior:
DRAM: Dynamic RAM or DRAM needs to periodically refresh in a few
milliseconds to retain data. DRAM is made up of capacitors and transistors and
electric charge leaks from capacitors and DRAM needs to be charged periodically.
DRAM is widely used in home PCs and servers as it is cheaper than SRAM.
SRAM: Static RAM or SRAM keeps the data as long as power is supplied to the
system. SRAM uses Sequential circuits like a flip-flop to store a bit and hence
need not be periodically refreshed. SRAM is expensive and hence only used where
speed is the utmost priority.
Primary Memory is volatile in nature.
Content of primary memory may or may not vanish when power is lost depending on
if it is stored in RAM or ROM.
The content of ROM is non-volatile in nature, they are stored even when power is
lost.
The content of RAM is volatile in nature, it vanishes when power is lost
Secondary Memory
In a computer, memory refers to the physical devices that are used to store programs
or data on a temporary or permanent basis. It is a group of registers. Memory are of
two types (i) primary memory, (ii) secondary memory. Primary memory is made up of
semiconductors, It is also divided into two types, Read-Only Memory (ROM) and
Random Access Memory (RAM). Secondary memory is a physical device for the
permanent storage of programs and data(Hard disk, Compact disc, Flash drive, etc.).
Secondary memory is a type of computer memory that is used to store data and
programs that can be accessed or retrieved even after the computer is turned off.
Unlike primary memory, which is volatile and temporary, secondary memory is non-
volatile and can store data and programs for extended periods of time.
1. Some examples of secondary memory include hard disk drives (HDDs), solid-state
drives (SSDs), optical discs (such as CDs and DVDs), and flash memory (such as
USB drives and memory cards). These storage devices provide a much larger
capacity than primary memory and are typically used to store large amounts of
data, such as operating systems, application programs, media files, and other types
of digital content.
2. Secondary memory can be classified into two types: magnetic storage and solid-
state storage. Magnetic storage devices, such as hard disk drives and magnetic
tapes, use magnetic fields to store and retrieve data. Solid-state storage devices,
such as solid-state drives and flash memory, use semiconductor-based memory
chips to store data.
3. One of the main advantages of secondary memory is its non-volatile nature, which
means that data and programs stored on secondary memory can be accessed even
after the computer is turned off. Additionally, secondary memory devices provide
a large storage capacity, making it possible to store large amounts of data and
programs.
However, there are also some disadvantages to secondary memory, such as slower
access times and lower read/write speeds compared to primary memory. Additionally,
secondary memory devices are often more prone to mechanical failures and data
corruption, which can result in data loss.
Overall, secondary memory plays an important role in modern computing systems and
is essential for storing large amounts of data and programs.
Primary Memory
Secondary Memory
We have read so far, that primary memory is volatile and has limited capacity. So, it is
important to have another form of memory that has a larger storage capacity and from
which data and programs are not lost when the computer is turned off. Such a type of
memory is called secondary memory. In secondary memory, programs and data are
stored. It is also called auxiliary memory. It is different from primary memory as it is
not directly accessible through the CPU and is non-volatile. Secondary or external
storage devices have a much larger storage capacity and the cost of secondary
memory is less as compared to primary memory.
Secondary memory is used for different purposes but the main purposes of using
secondary memory are:
Permanent storage: As we know that primary memory stores data only when the
power supply is on, it loses data when the power is off. So we need a secondary
memory to stores data permanently even if the power supply is off.
Large Storage: Secondary memory provides large storage space so that we can
store large data like videos, images, audios, files, etc permanently.
Portable: Some secondary devices are removable. So, we can easily store or
transfer data from one computer or device to another.
Types of Secondary memory
Compact disc is portable storage devices used for storing digital data like recording,
storing, and playing video, and audio. Compact Disc can be explained as a disc-like
memory device made from plastic material.
History for ZIP files : The concept of zip file format was given by Phil Katz, who
was the founder of the PKWARE, which replaces the previous concept i.e ARC
compression format by Thom Henderson .
What Are Zip Files and why it is used ? Zip file is like a container, which contains
one or more files or directories that have been compressed, to reduce their actual
size . The format of the Zip is generally known as archive file format that always
supports lossless data compression . Zip files mostly use the file extensions “.zip”,
or “.ZIP” and the MIME media type application/zip. A number of compression
algorithms are permitted in zip files but since 2008 only DEFLATE is used and
supported by all the systems.
Where it is used ? Today Zip file format is most popular lossless data compression
technique. As explained above, that ZIP file used to contains one or more files that
have been compressed, to reduce their actual file size. But the most important
question is that, where it is used, ZIP files are used wherever there is a storage
problem i.e. where we have less storage available to send or there the files. For
example : Suppose we have a folder that contains 15 files and we have to email it to
some person. As we know that, here we cannot email whole folder to someone, so
we have to email the 15 individual files. Now, this is the platform, where ZIP files
comes in use, because we can “ZIP UP” those 15 files into a single zip archive, and
then email it .
How extract the files from zip files in windows : The steps to be followed for
extracting files from zip files are :
At first do right click on the zip file you want to extract.
Then in the dialog box, click on the option Extract Here.
Then the desired files will be get extracted in the current folder in which you are.
What is a Floppy Disk?
A floppy disk is a detachable, flexible magnetic storage device that may hold
computer files or other electronic data. It is composed of a flexible and thin magnetic
storage disk that is enclosed inside a rectangular plastic carrier that has a fabric lining
for increased sturdiness. Data can be read and written because the disk is magnetic.
Writes and reads floppy disks with a floppy disk drive.
Table of Content
What is a Floppy Disk?
Usage of Floppy Disk
How Does a Floppy Disk Work?
Types of Floppy Disk
Advantages of Floppy Disk
Disadvantages of Floppy Disk
Difference Between Floppy Disk and Compact Disk
Floppy disks are a type of storage medium that Compact disk is an compression that
can read data storage information and are has the capacity to store data and it can
utilized to store electronic data. be accessed by computers.
Floppy Disk is comparatively smaller than Compact Disk is slightly bigger than
Compact Disk. Floppy Disk.