0% found this document useful (0 votes)
89 views

M1.1: Basic Terminology of Data Organization:: WWW - Magix.in

This document defines key terminology related to data structures, including data, data items, entities, entity sets, records, and files. It also describes different types of data types like primitive, composite, and abstract data types. Additionally, it defines what a data structure is, lists common data structures like arrays, linked lists, stacks, queues, trees and graphs. Finally, it outlines the program development life cycle including defining the problem, designing, coding, testing, documenting, and maintaining the program.

Uploaded by

Shweta Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views

M1.1: Basic Terminology of Data Organization:: WWW - Magix.in

This document defines key terminology related to data structures, including data, data items, entities, entity sets, records, and files. It also describes different types of data types like primitive, composite, and abstract data types. Additionally, it defines what a data structure is, lists common data structures like arrays, linked lists, stacks, queues, trees and graphs. Finally, it outlines the program development life cycle including defining the problem, designing, coding, testing, documenting, and maintaining the program.

Uploaded by

Shweta Sharma
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 0

Page 1 www.magix.

in

Before getting started with what is data-structure, why we need data-structure, what
are its main functions; first we should know some basic terminology related to data-
structure. Every subject has some basic terms related to its curriculum; i.e. if we
consider C-language as concerned subject then keywords, tokens, variables, compiler
etc are some of its basic terms, which we should have known before digging deep in
the subject. The same is true with data-structure. So, some basic terminology is given
in the following M1.1:

M1.1: Basic Terminology of Data Organization:

Data:
The term DATA simply refers to a value or a set of values. These values
may represent anything about something, like it may be RollNo of a student,
marks of a student, name of an employee, address of a person, phone no. of
sharukh khan or Suraj Arora etc. ;)

Data item:
A data item refers to a single unit of value. For example, roll number,
name, date of birth, age, address and marks in each subject are data items.
Data items that can be divided into subitems are called group items whereas
those who cannot be divided into subitems are called elementary items. For
example, an address is a group item as it is usually divided into subitems
such as house-number, street number, locality, city, pin code etc. Likewise, a
date can be divided into day, month and year, a name can be divided into first
name and surname. On the other hand, roll number, marks, city, pin code, etc.
are normally treated as elementary items.

Entity:
An entity is something that has a distinct, separate existence, though it
need not be a material existence. An entity has certain attributes or
properties, which may be assigned values. The values assigned may be either
numeric or non-numeric. For example, a student is an entity. The possible
attributes for a student can be roll number, name, date of birth, sex and class.
The possible values for these attributes can be 32, kanu, 12/ 03/ 84, F, 11.

Page 2 www.magix.in

Entity Set:
An entity set is a collection of similar entities. For example, students of a
class, employees of an organization etc. forms an entity set.

Record:
A record is a collection of related data items. For example, roll number,
name, date of birth, sex, and class of a particular student such as 32, kanu,
12/ 03/ 84, F, 11. In fact, a record represents an entity.

File:
A file is a collection of related records. For example, a file containing
records of all students in class, a file containing records of all employees of an
organization. In fact, a file represents an entity set.


M1.2: Concept of Date-Type:
A Data-Type in programming language is an attribute of a data, which tells the
computer (and the programmer) important things about the concerned data. This
involves what values it can take and what operations may be performed upon it. i.e.
it declare:
Set of values
Set of operations

Most programming languages require the programmer to declare the data type
of every data object, and most database systems require the user to specify the
type of each data field. The available data types vary from one programming
language to another, and from one database application to another, but the
following usually exist in one form or another:

Integer: Whole number; a number that has no fractional part. It takes digits as
its set of values. The operations on integers include the arithmetic
operations i.e. addition (+), subtraction (-), multiplication (*), and
division (/ ).

Floating-point: A number with a decimal point. For example, 3 is an integer, but .5
is a floating-point number.

Character (text): Readable text




Page 3 www.magix.in

We can classify the Data-types in different categories, according to their use. Some
of important forms of data-type are as follows:

M1.2.1: Primitive Date-Type:
A primitive data type is also called as basic data-type or built-in data type or
simple data-type. The primitive data-type is a data type for which the
programming language provides built-in support; i.e. you can directly declare and
use variables of these kinds. You need not to define these data-types before use.
So we can also say that primitive data-type is data type that is predefined. These
primitive data types may be different for different programming languages. For
example, C programming language provides built-in support for integers (int, long),
reals (float, double) and characters (char).

M1.2.2: Composite Date-Type:
In addition to the elementary data types, you can also assemble items of
different types to create composite data types such as structures, arrays, and
classes. You can build composite data types from elementary types and from
other composite types. For example, you can define an array of structure
elements, or a structure with array members.

M1.2.3: Abstract Date-Type:
In computing, an abstract data type (ADT) is a specification of a set of
data and the set of operations that can be performed on the data; and this is
organized in such a way that the specification of values and operations on those
values are separated from the representation of the values and the
implementation of the operations. For example, consider list abstract data type.
The primitive operations on a list may include adding new elements, deleting
elements, determining number of elements in the list etc. Here, we are not
concerned with how a list is represented and how the above-mentioned
operations are implemented. We only need to know that it is a list whose
elements are of given type, and what can we do with the list.


M1.3: Data-Structure Defined:
Data structure is a particular way of storing and organizing data. Different
kinds of data structures are suited to different kinds of applications, and some are
highly specialized to certain tasks. Data structures are used in almost every program
or software system. The basic types of data structures include:




Page 4 www.magix.in
Arrays
Link-List
Stack
Queue
Trees
Graphs

The various data structures are divided into following categories:
Linear Data-Structures:
A data structure whose elements form a sequence, and every element in
the structure has a unique predecessor and unique successor. Examples
of linear data structures are arrays, link-lists, stacks and queues.
Non-linear Data-Structures:
A data structure whose elements do not form a sequence, there is no
unique predecessor or unique successor. Examples of non-linear data
structures are trees and graphs.

M1.4: Common Operations on Data-Structures:
The various operations that can be performed on different data structures are
described below:
Traversal: Accessing each element exactly once in order to process it. This
operation is called visiting the element.

Searching: Finding the location of a given element.

Insertion: Adding a new element to the structure.

Deletion: Removing an existing element to the structure.

Sorting: Arranging the elements in some logical order. This logical order
may be ascending or descending in case of numeric key. In case of
alphanumeric key, it can be dictionary order.

Merging: Combining the elements of two similar sorted structures into a
single structure.

M1.5: Program Development Life Cycle (PDLC):
Every program in computer is written to solve some kind of problem. There are some
steps for solving a problem through programs. We dont start typing a program
directly as soon as we get problem. We should first understand the problem deeply.
Then see if it can be solved through computer program or not. If it can, then make a

Page 5 www.magix.in
stepwise plan to solve the problem through computer program. Then design the
program through that stepwise plan. This would make your life easier. The stepwise
plan is known as algorithm. There are many ways to design an algorithm, i.e. pseudo-
code, flowchart etc. The whole process of problem to program can be seen in some
predefined steps; known as Program Development Life Cycle (PDLC). The program
development process is divided into following phases:
o Defining the problem
o Designing the program
o Coding the program
o Testing and debugging the program
o Documenting the program
o Implementing and maintaining the program

In first step, we try to precisely define the given problem in terms of input data, the
processing that should take place, format of output and the user interface.

In second step the programmer draws a conceptual plan using some of the tools i.e.
algorithm, flow chart, pseudocode to visualize how the program will do the assigned
job.

In third step the programmer write the code according the defined plan of second
step in some programming language.

In fourth step programmers try to remove the bugs from the program. Syntax errors
and logical errors are collectively known as bugs. The process of identifying and
eliminating these errors is known as debugging.

In fifth step, we document the program for future reference. The structured charts,
pseudocode, flowcharts and decision tables developed during the design phase
become the documentation. This phase ends by writing a manual that provides an
overview of the programs functionality, tutorials for the beginner, reference
documents etc. This manual is given to the user when the program is installed.

In the final phase, the program is installed at the users site. In program
maintenance, the programming team fixes program errors that users discover during
its day-to-day use.

The first step, defining the problem clearly; is totally depends on the sharpness of
users mind. There are no techniques to help him in this step. So we start from the
second step of PDLC. As we now understand, in second step we plan how to solve the
given problem. There are many techniques for making this plan, i.e. algorithm,
pseudocode, flowchart etc. Explaining all these is out of scope of the book, so we
now concentrate on the algorithm part only.

Page 6 www.magix.in
M1.6: Algorithm Defined:
An algorithm is the step-by-step solution to a certain problem. An algorithm can
be expressed in English like language, called pseudocode, in a programming
language, or in the form of a flowchart. Every algorithm must satisfy the following
criteria:
Input: There are zero or more values, which are externally supplied.
Output: At least one value is produced.
Definiteness: Each step must be clear and unambiguous.
Finiteness: If we trace the steps of an algorithm, then for all cases, the
algorithm must terminate after a finite number of steps.

Effectiveness: Each step must be sufficiently basic that it can in principle be
carried out by a person using only paper and pencil.

We use algorithms every day. For example, a recipe for baking a cake is an
algorithm. Most programs, with the exception of some artificial intelligence
applications, consist of algorithms.

Simple Example of algorithm:
One of the simplest algorithms is to find the largest number out of given three
number. Then we can write an algorithm like this:

Algorithm MaxFind:
Let a, b and c be three integer numbers. This algorithm will find the maximum
numbers out of these three.

Step 1 : Begin
Step 2 : read a, b, c / / enter three integer values
Step 3 : If ( a >b ) then / / compare a with b
Step 3.1 : if ( a >c ) then / / compare a with c iff a >b
Step 3.1.1: write: a + is largest / / a is largest
Step 3.1.2: else
Step 3.1.3: write: c +is largest / / c is largest
Step 3.1.4: end if
Step 3.2 : else / / b is larger than a
Step 3.2.1: if ( b >c ) then / / compare b with c iff b>a
Step 3.2.2: write: b +is largest / / b is largest
Step 3.2.3: else
Step 3.2.4: write: c + is largest / / c is largest
Step 3.2.5: end if
Step 3.3 : end if
Step 4 : END.

Page 7 www.magix.in
M1.7: Algorithm Designing Conventions:
As we know, algorithm is just step by step solution of a given problem written in
simple English. But the standard written algorithms follow some convention. It must
be noted that an efficient algorithm is one, which is capable of giving the solution of
the problem using minimum resources of the system such as memory used and
processors time. The format for presentation of the algorithms is language free, well
structured and detailed. It will enable the readers to translate it into a computer
program using any high-level language such as FORTRAN, Pascal or C/ C++.

The format for the formal presentation of the algorithm consists of two parts:
The first part describes the input data, the purpose of the algorithm and identifies
the variables used in the algorithms.
The second part is composed of sequence of instruction that lead to the solution
of the problem.

The following description summarizes certain conventions used in presenting the
above algorithm.

Comments:
Each instruction may be followed by a comment. The comments begin with a
double slash, and explain the purpose of the instruction, such as:

/ / this is sample comment

Appropriate use of comments enhances the readability of the algorithm, which
in turn helps in maintaining the algorithm.

Variable Names:
For variable names, we can use any descriptive names like max, loc, etc. For
variable names, we will use lowercase letters such as max, loc etc. whereas
for defined constants, if any, we will use uppercase letters.

Assignment Statement:
The assignment statement will use the notation as

Set max :=a[i]

to assign the value of a[i] to max. The right hand side of the assignment
statement can have a value, a variable or an expression




Page 8 www.magix.in
Input/Output:
Data may be inputted and assigned to variables by means of a read
statement with the following format:

read: variable-list

Where the variable-list consists one or more variables separated by comma.

Similarly, the data held by the variables and the messages, if any, enclosed in
double quotes can be output by means of a write statement with the following
format:
write: message and/ or variable-list
or
print: message and/ or variable list

where the message and the variables in the variable-list are separated by
comma.

Execution of Instructions:
The instructions in the algorithm are usually executed one after the other as
they appear in the algorithm. However, there may be instances when some
instructions are skipped or some instructions may be repeated as a result of
certain conditions.

Completion of the Algorithm:
The algorithm is completed with the execution of the last instruction. However,
the algorithm can be terminated at any intermediate state using the exit
instruction.

With the help of these conventions, one can write the algorithm easily in the
standard fashion.

M1.8: Algorithm Complexity:
As an algorithm is a sequence of steps to solve a problem, there may be more
than one algorithm to solve a problem. So we have to choose one algorithm as a
solution. Also we would like to choose best algorithm in terms of resources used. This
needs to analyze each algorithm deeply. Analyzing an algorithm is to determine the
amount of resources (such as time and storage) necessary to execute it. Algorithm
analysis helps us to choose the best algorithm for a given application. Two main
factors on which the performance of a program depends are the amount of computer
memory consumed and the time required to execute it successfully. There are two
ways in which we can analyze the performance of an algorithm. One of them is to

Page 9 www.magix.in
carry out experiments with the algorithm by actually executing it and recording the
space and time required. This method can be used only after successful
implementation. Another method, which can be used, is the analytical method. We
can approximately find out the space and time required before implementation.
Implementing each algorithm and then recording their complexity is not a practical
way, as there may be thousands of solution (so, algorithms) of a given problem. So,
algorithm should be compared at the pseudocode stage, i.e. by analytical method.

The choice of particular algorithm depends on following considerations:
Performance requirements, i.e., time complexity
Memory requirements, i.e., space complexity
Programming requirements

Since programming requirements are difficult to analyze precisely, complexity theory
concentrate on performance and memory requirements. Performance requirements
are usually more critical than memory requirements; hence, in general, it is not
necessary to worry about memory unless they grow faster than performance
requirements. Therefore, in general, the algorithms are analyzed only on the basis of
performance requirements, i.e., running-time efficiency.

M1.8.1: Space Complexity:
The space complexity of an algorithm is the amount of memory it needs to run
to completion. Some of the reasons for studying space complexity are:
If the program is to run on multi-user system, it may be required to specify the
amount of memory to be allocated to the program.
We may be interested to know in advance that whether sufficient memory is
available to run the program.
There may be several possible solutions with different space requirements.
Can be used to estimate the size of the largest problem that a program can solve.

In general, the total space needed by a program can be divided in two parts:
1. A static part, which is independent of the instance characteristics. This includes
the space required to store the code. Again, the space required to store the code
is compiler and machine dependent. Other components are constants, variables,
complex data types etc.
2. A dynamic part, which consists of components whose memory requirements,
depends on the instance of the problem being solved. Dynamic memory
allocations and recursion are few components of this type. The factors, which can
help us in determining the size of the problem instance, are number of inputs,
outputs etc.




Page 10 www.magix.in
M1.8.2: Time Complexity:
Time complexity of a program is the amount of time required to execute
successfully. Some of the reasons for studying time complexity are:

We may be interested to know in advance that whether the program will provide a
satisfactory real-time response. For example, an interactive program, such as an
editor, must provide such a response. If it takes even a few seconds to move the
cursor one page up or down, it will not be acceptable to the user.
There may be several possible solutions with different time requirements.

To measure the time complexity accurately, we can count the all sort of operations
performed in an algorithm. If we know the time for each one of the primitive
operations performed on a given computer, we can easily compute the time taken by
an algorithm to complete its execution. This time will vary from system to system.

A more acceptable approach will be to estimate the execution time of an algorithm
irrespective of the computer on which it will be used. Hence, the more reasonable
approach is to identify the key operations and count such operations performed till
the program completes its execution. A key operation in our algorithm is an
operation that takes maximum time among all possible operations in the algorithm.
The time complexity can now be expressed as a function of a number of key
operations performed.

M1.8.3: Time-space Trade-off:
The best algorithm, hence best program, to solve a given problem is one that
requires less space in memory and takes less time to complete its execution. But in
practice, it is not always possible to achieve both of these objectives. Also, there may
be more than one approach to solve a same problem. One such approach may
require more space but takes less lime to complete its execution, while the other
approach requires less space but takes more time to complete its execution. Thus,
we may have to sacrifice one at the cost of the other. That is what we can say that
there exists a time-space trade among algorithms.

Therefore, if space is our constraint, then we have to choose a program that
requires less space at the cost of more execution time. On the other hand, if time is
our constraints such as in real-time systems, we have to choose a program that takes
less time to complete its execution at the cost of more space.

In the analysis of algorithms, we are interested in the average case, the amount
of time a program might be expected to take on typical input data, and in the worst
case, the amount of time a program would take on the worst possible input
configuration.

Page 11 www.magix.in
M1.9: Expressing Space and Time Complexity:
The space and/ or time complexity is usually expressed in the asymptotic
notation. The asymptotic notation is nothing but to assume the value of a function. In
this notation the complexity is usually expressed in the form of a function f(n), where
n is the input size for a given instance of the problem being solved. Expressing
space and/ or time complexity as a function f(n) is important because of following
reasons:
We may be interested to predict the rate of growth of complexity as the size of the
problem increases.
To compare the complexities of two or more algorithms solving the same problem
in order to find which is more efficient.

Time and space complexity is measured in terms of asymptotic notations. For
example, let us consider a program which stores n elements:

void store()
{
int i, n;
printf(Enter the number of elements );
scanf(%d, &n);
for( i=0, i<n, i++)
{
/ / store the elements
}
}

In the above program space is required to store the executable code and to
store the n number of elements. The memory that is required to store the
executable code is static. The memory required to store the n elements depends on
the value of n. The time required to execute the code also depends on the value of
n. In the above program we have two executable statements printf and scanf. Let us
assume there are x statements in the for loop. Then the time required to execute
the program will be equal to x*n+2.

n is the instance characteristics. At last this calculated time can be expressed in one
of the asymptotic notation in a function form f(n).

The most commonly used asymptotic notations are: Big Oh (O), Big Omega (), and
Big Theta ().

In these notations we use some terms like upper bound, lower bound etc. Upper
bound will give the maximum time or space required for a program. Lower bound will
give the minimum time/ space required.

Page 12 www.magix.in
The most important notation used to express this function f(n) is Big Oh notation,
which provides the upper bound for the complexity, and is described in next section.

Since in modern computers, the memory is not a severe constraint; therefore, our
analysis of algorithms will be on the basis of time complexity.

M1.9.1: Big-oh (O) notation:
The algorithm complexity can be determined ignoring the implementation
dependent factors. This is done by eliminating constant factors in the analysis of the
algorithms. Basically, these are the constant factors that differ from computer to
computer. Clearly, the complexity function f(n) of an algorithm increases as n
increases. It is the rate of f(n) that we want to examine.

Big-O is the formal method of expressing the upper bound of an algorithm's running
time. It's a measure of the longest amount of time it could possibly take for the
algorithm to complete.

Based on Big Oh notation, the algorithms can be categorized as follows:
Constant time (O(1)) algorithms
Logarithmic time (O(log2n)) algorithms
Linear time (O(n)) algorithms
Polynomial time (O(n
k
), for k>1) algorithms
Exponential time (O(k
n
), for k>1) algorithms
Many algorithms are of O(n log2n)


(Fig M1.1- Rate of growth of some standard functions)

Observe that the logarithmic function log2n grows most slowly, whereas the
exponential function 2
n
grows most rapidly, and the polynomial function n
k
grows
according to the exponent k.

Page 13 www.magix.in
Limitations of Big-Oh Notation:
Big Oh notation has two basic limitations:
It contains no consideration of programming efforts
It masks (hides) potentially important constants.

As an example of later limitation, imagine two algorithms, one using 500000n
2

time, and the other n
3
time. The first algorithm is O(n
2
), which implies that it will
take less time than the other algorithm which is O(n
3
). However, the second
algorithm will be faster for n<500000, and this would be faster for many
applications.


M1.9.2: Big Omega () notation :
The asymptotic notation used to denote space or time complexity in terms of
lower bound value is called as big omega notation.

M1.9.3: Big Theta () notation :
The asymptotic notation used to denote space or time complexity in terms of
tight bound value is called as big omega notation. Theta requires both Big Oh and
Omega, so that's why it's referred to as a tight bound (it must be both the upper and
lower bound).

As discussed above, the most important notation used to express this function f(n)
is Big Oh notation. So, we will be using this notation afterward.


Important Remainders:

Q.1: Define data types in detail. What do you mean by time and space complexity of
algorithms? Explain.

Q.2: Explain various Data types available in C language.

Q.3: Give difference between primitive and composite data types.

Q.4: Specify different primitive operations on some data-structure.

Q.5: Explain Big-Oh notation? What is the use of this notation?

Q.6: What are the characteristics of a good designed algorithm?

You might also like