0% found this document useful (0 votes)
22 views

MIS IGNOU

Uploaded by

Dhruv Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

MIS IGNOU

Uploaded by

Dhruv Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 325

MMPO-004

Management
School of Management Studies Information Systems

BLOCK 1
OVERVIEW OF MANAGEMENT INFORMATION
SYSTEM 7
BLOCK 2
BUSINESS INTELLIGENCE & DECISION MAKING 49
BLOCK 3
RELATIONAL DATABASE MANAGEMENT SYSTEM 149
BLOCK 4
EMERGING TECHNOLOGIES FORM BUSINESS 199
COURSE DESIGN AND PREPARATION TEAM
Prof. K. Ravi Sankar Prof. Sourbhi Chaturvedi
Director, School of Management Studies, Faculty of Management Studies,
IGNOU, New Delhi Ganpat University, Mehsana,
Gujarat
Prof. Deepak Jaroliya*
Prestige Institute of Management and Research, Dr. P. Mary Jeyanthi*
Indore Associate Professor - Business Analytics
Jaipuria Institute of Management,
Dr.Shaheen* Jaipur
Associate Professor - IT & Analytics
Institute of Public Enterprise Prof. Anurag Saxena
Hyderabad SOMS, IGNOU
New Delhi

Course Coordinator and Editor


Dr. Venkataiah Chittipaka
Associate Professor
School of Management Studies
IGNOU, New Delhi

Acknowledgement: The persons marked with (*) were the original contributors, and the profiles are as
they were on the initial print date.

PRINT PRODUCTION
Mr. Tilak Raj
Assistant Registrar
MPDD, IGNOU, New Delhi

May 2023
© Indira Gandhi National Open University, 2023
ISBN:
All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other
means, without permission in writing from the Indira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from the
University’s Office at Maidan Garhi, New Delhi – 110 068
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi, by the
Registrar, MPDD, IGNOU.
Laser typeset by Tessa Media & Computers, C-206, A.F.E-II, Jamia Nagar, New Delhi -110025
Content
BLOCK 1 OVERVIEW OF MANAGEMENT INFORMATION 7
SYSTEM

Unit 1 Introduction to Information Systems 9


Unit 2 Introduction to MIS 21
Unit 3 System Development Life Cycle (SDLC) 33

BLOCK 2 BUSINESS INTELLIGENCE & DECISION MAKING 49

Unit 4 Introduction to Business Intelligence 51


Unit 5 Information & Decision Making 76
Unit 6 Spread Sheet Analysis 93

BLOCK 3 RELATIONAL DATABASE MANAGEMENT 149


SYSTEM (RDBMS)

Unit 7 Organizing Data 151


Unit 8 Structured Query Language (SQL) 168
Unit 9 DBMS Implementation and Future Trends 185

BLOCK 4 EMERGING TECHNOLOGIES FOR BUSINESS 199

Unit 10 Cloud Computing 201


Unit 11 Big Data 222
Unit 12 ERP 240
Unit 13 Applications of IOT, AI & VR 258
Unit 14 Block Chain 305
COURSE INTRODUCTION

The contents of this course are practical, relevant, and current. All the topics
discussed in this course are simple and intuitive. This course may help the
learners to improve their knowledge and skills in management information
systems.

This course consists of four blocks spread over 14 units. Block 1 is an


overview of the Management Information System consisting of three
units.

Unit 1 Introduction to Information Systems discusses the most important


aspects of information systems every business organization, people and
society should understand what an information system is and how it can be
used to bring a competitive advantage. In Unit 2, Introduction to MIS is
introduced. In this unit, the learners may realise the importance of
Management Information Systems (MIS) by understanding an MIS can have
a wide range of functions, from providing basic reports to conducting
complex data analysis and decision-making support. It involves the use of
hardware like computers and servers, and software like databases and
reporting tools. Unit 3 System Development Life Cycle (SDLC) discusses
the importance of SDLC in MIS. In this unit, the learners may understand
that the system development life cycle (SDLC) is a complex project
management model that includes 7 stages: planning, analysis, design,
development, testing, implementation, and maintenance. It should be tailored
to the needs of the project, team, and stakeholders involved in the process.

Block 2 is on Business Intelligence & Decision Making and consists of


three units. Unit 4 Introduction to Business Intelligence, this unit provides
a thorough overview of Business Intelligence and its significance in modern
business operations. It focuses on the key concepts, strategies, and
technologies involved in business intelligence and explains how they can be
used to gain a competitive advantage. Unit 5 Information & Decision
Making, focuses on how to effectively manage data and make informed
decisions in a variety of situations. It also discusses how important
information and decision-making are in today's society, and how these skills
are needed in fields ranging from healthcare to finance to education. Unit 6
Spread Sheet Analysis discusses how to navigate, input data, use formulas
and functions, format data, and use built-in features of spreadsheet software
like Excel. It also discusses, how to analyse and organise data and understand
the essential functions and formulas in Excel. Further, it focuses on how to
create charts, graphs, and other visualisations to help communicate insights
and trends in the data.

Block 3 is on Relational Database Management System (RDBMS)


consisting of three units. In Unit 7, we discussed Organizing Data. In this
unit, we discussed the nature of quantitative and qualitative data, the various
methods of representing the quantified data graphically etc. Unit 8
Structured Query Language (SQL) discusses query languages, which
allow easy access to the information stored in the database. This unit also
illustrates the syntax used in SQL to run queries on single tables as well as
multiple tables. Embedding SQL statements in a host programming language
for batch processes is also discussed. Unit 9 DBMS Implementation and
Future Trends, this unit deals with the real-life managerial issues of product
selection and acquisition of emerging standards, and human aspects of
organizational resistance to DBMS tools. Block 4 is on Emerging
Technologies for Business consisting of five units. Unit 10 Cloud
Computing, this unit focuses on understanding cloud computing architecture
and comprehending the platforms for the development of cloud applications
and listing the applications of cloud services. Also discusses the features and
associated risks of different cloud deployment and service models. Unit 11
Big Data unit deals with the concept of Big Data, its characteristics, and the
challenges associated with it. Also, familiarizing with the Hadoop ecosystem
and its components. Further, it also deals with the basics of machine learning
algorithms for Big Data analytics. Unit 12 ERP focuses on understanding the
basics of ERP concepts, principles, components, and architecture of ERP
systems, as well as their benefits and limitations. further, it also focuses on
how to implement and configure an ERP system and manage ERP projects to
optimize business processes using ERP systems. Unit 13 Applications of
IoT, AI & VR, focuses on understanding the architecture of the Internet of
Things and illustrates the real-time IoT applications to make the smart world.
Further, it discusses AI history, explores its evolution, and contributes to
comprehending what led to the AI impacts we have in society today. Lastly,
this Unit 14 Block Chain discusses the fundamental concepts of blockchain
technology, including decentralized architecture, consensus mechanisms, and
cryptographic algorithms. Also analyses the real-world use cases of
blockchain, such as cryptocurrencies, supply chain management, digital
identity, and smart contracts. Further, it also discusses the economic and
social impact of blockchain technology.
BLOCK 1
OVERVIEW OF MANAGEMENT
INFORMATION SYSTEM
Introduction to
UNIT 1 INTRODUCTION TO Information System

INFORMATION SYSTEM

Objectives
After studying this unit, you will be able to:
• Define what an information system is by identifying its major
components.
• Understand the information subsystems which could be defined within a
typical organization.
• Differentiate between various types and levels of information systems.

Structure
1.1 Introduction
1.2 Defining Information System
1.3 Types of Information
1.4 Dimensions of information system:
1.5 Operating Elements of Information Systems:
1.6 Types of Information Systems:
1.7 The Components of Information Systems
1.8 Major processing functions in information systems:
1.9 How to Apply Information Systems in Business?
1.10 Facts of information systems
1.11 Summary
1.12 Self-Assessment Exercises
1.13 Further Readings

1.1 INTRODUCTION
Information systems (IS) are critical to the operation of modern
organizations. They are interconnected networks of hardware, software, data,
people, and procedures designed to collect, process, store, and disseminate
information to aid in decision-making, coordination, and control. The rise of
digital technologies, as well as the increased use of computers and the
internet, has altered how organizations operate and interact with their
stakeholders. In a rapidly changing business environment, information
systems have become critical tools for organizations of all sizes and types to
remain competitive, efficient, and effective. They assist organizations in
achieving their objectives by enhancing internal operations, facilitating
communication and collaboration, and assisting in strategic decision-making.
Information systems study is multidisciplinary, combining elements of
computer science, management, and information technology.

9
Overview of
Management
In today's business, information systems are critical because they allow
Information System organizations to collect, store, and process data to make informed decisions.
These systems can be used to improve internal and external communication
and collaboration, as well as gain insights into customer behavior and market
trends. Furthermore, by providing real-time data and analysis, they can help
businesses become more agile, responsive to market changes, and
competitive. Information systems are critical for businesses to operate
effectively and efficiently in today's fast-paced and data-driven environment.
The combination of hardware, software, data, people, and procedures that
organizations use to collect, process, store, and disseminate information is
referred to as an information system. These systems aid in decision-making,
coordination, and control, and they assist organizations in achieving their
objectives. Simple manual systems to complex computer-based systems that
automate many business processes are examples of information systems.

Activity A
Write down examples of an information system that you know in real-time or
in your real life.
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………

1.2 DEFINING INFORMATION SYSTEM


“An information system is a set of interrelated components that work
together to collect, process, store, and breakdown the information to support
decision making.”

“Information system (IS) is the study of complementary networks of


hardware and software that people and organizations use to collect, filter,
process, create, and distribute data.” [1]
“Information systems are combinations of hardware, software, and
telecommunications networks that people build and use to collect, create, and
distribute useful data, typically in organizational settings.” [2]
“Information systems are interrelated components working together to
collect, process, store, and disseminate information to support decision
making, coordination, control, analysis, and visualization in an organization.”
[3]

These definitions focus on two distinct aspects of information systems: the


components that comprise an information system and their role in an
organization.
10
Introduction to
1.3 TYPES OF INFORMATION Information System

Internal information and external information are the two broad categories of
information. The illustration below depicts the scope of internal and external
information in the context of business organizations.

Internal Information: Internal Information is defined as information


generated by the organization's operations at various management levels in
various functional areas. Internal information is summarized and processed as
it progresses from the lowest to the highest levels of management. Internal
information is always about the organization's various operational units.
Production figures, sales figures, personnel, account, and material
information are all examples of internal information. This type of information
is typically consumed by middle and junior management levels. However,
top-level management consumes summarized internal in format on.

Fig 1.1: Types of information


External Information: External information is typically gathered from the
business organization's surroundings. External information is defined as
information that comes from outside the organization and has an impact on
its performance. External information includes government policies,
competition, economic status, and international market conditions. External
information is typically required by top management cadres and is useful in
developing long-term policy plans for organizations.

1.4 DIMENSIONS OF INFORMATION SYSTEM


The dimensions of information systems can be viewed as a framework for
analyzing and designing information systems. They are:

Organizational Dimension:
Organizations include information systems. The standard operating procedure
and culture of an organization will be embedded in an information system.
Functional specialities, business processes, culture, and political interest
groups are all part of this. This refers to the people, policies, and procedures
that govern how an organization's information system is used and managed.
This refers to how the information system fits into the organizational
structure and how it supports the organization's goals and objectives. A sales
management system, for example, is part of the organizational dimension 11
Overview of
Management
because it helps to improve sales performance.
Information System
Management Dimension:
Managers perceive environmental business challenges. Information systems
provide managers with the tools and information they need to allocate,
coordinate, and monitor their work, make decisions, create new products and
services, and make long-term strategic decisions. The policies, procedures,
and rules that govern the use of the information system are referred to as this.
The management dimension includes things like passwords, backup
procedures, and data security policies.

Technology Dimension:
Management makes use of technology to carry out their duties. Computer
hardware/software, data management technology and networking/telecom
technology are all part of it. It is one of many tools used by managers to deal
with change. This includes the hardware, software, data, and network
components that comprise an information system's technical infrastructure. A
server, a personal computer, and database software, for example, are all
examples of technical dimensions.

Strategic Dimension:
This entails aligning information systems with an organization's overall goals
and strategies. This includes decision-making processes as well as the impact
of information systems on the competitiveness and success of the
organization.

User dimension:
This refers to the information system's end users and how they interact with
it. An e-commerce website, for example, is part of the user dimension
because it allows customers to purchase goods and services.

Each of these dimensions is interconnected and has an impact on an


information system's overall performance and effectiveness. To ensure that
an information system meets the needs of the organization and its users, it
should take into account all three dimensions.

1.5 OPERATING ELEMENTS OF


INFORMATION SYSTEMS
The components that allow an information system to function effectively and
efficiently are known as its operating elements. They are as follows:
• Hardware: A system's physical components, such as computer
equipment, peripheral devices, and other supporting equipment.
• Software: A set of instructions that instructs the hardware on what to do.
System software (such as the operating system) and application software
are both included.
• Data: Information that the system stores and processes. It can include
12 both structured (like a database) and unstructured data (such as a text
Introduction to
document). Information System
• Procedures: The steps and processes that are followed to complete
specific tasks such as data entry, information processing, and report
generation.
• People: Those who use the system as well as those who support and
maintain it.
• Network: The communication channels that connect the various system
components and allow them to work together.
• Policies and security measures: The guidelines and measures that
ensure the system's information's confidentiality, integrity, and
availability.
• The following are the major processing functions in information systems:
• Business transaction processing: Capture, collect, record, store, and
process events of business interest so that their impact is reflected in
organizational performance records.
• Master file updates: The effect of these transactions is carried over to
the organizational performance status files. At any given time, master
files must reflect the status of any entity after incorporating the impact of
current transactions.
• Information report generation: After processing transactions and
updating master files, information reports are generated to assist
managers in making decisions.
• Processing of interactive inquiries: Online information processing
systems allow managers to respond to business queries raised on data
files, both master and transaction files.
• Providing interactive analytical support: Key decision makers require
not only interaction with data files for data extraction using scientific and
planning models but also online processing support to analyze the impact
of some potential actions. A Decision Support System is created when
the system can extract data from relevant files and address it to the
models selected by the user.

1.6 TYPES OF INFORMATION SYSTEMS


Information systems can be classified into several types based on their
functions, organizational level, and nature of data processed:
• Transaction Processing Systems (TPS)
• Management Information Systems (MIS)
• Decision Support Systems (DSS)
• Executive Information Systems (EIS)
• Expert Systems (ES)
• Artificial Intelligence Systems (AI)
• Enterprise Resource Planning Systems (ERP)
13
Overview of
Management
• Supply Chain Management Systems (SCM)
Information System • Customer Relationship Management Systems (CRM)
• Knowledge Management Systems (KMS)

Transaction Processing System (TPS):


A transaction processing system is an information system that processes data
resulting from business transactions. Their goals are to provide transactions
so that records can be updated, and reports can be generated, i.e., to perform
storekeeping functions. The transaction is carried out in two stages: batch
processing and online transaction processing.

Examples: Bill system, payroll system, Stock control system.

Management Information System (MIS):


A Management Information System is intended to take relatively raw data
available through a Transaction Processing System and summarize and
aggregate it for the manager, usually in the form of a report. Middle
management and operational supervisors are likely to use its reports. MIS
generates a wide range of report types. A summary report, an on-demand
report, an ad-hoc report, and an exception report are among the reports
available.

Examples: Sales management systems, Human resource management


systems.

Decision Support System (DSS):


A Decision Support System (DSS) is an interactive information system that
provides information, models, and data manipulation tools to assist decision-
making in semi-structured and unstructured situations. The end user is more
involved in creating DSS than an MIS because DSS includes tools and
techniques to assist in gathering relevant information and analyzing options
and alternatives.

Examples: Financial planning systems, Bank loan management systems.

Experts System:
Experts systems include expertise to assist managers in diagnosing and
solving problems. These systems are based on artificial intelligence research
principles. Experts Systems is a data-driven information system. It acts as an
expert consultant to users by applying its knowledge of a specific area. An
expert system's components are a knowledge base and software modules.
These modules perform knowledge inference and provide answers to user
questions.

Office Automation System:


An office automation system is a type of information system that automates
various administrative processes such as documenting, data recording, and
office transactions. The administrative and clerical activities are separated in
14 the office automation system. Email, voice mail, and word processing are
Introduction to
some of the business activities performed by this type of information system. Information System

Executive Support System:


An Executive Support System (ESS) assists top-level executives in planning
and controlling workflow as well as making business decisions. It is similar
to the Management Information System (MIS).

• It provides great telecommunication, better computing capabilities, and


effective display options to executives, among other things.
• It provides information to them in the form of static reports, graphs, and
textual information on demand.
• It helps monitor performance, track competitor strategies, and forecast
future trends, among other things.

1.7 THE COMPONENTS OF INFORMATION


SYSTEMS
An information system is a collection of hardware, software, and
telecommunication networks that people construct to collect, create, and
distribute useful data, usually within an organization. It defines the
information flow within the system. An information system's goal is to
provide appropriate information to the user, gather data, process data, and
communicate information to the system's user.

Fig 1.2: Components of Information Systems

The components of the information system are as follows:


• Computer Hardware:
Physical equipment is used for input, output, and processing. The
hardware structure depends upon the type and size of the organization. It
consists of an input and an output device, an operating system, a
processor, and media devices. This also includes computer peripheral
devices.
15
Overview of
Management
• Computer Software:
Information System
The programs/ application program is used to control and coordinate the
hardware components. It is used for analyzing and processing the data.
These programs include a set of instructions used for processing
information.

1.8 MAJOR PROCESSING FUNCTIONS IN


INFORMATION SYSTEMS
In information systems, processing functions refer to the operations
performed on data, such as data input, manipulation, storage, and retrieval, to
produce meaningful information. It entails converting raw data into a format
that can be used for decision-making, reporting, or analysis. The goal is to
support an organization's information needs by making data accessible,
accurate, and useful.
The following are the primary functions of information systems:

• Input and capture of data: It is the process of entering data into a


computer system. This can be accomplished through a variety of
methods, including manual entry, scanning, and electronic transfer.
• Data Storage and retrieval: This is the process of storing data in a
system for later use. The data can be saved in a database, a file system,
or in the cloud.
• Data processing and analysis: This is the process of converting raw
data into useful information. Data validation, sorting, and calculation is
examples of such tasks.
• Decision-making and problem-solving: The process of selecting the
best option from a set of alternatives. The process of identifying and
resolving a problem or issue is known as problem-solving. Both
processes necessitate gathering information, weighing options, making a
decision, and putting a solution in place. Critical thinking and clear,
logical reasoning are required for effective decision-making and
problem-solving.
• Information output and dissemination: This is the process of
presenting processed data in a meaningful way, such as by creating
reports, visualizations, or sending notifications.
• Data Maintenance: This is the process of updating and managing data
in a system. Backups, data archiving, and data deletion are examples of
such tasks.
• Data security and protection: This refers to the process of preventing
unauthorized access to and modification of data stored in a system.
Encryption, authentication, and access control are examples of such
tasks.

These functions collaborate to ensure that data is collected, processed, stored,


and presented in a way that meets an organization's needs.
16
Introduction to
1.9 HOW TO APPLY INFORMATION SYSTEMS Information System

IN BUSINESS?
Here are some of the business activities that require the intervention of an
information system.

Enterprise resource planning (ERP):


Enterprise Resource Planning (ERP) is a type of software that integrates
different functions of an organization into a single system. The purpose of
ERP is to streamline and automate business processes, such as financials,
human resources, procurement, supply chain management, and customer
relationship management. The goal of ERP is to provide a single source of
truth for an organization's data and to improve decision-making by giving
executives and managers real-time access to accurate information. ERP
systems can vary in complexity and scope, ranging from basic systems that
handle simple tasks to complex, multi-module systems that can manage the
entire operations of a large enterprise.
Many ERP systems are web-based and can be accessed from anywhere with
an internet connection. ERP implementation can be a complex and time-
consuming process, but it can bring many benefits to an organization,
including increased efficiency, reduced errors, better visibility into business
operations, and improved decision-making. However, it is important to
carefully evaluate an organization's needs and choose an ERP system that is
appropriate for the organization's size, budget, and goals.

Supply chain management (SCM):


Supply Chain Management (SCM) is the coordination and management of
activities involved in the production and delivery of products and services to
customers. It involves managing the flow of materials, information, and
financial capital from suppliers, through the organization, and out to
customers. SCM encompasses a wide range of activities, including
procurement, production planning, inventory management, transportation,
warehousing, and customer service. The goal of SCM is to optimize the flow
of goods and services, improve the efficiency of the supply chain, and
enhance the overall customer experience.

Effective SCM requires collaboration and communication between all


participants in the supply chain, including suppliers, manufacturers,
distributors, and customers. This can be achieved through the use of
technologies such as electronic data interchange (EDI), RFID (Radio
Frequency Identification), and cloud-based collaboration tools. In today's
fast-paced business environment, managing the supply chain is becoming
increasingly complex and challenging. Companies must be able to respond
quickly to changes in demand, minimize the risk of supply chain disruptions,
and meet the evolving needs of customers. A well-designed and efficiently
managed supply chain can help companies to improve their bottom line and
achieve a competitive advantage in their markets.

17
Overview of
Management
Customer relationship management (CRM):
Information System
Customer Relationship Management (CRM) is a strategy that organizations
use to manage their interactions with customers and potential customers. The
goal of CRM is to create and maintain strong, lasting relationships with
customers by understanding their needs and behaviors and by delivering the
products, services, and experiences that they value. CRM is typically
achieved through the use of software and technology. CRM systems can
collect and store data about customers, including demographic information,
purchase history, and interaction history with the organization. This
information can be used to inform business decisions, such as which products
to develop or which customers to target with marketing campaigns.

CRM can encompass a wide range of activities, including sales management,


marketing, customer service and support, and customer analytics. By
centralizing customer data and automating many of the processes involved in
managing customer interactions, organizations can improve the efficiency of
their customer-facing operations and provide a better customer experience. In
today's business environment, the effective management of customer
relationships is critical to success. With the increasing competition and the
rise of digital channels, companies must be able to effectively manage their
interactions with customers to build strong, long-lasting relationships and
stay ahead of the competition.

Activity B
What is the role of information systems in business and society?
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………

1.10 FACTS OF INFORMATION SYSTEMS


The products of information technology are part of our daily lives. Here are
some of the facts about information systems.

• Necessary for businesses to grow


Every organization has computer-related operations that are critical to
getting the job done. In a business, there may be a need for computer
software, implementation of network architecture to achieve the
company’s objectives, or designing apps, websites, or games. So, any
company that is looking to secure its future needs to integrate a well-
designed information system.
18
Introduction to
• Better data storage and access Information System
Such a system is also useful for storing operational data, documents,
communication records, and histories. As manual data may cost a lot of
time, information systems can be very helpful in it. Information system
stores data in a sophisticated manner, making the process of finding the
data much easier.
• Better decision making
Information system helps a business in its decision-making process. With
an information system, delivering all the essential information is easier to
make better decisions. In addition, an information system allows
employees to communicate effectively. As the documents are stored in
folders, sharing and accessing them with the employees is more
accessible.

1.11 SUMMARY
In this unit, you have been introduced to information systems. First, we have
reviewed several definitions, focusing on the components of information
systems: technology, people, and process. Next, we have studied how the
business use of information systems has evolved over the years, from the use
of large mainframe computers for number crunching, through the
introduction of the PC and networks, all the way to the era of mobile
computing. Software and technology innovations allowed businesses to
integrate technology more deeply during each phase.
We are now to a point where every company uses information systems and
asks: Does it bring a competitive advantage? So, in the end, that is really
what this course is about what every businessperson should understand, what
an information system is and how it would use to bring a competitive
advantage.

1.12 SELF-ASSESSMENT EXERCISES


1. Of the five primary components of an information system (hardware,
software, data, people, process), which do you think is the most
important to the success of a business organization? Write a one-
paragraph answer to this question that includes an example from your
personal experience to support your answer.

2. The Walmart case study introduced you to how that company used
information systems to become the world’s leading retailer. Walmart has
continued to innovate and is still looked to as a leader in the use of
technology. Do some original research and write a one-page report
detailing a new technology that Walmart has recently implemented or is
pioneering.

3. "Internal information is used for day-to-day decision making whereas


external information is crucial for long-term planning." Comment.

19
Overview of
Management 1.13 FURTHER READINGS
Information System
1. “Does IT Matter?” by Nicholas Carr
2. "Information Systems: A Manager's Guide to Harnessing Technology"
by John Gallaugher
3. Wikipedia entry on "Information Systems," as displayed on August 19,
2012. Wikipedia: The Free Encyclopedia. San Francisco: Wikimedia
Foundation.
4. Information Systems Today - Managing in the Digital World, fourth
edition. Prentice-Hall, 2010.
5. Management Information Systems, twelfth edition, Prentice-Hall, 2012.

20
Introduction to
UNIT 2 INTRODUCTION TO MANAGEMENT Management
Information System
INFORMATION SYSTEM

Objectives
After studying this unit, you will be able to:
• Understand the organizational and Strategic view of MIS.
• Understanding the depth of different components of MIS.
• Understand the various terms involved in MIS.

Structure
2.1 Introduction to Management Information System (MIS)
2.2 Organizational & Strategic View of MIS
2.3 Information Systems and Technology
2.4 Database Management Systems
2.5 Data Analytics
2.6 Network and Telecommunication
2.7 Enterprise Resource Planning (ERP) Systems
2.8 Electronic Commerce (e-commerce)
2.9 Cybersecurity & Data Privacy
2.10 Business Intelligence
2.11 Project Management
2.12 System Development Life Cycle
2.13 IT Strategy and Management
2.14 Ethics and Legal Issues in Information Systems
2.15 Summary
2.16 Self-Assessment Exercises
2.17 Further Readings

2.1 INTRODUCTION TO MANAGEMENT


INFORMATION SYSTEM (MIS)
Definitions:
Management: Management has been defined as a process, a function, and a
profession concerned with the activity of accomplishing tasks with and
through people. Managers perform a variety of tasks such as planning,
directing, controlling, staffing, leading, and motivating.
Information: Information can be defined as collections of facts, figures, and
symbols that have been processed for the current decision-making situation.
The information is thought to be important in a specific situation.
21
Overview of System: A system is defined as a collection of related components, activities,
Management
Information System processes, and humans interacting to achieve a common goal.

When all three components are considered together, it is clear that


Management Information Systems are collections of related processes,
activities, individuals, or entities that interact to provide processed data to
individual managers at various levels in various functional areas. Some of the
characteristics that should be followed when defining Management
Information Systems such as Management information systems are primarily
intended to provide information derived from data after it has been processed;
information systems do not generate data. After being generated by business
operations in an organisation, data is generated, collected, recorded, stored,
processed, and retrieved. The information systems adhere to the procedures
established for processing the data generated within the organisation.

Information systems are created for job positions rather than individuals.
Regardless of who holds the job position, information systems are designed
with the job responsibilities that the individual is supposed to perform in
mind and are dependent on the information needs of the individual in the
organisational hierarchy in mind. The information systems are designed for
different levels of management; they are intended to meet the information
needs of top, middle, and junior management decision-makers. The
information systems are intended to provide information to managers in
various functional areas. Managers in marketing, finance, production,
personnel, materials, logistics, and other areas receive the information.
Databases should be used to integrate information systems. Integrating
information systems eliminates data storage, processing, and report
generation redundancy. To reduce the likelihood of data integrity
discrepancies, ensure single-point data entry and upkeep of master data files.
Computers and other electronic devices help to facilitate information
systems.

Fig. 2.1: Overview of Information System

The importance of MIS:


Management Information System (MIS) refers to the use of information
technology to support and manage organisational operations. It is an
22
Introduction to
integrated approach to the use of technology, people, and data to provide Management
relevant and accurate information for effective decision-making and problem- Information System
solving.

The importance of MIS can be understood through the following points:

• Improved Decision Making: MIS provides access to real-time data,


which allows organisations to make informed decisions based on
accurate and up-to-date information.
• Increased Efficiency: MIS automates many manual processes, freeing
up time and resources and reducing the risk of human error.
• Better Collaboration: MIS enables employees to share information and
collaborate on projects more effectively, regardless of physical location.
• Competitive Advantage: MIS helps organisations stay ahead of their
competition by providing them with the tools and information they need
to make strategic decisions.
• Improved Customer Service: MIS can help organisations understand
customer needs and preferences, leading to improved customer
satisfaction and increased customer loyalty.
• Cost Savings: MIS can help organisations reduce costs by automating
manual processes, reducing errors, and improving operational efficiency.

MIS plays a critical role in the success of modern organisations by providing


the information and tools needed for effective decision-making, improved
efficiency, and enhanced collaboration.

Disadvantages of MIS:
Management Information Systems (MIS) is a significant tool for
organisations to help them manage their data and information effectively, but
like any system, it has its disadvantages.

Some of the major disadvantages of MIS are:

• High Cost: Implementing and maintaining an MIS system can be


expensive. The costs involved include hardware, software, staffing, and
training expenses.
• Complexity: MIS systems can be complex and difficult to use,
especially for people with limited technical skills.
• Dependence on Technology: An MIS system relies on technology and
can be vulnerable to system failures, software bugs, and security
breaches.
• Inaccurate Data: The accuracy of the data in an MIS system depends on
the quality of the data that is entered into it. If the data is inaccurate, the
reports generated by the system will also be inaccurate.
• Resistance to Change: People within organisations may resist using an
MIS system due to fear of change, lack of trust in technology, or
difficulty in learning how to use it.
23
Overview of • Maintenance Requirements: MIS systems require ongoing
Management
Information System maintenance and upgrades to keep them functioning effectively, and this
can be time-consuming and costly.
• Limited Flexibility: MIS systems can be rigid and inflexible, making it
difficult to accommodate changes in the organisation's needs or
processes.

2.2 ORGANIZATIONAL & STRATEGIC VIEW


OF MIS
Management Information Systems (MIS) organisational and strategic view is
a top-down perspective that considers how the technology aligns with the
company's overall goals and objectives. This viewpoint regards MIS as a tool
for decision-making and assisting organisations in achieving their objectives.
The emphasis is on how technology is used to support the company's
strategies and how it is integrated into the overall organisational structure.
According to this viewpoint, MIS is more than just a collection of technical
systems; it is a critical component of overall business strategy.
A retail company that uses customer data to drive its marketing and sales
strategies is an example of an organisational and strategic view of MIS.
Through its POS systems, website, and social media presence, the company
collects data on customer purchases, demographics, and online behaviour.
The MIS department then analyses this data to identify patterns and trends
that can be used to inform the company's marketing decisions, such as
targeted promotions and personalised advertising.

The company can make data-driven decisions that can lead to increased sales,
customer loyalty, and competitive advantage by integrating technology and
data analysis into its overall business strategy. In this case, the MIS function
is viewed as a critical component of the company's overall growth and
success strategy.
In the following ways, Management Information System (MIS) is related to
organisations and strategy:
• Assists decision-making: MIS provides decision-makers with the
pertinent information they require to make informed decisions.
• Aligns with organisational strategy: MIS is designed and implemented
to support the overall strategy and goals of the organisation.
• Increases operational efficiency: MIS automates many business
processes, which reduces errors and increases efficiency.
• Promotes communication and collaboration: MIS promotes
communication and collaboration among departments and employees,
thereby improving organisational coordination and alignment.
• Improves competitiveness: MIS assists organisations in remaining
competitive by providing timely, accurate information and allowing them
to respond quickly to changing market conditions.

24
Introduction to
MIS is critical in assisting organisations in achieving their objectives by Management
providing the information and tools required for effective decision-making Information System
and operational performance.

2.3 INFORMATION SYSTEMS AND


TECHNOLOGY
Computer hardware, software, and data management are among the
fundamentals of information systems and technology. Information Systems
(IS) are a collection of interconnected components that work together to
collect, process, store, and disseminate information to support decision-
making and control within an organisation. The tools and techniques used to
develop, operate, and maintain information systems are referred to as
technology. Hardware, software, and telecommunications equipment, as well
as the processes used to design, develop, and implement information systems,
are all included.

As managers, the majority of you will work for companies that heavily rely
on information systems and make significant investments in information
technology. You will undoubtedly want to learn how to invest this money
wisely. Your company can outperform competitors if you make wise
decisions. You will waste valuable capital if you make poor decisions. This
book is intended to assist you in making informed decisions about
information technology and information systems. Information technology
(IT) encompasses all of the hardware and software that a company requires to
meet its business objectives. This includes not only computer machines,
storage devices, and handheld mobile devices, but also software such as the
Windows or Linux operating systems, the Microsoft Office desktop
productivity suite, and the many thousands of computer programmes found in
the average large firm.

"Information systems" are more complex and are best understood by


examining them from both a technological and a business standpoint. The
Management Information Systems (MIS) Information Systems and
Technology (IS&T) perspective focuses on the technical and functional
aspects of information systems. This point of view is concerned with the
design, development, implementation, and maintenance of the technology
infrastructure that supports the information needs of the organisation.
Hardware, software, data management, and network systems are all included.

The IS&T viewpoint is concerned with how information systems can be used
to support an organisation's day-to-day operations, automate manual
processes, and provide real-time access to information. This viewpoint
regards information systems as a collection of tools that can be used to
improve efficiency, cut costs, and boost productivity. Implementing an
enterprise resource planning (ERP) system, developing a customer
relationship management (CRM) system, and deploying a cloud-based data
storage and management system are all examples of IS&T in the MIS
context. These technologies can be used to support organisational operations,
manage data, and provide information for decision-making.
25
Overview of
Management 2.4 DATABASE MANAGEMENT SYSTEMS
Information System
Database Management Systems are concerned with the design, development,
and administration of databases, including data models and SQL. A database
is a collection of data that has been organised to efficiently serve many
applications by centralising the data and controlling redundant data. Instead
of storing data in separate files for each application, data is stored in such a
way that it appears to users to be stored in only one location. A single
database can support multiple applications. Instead of storing employee data
in separate information systems and files for personnel, payroll, and benefits,
a company could create a single common human resources database. A
database management system (DBMS) is software that allows an organisation
to centralise data, manage it efficiently, and provide application programmes
with access to the stored data. The database management system (DBMS)
serves as a bridge between application programmes and physical data files.
When an application programme requests a data item, such as gross pay, the
DBMS locates it in the database and returns it to the application.
Using traditional data files, the programmer would need to specify the size
and format of each data element used in the programme, as well as tell the
computer where to find them. By separating the logical and physical views of
the data, the DBMS relieves the programmer or end user of the task of
understanding where and how the data are stored. The logical view depicts
data as it would be perceived by end users or business specialists, whereas
the physical view depicts data organisation and structure on physical storage
media.

Management Information Systems (MIS) from the Database Management


Systems (DBMS) perspective focuses on the management and organisation of
data within an organisation. This viewpoint regards the DBMS as the
foundation for efficiently and effectively storing, organising, and retrieving
data. The DBMS is in charge of ensuring the accuracy, reliability, and
consistency of the data stored in the system in the context of MIS. It also
includes tools for defining data relationships, enforcing data constraints, and
restricting data access. Organisations can use a DBMS to simplify the process
of storing, retrieving, and analysing data, resulting in more effective decision-
making.

The use of relational databases, such as Microsoft SQL Server and Oracle,
for storing and retrieving data, as well as NoSQL databases, such as
MongoDB and Cassandra, for storing and retrieving large amounts of
unstructured data, are examples of DBMS in the MIS context. The DBMS
chosen will be determined by the organisation's specific needs, such as the
size of the data set, the complexity of the data relationships, and the system's
performance requirements.

2.5 DATA ANALYTICS


In the context of Management Information Systems (MIS), data analytics
refers to the process of gathering, cleaning, analysing, and visualising data to
26 extract insights and inform decision-making. This entails employing several
Introduction to
techniques and tools to identify patterns, relationships, and trends in data and Management
converting this information into actionable insights. Data analytics is an Information System
important component of the MIS function because it allows organisations to
make data-driven decisions based on facts rather than intuition or
assumptions. Data Analytics' goal is to provide meaningful information that
can support decision-making, identify areas for improvement, and provide a
competitive advantage.

Examples of Data Analytics include the use of data mining to identify


patterns in customer behaviour, predictive analytics to forecast future trends,
and text analytics to extract insights from unstructured data such as social
media posts and customer reviews. Data analytics can be used in many
different business areas, including marketing, sales, customer service, and
operations.

2.6 NETWORK AND TELECOMMUNICATIONS


Management Information Systems (MIS) Network and Telecommunications
perspective focus on the communication and information-sharing aspects of
technology. This viewpoint takes into account the hardware and software that
enable organisations to exchange information and collaborate, both internally
and with external stakeholders. Network and telecommunications encompass
a wide range of technologies in the context of MIS, including local and wide
area networks, cloud computing, internet technologies, and mobile
communications. According to this viewpoint, these technologies are critical
enablers of information flow within the organisation and between the
organisation and its stakeholders.

Network and telecommunications examples in the MIS context include the


deployment of a virtual private network (VPN) to allow remote workers to
access the company's information systems, the use of cloud computing to
store and access data and applications, and the implementation of mobile
technologies to provide employees and customers with real-time access to
information. In the context of MIS, the goal of networks and
telecommunications is to support the flow of information and to enable
organisations to collaborate and share information efficiently and effectively.

2.7 ENTERPRISE RESOURCE PLANNING (ERP)


SYSTEMS
In the context of Management Information Systems (MIS), ERP Systems
refer to integrated software systems that manage a company's core business
processes such as finance, human resources, supply chain management, and
customer relationship management. ERP systems are intended to provide a
unified view of an organisation's data, obviating the need for separate
systems and databases.
ERP systems provide businesses with a centralised platform for managing
operations, streamlining processes, and increasing efficiency. ERP systems
enable organisations to make data-driven decisions, reduce manual processes,
and improve information accuracy by integrating information from various 27
Overview of departments and functions. SAP, Oracle, and Microsoft Dynamics are
Management
Information System examples of ERP systems. These systems are used in a variety of industries,
including manufacturing, retail, healthcare, and finance. The ERP system
chosen will be determined by the organisation's specific needs, such as its
size, the complexity of its operations, and the system's performance
requirements.

2.8 ELECTRONIC COMMERCE


This topic covers the principles of conducting business over the Internet,
including e-commerce strategies, online payment systems, and digital
marketing. E-commerce in the context of Management Information Systems
(MIS) refers to the buying and selling of goods and services over the Internet.
E-commerce enables organisations to reach a wider customer base, increase
sales, and reduce the costs associated with traditional brick-and-mortar retail.
From an MIS perspective, e-commerce requires the integration of a range of
information systems, including payment systems, customer relationship
management systems, and inventory management systems. This integration
allows organisations to provide a seamless customer experience, from
product discovery to purchase, and to manage the flow of information and
goods between different parts of the organisation.

Examples of e-commerce in the MIS context include online marketplaces,


such as Amazon and eBay, and online retail stores, such as Walmart and
Target. E-commerce also includes business-to-business (B2B) transactions,
such as the purchase of raw materials or supplies by manufacturers. The
growth of e-commerce has led to an increase in the use of mobile commerce,
which enables customers to make purchases using their mobile devices.

2.9 CYBERSECURITY & DATA PRIVACY


This topic discusses the various threats to computer systems and data,
including viruses, hackers, and malware, and the measures organisations can
take to protect their information systems. Cybersecurity refers to the
protection of digital devices, networks, and data from unauthorised access,
theft, and malicious attacks. In the context of Management Information
Systems (MIS), it involves implementing measures to secure computer
systems, networks, and data against cyber threats. Data privacy, on the other
hand, refers to the protection of personal information and sensitive data. In
MIS, it involves ensuring that data is collected, stored, and used in
compliance with regulations and laws such as GDPR, HIPAA, and others and
that the privacy rights of individuals are respected.

Both cybersecurity and data privacy are critical aspects of MIS as they help
protect an organisation's information assets, maintain the trust of customers
and stakeholders, and comply with legal requirements.

28
Introduction to
2.10 BUSINESS INTELLIGENCE & DECISION Management
Information System
MAKING
This topic focuses on using data and analytics to make informed business
decisions, including data warehousing, business intelligence tools, and
dashboards. Business Intelligence (BI) in the context of Management
Information Systems (MIS) refers to the use of data, analytical tools, and
technology to support informed decision-making in an organisation. The goal
of BI is to turn data into actionable information that can inform and support
strategic, tactical, and operational decision-making. This can include
activities such as data collection and warehousing, data analysis, reporting,
and visualisation.

Decision-making in MIS involves using information and insights generated


from BI to make informed decisions about various aspects of an organisation
such as finance, operations, marketing, and others. The purpose of decision-
making in MIS is to support an organisation's ability to achieve its goals and
objectives by leveraging data and information to inform decisions. Both BI
and decision-making in MIS play critical roles in the success of an
organisation by providing the information and insights needed to make
informed decisions, monitor performance, and drive continuous improvement.

2.11 PROJECT MANAGEMENT


This topic covers the principles of managing projects, including project
planning, budgeting, and risk management. Project Management in the
context of Management Information Systems (MIS) refers to the application
of tools, techniques, and methodologies to plan, execute, and control
information technology (IT) projects. The goal of project management in
MIS is to ensure that IT projects are delivered on time, within budget, and to
the satisfaction of stakeholders. This can include activities such as project
planning, project scheduling, resource allocation, risk management, project
control, and project evaluation.

MIS project management involves the coordination of technical and business


resources to deliver IT solutions that support the goals and objectives of an
organisation. Effective project management in MIS requires clear
communication, collaboration, and alignment between technical and business
teams, and a focus on delivering value to stakeholders. Project management
in MIS is a critical aspect of the successful delivery of IT solutions and is
essential for ensuring that IT projects are delivered effectively, efficiently,
and to the satisfaction of stakeholders.

2.12 SYSTEM DEVELOPMENT LIFE CYCLE


This topic explores the various stages of developing a software system,
including requirements gathering, design, implementation, and testing. The
System Development Life Cycle (SDLC) in Management Information
Systems (MIS) refers to a process for creating and maintaining information
systems.
29
Overview of It consists of the following phases:
Management
Information System
• Requirements gathering and analysis: This phase involves
understanding the needs and requirements of the stakeholders.
• Design: In this phase, the system is designed based on the requirements
gathered in the first phase.
• Development and testing: The system is developed and tested to ensure
it meets the design specifications.
• Implementation: The system is put into use in this phase.
• Maintenance: The system is monitored, maintained, and updated as
needed to keep it running smoothly.
• Retirement: The system is decommissioned when it is no longer needed.
The SDLC helps ensure that the system is developed in a systematic and
controlled manner, which reduces the risk of errors and improves the quality
of the final product. We will discuss more about SDLC in Unit – 3.

2.13 IT STRATEGY AND MANAGEMENT


This topic discusses the strategic role of information technology in an
organisation, including IT governance, alignment with business objectives,
and management of IT resources. IT strategy and management in the context
of Management Information Systems (MIS) refer to the planning and
implementation of technology systems to support and improve the
organisation's overall goals and objectives. IT strategy defines the long-term
direction for the use of technology in the organisation and involves the
following steps:
• Assessing current IT infrastructure and capabilities
• Determining business requirements
• Developing a vision for the use of technology
• Defining goals and objectives for IT
• Developing an action plan to achieve the goals

IT management involves the day-to-day operations and maintenance of the


technology systems and infrastructure, including:
• Resource allocation
• Project management
• Monitoring and controlling the technology systems
• Ensuring the security and availability of data and systems
• Providing support to users

The goal of IT strategy and management in the context of MIS is to align


technology with the organisation's goals and objectives, improve operational
efficiency, support decision-making, and enhance the overall competitiveness
of the organisation.
30
Introduction to
2.14 ETHICS AND LEGAL ISSUES IN Management
Information System
INFORMATION SYSTEMS
This topic covers ethical and legal considerations related to information
technology, including data privacy, intellectual property, and cybercrime.
Ethics and legal issues in Information Systems (IS) in Management
Information Systems (MIS) are concerned with the responsible use and
protection of information and technology.

• Ethics: Refers to moral principles and values that guide behaviour in the
use of information and technology. This includes issues such as privacy,
accuracy, security and intellectual property.
• Privacy: Protecting the confidentiality and personal information of
individuals.
• Accuracy: Ensuring that the information stored and processed by IS is
accurate, up-to-date and free from errors.
• Security: Protecting the confidentiality, integrity and availability of
information and technology systems.
• Intellectual Property: Protecting the rights of creators and owners of
information, such as copyrights and patents.
• Legal issues in IS in MIS include:
• Compliance with laws and regulations: Organisations must comply with
laws and regulations related to information and technology, such as data
protection laws and privacy regulations.
• Liability: Organisations can be held responsible for the misuse of
information or technology systems.
• Electronic contracts: The legality and enforceability of electronic
contracts and agreements.
The goal of addressing ethics and legal issues in IS in MIS is to ensure the
responsible and legal use of information and technology and to protect the
rights and interests of individuals and organisations.

2.15 SUMMARY
Management Information System (MIS) is a systematic approach to studying
the information needs and decision-making at all levels of an organization. It
helps managers make informed decisions through reports, dashboards, and
visualizations.
An MIS can have a wide range of functions, from providing basic reports to
conducting complex data analysis and decision-making support. It involves
the use of hardware like computers and servers, and software like databases
and reporting tools. Implementing an MIS can bring numerous benefits to an
organization, including enhanced decision-making, increased efficiency, and
better coordination among different departments. However, the effectiveness
of an MIS largely depends on the quality and precision of the data used, as
well as the organization's capability to utilize the information provided. 31
Overview of In conclusion, an MIS is essential for organizations of all sizes and industries.
Management
Information System Providing timely and accurate information to managers can help
organizations to improve their operations, achieve their goals, and stay
competitive in today's fast-paced business environment.

2.16 SELF-ASSESSMENT EXERCISES


1) What is the role played by business information in an organisation?
2) Define a Management Information System and discuss various
characteristics expected of a good MIS.
3) How do Information Systems impact organisations and Business Firms?
Explain with examples.
4) "Internal information is used for day-to-day decision making whereas
external information is crucial for long-term planning". Comment.
5) "Finding better ways to innovate and develop new ideas is critical in a
marketplace" – Justify.
6) You work as a project manager in a large retail company that uses
various information systems to manage its operations, such as inventory
management, customer relationship management, and financial
reporting. As a project manager, you are responsible for ensuring that
these systems are implemented effectively and that they meet the
company's information needs.
i) Analyse the organisation's various information systems.
ii) Evaluate the data management and Examine decision-making

2.17 FURTHER READINGS


• Berlind, David. "Google Apps Refresh Sets up Death Match with
Microsoft," Information Week, April 12, 2010.
• Easley, Robert F., Sarv Devaraj, and J. Michael Crant."Relating
Collaborative Technology Use to Teamwork Quality and Performance:
An Empirical Analysis." Journal of Management Information Systems
19, no. 4 (Spring 2003).
• Clemons, Eric. "The Power of Patterns and Pattern Recognition When
Developing Information-Based Strategy. Journal of Management
Information Systems 27, No. 1 (Summer 2010).

32
System Development
UNIT 3 SYSTEM DEVELOPMENT LIFE Life Cycle (SDLC)

CYCLE (SDLC)

Objectives
After studying this unit, you will be able to:
• Understand the importance of SDLC in MIS
• Learn in detail about the phases of SDLC
• Deep knowledge about Methodologies of System Development Life
Cycle

Structure
3.1 Introduction of System Development Life Cycle (SDLC)
3.2 Phases of SDLC
3.3 Methodologies of System Development Life Cycle
3.4 Benefits of System Development Life Cycle
3.5 Possible Drawbacks of SDLC
3.6 Summary
3.7 Self-Assessment Exercises
3.8 Further Readings

3.1 INTRODUCTION OF SYSTEM


DEVELOPMENT LIFE CYCLE (SDLC)
The system development life cycle (SDLC), which was first introduced in the
1960s, has its roots in developing the first software systems. So far, the
SDLC has evolved into a complex software development process model that
can be seen and used within a specific software development model. The
systems development life cycle (SDLC) is a project management conceptual
model describing the stages of an information system development project,
from initial feasibility studies to application maintenance. SDLC applies to
both technical and non-technical systems. In most cases, a system is an IT
technology that includes both hardware and software. SDLC is typically
attended by project and program managers, system and software engineers,
development teams, and end users.
Every hardware or software system will undergo a development process,
which is an iterative process with multiple steps. SDLC provides a rigid
structure and framework for defining the phases and steps involved in system
development. SDLC stands for Synchronous Data Link Control and software
development life cycle. The software development life cycle is very similar
to the systems development life cycle, but it focuses solely on the software
development life cycle. An effective System Development Life Cycle
(SDLC) should result in a high-quality system that meets customer
expectations, is completed on time and within budget, and works effectively
33
Overview of and efficiently in the current and planned information technology
Management
Information System infrastructure.

SDLC is a conceptual model that includes policies and procedures for


developing or changing systems throughout their life cycles. Analysts use
SDLC to create an information system. SDLC activities include the
following:
• Requirements
• Design
• Implementation
• Testing
• Deployment
• Operations
• Maintenance

3.2 PHASES OF SDLC


The Systems Development Life Cycle is a systematic approach that explicitly
breaks down the work required to implement a new or modified Information
System into phases. The System Development Life Cycle (SDLC) is a
comprehensive and systematic approach to developing and maintaining
information systems. The following are the seven stages of SDLC as shown
in fig 3.1:

Fig 3.1: System Development Life Cycle (SDLC)

• Planning: The organization identifies the need for a new system and
defines its objectives and scope during this stage. A feasibility study is
carried out to determine whether or not the project is feasible and the
resources required.
• Analysis: During this stage, the organization collects and analyses the
system requirements. Gathering requirements from stakeholders and
34 developing a detailed system specification are part of this stage.
System Development
• Design: The system design is created during this stage, which includes Life Cycle (SDLC)
the software and hardware architecture, database design, user interfaces,
and system security.
• Development: This stage entails the actual coding and development of
the system based on the previous stage's design. Developers design,
develop, debug, and test the system.
• Integration & Testing: The system is tested during this stage to ensure
that it meets the requirements and functions as expected. To validate the
system, various types of testing are performed, including unit testing,
integration testing, and acceptance testing.
• Implementation: The system is installed and deployed in a live
environment for end-users to use at this stage. The system is deployed in
a production environment and used by customers and end users.
• Maintenance: After the system has been deployed, this stage entails
providing support for it. The system may require maintenance and bug
fixes, as well as the addition of new features based on customer
feedback.

To summarise, the SDLC is a thorough and well-defined approach to system


development that ensures systems are delivered on time, within budget, and
with high quality.

Stage #1. Planning – What Are the Existing Problems?


Planning is one of the SDLC's core phases, covering nearly all of the
upcoming steps that developers must complete for a successful project
launch. This stage assists in setting up the problem or defining the pain that
the software can target, defining the system's objectives, and sketching a
rough plan of how the system will work. In other words, the planning process
aids in developing an idea of how a specific problem can be solved using a
specific software solution. This is critical for developers to better understand
the project before beginning to build software.

Furthermore, the planning stage includes an analysis of the resources and


costs required to complete the project, as well as an estimate of the overall
price of the software developed. Finally, the planning process clearly defines
the outline of system development, including setting deadlines and time
frames for each of the system development life cycle phases - all to launch
the product on time and present it to the market.

Stage #2. Analysis – What Do We Want?


When the planning phase is complete, the research and analysis phase begins.
This step entails gathering more specific data for your new system, such as
initial system prototype draughts, market research, competition analysis, and
so on. To complete the analysis and obtain all of the critical information for a
specific project, developers can generate system requirements, prioritize
them, draw alternatives, and determine the existing prototypes' pros and cons.
Conducting market research to identify end-user pain points and needs, as
well as developing concrete system goals and features to target.
35
Overview of Additionally, the SDLC analysis stage includes the creation of the Software
Management
Information System Requirement Specification (SRS) document, which defines the upcoming
system development's software and hardware, functional, and network
requirements. In general, such a document shapes the project's strict
regulations and establishes the exact software model you want in the result.

Stage #3. Design – How It Should Look Like?


Design and prototyping are the next stages of a system development project.
Essentially, this process is a necessary precursor to the core developing stage,
which is why it is sometimes confused with the development process itself.
However, because of the outlining of the system interface, databases, core
software features, user interface and usability, and network and its
requirements, this SDLC step can significantly reduce the time required to
develop the software. In general, these features aid in the finalization of the
SRS document as well as the creation of the prototype of the software to get a
general idea of how it should look.
The design process also includes the first testimonials of previously drawn
ideas, as well as brainstorming some new concepts and solutions that fit
better - such an approach can significantly reduce the time and costs required
for the actual development of the software. Thus, once the design and
prototyping are completed and the operation plan is in place, the creators can
begin programming!
Stage #4. Development – Let’s Create It
The SDLC development stage focuses on the system creation process:
developers write code and build the app following the finalized requirements
and specification documents. In other words, it encompasses both the front-
end and the back-end development processes. Along with the core functions
of the software, it includes the application's UX/UI design - all of the basic
qualities the product must provide for its end-users.
Developers can use a variety of tools and programming environments written
in C++, PHP, Python, and other languages to fully meet the project
specifications and requirements. Though this phrase may appear to be simple,
it is still possible to implement minor changes and improvements if they
exist. However, the efficiency of the finalized version of the system created
can only be evaluated in the following stage - software testing.
Stage #5. Testing – Is It the Exact One We Needed?
The testing stage, like any other system development model, is one of the
most critical phases of SDLC. After the software is built, it is even more
critical to ensure that all of its features work correctly and coherently and that
they do not negatively impact the user experience. This process involves
detecting potential bugs, defects, and errors, searching for various
vulnerabilities, and so on, and can sometimes take even longer than the app-
building stage. Finally, developers usually produce a testing report that
includes a test case that lists all of the issues that were discovered and
resolved. You can also review the testing criteria to ensure that the software
product meets all of the requirements outlined in the SRS document.
36
System Development
Stage #6. Integration – How Will We Use It? Life Cycle (SDLC)

When the product is finished, it is time to integrate it into the specific


environment, which usually means installing it. At this point, the software
has completed its final testing in the training environment and is ready for
market presentation. As a result, the product is now available to a much
larger audience!

Stage #7. Maintenance – Let’s Make the Improvements


The final but not least important stage of the SDLC process is maintenance,
where the software is already in use by end users. Often, during the first few
months, developers will encounter issues that were not detected in the
testimonials, so they should immediately respond to the reported issues and
implement the changes required for the software's stable and convenient use.
This is especially important for large systems, which are typically more
difficult to test during the debugging stage.

3.3 SIX METHODOLOGIES OF SYSTEM


DEVELOPMENT LIFE CYCLE
Now that you understand the fundamental SDLC phases and why they are
important, it's time to delve into the core methodologies of the system
development life cycle - the approaches that can assist you in delivering a
specific software model based on its major characteristics and features.
Overall, there are six popular SDLC methodologies that you can use. Let's go
over the main distinctions and peculiarities of each:

Waterfall Model
The Waterfall Model was the first to be introduced as a Process Model as
shown in Figure 3.2.

Fig 3.2: Waterfall Model


37
Overview of A linear-sequential life cycle model is another name for it. It is extremely
Management
Information System simple to grasp and apply. In a waterfall model, each phase must be
completed before the next one can begin, and the phases do not overlap. The
Waterfall model was the first SDLC approach used in software development.
The waterfall Model depicts the software development process in a sequential
linear flow. This means that any phase of the development process can start
only after the previous phase is finished. The phases in this waterfall model
do not overlap.
The Waterfall model's sequential phases are as follows:

• Gathering and analyzing requirements: In this phase, all possible


requirements of the system to be developed are captured and documented
in a requirement specification document.
• System Design: In this phase, the requirement specifications from the
first phase are studied, and the system design is prepared. This system
design aids in the specification of hardware and system requirements, as
well as the definition of the overall system architecture.
• Implementation: Using system design inputs, the system is first
developed in small programs called units, which are then integrated into
the next phase. Unit Testing is the process of developing and testing each
unit for functionality.
• Testing: After testing each unit, all of the units developed during the
implementation phase are integrated into a system. Following
integration, the entire system is tested for flaws and failures.
• System deployment: After functional and non-functional testing is
completed, the product is deployed in the customer environment or
released to the market.
• Maintenance: There are a few issues that arise in the client
environment. Patches are released to address these issues. To improve
the product, newer versions are released. Maintenance is performed to
implement these changes in the customer environment.

Iterative Model
The Iterative model begins with a simple implementation of a small set of
software requirements and iteratively improves the evolving versions until
the entire system is implemented and ready for deployment. An iterative life
cycle model does not attempt to begin with a complete set of requirements.
Instead, development begins with specifying and implementing only a
portion of the software, which is then reviewed to identify additional
requirements. This process is then repeated, resulting in a new version of the
software at the end of each model iteration.

This includes a series of smaller "waterfalls" in which small portions of


changes are carefully analyzed, tested, and delivered via repeating
development cycles. Receiving early feedback from end users allows for the
elimination of issues and bugs in the early stages of software development.
38
System Development
Life Cycle (SDLC)

Fig 3.3: Iterative model

Iterative and incremental development is a development model that combines


iterative design or iterative methods with an incremental build model. "More
than one iteration of the software development cycle may be in progress at
the same time during software development." This method is referred to as
"evolutionary acquisition" or "incremental build." The entire requirement is
divided into various builds in this incremental model. The development
module goes through the requirements, design, implementation, and testing
phases during each iteration. Each subsequent module release adds
functionality to the previous release. The process is repeated until the entire
system meets the requirements.

The key to using an iterative software development lifecycle successfully is


rigorous requirement validation, as well as verification and testing of each
version of the software against those requirements within each cycle of the
model. Tests must be repeated and extended as the software evolves through
successive cycles to verify each version of the software.

Spiral Model

Fig 3.4: Spiral Model 39


Overview of The spiral model combines the idea of iterative development with the
Management
Information System waterfall model's systematic, controlled aspects. This spiral model is a hybrid
of the iterative development process model and the sequential linear
development model, also known as the waterfall model, with a heavy
emphasis on risk analysis. It enables incremental product releases or
incremental refinement with each iteration around the spiral.
The spiral model is best suited for large projects with similar, more
customized products, as it allows for repeated passage through the main
phases in a "spiral" motion. It allows for the simultaneous incorporation of
feedback on the first stages of a life cycle, significantly reducing the time and
costs required to implement the changes.
The spiral model is divided into four stages. A software project goes through
these phases in iterations known as Spirals.

• Identification:
This phase begins with gathering the baseline spiral's business
requirements. This phase is used to identify system requirements,
subsystem requirements, and unit requirements in subsequent spirals as
the product matures. This phase also includes continuous communication
between the customer and the system analyst to understand the system
requirements. The product is deployed in the identified market at the end
of the spiral.

• Design:
The Design phase begins with conceptual design in the baseline spiral
and progresses to architectural design, logical module design, physical
product design, and final design in subsequent spirals.

• Construct or Build:
At each spiral, the Construct phase refers to the production of the actual
software product. In the baseline spiral, when the product is still being
thought about and the design is being developed, a POC (Proof of
Concept) is created to solicit customer feedback. Then, in subsequent
spirals with greater clarity on requirements and design details, a working
model of the software known as a build with a version number is
produced. These prototypes are sent to the customer for review.

• Evaluation and Risk Analysis:


Identifying, estimating, and monitoring technical feasibility and
managing risks such as schedule slippage and cost overrun are all part of
risk analysis. After testing the build, the customer evaluates the software
and provides feedback at the end of the first iteration.

V-Model
The 'V-Model' is a modern version of the traditional software development
model. The letter 'V' represents verification and validation and is an extension
40 of the Waterfall model. The crux of the V model is the connection between
System Development
each phase of testing and that of development. The phases of testing are Life Cycle (SDLC)
categorized as the "Validation Phase" and that development as the
"Verification Phase". As a result, for each stage of development, a
corresponding test activity is planned ahead of time.

Fig 3.4: V – Model

Verification Phases:

• Requirement Analysis: The first step in software development is to


collect requirements. Requirements are business requirements that must
be met during the software development process. Business requirement
analysis is the process of understanding an aspect from a customer's
perspective by putting oneself in their shoes and thoroughly analyzing an
application's functionality from a user's perspective. An acceptance
criteria layout is now being prepared to correlate the tasks completed
during the development process with the overall effort's outcome.
• System Design: It entails creating a layout of the system/application
design that will be created. The goal of system design is to create
detailed hardware and software specifications. System design is further
classified into the following subcategories:
i) Architectural Design: Architectural design is concerned with the
development of technical methodologies to be used in the
completion of software development objectives. Architectural design
is frequently referred to as 'high-level design,' and it aims to provide
an overview of the solution, platform, system, product, and service.
ii) Module Design: Module design, also known as 'low-level design,'
aims to define the logic upon which the system will be built. We try
to depict the relationship between the modules and the order in
which they interact at this stage.

Validation Phases:
• Unit Testing Phase: Unit tests are designed to validate single modules
and identify and eliminate bugs. A unit test is simply running a piece of
code to see if it provides the desired functionality. 41
Overview of • Integration Testing: Integration testing is the process of collaborating
Management
Information System pieces of code to ensure that they perform as a single entity.
• System Testing: When the entire system is ready, the application is run
on the target environment in which it must operate, and a conclusion is
drawn to determine whether the system is capable of performing
efficiently with the shortest response time.
• User Acceptance Testing: The user acceptance test plan is created
during the requirement analysis phase because when the software is
ready to be delivered, it is tested against a set of tests that must be passed
to certify that the product met its goal.

The Big Bang Model


This model is ideal for clients who do not have a clear idea or vision of how
their final product should look. It is commonly used for creating and
delivering a wide range of ideas. As a result, delivering different system
variations that could more accurately define the final output provides a more
concrete vision of specific project completion. While it may be too expensive
to deliver a large project, this SDLC methodology is ideal for small or
experimental projects.
The Big bang model is an SDLC model that begins from scratch. It is the
most basic SDLC (Software Development Life Cycle) model because it
requires almost no planning. However, it requires more funds and coding, as
well as more time. The big bang model was named after the "Great Big
Bang," which resulted in the formation of galaxies, stars, planets, and so on.
Similarly, to build a product, this SDLC model combines time, effort, and
resources. The product is gradually built as the customer's requirements
arrive; however, the end product may not meet the actual requirements. The
diagram below depicts an overview of the Big Bang SDLC model.

Fig 3.5: The Big Bang Model

As new product requirements arrive, they are understood and implemented.


The entire module, or at least a portion of it, is integrated and tested. To
determine the cause, all modules are run separately and the defective ones are
removed. It is an appropriate model when the requirements are unclear and
the final release date is unknown. Simply put, it can be phased out in three
stages. Specifically,
• Integrate each module to provide a unique integrated overview
• Test each module separately to identify any errors or defects
• If an error is discovered, isolate that module and determine the source of
42 the error.
System Development
Agile Model Life Cycle (SDLC)

This model is used for rapid and ongoing release cycles, to implement minor
but significant changes between releases. This implies more tests and
iterations and is mostly applicable to removing minor issues from larger,
more complex projects. As you can see, different SDLC methodologies are
used depending on the specifics of each project, its requirements, the client's
core vision, and other factors. Knowing the specific characteristics of each
SDLC model can assist in selecting the best one to deliver a high-quality,
effective product.
Agile is defined as quick or adaptable. The term "Agile process model" refers
to an iterative software development approach. Agile methods divide tasks
into smaller iterations or parts and do not involve long-term planning
directly. The project scope and requirements are established at the start of the
development process. The number of iterations, duration, and scope of each
iteration are all clearly defined in advance.

In the Agile process model, each iteration is considered a short time "frame,"
typically lasting one to four weeks. The division of the entire project into
smaller parts aids in reducing project risk and overall project delivery time
requirements. Before a working product is demonstrated to the client, each
iteration involves a team going through the entire software development life
cycle, including planning, requirements analysis, design, coding, and testing.

Fig 3.6: The Agile Model

i) Requirements gathering: You must define the requirements during this


phase. You should describe business opportunities and estimate the time
and effort required to complete the project. You can assess the technical
and economic feasibility based on this information.
ii) Design the requirements: Once the project has been identified,
collaborate with stakeholders to define requirements. You can use a user
flow diagram or a high-level UML diagram to demonstrate the 43
Overview of functionality of new features and how they will interact with your
Management
Information System existing system.
iii) Construction/ iteration: The work begins when the team defines the
requirements. Designers and developers begin work on their project,
which aims to deliver a functional product. The product will go through
several stages of development, so it will have simple, minimal
functionality.
iv) Testing: The Quality Assurance team examines the product's
performance and looks for bugs during this phase.
v) Deployment: During this phase, the team creates a product for the user's
workplace.
vi) Feedback: The final step after releasing the product is feedback. In this
stage, the team receives product feedback and works through it.

3.4 BENEFITS OF SYSTEM DEVELOPMENT


LIFE CYCLE
The SDLC (System Development Life Cycle) is a methodical approach to
designing, developing, and maintaining software or information systems.
SDLC has the following advantages:
• Improved Quality: SDLC ensures that software is developed in a
systematic and organized manner, which helps to improve overall
software quality. This can result in fewer bugs and errors, as well as
improved overall performance.
• Reduced Costs: By using a structured approach to development, SDLC
assists in identifying and addressing problems early in the process,
lowering the cost of resolving issues later on.
• Improved Communication: SDLC encourages collaboration among
stakeholders, developers, and users, ensuring that everyone is on the
same page and working towards the same goals.
• Improved Control: Because SDLC employs a structured approach, it
provides greater control over the development process, making it easier
to manage resources, timelines, and budgets.
• Improved Maintenance: SDLC includes ongoing software maintenance
and support, which helps to ensure that it is up-to-date and functioning
properly.
• Reduced Risk: SDLC incorporates risk management into the
development process, allowing potential risks to be identified early and
mitigated.
• Improved Scalability: SDLC considers the need for software to be
scalable and flexible, which can assist in ensuring that it can grow and
adapt to changing needs over time.

In General, SDLC provides a structured approach to software development


that can aid in the improvement of quality, cost reduction, communication
44 and control, and risk reduction.
System Development
3.5 POSSIBLE DRAWBACKS OF SDLC Life Cycle (SDLC)

The System Development Life Cycle (SDLC) is a well-known software


development methodology that provides a structured framework for planning,
designing, implementing, testing, and maintaining software systems. While
SDLC has many advantages, it also has some disadvantages that can expose
the success of a software development project. SDLC may have the following
disadvantages:
• Time-consuming: SDLC can be a time-consuming process because it
entails several stages and activities that must be completed before the
project can proceed. This can cause project delays, which is problematic
for businesses that need to release software quickly.
• Rigid: SDLC can be a rigid process, with little flexibility or adaptability
to changing requirements. As a result, projects may fail to meet the needs
of stakeholders or become obsolete by the time they are released.
• Expensive: SDLC can be an expensive process because it necessitates a
significant investment in time, resources, and personnel. This can be a
barrier to entry for smaller businesses or startups that lack the financial
resources to support such an approach.
• Limited stakeholder involvement: SDLC can be a highly technical
process that does not always involve stakeholders outside of the
development team. This can result in projects that do not meet the needs
of end users or do not align with business objectives.
• Overemphasis on documentation: SDLC places a high value on
documentation, which can be time-consuming and does not always add
value to the project. This can also lead to an emphasis on process rather
than results, which can be counterproductive.

• Lack of testing focus: SDLC can sometimes overlook testing, resulting


in software that is buggy or does not function as intended. This can be a
major issue for businesses that rely on software to run their operations.

SDLC can be an effective approach to software development, but it is


important to be aware of its potential drawbacks and to tailor the approach to
the project's and organization's specific needs.

3.6 SUMMARY
To sum up, the system development life cycle is a complex project
management model that encompasses the system creation from its initial idea
to its finalized deployment and maintenance. The SDLC includes 7 different
stages: planning, analysis, design, development, testing, implementation, and
maintenance – all these are particularly important for delivering a high-
quality cost-effective product in the shortest time frames. Learning the basics
of the SDLC performance, its major methodologies, great benefits, and
possible drawbacks can help you to set up an ergonomic system development
process that will help you to deliver the best outcome.
45
Overview of The software development life cycle can and is adapted by software
Management
Information System development teams based on the philosophy, methodology, and framework
they use when developing a specific software product, or by organizations.
The SDLC is a project management tool that should be tailored to the needs
of the project, the team working on it, and other key stakeholders involved in
the process. The names of the phases, their order, and whether they are
distinct or merged into one another change. However, every software
development project has a life cycle, and you should now understand its role
in project management and as a tool for improving outcomes.

3.7 SELF-ASSESSMENT EXERCISES


Case Study:

Business Event: Customer Wants to Book a Taxi


Business Use Case: Make a Taxi Booking
Owner: Eva Josh, the chief despatcher

Regarding this case, other Stakeholders are the Accounting department for
details of customer accounts, Customers who use the taxi company's services,
Other despatchers who work for the taxi company, Public Carriage Office
who are responsible for setting the tariff for taxis.

As per the above case scenario, the owner wants to improve the performance
of the company and provide quality service through the System Development
Life cycle. Explain the below questions to ensure performance and quality.

1. Define the project scope and objectives: Write down the specific goals,
deliverables, and timeline for your project.

2. Analyze the requirements: Identify the stakeholders, their needs, and


expectations, and determine the functional and non-functional
requirements for the project.

3. Design the system: Create a high-level design of the system, including


architecture, data flow, and interfaces.

4. Choose the development methodology: Decide on the approach, such


as waterfall, agile, or hybrid, to be used in the project.
46
System Development
5. Build and test the system: Develop and test the system to ensure it Life Cycle (SDLC)
meets the requirements and is free of defects.

6. Deploy the system: Release the system to production and make it


available to end-users.

7. Monitor and maintain the system: Regularly monitor the system for
performance and take action to resolve any issues that arise.

8. Evaluate the project: Assess the success of the project by comparing


the actual results to the original goals and objectives.

3.8 FURTHER READINGS


• “Scenarios, Stories, Use Cases: Through the Systems Development Life-
Cycle” By Released September 2004, Publisher(s): Wiley, ISBN:
9780470861943
• https://www.linkedin.com/learning/software-development-life-cycle-sdlc

47
Overview of
Management
Information System

48
System Development
Life Cycle (SDLC)

BLOCK 2
BUSINESS INTELLIGENCE AND
DECISION MAKING

49
Overview of
Management
Information System

50
Introduction to
UNIT 4 INTRODUCTION TO BUSINESS Business Intelligence

INTELLIGENCE

Objectives
After studying this unit, you will be able to:
• Understand the depth of knowledge of Business Intelligence (BI) and
relative terminologies.
• Recognize the usage of various Business Intelligence tools and
techniques to collect, analyze, and interpret data from different sources.
• Gain insights into business operations, customer behaviour, market
trends, and other key areas of interest.
• Evaluate the goal of BI is to provide decision-makers with the
information they need to optimize business performance, reduce costs,
increase revenue, and achieve strategic objectives.

Structure
4.1 Introduction to Business Intelligence
4.2 Data Warehousing
4.2.1 Data Modeling and schema design
4.3 Data Mining and Analytics
4.4 Data Governance and Security
4.5 Business Intelligence Applications
4.6 Summary
4.7 Self-Assessment Exercises
4.8 Further Readings

4.1 INTRODUCTION TO BUSINESS


INTELLIGENCE
Business intelligence (BI) is the process of collecting, analyzing, and
presenting data to assist organizations in making informed business
decisions. Organizations can use BI tools and technologies to access and
analyze large amounts of data from various sources, transforming it into
actionable insights. To collect and analyze data, BI typically employs data
warehouses, data mining, and data visualization tools. It can assist businesses
in identifying trends, patterns, and relationships in their data, which can then
be used to inform strategic decisions and drive business growth.

Business intelligence can be used to analyze customer behaviour, track sales


performance, optimize supply chain operations, and monitor financial
performance. It is an essential component of modern business strategy, and
its importance is growing in today's data-driven business environment.
Business intelligence (BI) is a process that collects, organizes, and analyses
data to gain insights that can help organizations make informed decisions. BI 51
Business Intelligence systems collect data in a variety of ways, including data extraction from
& Decision Making
databases, web analytics, and social media platforms.

After the data is collected, it is converted into a format that can be analyzed
with BI tools such as data visualization software, dashboards, and reporting
tools. These tools enable businesses to analyze large datasets, identify trends,
and gain a better understanding of their operations. One of the most
significant advantages of business intelligence is that it allows organizations
to make data-driven decisions. Businesses can identify areas for improvement
in their operations and make changes as a result of real-time data analysis.
For example, if a company's sales are declining, it can use BI tools to identify
the root cause of the problem and take corrective action. Another advantage
of BI is that it can assist organizations in optimizing their operations.
Businesses, for example, can identify areas where they can improve customer
satisfaction and increase sales by analyzing customer behaviour. They can
also use business intelligence to track inventory levels and optimize supply
chain operations to save money.

Assume a retail company wants to increase sales in its physical stores. The
company can use business intelligence to analyze customer behaviour and
identify areas where operations can be improved. First, the company can
collect data from various sources, such as point-of-sale systems, customer
loyalty programs, and social media platforms, using data mining techniques.
This information may include customer demographics, purchasing habits, and
product preferences. The company can then use data visualization tools to
build dashboards and reports that highlight key performance metrics like
sales per store, sales per employee, and customer satisfaction ratings. These
reports can assist the company in identifying customer behaviour trends and
patterns, as well as tracking the effectiveness of marketing campaigns and
promotions.
The company can make data-driven decisions to improve its operations based
on the insights gained from data analysis. For example, if data shows that a
specific product sells well in one store but not in others, the company can
stock more of that product in the underperforming stores. They can also use
the data to identify peak shopping hours and adjust staffing levels
accordingly to ensure that customers are served as soon as possible. By
analyzing customer behavior with business intelligence, the company can
optimize operations and increase sales in its physical stores. They can also
use the data analysis insights to improve their online sales and marketing
efforts, resulting in higher revenue and profits.

Purpose of BI:
Business intelligence (BI) serves several functions, including the following:

Improved decision-making: Business intelligence (BI) provides


organizations with real-time insights into their operations, allowing decision-
makers to make more informed decisions. Organizations can identify trends,
patterns, and relationships in data that will help them identify opportunities
and potential risks.
52
Introduction to
Increased efficiency: Business intelligence can assist organizations in Business Intelligence
optimizing their operations and increasing efficiency. Organizations can
identify areas where they can cut costs and streamline operations by
analyzing data on inventory levels, sales performance, and customer
behaviour. BI can provide organizations with a competitive advantage by
providing insights that help them make better decisions. By analyzing
competitor data, businesses can identify areas where they can improve their
operations and gain a competitive advantage.
Better customer insights: Business intelligence can assist organizations in
gaining a better understanding of their customers. Organizations can identify
preferences and trends in customer behaviour, which can help them tailor
their products and services to meet customer needs.

Improved collaboration: Business intelligence tools and technologies allow


organizations to share data and insights with employees throughout the
organization. This can improve collaboration and help teams collaborate
more effectively.

Faster reporting and analysis: BI tools can automate data collection,


analysis, and reporting, saving time and improving accuracy. Organizations
can generate reports faster and make decisions in real time by automating
these processes.

Historical development of BI:


Business intelligence (BI) has been around since the mid-nineteenth century.
An overview of BI's historical development is as:
• Early decision-making systems: Companies began using simple
decision-support systems to analyze financial data in the mid-nineteenth
century. These systems analyzed data and generated reports using basic
statistical methods.
• Databases are on the rise: Databases became more common in the
1960s and 1970s, and businesses began to use them to store and manage
large amounts of data. This resulted in the creation of more sophisticated
data analysis tools.
• Data warehousing first appeared in the 1980s as a method for businesses
to store and manage large amounts of data. Companies were able to
analyze data from multiple sources and generate more comprehensive
reports as a result of this.
• The rise of OLAP: Online analytical processing (OLAP) tools gained
popularity in the 1990s. These tools allowed users to quickly analyze
data from various perspectives and generate reports.
• The evolution of data mining: Data mining emerged in the late 1990s
and early 2000s as a method for businesses to identify patterns and
trends in large amounts of data. Companies were able to gain new
insights into their operations and make better decisions as a result of this.
• The emergence of big data: The growth of big data over the last decade
has resulted in the development of new BI tools and technologies. These
53
Business Intelligence tools enable businesses to analyze massive amounts of data in real-time
& Decision Making
and generate previously unattainable insights.

Fig 4.1: Historical development of BI

BI's historical development has been fueled by technological advances and an


increasing need for organizations to analyze data and make informed
decisions. As data volumes increase, BI will evolve, with new tools and
technologies emerging to assist organizations in gaining insights into their
operations and making better decisions.
Key components of a BI system:
A business intelligence (BI) system consists of several components that work
together to collect, process, and analyze data. Below are the key components
of a BI system:

Fig 4.2: Key components of BI System


54
Introduction to
i) Data sources: Business Intelligence

The data sources are the first component of a BI system. This includes all
data-generating and data-capture systems and applications, such as
transactional databases, customer relationship management (CRM)
systems, and social media platforms. Data sources are the raw materials
from which a BI system generates insights. Internal databases, external
sources such as social media and market research, and data warehouses
are examples of these sources. To provide a comprehensive view of the
organization's performance, a BI system must be able to access and
integrate data from all of these sources.

ii) Data integration:


The data integration layer is the second component of a BI system. This
component collects data from various sources, formats it, and loads it
into a central data repository. The process of combining data from
various sources and transforming it into a usable format for analysis is
known as data integration. Cleaning, validating, and standardizing data is
required to ensure accuracy and consistency. Data integration aims to
create a unified view of data that can be used for analysis and reporting.

iii) Data warehouse:


The data warehouse is the third component of a BI system. This is a
central repository of integrated data that is optimized for reporting and
analysis. The data warehouse stores historical information and allows for
complex queries and analysis. A data warehouse is a central repository
where all integrated data is stored in a structured format that is optimized
for analysis. The data warehouse is intended to facilitate efficient
querying and reporting by organizing data into subject areas that
correspond to the organization's business processes. In addition, the data
warehouse offers a historical perspective on the data, allowing users to
analyze trends and patterns over time.

iv) Data modeling:


The data modelling layer is the fourth component of a BI system. This
component includes tools and technologies that enable users to create
data models that represent data element relationships. This encompasses
both logical and physical data models. Data modelling is a critical
component of Business Intelligence (BI) that entails developing a
conceptual representation of the data used in a BI system. A data model
is a graphical representation of data and the relationships between data
entities. The goal of data modelling is to provide a framework for
understanding and organizing data in a business intelligence system.

v) Business intelligence tools:


Business intelligence tools are the fifth component of a BI system. This
includes tools for reporting, dashboards, data visualization, and ad-hoc
querying. Users can use these tools to analyze data and generate reports
and visualizations to help them make decisions. BI tools are software 55
Business Intelligence applications that allow users to access, analyze, and visualize data to gain
& Decision Making
insights into their organization's performance. BI tools are an important
part of a Business Intelligence (BI) system, which is designed to give
users the information they need to make data-driven decisions. Data from
a variety of sources, including databases, spreadsheets, and other data
sources, is processed using BI tools. They provide a range of capabilities,
including reporting, data visualization, data mining, predictive analytics,
and OLAP (Online Analytical Processing). BI tools are designed to make
it easy for users to access and analyze data without requiring advanced
technical skills.

vi) Analytics and data mining:


The analytics and data mining layer is the sixth component of a BI
system. This section contains tools for statistical analysis, predictive
modeling, and data mining algorithms. Users can use these tools to
identify patterns and trends in data and make predictions based on
historical data. Analytics in business intelligence entails the application
of statistical and mathematical techniques to analyze data and generate
insights that can assist organizations in making informed decisions.
Regression analysis, time series analysis, and clustering analysis are
examples of such techniques. BI analytics tools allow organizations to
analyze data in real-time, perform ad hoc analysis, and create custom
reports and dashboards. In business intelligence, data mining entails
using machine learning algorithms and statistical techniques to uncover
hidden patterns and relationships in data. Techniques such as association
rule mining, decision trees, and neural networks are examples of this.
Data mining tools in BI enable organizations to identify patterns and
insights that may not be obvious at first glance and to use this
information to make more informed decisions.

vii) Metadata management:


The metadata management layer is the seventh component of a BI
system. This component is in charge of the metadata, which is the
information that describes the data. This includes information about the
data's origin, data definitions, and data lineage. Metadata management is
a critical component of Business Intelligence (BI) that involves the
administration of metadata, which is data that describes other data.
Metadata describes the context, meaning, and structure of data in a
business intelligence system. The metadata repository is a centralized
location for storing and managing metadata. The metadata repository
contains information about the BI system's data sources, data models,
data transformations, and other components. It enables users to
comprehend and interpret the information presented in the system.
Organizations can ensure that their BI system delivers meaningful
insights that can drive better business decisions by providing accurate,
consistent, and comprehensive metadata.
These components work together to give users insights into their operations
and to help them make decisions.
56
Introduction to
4.2 DATA WAREHOUSING Business Intelligence

Data warehousing is a critical component of Business Intelligence (BI) that


entails collecting, storing, and organizing large amounts of data from various
sources. A data warehouse is a centralized repository that stores historical
and current data from various sources in a structured format that can be
accessed and analyzed easily. Data warehouses typically collect information
from operational systems such as transactional databases, customer
relationship management (CRM) systems, and enterprise resource planning
(ERP) systems. After that, the data is transformed, cleaned, and integrated to
provide a unified view of the business. A data warehouse's primary function
is to provide a single source of truth for data analysis and reporting. This
provides business users with accurate, timely, and consistent data to generate
insights and inform decision-making. Data warehouses also serve as a
platform for advanced analytics and data mining, which can assist businesses
in identifying patterns and trends in their data.

In BI, data warehousing involves the use of a variety of technologies and


tools, such as Extract, Transform, Load (ETL) processes, data modeling, and
data visualization. ETL processes are used to extract data from source
systems, transform the data into a data warehouse-compatible format, and
load the data into the warehouse. Data modeling entails developing a logical
data model and a physical data model, which define the relationships between
data elements and the database structure. Data visualization tools are used to
create reports and dashboards that business users can use to communicate
insights.

Data warehousing serves as a central repository of data that can be accessed


and analyzed to gain insights into business performance, trends, and
opportunities. A data warehouse provides a consolidated view of data from
various sources. This enables companies to combine data from various
systems and departments, such as sales, finance, and operations.

Fig 4.3: ETL Process & Data Warehouse

A data warehouse stores historical data over a long period, allowing


businesses to analyze performance trends and patterns over time. This can aid
in identifying areas for improvement as well as informing strategic decision- 57
Business Intelligence making. Data warehouses serve as the foundation for business intelligence
& Decision Making
tools like dashboards and reports. These tools assist businesses in analyzing
and visualizing data to gain insights and make informed decisions about data
quality, such as data cleansing and validation. This contributes to the
accuracy and dependability of the data used for analysis.

In the diagram above, data is collected from various sources, and the ETL
process converts unstructured data into structured and meaningful
information. This leads to the creation of a data warehouse, reporting, and
analysis to make better decisions.

Data modeling and schema design:


The process of creating a conceptual representation of data structures and
relationships that exist within a specific domain or system is known as data
modelling. It entails identifying and defining the entities, attributes, and
relationships required to effectively represent and manage data. The process
of translating a conceptual data model into a physical database schema that
can be implemented in a specific database management system is known as
schema design. It entails selecting appropriate data types, defining tables and
their columns, and establishing table relationships.
Typically, the data modelling process begins with identifying the entities that
will be represented in the system. A real-world object, concept, or event that
can be uniquely identified and described is referred to as an entity. Let us
consider, Customers, orders, products, and invoices are examples of entities
in a customer relationship management system. After identifying the entities,
the next step is to define the attributes associated with each entity. A
characteristic or property of an entity that can be used to describe or
distinguish it is referred to as an attribute. A customer entity's attributes
might include, for example, name, address, phone number, and email address.
Following the definition of the entities and attributes, the next step is to
identify the relationships between the entities. The associations or
connections that exist between two or more entities are referred to as
relationships. In a customer relationship management system, for example, a
customer may place multiple orders, each of which may contain multiple
products.

Following the definition of the data model, the next step is to design the
schema that will be used to implement the data model in a specific database
management system. Selecting appropriate data types for each attribute,
defining tables and their columns, and establishing relationships between
tables using primary and foreign keys are all part of this process. Data Model
Schemas are commonly used to visually represent the architecture of a
database and serve as the foundation for an organization's Data Management
practice.

Choosing the right Data Model Schema can help to eliminate bottlenecks and
anomalies during software project execution. An incorrect Schema Design,
on the other hand, can cause several errors in an application and make
refactoring expensive. For example, if you didn't realize early on that your
58 application would require multiple table JOINS, your service will eventually
Introduction to
stop when you reach a certain number of users and data. Business Intelligence

To resolve such complications, data will almost certainly need to be moved to


new tables, code will need to point to those new tables, and the tables will
require the necessary JOINs. This implies that you'll need a very strong test
environment (Database and Source Code) to test your changes, as well as a
strategy for managing Data Integrity while also upgrading your database and
source code. Once you begin migrating your database to a new schema, there
is almost no turning back. To avoid such complexities in the early stages of a
data project, it is critical to choose the appropriate schema, avoiding
unprecedented bottlenecks.

The Data Model Schema design begins with a high level of abstraction and
progresses to become more concrete and specific, as with any design process.
Based on their level of abstraction, data models are generally classified into
three types. The process will start with a Conceptual Model, then a Logical
Model, and finally a Physical Model. Such data models provide a conceptual
framework within which a database user can specify the requirements,
structure, and set of attributes for configuring a Database Schema. A Data
Model also offers users a high-level design implementation that dictates what
can be included in the schema.
The following are some popular data model schemas:
• Hierarchical Schema
• Relational Schema
• Network Schema
• Object-Oriented Schema
• Entity-Relationship Schema

Fig 4.4: Types of Data Models

Hierarchical Schema:
A hierarchical schema is a type of database schema that organizes data in a
tree-like structure with a single root, with each node having one parent and
potentially multiple children. A tree schema or a parent-child schema is
another name for this type of schema. Data is organized top-down in a
hierarchical schema, with the parent node at the top of the tree representing
the most general information and child nodes below it representing more
specific information.

59
Business Intelligence A hierarchical schema for a company, for example, might have "Company"
& Decision Making
as the root node, with child nodes for "Departments," "Employees," and
"Projects." One of the primary benefits of a hierarchical schema is that it is
simple to understand and apply, making it ideal for small to medium-sized
databases with simple data relationships. However, when dealing with
complex relationships or when changes to the schema structure are required,
it can be limiting. Furthermore, because some data may need to be repeated at
multiple levels of the hierarchy, this type of schema can result in data
redundancy.

This data model arranges data using a tree-like structure, with the root node
being the highest. When there are multiple nodes at the top level, root
segments exist. It has nodes that are linked together by branches. Each node
has one parent, who may have multiple children. A one-to-many connection
between various types of data. The information is saved as a record and
linked together.

A hierarchical schema is a database schema that organizes data into a tree-


like structure with one or more child nodes for each node. Here's an example
of a hierarchical organizational structure for a company:

Fig 4.5: A hierarchical schema

The "Company" node is the root of the hierarchy in this schema, with three
child nodes representing the company's departments. Each department node
has a manager node as its first child, followed by one or more employee
nodes. This structure enables efficient querying of data related to specific
departments or employees, as well as easy navigation of the company's
organizational structure.

Advantages:
• Easy to understand and implement: A hierarchical schema is a simple
and intuitive way to organize data. It is simple to comprehend and
implement.
• Querying data becomes more efficient because the hierarchical schema is
organized in a tree-like structure. This is due to the ease with which we
can navigate the hierarchy by following the links between parent and
child nodes.
• Data Integrity: A hierarchical schema ensures that data is always
consistent and that data integrity is maintained. This is because each
60
Introduction to
child node can only have one parent node, preventing data duplication Business Intelligence
and inconsistency.
• Improved Security: A hierarchical schema improves security because
access to nodes can be easily controlled by setting permissions at the
appropriate levels.

Challenges:
• Limited flexibility: Hierarchical schema has limited flexibility because
it can only represent data in a tree-like structure. This makes representing
complex data relationships difficult.
• Data redundancy: Because data may need to be duplicated at multiple
levels in the hierarchy, hierarchical schema can lead to data redundancy.
• Difficult to scale: Hierarchical schema can be difficult to scale because
adding new levels or nodes requires significant restructuring of the
schema.
• Inefficient updates: Updating data in a hierarchical schema can be
inefficient because changes to a parent node may necessitate updates to
all of its children nodes.

Relational Schema Model:


A relational schema is a formal description of how data in a relational
database is organized. It defines a database's structure, including the tables,
fields, and relationships between them. The schema defines the names of the
tables and columns, as well as the column data types and any constraints or
rules that apply to the data. It also specifies the primary and foreign keys that
are used to connect tables.

A simple relational schema might consist of two tables, "Customers" and


"Orders," for example. Columns in the "Customers" table would include
"customer_id," "first_name," "last_name," and "email," while columns in the
"Orders" table would include "order_id," "customer_id," "order_date," and
"total_amount." A foreign key would connect the "customer_id" column in
the "Orders" table to the "customer_id" column in the "Customers" table.

Relational schemas are important because they standardize the way data is
organized and accessed in a database. They make data management easier
and ensure data integrity by imposing rules and constraints on the data. They
also allow for efficient data querying and reporting, making it easier for
applications to retrieve the information they require.

A simple relational schema for a company's employee database is shown


below:

Employee table:

Field Name Data Type Description


employee_id Integer Unique identifier for each employee
first_name Varchar(50) First name of the employee
61
Business Intelligence last_name Varchar(50) Last name of the employee
& Decision Making
hire_date Date Date when the employee was hired
job_title Varchar(50) Job title of the employee
department_id Integer The ID of the department the
employee belongs to
manager_id Integer The ID of the employee's manager
Primary Key: employee_id

Department table:

Field Name Data Type Description


department_id Integer Unique identifier for each department
department_name Varchar(50) Name of the department
location Varchar(50) Location of the department
Primary Key: department_id

Foreign Key:
The field "department_id" in the "Employee" table is a foreign key that refers
to the field "department_id" in the "Department" table. This indicates that
each employee is assigned to a specific department. By joining the
"Employee" and "Department" tables on the "department_id" field, we can
answer questions like "What is the name of the department that employee
John Smith belongs to?" This is just a simple example; in practice, a
relational schema could be much more complex, with many more tables and
relationships.

Advantages:
• Standardization: Relational schema provides a standardized method of
organizing and accessing data in a database. This facilitates the
understanding of the data structure by developers and users, as well as
the access and manipulation of the data by applications.

• Data Integrity: Relational schema allows you to define constraints and


rules for your data. This ensures that the data in the database is correct,
consistent, and meets certain requirements. It aids in the prevention of
data duplication, loss, or corruption.

• Scalability: Relational databases can handle large amounts of data while


maintaining performance. The schema enables efficient data querying,
indexing, and searching.

• Flexibility: A relational schema stores data in tables that can be easily


manipulated, queried, and joined to other tables to extract the required
data. This makes it simple to adapt the database to changing needs.

Challenges:

• Complexity: Relational schema can be complicated, particularly in large


databases with many tables and relationships. Designing an optimal
62 schema that balances efficiency and flexibility can be difficult.
Introduction to
• Performance: Although relational databases are scalable, performance Business Intelligence
issues can arise when dealing with very large datasets or complex
queries. This can be mitigated by indexing, caching, and other
optimizations, but it remains a concern.

• Maintenance and updates: Maintenance and updates are required for


the relational schema to ensure that the data remains accurate and
consistent over time. This can take a long time and requires specialized
knowledge and skills.
• Cost: Setting up and maintaining relational databases can be costly,
especially for large-scale applications. The cost of hardware, software,
and licensing can quickly add up.

Network Schema:
The network schema is a type of database schema that organizes data
similarly to the hierarchical schema, but with a more flexible and complex
structure. Data is organized as a graph in a network schema, with nodes
representing entities and edges representing relationships between them. In
contrast to the hierarchical schema, which allows only one parent for each
child, nodes in a network schema can have multiple parents, allowing for
more complex relationships between entities.

Entities are represented as records in a network schema, and relationships


between entities are represented as pointers or links. Each network schema
record contains two types of fields: data fields, which store the values
associated with the entity, and set fields, which store pointers to related
records. Set fields are used to represent the various relationships between
records and can have multiple values.

Consider the employee database of a company. Employee records in a


network schema would include data fields such as name, hire date, and job
title, as well as set fields for the employee's supervisor, department, and
projects they are working on. The department records would include data
fields such as name and location, as well as set fields for the department's
manager and the department's employees.
An example of a network schema for a company's employee database:

Employee record:

Field Name Data Type Description


Employee ID Integer Unique identifier for each employee
Name Varchar(50) Name of the employee
Hire Date Date Date when the employee was hired
Job Title Varchar(50) Job title of the employee
Set Fields:
Supervisor: This is a link to the supervisor's employee record.
Department: A pointer to the employee's department record.
63
Business Intelligence Project: A reference to the project record on which the employee is working.
& Decision Making
Department record:

Field Name Data Type Description


Department Integer Unique identifier for each department
ID
Name Varchar(50) Name of the department
Location Varchar(50) Location of the department

Set Fields:
Manager: Pointer to the manager's employee record.
Employee: Pointer to the employee records that belong to the department.
Project record:

Field Name Data Type Description

Unique identifier for each


Project ID Integer
project

Name Varchar(50) Name of the project

Start Date Date Date when the project started

End Date Date Date when the project ended

Set Fields:
Manager: Pointer to the manager's employee record
Employee: Pointer to the employee records working on the project

We can use this schema to answer questions like "What are the names of the
employees working on project X?" by following the pointers from the project
record to the employee records, and "Who is the supervisor of employee Y?"
by following the pointer from the employee record to the supervisor's
employee record.
The network schema's flexibility in representing complex relationships
between entities is one of its advantages. Entities with multiple parents can
have more complex and flexible relationships than in the hierarchical schema.
Furthermore, the network schema supports many-to-many relationships,
which allow entities to have multiple relationships with other entities.

The network schema's complexity, on the other hand, can make it difficult to
understand and manage. It is more difficult to program and may be less
efficient than the hierarchical schema. Furthermore, the use of pointers or
links between records can make navigating and querying the data more
difficult.

64
Introduction to
Object-Oriented Schema: Business Intelligence

An object-oriented schema is a type of data model used in computer


programming and software engineering. It is based on the principles of
object-oriented programming (OOP), which emphasizes the use of objects to
represent real-world entities and concepts. In an object-oriented schema, data
is organized into objects, each of which has a set of attributes or properties
and a set of methods or operations that can be performed on those attributes.
Objects are defined by classes, which are templates that define the attributes
and methods that all instances of the class will have.
Object-oriented schemas also support inheritance, which allows one class to
inherit properties and methods from another. Because common functionality
can be defined in a parent class and inherited by child classes, this helps to
reduce code duplication and improve code organization. One of the primary
advantages of using an object-oriented schema is that it can aid in the
modularity and maintainability of code. It is easier to modify and extend code
when it is broken down into objects with well-defined interfaces.

Object-oriented schemas are widely used in software development, especially


for large-scale applications and systems. Many popular programming
languages, including Java, C++, and Python, support object-oriented
programming and provide tools for creating and manipulating object-oriented
schemas.

Fig 4.6: Object-oriented schemas – relationships between the classes

We have three classes in this example: Accounts, CheckingAccounts, and


SavingsAccounts. The deposit and withdrawal methods update the account
balance. CheckingAccount is a subclass of Accounts with the private attribute
overdraft_limit. It takes precedence over the withdrawal method, allowing
withdrawals up to the overdraft limit. SavingsAccount is a subclass of
Accounts with the private attribute interest_rate. Fig 4.6 depicts the
relationships between the classes: CheckingAccount and SavingsAccount
derive from Account, and CheckingAccount derives from Account.

Advantages:
• Encapsulation: One of the primary characteristics of object-oriented
schema is encapsulation. It allows data to be hidden from other parts of
the program, limiting access to only defined interfaces. This reduces
complexity, improves modularity, and boosts security.

• Reusability: Object-oriented schema encourages code reuse by allowing


developers to create classes that can be used throughout the programme.
65
Business Intelligence This reduces the amount of redundant code that must be written and aids
& Decision Making
in code maintenance.

• Inheritance: Classes can inherit properties and methods from other


classes. This encourages code reuse, simplifies code maintenance, and
makes it easier for developers to create complex programmes.

• Polymorphism is the ability to treat objects of different classes as if they


are of the same type. This allows for the creation of generic code that can
be applied to a wide variety of objects, increasing code flexibility and
reducing the amount of code that must be written.

Challenges:

• Complexity: Object-oriented schema can be more complex than other


data models, making it more difficult for beginners to learn and apply
effectively.

• Performance: Because of the overhead associated with encapsulation,


inheritance, and polymorphism, object-oriented programmes can be
slower than other types of programmes at times.

• Over-engineering: Object-oriented schema can occasionally lead to


over-engineering, with developers producing overly complex code that is
difficult to understand and maintain.

• Difficulty in debugging: Because of the complexity of the code and the


interactions between different objects, object-oriented programmes can
be difficult to debug at times.

Entity-Relationship Schema:
An entity-relationship (ER) schema is a diagrammatic representation of a
database's entities, attributes, and relationships between them. It is a high-
level conceptual model of a database's structure. The ER schema is
commonly used to design relational databases and to communicate database
designs to developers and stakeholders. An entity in an ER schema is a real-
world object, concept, or event with its own identity and the ability to be
uniquely identified. Attributes describe the characteristics of entities and are
used to define their properties. Relationships describe the connection between
entities.
Entities, attributes, and relationships are the three main components of an ER
schema.

Entities: A rectangle represents an entity, and its name is written inside the
rectangle. An entity can be a person, a place, a thing, an event, or a concept.

Attributes are represented by an oval shape and are linked to their respective
entities by a line. An attribute defines an entity's properties and provides
additional information about it.

Relationships: A relationship connects two entities and is represented by a


diamond shape. It describes the relationship between the two entities and can
66
Introduction to
include constraints on cardinality and participation. The number of entities Business Intelligence
that participate in the relationship is described by cardinality, whereas
participation constraints describe whether the entities are required or optional
in the relationship.

Consider a straightforward ER schema for a library database. Entities in this


schema could include "book," "author," and "borrower." Attributes such as
"book title," "author name," and "borrower ID" would be assigned to each
entity. The schema may include relationships such as "written by" between
"book" and "author" and "borrowed by" between "book" and "borrower."
These entities and relationships would be visually represented in the ER
schema, providing a high-level overview of the database structure.

Book Author
book_id author_id
title name
genre nationality
publish_year
publisher_id
Publisher Borrower
publisher_id borrower_id
name Name
address Email
phone
Borrowed Book Borrowing Log
book_id borrow_id
borrower_id book_id
borrow_date borrower_id
return_date borrow_date
return_date

Fig 4.7: ER Schema

We have four entities in this ER diagram (Fig 4.7): "Book," "Author,"


"Publisher," and "Borrower." Each entity has its own set of attributes, and the
lines connecting them represent the relationships between the entities.
Attributes of the "Book" entity include "book_id," "title," "genre,"
"publish_year," and "publisher_id." Attributes of the "Publisher" entity
include "publisher_id," "name," "address," and "phone." Attributes of the
"Author" entity include "author_id," "name," and "nationality." Attributes of
the "Borrower" entity include "borrower_id," "name," and "email."

The lines connecting the entities represent their relationships. The "Book"
entity, for example, has a "publisher_id" attribute that links it to the
"Publisher" entity. The "Borrowed Book" entity has attributes "book_id" and
"borrower_id" that link it to the "Book" and "Borrower" entities, respectively.
67
Business Intelligence Finally, the "Borrowing Log" entity has attributes that describe how
& Decision Making
borrowers borrow and return books, and it is linked to both the "Book" and
"Borrower" entities.

Processes for extract, transform, and load (ETL):


ETL (extract, transform, load) is a data integration and data warehousing
process. It entails extracting data from various sources, transforming it to
meet the needs of the target system, and loading it into the destination
system.

ETL is divided into three stages:


• Extract: Data is extracted from various sources such as databases, files,
APIs, and web services at this stage. Connecting to the source systems
and retrieving the data in the required format may be required.
• Transform: The extracted data is transformed at this stage to meet the
specific requirements of the target system. This can include data
cleaning, validation, standardization, enrichment, aggregation, and
integration.
• Load: The transformed data is loaded into the target system, such as a
data warehouse or a data lake, at this stage. Loading data into tables or
files, creating indexes, and optimizing the database for faster query
performance are all examples of this.
Consider an e-commerce company with multiple data sources, such as sales
transactions, customer profiles, and product information, as an example of an
ETL process. The company intends to build a data warehouse to analyze this
data and gain insights to improve its business operations.

Fig 4.8: Stages in ETL Process

Now that the data is in the data warehouse, the e-commerce company can
analyze the data and gain insights into its business operations using tools
such as SQL queries, data visualization software, and machine learning
algorithms. This can assist the company in making data-driven decisions to
improve customer satisfaction, boost sales, and optimize its supply chain.
ETL is a Data Warehousing process that stands for Extract, Transform, and
Load. An ETL tool extracts data from various data source systems,
transforms it in the staging area, and then loads it into the Data Warehouse
68 system.
Introduction to
Extraction: Business Intelligence

Extraction is the first step in the ETL process. In this step, data from various
source systems is extracted into the staging area in various formats such as
relational databases, No SQL, XML, and flat files. Because the extracted data
is in various formats and can be corrupted, it is critical to extract it from
various source systems and store it first in the staging area rather than directly
in the data warehouse. As a result, loading it directly into the data warehouse
may cause it to be damaged, and rollback will be much more difficult. As a
result, this is one of the most crucial steps in the ETL process.

Fig 4.9: ETL Process

Transformation:
Transformation is the second step in the ETL process. In this step, the
extracted data is subjected to a set of rules or functions to be converted into a
single standard format. It could include the following processes/tasks:
• Filtering is the process of loading only specific attributes into a data
warehouse.
• Cleaning entails replacing NULL values with default values, mapping
the U.S.A, United States, and America into the USA, and so on.
• Joining is the process of combining multiple attributes into one.
• Splitting is the process of dividing a single attribute into multiple
attributes.
• Sorting is the process of organizing tuples based on some attribute.
(generally key-attribute).

Loading:
Loading is the third and final step in the ETL process. The transformed data
is finally loaded into the data warehouse in this step. The data is sometimes
updated very frequently by loading it into the data warehouse, and other
times it is done at longer but regular intervals. The rate and duration of
loading are solely determined by the requirements and differ from system to
system.
69
Business Intelligence
& Decision Making 4.3 INTRODUCTION TO DATA MINING AND
ANALYTICS
Data mining and analytics are two related fields that involve extracting
insights and knowledge from data using computer algorithms and statistical
techniques. While the two terms are frequently used interchangeably, there
are some distinctions between them. Data mining is the process of identifying
patterns and relationships in large datasets by using algorithms. The goal of
data mining is to extract previously unknown insights and knowledge from
data that can then be used to make better decisions and predictions.

Data mining techniques can be used for a variety of purposes, including fraud
detection, market segmentation, and customer churn prediction. Analytics, on
the other hand, entails analyzing and interpreting data using statistical and
mathematical techniques. Analytics can be used to spot trends, forecast future
outcomes, and test hypotheses. In business, analytics is frequently used to
inform decision-making, such as optimizing pricing strategies or improving
supply chain efficiency.

Data mining and analytics are inextricably linked because both involve
working with data and extracting insights from it. Many techniques, such as
clustering, classification, and regression, are also shared. Data mining, on the
other hand, focuses on identifying patterns and relationships, whereas
analytics focuses on analyzing and interpreting data to make informed
decisions. Data mining and analytics are both critical tools for businesses and
organizations looking to maximize the value of their data. Businesses can
gain valuable insights into customer behavior, market trends, and operational
efficiency by using these techniques, which can help them stay ahead of the
competition and make data-driven decisions.
As an example, suppose a business wants to improve customer retention by
identifying customers who are likely to cancel their subscriptions. They have
a large dataset with data on customer demographics, behaviour, and
transaction history. To use data mining techniques, the company could group
customers based on their behaviour and transaction history using clustering
algorithms. Customers who have made large purchases in the past are more
likely to renew their subscriptions, whereas customers who have recently
decreased their purchase activity are more likely to cancel.

The company could use analytics techniques such as regression analysis to


determine which customer attributes are most strongly correlated with
subscription cancellation. Customers who are in a certain age group, live in a
certain geographic area, or use a specific payment method may be more
likely to cancel. The company can gain a better understanding of customer
behavior and develop more effective retention strategies by combining data
mining and analytics techniques. For example, based on data mining and
analytics insights, they may create targeted promotions for customers who are
about to cancel their subscriptions.

70
Introduction to
4.4 DATA GOVERNANCE AND SECURITY Business Intelligence

Data governance and security are two essential elements of any organization's
data management strategy. Data governance refers to the processes and
policies that ensure data is managed and used effectively and efficiently,
whereas data security refers to the safeguards put in place to prevent
unauthorized access, disclosure, modification, or destruction of data. The
creation of policies, procedures, and standards for managing data throughout
its lifecycle is referred to as data governance. It includes data quality,
privacy, security, and compliance management, as well as overall data asset
management. Data governance aims to ensure that data is managed
effectively and efficiently and that it is used to support business goals.

The following are the key components of data governance:

• Data ownership refers to the identification of individuals or teams who


are in charge of managing specific data sets.
• Data quality is the assurance that data is correct, complete, and
consistent.
• Data privacy is the protection of data following applicable laws and
regulations.
• Data security is the protection of data from unauthorized access,
disclosure, modification, or destruction.
• Data lifecycle management is the process of ensuring that data is
managed from creation to disposal.

Data governance also entails developing a data governance framework that


includes data management policies, processes, and procedures. This
framework should be backed up by appropriate technology solutions that help
automate data governance processes and ensure policy and standard
compliance.
Data security entails safeguarding data against unauthorized access,
disclosure, modification, or destruction. This can be accomplished through
several means, including:
• Controls for access: Restricting data access to only those who require it.
• Encryption: The process of converting data into an unreadable format
that can only be decrypted using a key.
• Backup and recovery: Ensuring that data can be recovered promptly.

Importance of data governance:


Data governance is the process of managing an organization's data's
availability, usability, integrity, and security. Data governance is critical
because it ensures that organizations can effectively manage their data assets
to meet business objectives, comply with regulations, and maintain the trust
of their stakeholders. Here are some of the main reasons why data
governance is critical:
71
Business Intelligence • Data integrity: Data governance that is effective ensures that data is of
& Decision Making
high quality, accurate, complete, and consistent. This improves decision-
making and reduces the likelihood of errors and inefficiencies.

• Compliance: Organizations must follow various laws and regulations


regarding data privacy, security, and confidentiality. Effective data
governance ensures that data is managed by these standards, lowering the
risk of legal and financial penalties.

• Risk management: Effective data governance lowers the risk of data


breaches, unauthorized access, and other security threats, thereby
protecting an organization's reputation and limiting potential financial
losses.
• Efficiency: Effective data governance aids in the streamlining of
processes and the reduction of costs associated with data management,
resulting in increased efficiency and productivity.
• Business strategy: Data governance assists organizations in aligning
their data management practices with their business objectives, allowing
them to better leverage their data assets and gain a competitive
advantage.

Data governance framework:


A data governance framework is a collection of policies, procedures,
standards, and guidelines that define how a company manages and protects
its data assets. A data governance framework's purpose is to ensure that data
is managed consistently, effectively, and securely across an organization
while adhering to applicable laws, regulations, and industry standards.

A typical data governance framework consists of the following elements:


• Data policies are high-level statements that define the organization's
approach to data management, such as how data is collected, processed,
stored, and shared.
• Data standards are detailed specifications outlining the requirements for
data quality, security, and data management processes. Standards help to
ensure consistency and accuracy in data handling.
• Data management processes are the procedures and workflows that help
with the data lifecycle, which includes data acquisition, processing,
storage, and distribution.
• Data stewardship is the delegation of responsibility for the oversight and
management of specific data assets to individuals or teams within an
organization.
• Data security and privacy policies and procedures for protecting data
from unauthorized access, use, or disclosure, as well as compliance with
data privacy regulations, are included.
• Data architecture and technology refer to the infrastructure, systems, and
tools used to manage data throughout an organization.
72
Introduction to
• Data quality management encompasses processes for ensuring data Business Intelligence
accuracy, completeness, consistency, and procedures for resolving data
quality issues.
• Roles and responsibilities of data governance: This defines the roles and
responsibilities of various stakeholders involved in data management,
such as data owners, data stewards, and data users.

Data security and privacy:


Modern technology and the digital age rely heavily on data security and
privacy. They refer to safeguarding sensitive information against
unauthorized access, theft, and misuse. Data security refers to the safeguards
put in place to protect information from malicious attacks and cyber threats.
This can include using encryption to protect data, installing firewalls to
prevent unauthorized access, and establishing authentication mechanisms to
ensure only authorized users have access.

Data privacy, on the other hand, is concerned with the management of


personal data, ensuring that it is collected, used, and shared in a manner that
respects individuals' privacy rights. This includes obtaining consent before
collecting data, securely storing data, and giving individuals control over how
their data is used and shared. Individuals and organizations can use several
best practices to ensure data security and privacy. These are some examples:
• Making use of strong passwords and enabling two-factor authentication.
• Updating software to prevent vulnerabilities from being exploited.
• Encryption is used to protect sensitive data both at rest and in transit.
• Only necessary data should be collected, and it should be securely stored.
• Providing clear and concise privacy policies, as well as obtaining consent
before data collection.
• Reviewing data protection policies and procedures regularly to ensure
they are up to date and effective.

4.5 BUSINESS INTELLIGENCE APPLICATIONS


Business intelligence (BI) applications are software tools that analyze,
process, and present large amounts of data from various sources to assist
organizations in making better business decisions. These applications
typically include features such as reporting, data visualization, dashboards,
and data mining that enable businesses to collect, process, and analyze data
from various departments within an organization. BI applications' primary
goal is to assist businesses in identifying trends and patterns in data and
making data-driven decisions that can lead to improved business
performance. Among the most common BI applications are:

• Tools for reporting: These applications assist businesses in producing


reports and presentations based on data gathered from a variety of
sources. These reports can be used to monitor key performance
indicators, identify areas for improvement, and make strategic decisions.
73
Business Intelligence • Data visualization tools: These are used to create visual representations
& Decision Making
of data to assist users in quickly understanding complex data sets.
Graphs, charts, and interactive dashboards are examples of this.

• Data mining tools: Extract information from large data sets to identify
patterns, correlations, and trends. This data can be used to inform
business decisions and improve operations.

• Performance management applications: These applications assist


businesses in tracking and measuring performance metrics across
departments and business units. Financial metrics, sales metrics, and
customer engagement metrics are examples of such metrics.

4.6 SUMMARY
This unit gives an overview of Business Intelligence (BI), which is the
process of gathering, analyzing, and transforming raw data into useful
information that businesses can use to make informed decisions. BI employs
a wide range of tools, technologies, and strategies to access and analyze data
from a variety of sources, including databases, spreadsheets, and data
repositories. This unit discusses the advantages of business intelligence, such
as its ability to provide insights into business operations, identify areas for
improvement, and enable data-driven decision-making, which can increase
revenue and profitability. Dashboards, reports, and data visualizations are
also highlighted as tools to assist decision-makers in interpreting complex
data and identifying patterns and trends.

The unit also discusses some common BI tools and technologies, such as data
warehouses, ETL (Extract, Transform, Load) tools, analytics software, and
data visualization platforms. It also discusses the significance of data quality,
data governance, and data security in business intelligence. Overall, this unit
provides a thorough overview of Business Intelligence and its significance in
modern business operations. It focuses on the key concepts, strategies, and
technologies involved in business intelligence and explains how they can be
used to gain a competitive advantage.

4.7 SELF-ASSESSMENT EXERCISES


• Consider any data set and assess your data analysis skills by
answering the following questions:
1. Can you interpret complex data sets and identify patterns and
trends?
2. Can you use statistical methods and tools to analyze data?
3. Are you familiar with data visualization techniques and tools?
• Case let: The Marketing Manager's Dilemma
Eva is the marketing manager for a large consumer goods company. She
is responsible for launching a new line of products and needs to make
some key decisions about the marketing strategy. She has access to a lot
of data, but she's not sure how to use it to make informed decisions.
74
Introduction to
Questions: Business Intelligence
1. What is the problem that Eva is facing?
2. How can Business Intelligence (BI) help Eva solve this problem?
3. What data sources should Eva consider when making decisions about the
marketing strategy?
4. What are some potential insights that BI could provide to Eva?
5. What types of BI tools and technologies could Eva use to analyze the
data and generate insights?
6. How can Eva ensure that the data she is using is of high quality and
reliable?
7. What steps can Eva take to ensure that she effectively communicates the
insights from BI to stakeholders within the company?
8. How can Eva use BI to measure the marketing strategy's success and
make necessary adjustments?

4.8 FURTHER READINGS


• "Business Intelligence: A Managerial Perspective on Analytics" by
Ramesh Sharda, Dursun Delen, Efraim Turban.
• "The Data Warehouse Toolkit: The Definitive Guide to Dimensional
Modeling" by Ralph Kimball and Margy Ross.
• "Data Science for Business: What You Need to Know about Data
Mining and Data-Analytic Thinking" by Foster Provost and Tom
Fawcett.

75
Business Intelligence
& Decision Making UNIT 5 INFORMATION AND DECISION
MAKING

Objectives
After studying this unit, you will be able to:
• Understand how to manage information and make effective decisions in
real-time.
• Understand the decision-making process and decision-making models.
• Develop skills to assess the quality and relevance of information,
including identifying biases and evaluating sources of information.

Structure
5.1 Introduction to Information & Decision Making
5.2 The Decision-Making Process
5.3 Information Sources and Systems
5.4 Decision-Making Models and Tools
5.5 Summary
5.6 Self-Assessment Exercises
5.7 Further Readings

5.1 INTRODUCTION TO INFORMATION &


DECISION MAKING
Businesses, organizations, and individuals all rely on accurate data and
effective decision-making processes to achieve their objectives. In today's
world, where data is plentiful and decision-making can be difficult, the ability
to manage information and make sound decisions is more important than
ever. This unit will look at Information and Decision Making, teaching
readers how to effectively manage data and make informed decisions in a
variety of situations. This unit discusses how important information and
decision-making are in today's society, and how these skills are needed in
fields ranging from healthcare to finance to education.

The processes of gathering, managing, and analyzing data to inform decision-


making processes are referred to as information and decision-making. It
entails using the information to make informed decisions, solve problems,
and achieve objectives. The ability to manage information and make sound
decisions is critical in today's world, where data is abundant, and decision-
making can be complex. Identifying the problem or question that needs to be
addressed, collecting relevant data, analyzing that data using appropriate
methods, evaluating potential solutions, and ultimately making a decision
based on the available information are all steps in effective information and
decision-making.

76
Information &
The Information & Decision-Making process can be applied in a variety of Decision Making
contexts, ranging from personal decision-making to business strategy
development. It is a valuable skill set for both individuals and organizations
because it allows them to make informed decisions that can have a significant
impact on their success.

The significance of data in decision-making:


Information is essential in decision-making because it provides the data and
insights required to make informed decisions. Decision-making becomes
difficult and risky in the absence of accurate and reliable information, as
decisions are made based on assumptions or incomplete data. Information
availability can assist decision-makers in understanding the problem or
opportunity at hand, as well as evaluating potential solutions and their likely
outcomes. Decision-makers can identify patterns and trends in data, assess
risks and opportunities, and forecast future outcomes by analyzing data. This
enables them to make more informed decisions and reduce the risks
associated with decision-making.

Furthermore, having access to relevant information can assist decision-


makers in avoiding biases and personal preferences that could otherwise
influence their decisions. Data can provide an objective foundation for
decision-making, ensuring that decisions are based on evidence rather than
intuition or personal preferences. In many cases, the quality of the
information used to make decisions is directly related to it. Access to
accurate, reliable, and timely information is thus critical to effective decision-
making at both the individual and organizational levels. It is impossible to
overestimate the value of information in decision-making. To make informed
decisions, minimize risk, and avoid biases, accurate and reliable information
is required. Decision-makers can improve their chances of success and
achieve their objectives by effectively leveraging information.

Role of technology in information & decision making


Technology is critical in information and decision-making because it allows
individuals and organizations to collect, process, and analyze large amounts
of data more efficiently and effectively than ever before. Technology has
transformed the way we access and use information, allowing us to make
more informed and timely decisions.

Some of the ways technology has influenced information and decision-


making are as follows:

• Data Collection: Technological advancements have made it easier to


collect and store data from a wide range of sources, including social
media, sensors, and other digital platforms. As a result, massive amounts
of data are now available to inform decision-making processes.

• Data Collection: Advances in technology have made it easier to collect


and store data from a variety of sources, such as social media, sensors,
and other digital platforms. As a result, large amounts of data are now
available, which can be used to inform decision-making processes.
77
Business Intelligence • Data Processing: Advances in technology have enabled large amounts
& Decision Making
of data to be processed and analyzed quickly and efficiently. Advanced
algorithms and analytics tools can detect patterns and trends in data,
assisting decision-makers in better comprehending complex situations.

• Data Visualization: Technology has enabled complex data to be


presented in more accessible and user-friendly ways. Data visualization
tools can assist decision-makers in quickly interpreting and
comprehending data, making it easier to identify trends and patterns.
• Collaboration: Advances in technology have made it easier for
individuals and groups to work together on data analysis and decision-
making. People can share and access data in real-time, regardless of their
location, thanks to cloud-based platforms and tools.

Individuals and organizations can now make better use of data and make
more informed decisions thanks to advances in technology. It has improved
data processing speed and accuracy, and it has made it easier for decision-
makers to collaborate and share information. As technology advances, we can
expect it to play an increasingly important role in information and decision-
making, assisting in driving innovation and improving outcomes in a variety
of fields.

Decision-making in the Information Age: Challenges and Opportunities


The information age has resulted in an abundance of data and information,
which has changed the way decisions are made. Decision-making in the
information age is both difficult and advantageous. Here are some of the
issues and opportunities that come with making decisions in the information
age:

Challenges:
• Information Overload: With so much available information, it can be
difficult to separate the relevant information from the noise.

• Misinformation and bias: The availability of information does not


guarantee its accuracy, and decision-makers must be wary of
misinformation and bias.

• Complexity: In the information age, the interconnectedness of data and


information can lead to complexity and difficulty in making decisions.
• Cybersecurity: As people's reliance on technology and the internet has
grown, cybersecurity threats have become more common, and decision-
makers must consider the security of their data.

Opportunities:
• Big Data: The availability of big data allows decision-makers to make
more informed decisions.

• Artificial Intelligence and Machine Learning: AI and machine learning


algorithms are being developed to assist decision-makers in analyzing
78
Information &
large amounts of data and making more accurate predictions. Decision Making

• Real-time Analytics: Real-time data analytics tools can provide timely


and accurate information to decision-makers.

• Collaboration: Decision-makers can collaborate across time zones and


locations thanks to the internet and other communication technologies.

• Innovation: The information age allows businesses to create new


products and services that leverage technology and data.

• The ability to leverage data and technology can give businesses a


competitive advantage, but it also necessitates careful consideration of
information accuracy and relevance.

5.2 THE DECISION-MAKING PROCESS


The decision-making process refers to the series of steps involved in selecting
a course of action from among several alternatives. Identifying a problem,
gathering information, analyzing alternatives, and selecting the best solution
are all part of the process. The decision-making process begins with
identifying a problem or opportunity that requires a decision, then gathering
relevant information about the situation, evaluating the available alternatives,
and choosing the best solution. Once a decision is made, it must be
implemented, and its effectiveness must be assessed to determine whether the
desired outcome was achieved.

Individuals and organizations must make effective decisions to achieve their


goals and objectives, and this requires critical thinking, analysis, and
consideration of the potential consequences of each option. The decision-
making process can be complex, involving multiple stakeholders with
varying perspectives and interests. To achieve the best results, it is critical to
ensure that the process is transparent, inclusive, and founded on sound
reasoning.

Fig 5.1: Steps in the decision-making process 79


Business Intelligence Step 1: Defining the problem or opportunity:
& Decision Making
The first step in the decision-making process is to define the problem or
opportunity, which involves identifying the issue or situation that requires a
decision. This is an important step because it lays the groundwork for the
entire decision-making process, and a well-defined problem or opportunity
can increase the likelihood of making an effective decision. Defining the
problem or opportunity entails several important steps, including:

• Identifying the specific problem or opportunity: This entails clearly


defining the problem or opportunity and identifying what needs to be
resolved or accomplished.

• Understanding the context: It is critical to comprehend the context of


the problem or opportunity, which includes any pertinent background
information, stakeholders, and potential constraints or challenges.

• Establishing goals and objectives: This entails establishing specific,


measurable goals and objectives for the decision-making process.

• Information gathering: It is critical to gather relevant information about


the problem or opportunity, which may entail conducting research,
consulting with experts, or analyzing data.

Defining the problem or opportunity necessitates careful consideration and


analysis of the current situation. This step is critical for ensuring that the
decision-making process is focused, effective, and results oriented.

Step 2: Gathering and analyzing information:

After identifying the problem, the next step is to gather information about it.
This is a critical step in the decision-making process because it provides the
necessary information and insights to make informed and effective decisions.
Gathering relevant data, analyzing it to identify patterns and trends, and using
that information to guide decision-making are all part of this step. In most
cases, several key components are involved in the process of gathering and
analyzing information:

• Identify the sources of information: Determine the information's


source and the key stakeholders who must be consulted. This could entail
conducting surveys, reviewing reports, or consulting with subject matter
experts.
• Gather the data: Gather the required data from the identified sources.
This may include both quantitative and qualitative data, such as statistics,
financial reports, and customer feedback.
• Examine the data: Analyze the data for trends, patterns, and insights.
This could include the use of data visualization software, statistical
analysis software, or other analytical methods.
• Draw conclusions and make recommendations: Draw conclusions
about the information gathered based on the analysis and use this to
inform the decision-making process. Based on the information gathered,
80 make recommendations for the best course of action.
Information &
It enables decision-makers to make informed decisions based on data and Decision Making
insights, increasing the likelihood of success while decreasing the risk of
failure.

Step 3: Evaluating alternatives:

After gathering information, the next step is to evaluate various options or


solutions to the problem. This step entails weighing the advantages and
disadvantages of each alternative and considering the possible outcomes.
Evaluating alternatives is an important step in the decision-making process
that involves assessing and comparing various options to determine the best
course of action. This step entails weighing the advantages and disadvantages
of each alternative, identifying potential risks and benefits, and selecting the
best option based on the criteria established earlier in the decision-making
process. Several key components are typically involved in the process of
evaluating alternatives:
• Establish criteria: Before evaluating alternatives, it is critical to define
the criteria that will be used to evaluate each option. Cost, feasibility,
impact on stakeholders, and alignment with organizational goals may be
among these criteria.
• Determine alternatives: Determine a variety of viable alternatives that
meet the established criteria. This could include brainstorming sessions,
expert consultations, or research.
• Consider the following alternatives: Consider the pros and cons,
potential risks and benefits, and any other relevant factors when
evaluating each alternative against the established criteria. This may
entail creating a decision matrix or comparing the alternatives using
other analytical tools.

• Choose the best option: Select the alternative that best meets the
established criteria and is most likely to achieve the desired outcomes
based on the assessment. This may entail consulting with stakeholders or
conducting additional research to confirm the decision.

Evaluating alternatives is an important step in the decision-making process


because it allows decision-makers to make informed choices based on a
thorough examination of available options. Decision-makers can increase the
likelihood of success and avoid potential pitfalls by carefully considering the
pros and cons of each alternative and selecting the most appropriate option.

Step 4: Selecting the best alternative:


• Choosing the best option is an important step in the decision-making
process. When making decisions, we frequently face multiple options,
each with its own set of benefits and drawbacks. The goal is to find the
solution that best meets our objectives and constraints. Some of the steps
in a decision-making process to select the best alternative:
• Consider the alternatives: Evaluate each alternative in light of the
objectives and criteria you've established. Consider the advantages and
81
Business Intelligence disadvantages of each option.
& Decision Making
• Consider the following outcomes: Consider the potential outcomes for
each alternative. What are the most likely outcomes of each option?

• Evaluate the risks: Consider the risks associated with each option. What
are the possible negative outcomes of each option, and how likely are
they to occur?

• Analyze trade-offs: Think about any trade-offs you might need to make
between different options. One option, for example, may be less
expensive but take longer to implement, whereas another may be more
expensive but faster.

• Consider the feasibility of each alternative: Consider the feasibility of


each alternative. Can you realistically implement each option given your
available resources, time, and other constraints?

• Make a choice: Make a decision on which option to select based on your


evaluation of the alternatives. Make a note of your decision-making
process and the reasoning behind it.

In a decision-making process, selecting the best alternative necessitates


careful consideration of the options, consideration of potential outcomes and
risks, analysis of trade-offs, and assessment of feasibility. By following these
steps, you can make an informed decision that best meets your objectives and
constraints. Based on the analysis, the decision-maker will select the
alternative that appears to be the best solution to the problem.

Step 5: Implementing the decision:


• Following the decision, the next step is to put the chosen alternative into
action. The decision is implemented at the end of the decision-making
process when it is put into action. It entails carrying out the chosen
course of action and monitoring the results to ensure that they correspond
to the desired outcome. The following steps may be useful in effectively
implementing a decision:

• Assign responsibilities: Clearly define who is responsible for carrying


out each component of the decision, including the tasks and actions
required to put the decision into action.
• Communicate the decision: Communicate the decision to all relevant
internal and external parties and stakeholders who may be affected by the
decision. This ensures that everyone understands the decision and their
role in carrying it out.
• Allocate resources: Ensure that the decision's resources, such as funds,
personnel, and materials, are available.
• Create an action plan: Create a detailed action plan outlining the
specific steps and timelines required to carry out the decision. This plan
should identify potential roadblocks and offer solutions to overcome
82 them.
Information &
• Put the decision into action: Follow the action plan and complete the Decision Making
necessary tasks and actions to put the decision into action.

• Monitor and assess: Monitor the implementation process to ensure that


it is going as planned and evaluate the results to see if the desired
outcome was achieved. If the desired outcome is not achieved, changes
to the decision or the implementation plan may be required.

The final step in the decision-making process is implementation, which


entails carrying out the chosen course of action, monitoring the process, and
evaluating the results to ensure that they align with the desired outcome.

Step 6: Monitoring and evaluating the decision:

The final step in the decision-making process is to assess the decision's


outcomes. This step entails determining whether the decision was successful
in resolving the problem and producing the desired results. Monitoring and
evaluating are critical components of decision-making because they help to
ensure that decisions are effective and efficient.

Monitoring is the continuous process of gathering and analyzing data to


determine how well a decision is being implemented and whether it is
achieving its intended goals. It entails establishing systems and procedures to
track the decision's progress and identify any problems or issues that may
arise. For example, if a company decides to implement a new marketing
strategy, it must track sales and customer feedback to determine how
effective the strategy is.

The process of determining the effectiveness of a decision is referred to as


evaluating. It entails analyzing the data gathered through monitoring and
comparing it to the decision's original goals and objectives. This enables
decision-makers to assess whether the decision was successful and whether
any changes should be made in the future. For example, if a company
launches a new product line, it must analyze sales data and customer
feedback to determine whether the product is meeting expectations and
whether any changes to the product or marketing strategies are required.

Monitoring and evaluating decisions are critical components of the decision-


making process because they allow decision-makers to assess the
effectiveness of their decisions and make necessary adjustments.
Organizations can improve their decision-making processes and achieve
better results by monitoring and evaluating decisions.

5.3 INFORMATION SOURCES AND SYSTEMS


The means and methods used to gather, process, store, and disseminate
information are referred to as information sources and systems. Today's
information sources and systems include traditional print media, electronic
media, databases, search engines, and social media platforms, among others.
Information sources are classified as primary, secondary, or tertiary. First-
hand information, such as original research studies, interviews, or
government documents, is referred to as primary sources. Secondary sources
83
Business Intelligence are data derived from primary sources, such as books, articles, or reviews.
& Decision Making
Tertiary sources, such as encyclopedias, handbooks, or databases, aggregate
and synthesize information from primary and secondary sources.

In contrast, information systems are the tools and technologies used to


manage information. Databases, content management systems, search
engines, and information retrieval systems are examples of these systems.
These systems are intended to store, retrieve, and organize information to
make it more accessible and usable. In today's information-driven world,
effective use of information sources and systems is critical. These systems
are used by researchers, businesses, and individuals to collect and analyze
data, make informed decisions, and stay up to date on the latest trends and
developments in their fields. However, it is critical to use these systems
critically and to assess the accuracy and dependability of the information
obtained.

Internal vs. external information sources:


Internal information sources are information generated within a company or
organization. This can include data on sales, operations, employee
performance, financial reports, and other information that only those within
the organization have access to. Internal information sources are frequently
used to improve decision-making processes within organizations because
they provide insights into the operations and performance of the organization.
External information sources, on the other hand, refer to data obtained from
sources outside the organization. This can include news articles, market
research reports, government statistics, customer feedback, and other publicly
available or obtainable data. External information sources are frequently used
to comprehend broader industry and market trends, identify potential threats
and opportunities, and make sound business decisions.

Fig 5.2: Internal vs. external information

While both internal and external information sources have advantages,


organizations often benefit from a combination of the two. Internal sources
can provide a more in-depth understanding of the organization's strengths and
weaknesses, whereas external sources can provide a broader view of market
and industry trends. Combining internal and external sources can provide a
84
Information &
more comprehensive understanding of the organization's environment, Decision Making
resulting in more effective decision-making processes.

Information systems and databases


Databases and information systems are two interconnected concepts that are
critical in the worlds of technology and business. An information system is a
collection of people, processes, and technologies that work together within an
organization to organize, store, process, and disseminate information. An
information system's primary goal is to assist an organization in making
better decisions by providing accurate and timely information. Accounting,
inventory management, customer relationship management, and supply chain
management are all examples of how information systems can be used.
A database, on the other hand, is a collection of data that has been organized
and stored in such a way that it can be retrieved and manipulated efficiently.
A database management system (DBMS) typically manages the data and
provides tools for creating, managing, and querying the data. Databases can
be used for a variety of purposes, including customer data management,
inventory tracking, and financial data analysis.
Because an information system typically relies on a database to store and
manage its data, the two are inextricably linked. A database, for example, is
used by a customer relationship management (CRM) system to store
customer data such as names, addresses, and purchase histories. This data is
then used by the CRM system to provide insights into customer behavior,
such as which products are the most popular, which customers are the most
profitable, and which customers are most likely to churn.
Databases and information systems are both critical components of modern
businesses and organizations. A database is a structured and efficient way to
store and retrieve data, whereas an information system provides a framework
for managing and processing information. They work together to help
organizations make better decisions and run more efficiently.

Fig 5.3: Critical Components of information


85
Business Intelligence The organization is the structure of the business or enterprise in which the
& Decision Making
information system operates and is referred to as the organization. This
includes the roles and responsibilities of various employees, the workflow of
business processes, and the organization's overall culture and values. To be
effective, an information system must be designed to fit within the context of
the organization.
Information Technology (IT) refers to the hardware, software, and
networking infrastructure that comprise an information system's technical
components. Computers, servers, databases, software applications, and
network devices such as routers and switches are all included. An
information system's IT component is in charge of storing, processing and
transmitting data and information.

Management refers to the processes of strategic planning and decision-


making that guide the use and development of an information system. Setting
goals and objectives, allocating resources, establishing policies and
procedures, and monitoring performance are all part of this process. Effective
information system management is essential for ensuring that it aligns with
the organization's goals and supports its business processes.

5.4 DECISION-MAKING MODELS AND TOOLS


Several techniques can be used to assist organizations in making informed
and effective decisions when it comes to decision-making models and tools
for information systems. Decision support systems (DSS), Expert systems,
Analytic hierarchy process (AHP), and Multi-Criteria Decision Analysis are
some popular models and tools. (MCDA).

• Decision support systems (DSS):


These systems are computer-based tools that give decision-makers access to
data and analytical tools to help them make decisions. Among other things,
DSS can assist with data analysis, simulation, and optimization. A Decision
Support System (DSS) is an information system that assists individuals or
groups in making critical decisions. These systems are intended to assist
users in analyzing data, identifying patterns, and making more informed
decisions based on the information available. In business and management
contexts, DSS are frequently used to aid in decision-making processes such
as strategic planning, budgeting, and forecasting. They typically include
software tools and databases that can be tailored to a specific user's or
organization's needs.

A key feature of a DSS is that it allows users to interact with the system in
real-time, allowing them to ask questions, change inputs, and see the results
of their decisions right away. This interactivity assists users in better
understanding the consequences of their decisions and making more informed
decisions. Depending on the specific problem being addressed, DSS can be
designed to use a variety of decision-making techniques, such as
optimization, simulation, and artificial intelligence. They may also include
data visualization tools, such as charts and graphs, to assist users in
86
Information &
comprehending the information presented. Decision Making

Decision support system (DSS) components are consists of three different


parts:
• knowledge database,
• software and
• user interface.

• Knowledge base: A knowledge base is an essential component of a


decision support system database that contains data from both internal
and external sources. It is a library of information on specific subjects
and the component of a DSS that stores information used by the system's
reasoning engine to determine a course of action.

• Software system: Model management systems make up the software


system. A model is a simulation of a real-world system created to learn
how it works and how it can be improved. Organizations use models to
predict how different system adjustments will affect outcomes. Models,
for example, can be useful in understanding systems that are too
complicated, expensive, or dangerous to fully explore in real life. That is
the concept behind computer simulations, which are used in scientific
research, engineering tests, weather forecasting, and a variety of other
applications. Models can also be used to represent and explore non-
existent systems, such as a proposed new technology, a planned factory,
or a company's supply chain. Businesses also use models to predict the
outcomes of various system changes, such as policies, risks, and
regulations, to make business decisions.

• User interface: The user interface makes it simple to navigate the system.
The primary goal of the user interface of the decision support system is
to make it simple for the user to manipulate the data stored on it. The
interface can be used by businesses to assess the effectiveness of DSS
transactions for end users. Simple windows, complex menu-driven
interfaces, and command-line interfaces are all examples of DSS
interfaces.

Below are the Types of Decision support systems (DSS):

TYPES WHAT IT DOES HOW IT WORKS


Data-driven Decisions are based on Data mining techniques are
data from internal or used to identify trends and
external databases. patterns to forecast future
events. Frequently used to
aid in inventory, sales, and
other business decisions.
Model-driven Customized to meet a Analyzes various scenarios
specific set of user to meet user requirements,
requirements. such as assisting with
scheduling or developing
financial statements.
87
Business Intelligence Communication Allows multiple people Improves overall system
& Decision Making
-driven and to work on the same task efficiency and effectiveness
group by utilizing a variety of by increasing collaboration
communication tools. between users and the
system.
Knowledge- A knowledge Users are given data that is
driven management system consistent with a company's
maintains a constantly business processes and
updated knowledge base knowledge base.
in which data is stored.
Document- A type of information Allows users to search
driven management system that webpages or databases, as
retrieves data from well as find specific search
documents. terms such as policies and
procedures, meeting
minutes, and corporate
records.

Expert systems:
These are computer programs that use artificial intelligence to simulate a
human expert's decision-making abilities. Based on their knowledge and
experience, they can make recommendations and offer advice. Expert
systems are computer-based decision-making tools that solve complex
problems by applying knowledge and rules. These systems are intended to
mimic the decision-making abilities of a human expert in a particular domain.
An expert system captures and encodes a human expert's knowledge and
expertise in a computer program. This knowledge is typically represented by
rules and if-then statements, which the computer can use to reason and make
decisions.

Expert systems are used in a variety of fields, including medicine, finance,


engineering, and others, where decisions must be made based on complex
data and a thorough understanding of the subject matter. An expert system's
decision-making process typically includes the following steps:

• Knowledge acquisition: Acquisition of knowledge and rules: The expert


system is programmed with knowledge and rules obtained from one or
more domain experts.

• Inference engine: The inference engine is at the heart of the expert


system. It makes decisions based on the data provided by applying
knowledge and rules.

• User interface: The expert system's user interface allows the user to
interact with it and provide input data.

• Explanation facility: The explanation feature provides the user with an


explanation of the reasoning behind the expert system's decision.

88
Information &
Because they provide a consistent and reliable way to make decisions based Decision Making
on expert knowledge, expert systems can be an effective tool in decision-
making models. They are especially useful when there is a large amount of
data to be analyzed and the decision-making process necessitates a thorough
understanding of a particular domain.

Real-Time Case Scenarios:


The MYCIN system, developed in the 1970s to aid doctors in diagnosing and
treating bacterial infections, is an example of a real-time expert system.
MYCIN was designed to mimic an expert microbiologist's decision-making
process, and it used a knowledge base of rules to make treatment
recommendations based on the patient's symptoms and test results.
Another example is the Dendral system, which was developed in the 1960s to
use mass spectrometry to determine the molecular structure of organic
compounds. Dendral was created to mimic the decision-making process of
expert chemists by interpreting mass spectrometry data and identifying the
chemical structure of unknown compounds using a knowledge base of rules.
The Watson system, developed by IBM in recent years, is an example of an
expert system. Watson analyses large amounts of data and provides answers
to complex questions using natural language processing and machine learning
algorithms. Watson has been used in a variety of fields, including healthcare,
finance, and customer service.
Another example is the personal finance app Mint, which employs an expert
system to provide financial advice and assist users with money management.
Mint analyses a user's spending habits and provides personalized
recommendations for saving money and cutting expenses using a knowledge
base of rules and algorithms.

Analytic hierarchy process (AHP):


This is a decision-making technique that entails breaking down a complex
decision into smaller, more manageable parts and ranking them in order of
importance. Dr. Thomas Saaty developed the Analytic Hierarchy Process
(AHP) in the 1970s as a decision-making model and tool. AHP is a multi-
criteria decision-making method that breaks complex decisions down into
smaller, more manageable components to assist individuals or groups in
making complex decisions.

The AHP model functions by first identifying the decision criteria and then
evaluating the relative importance of each criterion. This is accomplished by
using pair-wise comparisons to compare the criteria to one another. In
pairwise comparisons, each criterion is compared to every other criterion, and
a score is assigned to indicate how much more important one criterion is than
the other. These scores are then used to rank the criteria in order of
importance. The AHP model is used to evaluate alternatives based on how
well they meet each criterion after the criteria have been prioritized. The
alternatives are ranked based on how well they perform on each criterion, and
the overall ranking is determined by a weighted sum of the rankings.
89
Business Intelligence The AHP model incorporates both quantitative and qualitative data into the
& Decision Making
decision-making process, which is a key feature. This is especially useful
when decisions must be made based on several factors, such as financial
considerations, technical requirements, and stakeholder preferences. A real-
world application of AHP is the evaluation of potential renewable energy
sources. AHP could be used by a team to compare the criteria for various
renewable energy sources such as wind, solar, and hydropower. They would
then assess the relative importance of each criterion and the performance of
each energy source on those criteria. They could make a decision about
which renewable energy source to pursue based on the results.

AHP is a decision-making model and tool that simplifies complex decisions


and provides a structured method for evaluating alternatives based on
multiple criteria. It is a useful decision-making tool in situations where
multiple factors must be considered, and it allows for the inclusion of both
quantitative and qualitative data.

Multi-Criteria Decision Analysis (MCDA):


To determine the best option, this technique involves evaluating various
alternatives based on multiple criteria such as cost, risk, and impact. The
Multi-Criteria Decision Analysis (MCDA) model and tool are used to assist
individuals or groups in making complex decisions by taking into account
multiple criteria or factors. MCDA is a versatile approach that enables
decision-makers to consider both quantitative and qualitative data. The
MCDA model operates by first identifying the decision criteria and then
weighting each criterion based on its relative importance. Criteria of various
types, such as financial, technical, environmental, or social criteria, can be
used.
Following the prioritization of the criteria, the MCDA model is used to
evaluate the alternatives based on how well they perform on each criterion.
The alternatives are ranked based on their performance on each criterion, and
the overall ranking is determined by a weighted sum of the rankings. MCDA
offers a structured approach to decision-making and allows decision-makers
to weigh the trade-offs between various criteria. It can assist decision-makers
in understanding the implications of various options and making more
informed decisions.

A real-world application of MCDA is the selection of a supplier for a


manufacturing company. The price of the products, the quality of the
products, the supplier's dependability, and the supplier's environmental record
could all be decision criteria. Each criterion would be assigned a weight
based on its importance, and each supplier would be evaluated based on how
well it performed on each criterion. They could make an informed decision
about which supplier to choose based on the results.

When making complex decisions, decision-makers can use the MCDA


decision-making model and tool to consider multiple criteria. It offers a
structured approach to decision-making and enables decision-makers to
weigh the trade-offs between various criteria. MCDA is a versatile approach
90 that can be used in a variety of contexts, including business, engineering, and
Information &
public policy. It is critical to select the appropriate model or tool for the Decision Making
specific decision-making task at hand, taking into account factors such as the
decision's complexity, the amount of data available, and the decision-makers
expertise.

Decision-making tools and software:


Decision-making software and decision-making tools are specialized
programmes or applications designed to assist individuals or organizations in
making better decisions. These tools and software come in a variety of forms,
ranging from spreadsheets and statistical analysis software to industry-
specific applications. Decision-making tools and software examples:

Excel (Microsoft): This popular spreadsheet software includes data analysis


tools, charts and graphs, and customizable templates for creating decision-
making models.

Tableau: Users can use this data visualization software to create interactive
dashboards and charts to analyze data and gain insights into complex
business problems.

IBM SPSS Statistics: Researchers and data scientists use this statistical
analysis software to analyze data and make predictions using a variety of
statistical techniques.
Gurobi: Organizations use this optimization software to optimize complex
operations such as supply chain management, logistics, and scheduling.
FICO Decision Management Suite: This software suite is intended to assist
businesses in automating and optimizing decision-making processes
throughout the organization, utilizing data and analytics to make informed
decisions.
Palantir: To make sense of large and complex datasets, government agencies
and organizations in industries such as finance, healthcare, and energy use
this data integration and analytics platform.
SAP Business Objects: This business intelligence software helps
organizations make informed decisions by providing tools for data
visualization, reporting, and analytics.

These are only a few of the many decision-making tools and software options
available today. The key is to select the appropriate tool for the decision-
making problem at hand and to use it effectively to gain insights and make
sound decisions.

5.5 SUMMARY
The unit on Information and Decision Making discusses the various types of
information, decision-making models, and decision-making tools. It
emphasizes the significance of information in decision-making, emphasizing
the need for relevant and trustworthy data. The rational decision-making
model, bounded rationality model, intuitive decision-making model, and
91
Business Intelligence political decision-making model are all discussed in this chapter. Each model
& Decision Making
takes a different approach to decision-making, and the model chosen is
determined by the decision context.

The unit also discusses decision-making tools like Expert Systems, Analytic
Hierarchy Process (AHP), and Multi-Criteria Decision Analysis. (MCDA).
These decision-making tools provide a structured approach, allowing
decision-makers to evaluate multiple criteria and make informed choices. In
the end, the chapter emphasizes the importance of effective information
management and decision-making for organizations to make sound, informed
decisions. It provides a decision-making framework, emphasizing the
importance of data, models, and tools in the decision-making process.

5.6 SELF-ASSESSMENT EXERCISES


Caselet:
You are a marketing manager at a pharmaceutical company, and your team
has been tasked with deciding whether to launch a new drug in the market.
The drug is effective in treating a rare disease, but the market for this disease
is small. The development cost of the drug was high, and the price to
manufacture it is expensive. Your team needs to decide whether to launch the
drug or not.

Questions:
1. What type of information do you need to make this decision?
2. Which decision-making model would you use to decide whether to
launch the drug or not?
3. What are the potential risks associated with launching the drug?
4. How can MCDA aid in making the decision to launch the drug?
5. How can expert systems be used to support the decision-making process
in this scenario?

5.7 FURTHER READINGS


• "Data Science for Business: What You Need to Know about Data
Mining and Data-Analytic Thinking" by Foster Provost and Tom
Fawcett
• "Expert Systems: Principles and Programming" by Joseph Giarratano
and Gary Riley
• "Applied Multiple Criteria Decision Making" edited by Carlos A. Bana e
Costa, João C. P. da Costa, and Jean-Marie Roussel
• "Decision Making: Cognitive Models and Explanations" by Steven J.
Sherman and Reid Hastie

92
Spreadsheet Analysis
UNIT 6 SPREADSHEET ANALYSIS

Objectives
After studying this unit, you will be able to:
• Learn how to navigate, input data, use formulas and functions, format
data, and use built-in features of spreadsheet software like Excel.
• Analyse and organise data and understand the essential functions and
formulas.
• Learn how to create charts, graphs, and other visualisations to help
communicate insights and trends in the data.

Structure
6.1 Introduction to Spreadsheet Analysis
6.2 Basic Functions and Formulas
6.3 Spreadsheet Design and Formatting
6.4 Pivot Tables and Data Analysis
6.5 Summary
6.6 Self-Assessment Exercises
6.7 Further Readings

6.1 INTRODUCTION TO SPREADSHEET


ANALYSIS:
Spreadsheet analysis is the process of using spreadsheets to organise,
manipulate, and analyse data. A spreadsheet is a computer program that
allows users to enter, manipulate, and analyse data in a tabular format.
Spreadsheets are widely used in many fields, including finance, accounting,
marketing, and operations. The basic structure of a spreadsheet consists of
rows and columns, with each intersection representing a cell. Each cell can
contain data such as numbers, text, and formulas. Formulas are equations that
perform calculations using the data in other cells.

Spreadsheets are helpful for many tasks, including budgeting, financial


forecasting, data analysis, and project management. They can be used to
create charts and graphs to help visualise data and to identify trends and
patterns. Spreadsheets also allow users to perform "what-if" analysis by
changing variables and observing the effects on the data. Excel is the most
widely used spreadsheet program, but other alternatives like Google Sheets or
OpenOffice Calc exist. Regardless of the software used, spreadsheet analysis
can help users make more informed decisions by providing a clear, organised
view of the data.

Spreadsheet analysis involves various tasks, including data entry, formatting,


manipulation, and research. Some standard techniques and tools used in
spreadsheet analysis:
93
Business Intelligence • Data entry: It needs to be entered into the spreadsheet before analysing
& Decision Making
data. Data can be manually entered into cells, copied and pasted from
other sources, or imported from external sources.

• Formatting: Formatting tools such as font, colour, and cell borders can
be used to make data more readable and visually appealing. Formatting
can also be used to highlight important data or to differentiate between
different types of data.
• Functions and formulas: Functions and formulas are used to perform
calculations on data in the spreadsheet. Common functions include SUM,
AVERAGE, and COUNT, while formulas use operators such as +, -, *,
and / to perform more complex calculations.
• Pivot tables: Pivot tables are a powerful tool used to summarise and
analyse large amounts of data. Pivot tables allow users to group, filter,
and analyse data in various ways, making it easier to identify trends and
patterns.
• Data validation: Data validation is used to ensure that data entered into
the spreadsheet meets certain criteria, such as a specific format or range
of values. This helps prevent errors and ensures that the data is accurate.
• Charts and graphs: Charts and graphs are used to visualise data and
make it easier to interpret. Common types of charts include bar charts,
line charts, and pie charts.
• Conditional formatting: Conditional formatting allows users to format
cells based on specific criteria. For example, cells can be formatted to
turn red if the value is below a certain threshold, or to turn green if the
value is above a certain threshold.

Spreadsheet analysis is a powerful tool for organising and analysing data, and
is widely used in many fields. With the right tools and techniques, users can
gain valuable insights into their data and make informed decisions based on
the results.

Advantages of Spreadsheet Analysis:


Spreadsheet analysis offers several advantages for organising and analysing
data. Here are some of the main advantages:

• Organise large amounts of data: Spreadsheets allow users to organise


large amounts of data in a structured and easy-to-understand format.
Data can be entered, sorted, and filtered in a variety of ways to make it
more accessible.
• Perform complex calculations: Spreadsheets allow users to perform
complex calculations and mathematical operations using formulas and
functions. This can save time and increase accuracy compared to manual
calculations.
• Visualise data: Spreadsheets offer a range of tools for visualising data,
such as charts and graphs. These tools make it easier to identify patterns,
trends, and outliers in the data.
94
Spreadsheet Analysis
• Collaborate with others: Spreadsheets can be easily shared with others,
allowing for collaboration and teamwork. Multiple users can edit and
update the same spreadsheet simultaneously, and changes are
automatically saved in real-time.
• Customise analysis: Spreadsheets offer high customisation, allowing
users to tailor their analysis to their specific needs. Users can select
which data to include in the analysis, apply custom formatting, and
create custom calculations.
• Make data-driven decisions: Spreadsheet analysis can help users make
informed, data-driven decisions based on their analysis. Organising and
analysing data in a structured and logical way allows users to gain
valuable insights and make more accurate predictions.

Common Spreadsheet Analysis Tools:


There are several spreadsheet analysis tools available that can help users to
organise, manipulate, and analyse data. Here are some of the most common
tools:
• Pivot Tables: Pivot tables are used to summarise, analyse, and
manipulate large amounts of data in a spreadsheet. Users can group,
filter, and sort data, as well as calculate subtotals and totals.
• Formulas and Functions: Formulas and functions allow users to
perform calculations and manipulate data in a spreadsheet. Examples
include SUM, AVERAGE, COUNT, IF, and VLOOKUP.
• Charts and Graphs: Charts and graphs are visual tools that can help
users to better understand and analyse data. Examples include bar charts,
line graphs, pie charts, and scatter plots.
• Data Validation: Data validation allows users to ensure that data
entered into a spreadsheet is accurate and consistent. Users can set rules
to validate data, such as requiring a specific data format or setting a
range of acceptable values.

• Conditional Formatting: Conditional formatting allows users to apply


formatting to cells based on certain conditions or criteria. This can help
to highlight important data or identify trends.

• What-If Analysis: What-If Analysis allows users to model different


scenarios by changing input values and observing the effect on
calculated results. This can help users to make more informed decisions
based on potential outcomes.

• Macros: Macros are automated scripts that can be created in a


spreadsheet to automate repetitive tasks or perform complex operations.
Macros can save time and increase accuracy.

These tools are just a few examples of the many spreadsheet analysis tools
available. By using these tools and others, users can gain valuable insights
into their data and make more informed decisions based on the results.
95
Business Intelligence Spreadsheet Analysis Use Cases:
& Decision Making
Spreadsheet analysis can be used in a wide variety of contexts and industries.
Here are some examples of use cases:

• Financial Analysis: Spreadsheet analysis is commonly used in financial


analysis to track expenses, calculate budgets, forecast cash flow, and
analyse financial statements.
• Sales and Marketing: Sales and marketing professionals use
spreadsheet analysis to track sales data, analyse customer behavior, and
forecast sales trends. This can help them to identify opportunities for
growth and improve marketing strategies.
• Human Resources: Human resources professionals use spreadsheet
analysis to track employee data, analyse staffing needs, and monitor
employee performance. This can help them to make informed decisions
about hiring, training, and retention.
• Project Management: Project managers use spreadsheet analysis to
track project timelines, budgets, and progress. This can help them to
identify potential roadblocks and make adjustments to keep projects on
track.
• Inventory Management: Inventory managers use spreadsheet analysis
to track inventory levels, monitor product demand, and forecast future
inventory needs. This can help them to optimise inventory levels and
reduce waste.
• Education: Educators use spreadsheet analysis to track student grades,
attendance, and performance. This can help them to identify areas where
students may need additional support and tailor instruction to meet
individual student needs.
• Research and Data Analysis: Researchers and data analysts use
spreadsheet analysis to organise, clean, and analyse large amounts of
data. This can help them to identify patterns, trends, and relationships in
the data.

These are just a few examples of the many ways that spreadsheet analysis can
be used to analyse and interpret data in various industries and contexts.

6.2 BASIC FUNCTIONS AND FORMULAS


There are two basic ways to perform calculations in Excel: Formulas and
Functions.

Excel formulas allow you to identify relationships between values in your


spreadsheet’s cells, perform mathematical calculations with those values, and
return the resulting value in the cell of your choice. Sum, subtraction,
percentage, division, average, and even dates/times are among the formulas
that can be performed automatically. For example, =A1+A2+A3+A4+A5,
which finds the sum of the range of values from cell A1 to cell A5.

96
Spreadsheet Analysis
Excel Functions: A formula is a mathematical expression that computes the
value of a cell. Functions are predefined formulas that are already in Excel.
Functions carry out specific calculations in a specific order based on the
values specified as arguments or parameters. For example, =SUM (A1:A10).
This function adds up all the values in cells A1 through A10.
This horizontal menu, shown below, in more recent versions of Excel, allows
you to find and insert Excel formulas into specific cells of your spreadsheet.
On the Formulas tab, you can find all available Excel functions in the
Function Library:

The more you use Excel formulas, the easier it will be to remember and
perform them manually. Excel has over 400 functions, and the number is
increasing from version to version. The formulas can be inserted into Excel
using the following method:
1. Simple insertion of the formula - Typing a formula in the cell:
Typing a formula into a cell or the formula bar is the simplest way to insert
basic Excel formulas. Typically, the process begins with typing an equal sign
followed by the name of an Excel function. Excel is quite intelligent in that it
displays a pop-up function hint when you begin typing the name of the
function.

97
Business Intelligence 2. Using the Insert Function option on the Formulas Tab:
& Decision Making
If you want complete control over your function insertion, use the Excel
Insert Function dialogue box. To do so, go to the Formulas tab and select the
first menu, Insert Function. All the functions will be available in the dialogue
box.

3. Choosing a Formula from One of the Formula Groups in the Formula Tab:
This option is for those who want to quickly dive into their favourite
functions. Navigate to the Formulas tab and select your preferred group to
access this menu. Click to reveal a sub-menu containing a list of functions.
You can then choose your preference. If your preferred group isn’t on the tab,
click the More Functions option — it’s most likely hidden there.

98
Spreadsheet Analysis
4. Use Recently Used Tabs for Quick Insertion:
If retyping your most recent formula becomes tedious, use the Recently Used
menu. It’s on the Formulas tab, the third menu option after AutoSum.

Basic Excel Formulas and Functions:


1. SUM:
The SUM formula in Excel is one of the most fundamental formulas you can
use in a spreadsheet, allowing you to calculate the sum (or total) of two or
more values. To use the SUM formula, enter the values you want to add
together in the following format: =SUM(value 1, value 2,…..).
Example: In the below example to calculate the sum of price of all the fruits,
in B9 cell type =SUM(B3:B8). this will calculate the sum of B3, B4, B5, B6,
B7, B8 Press “Enter,” and the cell will produce the sum: 430.

99
Business Intelligence 2. SUBTRACTION:
& Decision Making
To use the subtraction formula in Excel, enter the cells you want to subtract
in the format =SUM (A1, -B1). This will subtract a cell from the SUM
formula by appending a negative sign before the cell being subtracted.
For example, if A3 was 300 and B3 was 225, =SUM(A1, -B1) would perform
300 + -225, returning a value of 75 in D3 cell.

3. MULTIPLICATION:
In Excel, enter the cells to be multiplied in the format =A3*B3 to perform the
multiplication formula. An asterisk is used in this formula to multiply cell A3
by cell B3.
For example, if A3 was 300 and B3 was 225, =A1*B1 would return a value
of 67500.

Highlight an empty cell in an Excel spreadsheet to multiply two or more


values. Then, in the format =A1*B1…, enter the values or cells you want to
multiply together. The asterisk effectively multiplies each value in the
formula.
100
Spreadsheet Analysis
To return your desired product, press Enter. Take a look at the screenshot
above to see how this looks.

4. DIVISION:
To use the division formula in Excel, enter the dividing cells in the format
=A3/B3. This formula divides cell A3 by cell B3 with a forward slash, “/.”
For example, if A3 was 300 and B3 was 225, =A3/B3 would return a decimal
value of 1.333333333.

Division in Excel is one of the most basic functions available. To do so,


highlight an empty cell, enter an equals sign, “=,” and then the two (or more)
values you want to divide, separated by a forward slash, “/.” The output
should look like this: =A3/B3, as shown in the screenshot above.
5. AVERAGE:
The AVERAGE function finds an average or arithmetic mean of numbers. to
find the average of the numbers type = AVERAGE(A3.B3,C3….) and press
‘Enter’ it will produce average of the numbers in the cell.
For example, if A3 was 300, B3 was 225, C3 was 180, D3 was 350, E3 is 400
then =AVERAGE(A3,B3,C3,D3,E3) will produce 291.

101
Business Intelligence 6. IF formula:
& Decision Making
In Excel, the IF formula is denoted as =IF(logical test, value if true, value if
false). This lets you enter a text value into a cell “if” something else in your
spreadsheet is true or false.
For example, You may need to know which values in column A are greater
than three. Using the =IF formula, you can quickly have Excel auto-populate
a “yes” for each cell with a value greater than 3 and a “no” for each cell with
a value less than 3.

7. PERCENTAGE:
To use the percentage formula in Excel, enter the cells you want to calculate
the percentage for in the format =A1/B1. To convert the decimal value to a
percentage, select the cell, click the Home tab, and then select “Percentage”
from the numbers dropdown. There is no specific Excel “formula” for
percentages, but Excel makes it simple to convert the value of any cell into a
percentage so you don’t have to calculate and reenter the numbers yourself.
The basic setting for converting a cell’s value to a percentage is found on the
Home tab of Excel. Select this tab, highlight the cell(s) you want to convert
to a percentage, and then select Conditional Formatting from the dropdown
menu (this menu button might say “General” at first). Then, from the list of
options that appears, choose “Percentage.” This will convert the value of each
highlighted cell into a percentage. This feature can be found further down.

102
Spreadsheet Analysis

8. CONCATENATE:
CONCATENATE is a useful formula that combines values from multiple
cells into the same cell. For example , =CONCATENATE(A3,B3) will
combine Red and Apple to produce RedApple.

103
Business Intelligence 9. DATE:
& Decision Making
DATE is the Excel DATE formula =DATE(year, month, day). This formula
will return a date corresponding to the values entered in the parentheses,
including values referred to from other cells.. For example, if A2 was 2019,
B2 was 8, and C1 was 15, =DATE(A1,B1,C1) would return 15-08-2019.

10. TRIM:
The TRIM formula in Excel is denoted =TRIM(text). This formula will
remove any spaces that have been entered before and after the text in the cell.
For example, if A2 includes the name ” Virat Kohli” with unwanted spaces
before the first name, =TRIM(A2) would return “Virat Kohli” with no spaces
in a new cell.

104
Spreadsheet Analysis
11. LEN:
LEN is the function to count the number of characters in a specific cell when
you want to know the number of characters in that cell. =LEN(text) is the
formula for this. Please keep in mind that the LEN function in Excel counts
all characters, including spaces:
For example,=LEN(A2), returns the total length of the character in cell A2
including spaces.

6.3 SPREADSHEET DESIGN AND FORMATTING


Design and formatting are important aspects of spreadsheet analysis. Well-
designed and formatted spreadsheets are easier to read, understand, and use.
Here are some tips for designing and formatting spreadsheets:

Apply cell borders:


Select the cell or range of cells that you want to add a border to quickly select
the whole worksheet, click the Select All button.

On the Home tab, in the Font group, click the arrow next to Borders Button
image, and then click the border style that you want.

The Borders button displays the most recently used border style. You can
click the Borders button (not the arrow) to apply that style.
105
Business Intelligence Change text color and alignment:
& Decision Making
1. Select the cell or range of cells that contain (or will contain) the text that
you want to format. You can also select one or more portions of the text
within a cell and apply different text colors to those sections.
2. To change the color of text in the selected cells, on the Home tab, in
the Font group, click the arrow next to Font Color , and then
under Theme Colors or Standard Colors, click the color that you want to
use.
To apply a color other than the available theme colors and standard
colors, click More Colors, and then define the color that you want to use
on the Standard tab or Custom tab of the Colors dialog box.
3. To change the alignment of the text in the selected cells, on
the Home tab, in the Alignment group, click the alignment option that
you want.

For example, to change the horizontal alignment of cell contents, click Align
Text Left , Center , or Align Text Right .

Apply cell shading:


1. Select the cell or range of cells that you want to apply cell shading to.
2. On the Home tab, in the Font group, click the arrow next to Fill Color ,
and then under Theme Colors or Standard Colors, click the color that you
want.
Add or change the background color of cells:
You can highlight data in cells by using Fill Color to add or change the
background color or pattern of cells. Here's how:
1. Select the cells you want to highlight.
To use a different background color for the whole worksheet, click the Select
All button. This will hide the gridlines, but you can improve worksheet
readability by displaying cell borders around all cells.

2. Under Theme Colors or Standard Colors, pick the color you want.
106
Spreadsheet Analysis

To use a custom color, click More Colors, and then in the Colors dialog box
select the color you want. To apply the most recently selected color, you can
just click Fill Color You'll also find up to 10 most recently selected custom
colors under Recent Colors.

Apply a pattern or fill effects:


When you want something more than a just a solid color fill, try applying a
pattern or fill effects.

1. Select the cell or range of cells you want to format.


2. Click Home > Format Cells dialog launcher, or press Ctrl+Shift+F.

3. On the Fill tab, under Background Color, pick the color you want.

4. To use a pattern with two colors, pick a color in the Pattern Color box,
and then pick a pattern in the Pattern Style box.
To use a pattern with special effects, click Fill Effects, and then pick the
options you want.
107
Business Intelligence Remove cell colors, patterns, or fill effects:
& Decision Making
To remove any background colors, patterns, or fill effects from cells, just
select the cells. Then click Home > arrow next to Fill Color, and then
pick No Fill.

Print cell colors, patterns, or fill effects in color

If print options are set to Black and white or Draft quality — either on
purpose, or because the workbook has large or complex worksheets and
charts that caused draft mode to be turned on automatically — cells won't
print in color. Here's how you can fix that:
1. Click Page Layout > Page Setup dialog box launcher.

2. On the Sheet tab, under Print, uncheck the Black and white and Draft
quality check boxes.

Formatting Techniques for Effective Analysis:


Conditional Formatting: To apply conditional formatting in Excel, select the
cells you want to format, then click the "Conditional Formatting" button in
the "Home" tab of the ribbon. From there, you can choose a formatting rule
based on the data you want to format, such as highlighting cells that contain
specific text, values or formulas. The browser version of Excel provides a
number of built-in conditions and appearances:

The web browser version of Excel only offers a selection of built-in


conditional formatting options. The Excel application has the option of
108
Spreadsheet Analysis
creating fully customised conditional formatting rules. Example-the Speed
values of each pokemon is formatted with a Color Scale:

Conditional formatting, step by step: 1. Select the range of Speed values


C2:C9

2. Click on the Conditional Formatting icon in the ribbon, from


the Home menu
3. Select the Color Scales from the drop-down menu

There are 12 Color Scale options with different color variations.

The color on the top of the icon will apply to the highest values.
109
Business Intelligence 4. Click on the "Green - Yellow - Red Colour Scale" icon
& Decision Making

Now, the Speed value cells will have a colored background highlighting:

Dark green is used for the highest values, and dark red for the lowest values.
Charizard has the highest Speed value (100) and Squirtle has the lowest
Speed value (43). All the cells in the range gradually change color from
green, yellow, orange, then red.

Aligning Columns and Rows: To align columns or rows in Excel, select the
cells you want to align, then click the "Align Text" button in the "Home" tab
of the ribbon. From there, you can choose to left, center, or right align text
and data.

If you’d like to realign text in a cell to enhance the visual presentation of your
data, here’s how you can do it:

1. Select the cells that have the text you want aligned.
2. On the Home tab choose one of the following alignment options:

110
Spreadsheet Analysis
3. To vertically align text, pick Top Align , Middle Align , or Bottom
Align .
4. To horizontally align text, pick Align Text Left , Center , or Align
Text Right .
5. When you have a long line of text, part of the text might not be visible.
To fix this without changing the column width, click Wrap Text.
6. To center text spanning several columns or rows, click Merge & Center.

Undo alignment changes:


• To remove an alignment change immediately after you apply it, click
Undo.
• To make alignment changes later, select the cell or cell range you want
to change, and click Clear > ClearFormats.

Creating Charts and Graphs:


In Excel, a graph or chart is a visual representation of data that makes it
easier to understand and analyse. Graphs and charts are used to display data
in a way that allows you to identify trends and patterns. Excel provides a
wide range of chart types, including column charts, line charts, pie charts, bar
charts, area charts, scatter plots, and more. You can choose the chart type that
best suits your data and your needs. To create a chart in Excel, you first need
to select the data you want to use. Once you have selected the data, you can
choose the chart type and customise the chart to suit your needs. You can
change the chart's title, axis labels, and other properties, and you can also
format the chart to make it more visually appealing. Excel also provides a
range of tools for analysing data in charts, such as trendlines, error bars, and
data labels. These tools can help you identify trends and outliers in your data
and make it easier to communicate your findings to others.

To create a chart or graph in Excel, select the data you want to chart, then
click the "Insert" tab on the ribbon. From there, you can choose the type of
chart or graph you want to create, such as a bar chart, line graph, or pie chart.
You can then use the formatting tools in the "Chart Tools" tab to customise
the appearance of your chart or graph.

Usage of Each Chart and Graph Type in Excel:


Excel offers a large library of charts and graphs types to display your data.
While multiple chart types might work for a given data set, you should select
the chart that best fits the story that the data is telling.
111
Business Intelligence In Excel 2016, there are five main categories of charts or graphs:
& Decision Making
• Column Charts: Some of the most commonly used charts, column
charts, are best used to compare information or if you have multiple
categories of one variable (for example, multiple products or genres).
Excel offers seven different column chart types: clustered, stacked, 100%
stacked, 3-D clustered, 3-D stacked, 3-D 100% stacked, and 3-D,
pictured below. Pick the visualisation that will best tell your data’s story.

• Bar Charts: The main difference between bar charts and column charts
are that the bars are horizontal instead of vertical. You can often use bar
charts interchangeably with column charts, although some prefer column
charts when working with negative values because it is easier to visualise
negatives vertically, on a y-axis.

• Pie Charts: Use pie charts to compare percentages of a whole (“whole”


is the total of the values in your data). Each value is represented as a
piece of the pie so you can identify the proportions. There are five pie
chart types: pie, pie of pie (this breaks out one piece of the pie into
another pie to show its sub-category proportions), bar of pie, 3-D pie,
and doughnut.

112
Spreadsheet Analysis

• Line Charts: A line chart is most useful for showing trends over time,
rather than static data points. The lines connect each data point so that
you can see how the value(s) increased or decreased over a period of
time. The seven line chart options are line, stacked line, 100% stacked
line, line with markers, stacked line with markers, 100% stacked line
with markers, and 3-D line.

• Scatter Charts: Similar to line graphs, because they are useful for
showing change in variables over time, scatter charts are used
specifically to show how one variable affects another. (This is called
correlation.) Note that bubble charts, a popular chart type, is categorised
under scatter. There are seven scatter chart options: scatter, scatter with
smooth lines and markers, scatter with smooth lines, scatter with straight
lines and markers, scatter with straight lines, bubble, and 3-D bubble.

113
Business Intelligence
& Decision Making

There are also four minor categories. These charts are more use case-specific:
• Area: Like line charts, area charts show changes in values over time.
However, because the area beneath each line is solid, area charts are
useful to call attention to the differences in change among multiple
variables. There are six area charts: area, stacked area, 100% stacked
area, 3-D area, 3-D stacked area, and 3-D 100% stacked area.

• Stock: Traditionally used to display the high, low, and closing price of
stock, this type of chart is used in financial analysis and by investors.
However, you can use them for any scenario if you want to display the
range of a value (or the range of its predicted value) and its exact value.
Choose from high-low-close, open-high-low-close, volume-high-low-
close, and volume-open-high-low-close stock chart options.

114
Spreadsheet Analysis

• Surface: Use a surface chart to represent data across a 3-D landscape.


This additional plane makes them ideal for large data sets, those with
more than two variables, or those with categories within a single
variable. However, surface charts can be difficult to read, so make sure
your audience is familiar with them. You can choose from 3-D surface,
wireframe 3-D surface, contour, and wireframe contour.

• Radar: When you want to display data from multiple variables in


relation to each other use a radar chart. All variables begin from the
central point. The key with radar charts is that you are comparing all
individual variables in relation to each other — they are often used for
comparing strengths and weaknesses of different products or employees.
There are three radar chart types: radar, radar with markers, and filled
radar.

115
Business Intelligence How to Chart Data in Excel:
& Decision Making
To generate a chart or graph in Excel, you must first provide the program
with the data you want to display. Follow the steps below to learn how to
chart data in Excel 2016.

Step 1: Enter Data into a Worksheet


1. Open Excel and select New Workbook.
2. Enter the data you want to use to create a graph or chart. In this example,
we’re comparing the profit of five different products from 2013 to 2017.
Be sure to include labels for your columns and rows. Doing so enables
you to translate the data into a chart or graph with clear axis labels. You
can download this sample data below.

Step 2: Select Range to Create Chart or Graph from Workbook Data


1. Highlight the cells that contain the data you want to use in your graph by
clicking and dragging your mouse across the cells.
2. Your cell range will now be highlighted in gray and you can select a
chart type.

How to Make a Chart in Excel


After you input your data and select the cell range, you’re ready to choose the
chart type. In this example, we’ll create a clustered column chart from the
data we used in the previous section.

Step 1: Select Chart Type


Once your data is highlighted in the Workbook, click the Insert tab on the top
banner. About halfway across the toolbar is a section with several chart
options. Excel provides Recommended Charts based on popularity, but you
can click any of the dropdown menus to select a different template.
116
Spreadsheet Analysis

Step 2: Create Your Chart


1. From the Insert tab, click the column chart icon and select Clustered
Column.

1. Excel will automatically create a clustered chart column from your


selected data. The chart will appear in the center of your workbook.
2. To name your chart, double click the Chart Title text in the chart and
type a title. We’ll call this chart “Product Profit 2013 - 2017.”

We’ll use this chart for the rest of the walkthrough. You can download this
same chart to follow along.
117
Business Intelligence
& Decision Making
COLUMN CHART TEMPLATE

PRODUCT 2013 2014 2015 2016 2017


Product A $18,580 $49,225 $16,326 $10,017 $26,134
Product B $78,970 $82,262 $48,640 $48,640 $48,640
Product C $24,236 $131,390 $79,022 $71,009 $81,474
Product D $16,730 $19,730 $12,109 $11,355 $17,686
Product E $35,358 $42,685 $20,893 $16,065 $21,388

There are two tabs on the toolbar that you will use to make adjustments to
your chart: Chart Design and Format. Excel automatically applies design,
layout, and format presets to charts and graphs, but you can add
customisation by exploring the tabs. Next, we’ll walk you through all the
available adjustments in Chart Design.

Step 3: Add Chart Elements


Adding chart elements to your chart or graph will enhance it by clarifying
data or providing additional context. You can select a chart element by
clicking on the Add Chart Element dropdown menu in the top left-hand
corner (beneath the Home tab).

118
Spreadsheet Analysis
To Display or Hide Axes:
1. Select Axes. Excel will automatically pull the column and row headers
from your selected cell range to display both horizontal and vertical axes
on your chart (Under Axes, there is a check mark next to Primary
Horizontal and Primary Vertical.)

2. Uncheck these options to remove the display axis on your chart. In this
example, clicking Primary Horizontal will remove the year labels on the
horizontal axis of your chart.

3. Click More Axis Options… from the Axes dropdown menu to open a
window with additional formatting and text options such as adding tick
marks, labels, or numbers, or to change text color and size.

119
Business Intelligence
& Decision Making

To Add Axis Titles:

1. Click Add Chart Element and click Axis Titles from the dropdown menu.
Excel will not automatically add axis titles to your chart; therefore,
both Primary Horizontal and Primary Vertical will be unchecked.

2. To create axis titles, click Primary Horizontal or Primary Vertical and a


text box will appear on the chart. We clicked both in this example. Type
your axis titles. In this example, the we added the titles “Year”
(horizontal) and “Profit” (vertical).

120
Spreadsheet Analysis

To Remove or Move Chart Title:


• Click Add Chart Element and click Chart Title. You will see four
options: None, Above Chart, Centered Overlay, and More Title Options.

• Click None to remove chart title.


• Click Above Chart to place the title above the chart. If you create a chart
title, Excel will automatically place it above the chart.
• Click Centered Overlay to place the title within the gridlines of the chart.
Be careful with this option: you don’t want the title to cover any of your
data or clutter your graph (as in the example below).

121
Business Intelligence To Add Data Labels:
& Decision Making
1. Click Add Chart Element and click Data Labels. There are six options
for data labels: None (default), Center, Inside End, Inside Base, Outside
End, and More Data Label Title Options.

2. The four placement options will add specific labels to each data point
measured in your chart. Click the option you want. This customisation
can be helpful if you have a small amount of precise data, or if you have
a lot of extra space in your chart. For a clustered column chart, however,
adding data labels will likely look too cluttered. For example, here is
what selecting Center data labels looks like:

To Add a Data Table:


1. Click Add Chart Element and click Data Table. There are three pre-
formatted options along with an extended menu that can be found by
clicking More Data Table Options:
122
Spreadsheet Analysis

• None is the default setting, where the data table is not duplicated within
the chart.
• With Legend Keys displays the data table beneath the chart to show the
data range. The color-coded legend will also be included.

• No Legend Keys also displays the data table beneath the chart, but
without the legend.

123
Business Intelligence Note: If you choose to include a data table, you’ll probably want to make
& Decision Making
your chart larger to accommodate the table. Simply click the corner of your
chart and use drag-and-drop to resize your chart.

To Add Error Bars:


1. Click Add Chart Element and click Error Bars. In addition to More
Error Bars Options, there are four options: None (default), Standard
Error, 5% (Percentage), and Standard Deviation. Adding error bars
provide a visual representation of the potential error in the shown data,
based on different standard equations for isolating error.

2. For example, when we click Standard Error from the options we get a
chart that looks like the image below.

To Add Gridlines:
1. Click Add Chart Element and click Gridlines. In addition to More Grid
Line Options, there are four options: Primary Major Horizontal, Primary
Major Vertical, Primary Minor Horizontal, and Primary Minor Vertical.
For a column chart, Excel will add Primary Major Horizontal gridlines
by default.
124
Spreadsheet Analysis

2. You can select as many different gridlines as you want by clicking the
options. For example, here is what our chart looks like when we click all
four gridline options.

To Add a Legend:
1. Click Add Chart Element and click Legend. In addition to More Legend
Options, there are five options for legend placement: None, Right, Top,
Left, and Bottom.

2. Legend placement will depend on the style and format of your chart.
Check the option that looks best on your chart. Here is our chart when 125
Business Intelligence we click the Right legend placement.
& Decision Making

To Add Lines: Lines are not available for clustered column charts. However,
in other chart types where you only compare two variables, you can add lines
(e.g. target, average, reference, etc.) to your chart by checking the appropriate
option.

To Add a Trendline:
1. Click Add Chart Element and click Trendline. In addition to More
Trendline Options, there are five options: None (default), Linear,
Exponential, Linear Forecast, and Moving Average. Check the
appropriate option for your data set. In this example, we will
click Linear.

126
Spreadsheet Analysis
2. Because we are comparing five different products over time, Excel
creates a trendline for each individual product. To create a linear
trendline for Product A, click Product A and click the blue OK button.

3. The chart will now display a dotted trendline to represent the linear
progression of Product A. Note that Excel has also added Linear
(Product A) to the legend.

4. To display the trendline equation on your chart, double click the


trendline. A Format Trendline window will open on the right side of your
screen. Click the box next to Display equation on chart at the bottom of the
window. The equation now appears on your chart.

127
Business Intelligence
& Decision Making

Note: You can create separate trendlines for as many variables in your chart
as you like. For example, here is our chart with trendlines for Product A and
Product C.

To Add Up/Down Bars: Up/Down Bars are not available for a column chart,
but you can use them in a line chart to show increases and decreases among
data points.
Step 4: Adjust Quick Layout
1. The second dropdown menu on the toolbar is Quick Layout, which
allows you to quickly change the layout of elements in your chart (titles,
legend, clusters etc.).
128
Spreadsheet Analysis

2. There are 11 quick layout options. Hover your cursor over the different
options for an explanation and click the one you want to apply.

Step 5: Change Colors


The next dropdown menu in the toolbar is Change Colors. Click the icon and
choose the color palette that fits your needs (these needs could be aesthetic,
or to match your brand’s colors and style).

129
Business Intelligence
& Decision Making

Step 6: Change Style


For cluster column charts, there are 14 chart styles available. Excel will
default to Style 1, but you can select any of the other styles to change the
chart appearance. Use the arrow on the right of the image bar to view other
options.

Step 7: Switch Row/Column


1. Click the Switch Row/Column on the toolbar to flip the axes. Note: It is
not always intuitive to flip axes for every chart, for example, if you have
more than two variables.

In this example, switching the row and column swaps the product and year
(profit remains on the y-axis). The chart is now clustered by product (not
year), and the color-coded legend refers to the year (not product). To avoid
confusion here, click on the legend and change the titles from Series to Years.

130
Spreadsheet Analysis

Step 8: Select Data


1. Click the Select Data icon on the toolbar to change the range of your
data.

2. A window will open. Type the cell range you want and click
the OK button. The chart will automatically update to reflect this new
data range.

131
Business Intelligence Step 9: Change Chart Type
& Decision Making
1. Click the Change Chart Type dropdown menu.

2. Here you can change your chart type to any of the nine chart categories
that Excel offers. Of course, make sure that your data is appropriate for
the chart type you choose.

3. You can also save your chart as a template by clicking Save as


Template…
4. A dialogue box will open where you can name your template. Excel will
automatically create a folder for your templates for easy organisation.
Click the blue Save button.

Step 10: Move Chart


1. Click the Move Chart icon on the far right of the toolbar.

2. A dialogue box appears where you can choose where to place your chart.
You can either create a new sheet with this chart (New sheet) or place
this chart as an object in another sheet (Object in). Click the
132
Spreadsheet Analysis
blue OK button.

Step 11: Change Formatting


1. The Format tab allows you to change formatting of all elements and text
in the chart, including colors, size, shape, fill, and alignment, and the
ability to insert shapes. Click the Format tab and use the shortcuts
available to create a chart that reflects your organisation’s brand (colors,
images, etc.).

2. Click the dropdown menu on the top left side of the toolbar and click the
chart element you are editing.

Step 12: Delete a Chart


To delete a chart, simply click on it and click the Delete key on your
keyboard.

By using these basic techniques, you can improve the design and formatting
of your spreadsheets and make them easier to read and understand.

6.4 PIVOT TABLES AND DATA ANALYSIS


A pivot table is a data summarisation tool used in spreadsheet programs like
Microsoft Excel or Google Sheets. It allows users to reorganise and
summarise large amounts of data in a flexible and dynamic way, making it
easier to understand and analyse. Pivot tables work by allowing users to
group and aggregate data based on specific criteria, such as dates, categories,
133
Business Intelligence or numerical values. The resulting table displays a summary of the data, with
& Decision Making
the ability to drill down into the details as needed.

Data analysis, on the other hand, is the process of examining and interpreting
data to identify patterns, trends, and insights. It is used to gain a deeper
understanding of data and to make informed decisions based on the results.
Pivot tables are a powerful tool for data analysis, as they allow users to
quickly and easily summarise and manipulate large amounts of data. By
using pivot tables in combination with other data analysis techniques, such as
charts, graphs, and statistical analysis, users can gain valuable insights into
their data and make informed decisions based on the results.

Consider the following table of sales data. From this data, you might have to
summarise total sales region wise, month wise, or salesperson wise. The easy
way to handle these tasks is to create a PivotTable that you can dynamically
modify to summarise the results the way you want.

Creating PivotTable
To create PivotTables, ensure the first row has headers.
• Click the table.
• Click the INSERT tab on the Ribbon.
• Click PivotTable in the Tables group. The PivotTable dialog box
appears.

134
Spreadsheet Analysis

As you can see in the dialog box, you can use either a Table or Range from
the current workbook or use an external data source.
• In the Table / Range Box, type the table name.
• Click New Worksheet to tell Excel where to keep the PivotTable.
• Click OK.

A Blank PivotTable and a PivotTable fields list appear.

135
Business Intelligence
& Decision Making

Recommended PivotTables
In case you are new to PivotTables or you do not know which fields to select
from the data, you can use the Recommended PivotTables that Excel
provides.
• Click the data table.
• Click the INSERT tab.
• Click on Recommended PivotTables in the Tables group. The
Recommended PivotTables dialog box appears.

In the recommended PivotTables dialog box, the possible customised


PivotTables that suit your data are displayed.
• Click each of the PivotTable options to see the preview on the right side.
• Click the PivotTable Sum of Order Amount by Salesperson and month.

136
Spreadsheet Analysis

Click OK. The selected PivotTable appears on a new worksheet. You can
observe the PivotTable fields that was selected in the PivotTable fields list.

PivotTable Fields
The headers in your data table will appear as the fields in the PivotTable.

You can select / deselect them to instantly change your PivotTable to display
only the information you want and in a way that you want. For example, if
you want to display the account information instead of order amount
information, deselect Order Amount and select Account.

137
Business Intelligence
& Decision Making

PivotTable Areas
You can even change the Layout of your PivotTable instantly. You can use
the PivotTable Areas to accomplish this.

In PivotTable areas, you can choose −


• What fields to display as rows
• What fields to display as columns
• How to summarise your data
• Filters for any of the fields
• When to update your PivotTable Layout
o You can update it instantly as you drag the fields across areas, or
o You can defer the update and get it updated only when you click on
UPDATE

An instant update helps you to play around with the different Layouts and
pick the one that suits your report requirement.
You can just drag the fields across these areas and observe the PivotTable
layout as you do it.

138
Spreadsheet Analysis

Nesting in the PivotTable


If you have more than one field in any of the areas, then nesting happens in
the order you place the fields in that area. You can change the order by
dragging the fields and observe how nesting changes. In the above layout
options, you can observe that
• Months are in columns.
• Region and salesperson in rows in that order. i.e. salesperson values are
nested under region values.
• Summarising is by Sum of Order Amount.
• No filters are chosen.

The resulting PivotTable is as follows −

In the PivotTable Areas, in rows, click region and drag it below salesperson
such that it looks as follows −

139
Business Intelligence
& Decision Making

The nesting order changes and the resulting PivotTable is as follows −

Note − You can clearly observe that the layout with the nesting order –
Region and then Salesperson yields a better and compact report than the one
with the nesting order – Salesperson and then Region. In case Salesperson
represents more than one area and you need to summarise the sales by
Salesperson, then the second layout would have been a better option.

Filters
You can assign a Filter to one of the fields so that you can dynamically
change the PivotTable based on the values of that field.

Drag Region from Rows to Filters in the PivotTable Areas.

140
Spreadsheet Analysis

The filter with the label as Region appears above the PivotTable (in case you
do not have empty rows above your PivotTable, PivotTable gets pushed
down to make space for the Filter.

You can see that −


• Salesperson values appear in rows.
• Month values appear in columns.
• Region Filter appears on the top with default selected as ALL.
• Summarising value is Sum of Order Amount
o Sum of Order Amount Salesperson-wise appears in the column Grand
Total
o Sum of Order Amount Month-wise appears in the row Grand Total
Click the arrow in the box to the right of the filter region. A drop-down list
with the values of the field region appears.

141
Business Intelligence
& Decision Making

• Check the option Select Multiple Items. Check boxes appear for all the
values.
• Select South and West and deselect the other values and click OK.

The data pertaining to South and West Regions only will be summarised as
shown in the screen shot given below −

You can see that next to the Filter Region, Multiple Items is displayed,
indicating that you have selected more than one item. However, how many
items and / or which items are selected is not known from the report that is
displayed. In such a case, using Slicers is a better option for filtering.

Slicers
You can use Slicers to have a better clarity on which items the data was
142 filtered.
Spreadsheet Analysis
• Click ANALYSE under PIVOTTABLE TOOLS on the Ribbon.
• Click Insert Slicer in the Filter group. The Insert Slicers box appears. It
contains all the fields from your data.
• Select the fields Region and month. Click OK.

Slicers for each of the selected fields appear with all the values selected by
default. Slicer Tools appear on the Ribbon to work on the Slicer settings,
look and feel.

• Select South and West in the Slicer for Region.


• Select February and March in the Slicer for month.
• Keep Ctrl key pressed while selecting multiple values in a Slicer.
Selected items in the Slicers are highlighted. PivotTable with summarised
values for the selected items will be displayed.

143
Business Intelligence Summarising Values by other Calculations
& Decision Making
In the examples so far, you have seen summarising values by Sum. However,
you can use other calculations also if necessary.
In the PivotTable Fields List
• Select the Field Account.
• Unselect the Field Order Amount.

• Drag the field Account to Summarising Values area. By default, Sum of


Account will be displayed.
• Click the arrow on the right side of the box.
• In the drop-down that appears, click Value Field Settings.

The Value Field Settings box appears. Several types of calculations appear as
a list under Summarise value field by −
• Select Count in the list.
• The Custom Name automatically changes to Count of Account. Click
OK.

144
Spreadsheet Analysis

The PivotTable summarises the Account values by Count.

PivotTable Tools
Follow the steps given below to learn to use the PivotTable Tools.
• Select the PivotTable.
The following PivotTable Tools appear on the Ribbon −
• ANALYSE
• DESIGN

145
Business Intelligence Some of the ANALYZE Ribbon commands are −
& Decision Making
• Set PivotTable Options
• Value Field Settings for the selected Field
• Expand Field
• Collapse Field
• Insert Slicer
• Insert Timeline
• Refresh Data
• Change Data Source
• Move PivotTable
• Solve Order (If there are more calculations)
• PivotChart
Some of the DESIGN Ribbon commands are −
• PivotTable Layout
o Options for Sub Totals
o Options for Grand Totals
o Report Layout Forms
o Options for Blank Rows
• PivotTable Style Options
• PivotTable Styles

6.5 SUMMARY
Spreadsheet analysis refers to the process of using electronic spreadsheets,
such as Microsoft Excel or Google Sheets, to organise, manipulate, and
analyse data. It involves creating formulas and functions to perform
calculations and automate tasks, formatting data to make it more readable,
and using charts and graphs to visualise and communicate insights.
Spreadsheet analysis can be used for a wide range of applications, from
budgeting and financial analysis to inventory management and project
tracking. By leveraging the power of spreadsheets, analysts can save time,
reduce errors, and gain valuable insights from their data.
Spreadsheet analysis can be used in a variety of contexts, such as finance and
accounting, sales and marketing, human resources, project management, and
more. Some common use cases include budgeting, forecasting, financial
statement analysis, inventory management, and data analysis. To get the most
out of spreadsheet analysis, it's important to follow best practices such as
organising data in a logical and consistent manner, using descriptive labels
and formulas, keeping formulas simple and transparent, and testing and
validating calculations to ensure accuracy.

146
Spreadsheet Analysis
6.6 SELF-ASSESSMENT EXERCISES
Caselet:
You have been tasked with creating a budget spreadsheet for your household
expenses. Your goal is to create a spreadsheet that can track your income and
expenses, calculate your monthly savings, and provide a summary of your
spending by category.

Questions:
• Create a new Excel spreadsheet and label the first row as "Month" and
the second row as "Income" and "Expenses" respectively.
• Under the "Income" column, list your sources of income for the month
(such as salary, freelance work, or rental income).
• Under the "Expenses" column, list your expenses for the month,
including categories such as housing, food, transportation, and
entertainment.
• Use Excel's SUM function to calculate the total income and expenses for
the month.
• Create a formula to calculate your monthly savings by subtracting your
total expenses from your total income.
• Use conditional formatting to highlight any cells that are over budget or
below a certain threshold (such as a minimum savings amount).
• Create a pie chart or bar chart to visualise your spending by category.
• Use data validation to create a drop-down list of categories for your
expenses.
• Save the spreadsheet and update it each month to track your progress and
make adjustments as needed.

6.7 FURTHER READINGS


1. "Excel 2019 Bible" by Michael Alexander and Richard Kusleika.
2. "Excel Formulas and Functions For Dummies" by Ken Bluttman and
Peter G. Aitken
3. "Power Pivot and Power BI: The Excel User's Guide to DAX, Power
Query, Power BI & Power Pivot in Excel 2010-2016" by Rob Collie and
Avi Singh.
4. Online tutorials –
• Excel Easy (https://www.excel-easy.com/)
• Exceljet (https://exceljet.net/)
• Excel Campus (https://www.excelcampus.com/).

147
Business Intelligence
& Decision Making

148
Spreadsheet Analysis

BLOCK 3
RELATIONAL DATABASE
MANAGEMENT SYSTEM (RDBMS)

149
Business Intelligence
& Decision Making

150
Organizing Data
UNIT 7 ORGANIZING DATA

Objectives
After studying this unit, you will be able to:
• Define types of data
• Describe the processing of data
• Demonstrate and visualize data using graph and
• Interpret data for decision making

Structure
7.0 Introduction
7.1 Types of Data
7.1.1 Quantitative Data
7.1.2 Qualitative Data
7.1.3 Nominal Data
7.1.4 Ordinal Data
7.1.5 Interval Data
7.1.6 Discrete Data
7.1.7 Continuous Data
7.2 Data Processing
7.2.1 Coding of Data
7.2.2 Data Presentation for Clearer Reference
7.3 Let Us Sum Up
7.4 Glossary
7.5 Exercises

7.0 INTRODUCTION
In this unit we dealt with data and its types. Data organization is the way to
arrange raw data in an understandable order. Graphical representation,
classification and arranging of data are the part of organizing data. Data
organization helps in reading of data. We can easily work on organized data.
Organized data determines the cause of problems in the organization. Data is
knowledge in today’s world, good data provides help in making informed
decision in the organization. Data allows the happening visible, organized
data increases efficiency.
It's true that organizing data is essential for making informed decisions in an
organization. By arranging and classifying data in a logical and
understandable way, it becomes easier to analyze and interpret. Graphical
representation of data also helps in understanding complex patterns and
trends, and can be an effective way to communicate findings to others.
Additionally, having organized data can increase efficiency by reducing the
time and effort needed to find and extract relevant information.
151
Relational Database The aim of this unit is to teach you about the basics of data analysis and
Management System
(Rdbms) visualization, including understanding the nature of data, processing raw data
for presentation in graphical form, and classifying data to make informed
decisions in an organization. Understanding the nature of data is important in
order to properly interpret and analyze it. This includes understanding the
different types of data (e.g. numerical, categorical, ordinal etc.) and the
various measures used to describe and summarize it (e.g. mean, median,
mode, standard deviation etc.).
Processing raw data involves transforming it into a format that is more easily
analyzed and visualized, such as using spreadsheet software to organize and
prepare summary statistics. Graphical presentation of data is an effective way
to visually communicate patterns and relationships within the data, using
charts such as histograms, box and whisker plots, funnel charts to name a
few.

Classifying data involves grouping it into categories or classes based on


relevant criteria, in order to better understand and analyze it. Overall,
understanding the basics of data analysis and visualization is crucial for
making informed decisions in an organization, as it allows you to properly
interpret and communicate data to support business goals and objectives.

7.1 TYPES OF DATA


Data collection requires processing and identification of correct data.
Accurate and processed data is used in research process. For accurate
decision making in the organization perfect plan of data processing is needed.
Data collection is a crucial step in the research process and involves the
identification and processing of relevant data. Once data is collected, it needs
to be processed and analyzed to derive meaningful insights. The accuracy of
data is important for reliable research results and informed decision-making
in an organization.
Data processing involves several steps, including identification, coding,
comparison, and preparation of charts, among others. Proper data processing
ensures that data is organized, structured, and analyzed in a meaningful way.
Data can come in various forms and formats, such as numerical, text, audio,
and visual data. The use of appropriate formats improves data management
and facilitates data reuse. Additionally, the use of standard formats and data
models can enable data interoperability, allowing for seamless integration of
data from different sources.

Data processing contains identification, coding, comparison, and preparation


of charts. As data is available in various forms, these forms and formats
improve reuse of data, because right format improves data management.

7.1.1 Quantitative Data


Quantitative data can be expressed as a number or can be quantified, it can be
measured by numerical variables. Height in feet and age in years are the
examples of quantitative data. Quantitative data is used when a researcher
152 needs to quantify a problem and answers questions like “what” and how
Organizing Data
many. For example, how many customers bought a certain item in context of
shopping, how much horsepower a car has etc.

Quantitative data is anything that can be counted in definite units and


numbers. Quantitative data is made up of numerical values and has numerical
properties, and can easily undergo numerical operations like addition and
subtraction. The nature of quantitative data means that its validity can be
verified and evaluated using statistical techniques.

Quantitative data can be measured and expressed numerically, and it is used


to answer questions related to quantities and amounts. It can be analyzed and
manipulated using statistical techniques, which allows for the verification and
evaluation of its validity.
Quantitative data such as counts, measurements, or ratings. This type of data
is often used to answer questions related to quantities and amounts, such as
"How many customers visited the store today?" or "What is the average age
of employees in the company?"

One of the advantages of quantitative data is that it can be analyzed and


manipulated using mathematical techniques, such as statistical analysis or
mathematical modeling. This allows for the verification and evaluation of the
validity of the data, and can help identify patterns, trends, or relationships
between variables.
Quantitative data can be collected through various methods, including
surveys, experiments, or direct observations. Once collected, it can be
organized and summarized using descriptive statistics, such as measures of
central tendency (e.g. mean, median, mode) or measures of variability (e.g.
standard deviation, range).

Overall, quantitative data is an important type of data in many fields,


including science, economics, and business, and is used to support evidence-
based decision making and problem-solving.

7.1.2 Qualitative Data


Qualitative data can’t be expressed as a number and can’t be measured.
Qualitative data consist of words, pictures, and symbols but not numbers.
Qualitative data is also called categorical data because the information can be
sorted by category. Your favorite holiday destination such as Shimla,
Mumbai. Ethnicity such as American Indian, Asian, etc. are the examples of
qualitative data.

It’s pretty easy to understand the difference between qualitative and


quantitative data. Qualitative data does not include numbers in its definition
of traits, whereas quantitative data is all about numbers. The cake is orange,
blue, and black in color and females have brown, black, blonde, and red hair.
Qualitative data is important in determining the particular traits or
characteristics. It allows the statistician or the researchers to form parameters
through which larger data sets can be observed. Qualitative data provides the
means by which observers can enumerate the world around them.
153
Relational Database 7.1.3 Nominal Data
Management System
(Rdbms)
Nominal data is used just for labeling variables, without any type of
quantitative value. The name ‘nominal’ comes from the Latin word “nomen”
which means ‘name’. The nominal data just name a thing without applying it
to order. Actually, the nominal data could just be called “labels.” Gender
(Female, Male), Hair color (Blonde, Brown, Brunette, Red, etc.), Marital
status (Married, Single, Widowed) are the example of nominal data. Eye
color is a nominal variable having a few categories (Blue, Green, Brown) and
there is no way to order these categories from highest to lowest.
Nominal data simply provides labels or categories for the variables being
measured. For example, eye color (blue, green, brown) is a nominal variable
because there is no inherent order or value associated with each category.

7.1.4 Ordinal Data


Ordinal data shows where a number is in order. This is the crucial difference
from nominal types of data. Ordinal data is data which is placed into some
kind of order by their position on a scale. Ordinal data may indicate
superiority. However, you cannot do major arithmetic with ordinal numbers
because they only show sequence.
Examples are first, second and third person in a competition. Letter grades A,
B, C, and etc. When a company asks a customer to rate the sales experience
on a scale of 1-10. Economic status: low, medium and high. Ordinal variables
are considered as “in between” qualitative and quantitative variables. In other
words, the ordinal data is qualitative data for which the values are ordered.

7.1.5 Interval Data


Interval data is a type of quantitative data that has numerical values that are
spaced at uniform intervals. This means that the difference between two
adjacent values on the scale is equal, and each value represents an equal
interval of the underlying attribute being measured.
Interval data is commonly used to measure things like temperature, time, or
calendar dates, where there is a fixed interval between any two values on the
scale. For example, a temperature scale that goes from 0 to 100 degrees
Celsius is an interval scale, where the difference between 0 and 10 degrees is
the same as the difference between 50 and 60 degrees.
Unlike nominal or ordinal data, interval data can be added, subtracted, and
averaged, and can be analyzed using a wide range of statistical techniques,
including measures of central tendency (mean, median, mode) and measures
of dispersion (standard deviation, variance). However, it's important to note
that interval data does not have a true zero point, meaning that ratios between
values cannot be calculated (e.g. 20 degrees Celsius is not twice as hot as 10
degrees Celsius).

Overall, interval data is an important type of quantitative data in many fields,


and is used to support a wide range of analyses and decision-making
154 processes.
Organizing Data
7.1.6 Discrete Data
Discrete data is a count that involves only integers. The discrete values
cannot be subdivided into parts. For example, the number of children in a
class is discrete data. You can count whole individuals. You can’t count 1.5
kids. To put in other words, discrete data can take only certain values. The
data variables cannot be divided into smaller parts. It has a limited number of
possible values e.g. days of the month.
Examples of discrete data are the number of students in a class, the number of
workers in a company, the number of home runs in a baseball game, the
number of test questions you answered correctly.

7.1.7 Continuous Data


Continuous data is information that could be meaningfully divided into finer
levels. It can be measured on a scale or continuum and can have almost any
numeric value. For example, you can measure your height at very precise
scales — meters, centimeters, millimeters etc.

You can record continuous data at so many different measurements – width,


temperature, time, etc. This is where the key difference from discrete types of
data lies. The continuous variables can take any value between two numbers.
For example, between 50 and 72 inches, there are literally millions of
possible heights: 52.04762 inches, 69.948376 inches etc.
A good great rule for defining if a data is continuous or discrete is that if the
point of measurement can be reduced in half and still make sense, the data is
continuous.
Examples of continuous data are the amount of time required to complete a
project, the height of children, the square footage of a two-bedroom house,
the speed of cars.

7.2 DATA PROCESSING


Once the researcher/data analyst defined types of data for the collection
researcher/data analyst focuses on processing of data for decision. In this
section we would discuss about different phases of data processing. For data
analysis and processing researcher must follow precise blue print made well
in advance. Data processing refer to filtering and clearing of raw data. Data
processing make data suitable for further process of organization decisions.

Researcher/data analyst perform editing, coding, tabulation, classification and


make data cleaned in data processing process. Data processing makes editing
of data simple and give a way to researcher for moving forward in the right
direction of data entry. Whenever researcher finds missing entries in
collected data, filling missing entries may not legible. There has to be
inconsistency faced by the researcher. Hence data processing becomes very
important.

155
Relational Database 7.2.1 Coding of Data
Management System
(Rdbms)
Coding of data refers to the process of transforming collected data or
observations to a set of meaningful, cohesive categories. It is a process of
summarizing and re-presenting data in order to provide a systematic account
of the recorded or observed phenomenon. Coding is the analytic task of
assigning codes to non-numeric data. The data that is obtained from surveys,
experiments or secondary sources are in raw form. This data needs to be
refined and organized to evaluate and draw conclusions.

Data coding is the process of driving codes from the observed data. In
research the data is either obtained from observations, interviews or from
questionnaires. The purpose of data coding is to bring out the essence and
meaning of the data that respondents have provided. The data coder extracts
preliminary codes from the observed data, the preliminary codes are further
filtered and refined to obtain more accurate precise and concise codes.

Later, in the evaluation of data the researcher/data analyst assigns values,


percentages or other numerical quantities to these codes to draw inferences. It
should be kept in mind that the purpose of data coding is not to just to
eliminate excessive data but also to summarize it meaningfully. Sometimes
the interviewer or the observer writes down some codes as he/she observes
the behavior of the respondent. Such codes are really worthy in the data
analysis because these codes cannot be derived from the written responses
that the respondents provide.

Table 7.1: Coding Example

Variable Information Options Code


Gender Male 1
Female 2
Age <25 1
26-35 2
36-45 3
>45 4
Qualification HSS 1
Diploma 2
Undergraduate 3
Post Graduate 4
Income (Monthly) <25000 1
25000 – 50000 2
50000 – 100000 3
>100000 4

After the coding of data, the next step involves visualization of the data
which is a complex process.

156
Organizing Data
7.2.2 Data Presentation for Clearer Reference
Data without a definite presentation, will be burdensome. Presentation of data
helps the researcher to make study meaningful. In this context we study types
of presentation of data in this sub section.

Broadly, there are three methods of data presentation. Data presentation is an


important aspect of research as it allows researchers to convey their findings
effectively.

Textual Presentation: This involves presenting data in the form of written


words, sentences, or paragraphs. This method is commonly used when the
amount of data is small or when the researcher wants to highlight specific
details or findings. However, textual presentation can be difficult to interpret
for some readers, and it may not be the best method for presenting large
amounts of data.

Tabular Presentation: This method involves organizing data in a table, with


each row representing a case or observation, and each column representing a
variable. Tabular presentation is useful when dealing with large amounts of
data, as it allows the researcher to compare and contrast different variables
easily. Additionally, tabular presentation can be easily analyzed using
statistical software.
Graphical Presentation: This method involves presenting data in the form of
graphs or charts. Graphical presentation is useful when the researcher wants
to show patterns, trends, or relationships between variables. Graphical
presentation can make it easier for readers to understand and interpret the
data, especially when dealing with complex data sets. Common types of
graphs include bar graphs, line graphs, scatter plots, pie charts etc.
Textual presentation can be useful for presenting detailed information or for
describing complex findings in a comprehensive manner. However, the main
disadvantage of textual presentation is that it can be time-consuming and
burdensome for readers to extract the key information from the text. Even
with abstracts, summaries, and conclusions, readers may still need to read
through the entire text to fully understand the research findings. Additionally,
textual presentation may not be suitable for presenting large amounts of data
or for highlighting relationships between variables. In such cases, tabular or
graphical presentation may be more appropriate.

To avoid the complexities involved in the textual way of data presentation,


people use tables and charts to present data. In this method, data is presented
in rows and columns - just like you see in a cricket match showing who made
how many runs. Each row and column have an attribute (name, year, sex,
age, and other things like these).

Graphical presentation has been divided into further categories:

• Column Chart
A column chart is a type of chart that is commonly used to display data that is
arranged in columns or rows on a worksheet. In a column chart, categories
157
Relational Database are typically displayed along the horizontal (category) axis, while values are
Management System
(Rdbms) displayed along the vertical (value) axis. Each column in the chart represents
a different category, and the height of each column represents the value
associated with that category.

Column charts are useful for displaying data that can be divided into discrete
categories, such as sales by month, or the number of students in different
grade levels. They are also effective at showing changes in data over time,
such as changes in revenue from year to year. Additionally, column charts
can be easily customized to display different colors, labels, and formatting
options, making them a flexible and versatile tool for data visualization.

Fig 7.1: Column chart

• Histogram
Where the grouped frequency of pens (from the above example) is written on
the X-axis and the numbers of students are marked on the Y-axis. The data is
presented in the form of bars. A histogram is a graphical representation of a
frequency distribution. It consists of a series of vertical bars, or bins, that
represent the frequency of occurrence of a range of values. The width of each
bin is determined by the range of values being analyzed, and the height of
each bin represents the frequency of occurrence of values within that range.

Histograms are commonly used to analyze the shape of a distribution,


identify outliers or unusual values, and determine whether the data is skewed
or normally distributed. By changing the bin width, you can adjust the level
of detail in the histogram and potentially reveal patterns or trends that may
not be immediately apparent at a coarser resolution.

Overall, histograms are a useful tool for exploring and visualizing data, and
can provide valuable insights into the underlying patterns and trends within a
distribution.

158
Organizing Data

Fig 7.2: Histogram

• Frequency Polygon
When you join the midpoints of the upper side of the rectangles in a
histogram, you get a Frequency Polygon

Fig 7.3: Frequency Polygon

• Line Chart
A line chart is another type of chart commonly used to display data that is
arranged in columns or rows on a worksheet. In a line chart, the horizontal
axis represents evenly spaced categories, such as time periods, and the
vertical axis represents values. Each value is connected by a line, creating a
continuous visual representation of the data over time. 159
Relational Database Line charts are useful for displaying trends in data over time or for comparing
Management System
(Rdbms) multiple sets of data. They are effective at showing changes in data over time,
as well as identifying patterns or cycles in the data. Line charts can also be
used to compare multiple data sets, with each set represented by a different
line. Additionally, line charts can be easily customized with colors, labels,
and other formatting options to enhance their visual impact.

Fig 7.4: Line chart

• Pie Chart
The pie chart is a type of chart that is commonly used to display data that is
arranged in one column or row on a worksheet. In a pie chart, the data points
are represented as slices of a circle, with each slice representing a proportion
of the whole. The size of each slice is proportional to the value of the data
point it represents, and the total size of the pie chart represents the sum of all
the values in the data series.
Pie charts are useful for displaying data that can be divided into categories or
parts, such as market share, budget allocations, or survey responses. They are
effective at showing the relative sizes of different categories and can be
easily customized with colours, labels, and other formatting options to
enhance their visual impact. However, pie charts can be difficult to read
when there are too many categories or when the data points are very similar
in size, making it hard to distinguish between them. In these cases, other
types of charts, such as bar charts or stacked column charts, may be more
suitable.

Fig 7.5: Pie Chart


160
Organizing Data
Pie and 3-D pie: Pie charts show the contribution of each value to a total in a
2-D or 3-D format. You can pull out slices of a pie chart manually to
emphasize the slices.

Pie of pie and bar of pie: Pie of pie or bar of pie charts show pie charts with
smaller values pulled out into a secondary pie or stacked bar chart, which
makes them easier to distinguish.

• Doughnut Chart
A doughnut chart is similar to a pie chart in that it also shows the relationship
of parts to a whole, but it can contain more than one data series. In a
doughnut chart, data points are represented as slices of a doughnut-shaped
circle, with each slice representing a proportion of the whole. The size of
each slice is proportional to the value of the data point it represents, and the
total size of the doughnut chart represents the sum of all the values in the data
series.
Doughnut charts are useful for displaying data that can be divided into
categories or parts, and can be effective at showing the relative sizes of
different categories or parts. They are also useful for comparing multiple data
series, with each series represented by a different doughnut slice. Doughnut
charts can be easily customized with colors, labels, and other formatting
options to enhance their visual impact. However, like pie charts, doughnut
charts can be difficult to read when there are too many categories or when the
data points are very similar in size.

• Bar Chart
A bar chart is a type of chart commonly used to display data that is arranged
in columns or rows on a worksheet. In a bar chart, categories are displayed
along the horizontal (category) axis, while values are displayed along the
vertical (value) axis. Each bar in the chart represents a different category, and
the length of each bar represents the value associated with that category.

Bar charts are useful for illustrating comparisons among individual items,
such as sales by product or revenue by department. They are also effective at
showing changes in data over time, such as changes in market share from
year to year. Additionally, bar charts can be easily customized with colors,
labels, and other formatting options to enhance their visual impact.

Fig 7.6: Bar Chart


161
Relational Database • Area Chart
Management System
(Rdbms)
An area chart is a type of chart commonly used to display data that is
arranged in columns or rows on a worksheet. In an area chart, the data is
plotted as a series of points, which are then connected by a line. The area
between the line and the horizontal axis is then filled with color, creating a
shaded region that shows the trend in the data.

Area charts are useful for plotting change over time, and for drawing
attention to the total value across a trend. By showing the sum of the plotted
values, an area chart also shows the relationship of parts to a whole.
Additionally, area charts can be easily customized with colors, labels, and
other formatting options to enhance their visual impact. However, like other
charts, they can be difficult to read when there are too many data points, or
when the data points are very similar in size.

Fig 7.7: Area Chart

• Bubble Chart
Bubble charts are useful for displaying data that has three dimensions, such
as sales revenue, profit, and market share for different products. They can be
effective at showing patterns or relationships among different sets of data
points, and they can be customized with colors, labels, and other formatting
options to enhance their visual impact. However, like other charts, bubble
charts can be difficult to read when there are too many data points, or when
the data points are very similar in size.

Fig 7.8: Bubble Chart


162
Organizing Data
• Radar Chart
A radar chart, also known as a spider chart or a web chart, is a type of chart
that is used to compare the aggregate values of several data series. In a radar
chart, each data series is plotted as a line or a shape that connects a set of data
points arranged in a circular pattern around a central point.

Radar charts are useful for displaying data that has multiple variables, and for
showing the relative strengths or weaknesses of different data series. They
can be effective at highlighting patterns or relationships among the data, and
they can be customized with colors, labels, and other formatting options to
enhance their visual impact.

However, radar charts can be difficult to read when there are too many
variables, or when the data points are very similar in value. In addition,
interpreting the data in a radar chart can be challenging for people who are
not familiar with this type of chart, so it is important to use them
appropriately and provide clear explanations of the data being presented.

Fig 7.9: Radar Chart

• Box and Whisker Plot


A box and whisker plot, also known as a box plot, is a graphical
representation of a dataset that shows the distribution of the data into
quartiles, highlighting the median (middle value), and outliers. The box in the
plot represents the interquartile range (IQR), which is the range between the
first quartile (Q1) and the third quartile (Q3). The median is represented by a
line inside the box.

The "whiskers" in the plot extend vertically from the top and bottom of the
box, indicating the range of values that lie within 1.5 times the IQR above the
third quartile and below the first quartile. Any point outside those whiskers is
considered an outlier. Box and whisker plots are useful when comparing
multiple datasets, as they allow you to quickly compare the medians, ranges,
and variability of the data. They can also help identify potential outliers in the
data. However, they are less useful for visualizing the shape of the
distribution compared to other chart types, such as histograms or density
plots.

163
Relational Database Overall, box and whisker plots are a valuable tool for data analysis and
Management System
(Rdbms) visualization, particularly when comparing multiple datasets.

Fig 7.9: Box and Whisker Plot

• Funnel Chart
Funnel charts are a type of chart used to represent values across multiple
stages in a process, such as a sales or marketing funnel. The bars in a funnel
chart are arranged in decreasing order, with the first bar being the largest and
the subsequent bars becoming progressively smaller. This creates a funnel-
like shape, with the largest section at the top and the smallest section at the
bottom.

However, funnel charts do not necessarily have to decrease gradually. The


bars can decrease in size in a stepped or irregular pattern, depending on the
specific data being represented. Funnel charts are useful for identifying areas
of a process where there may be drop-offs or bottlenecks, as well as for
highlighting which stages are the most important or impactful.
Overall, funnel charts are a valuable tool for visualizing data across multiple
stages in a process, and can help identify areas where improvements or
changes may be needed.

Fig 7.10: Funnel Chart

164
Organizing Data
Check Your Progress:
1) What is Bar Graph? Create a Bar Graph for the following data

Category of OTT Series Number of People Preferred


Comedy 6
Action 4
Romance 2
Drama 1
2) List the various types of graphs you can use for presentation of data?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
3) Students got each grade in the recent test: Draw pie chart

A B C D

4 12 10 2

7.3 SUMMARY
In this unit, we discussed the nature of quantitative and qualitative data, the
various methods of representing the quantified data graphically. The main
points are as follows:
1) Data collected by the researcher are raw in nature, it requires cleaning,
classification and editing decision-making process.
2) Data are classified into six categories viz Quantitative, Qualitative,
Ordinal, Ratio, Discrete and Continuous Data.
3) Data is represented broadly into three categories: Textual, Tabulation
and Graphical
4) Data coding helps the researcher to prepare tables and graphs.
5) Qualitative data consist of detailed descriptions of situations, events, and
people. interactions, and observed behaviors. These data are also
available in the form of direct quotations from people about their
experiences, attitudes, beliefs and thoughts.

7.4 KEYWORDS
Quantitative Data: Quantitative data is anything that can be counted in
definite units and numbers. Quantitative data is made up of numerical values
and has numerical properties, and can easily undergo math operations like 165
Relational Database addition and subtraction.
Management System
(Rdbms)
Qualitative Data: Qualitative data can’t be expressed as a number and can’t
be measured. Qualitative data consist of words, pictures, and symbols, not
numbers. Qualitative data is also called categorical data because the
information can be sorted by category, not by number.

Nominal Data: Nominal data is used just for labeling variables, without any
type of quantitative value.

Ordinal Data: Ordinal data shows where a number is in order. This is the
crucial difference from nominal types of data.

Continuous Data: Continuous data is information that could be


meaningfully divided into finer levels. It can be measured on a scale or
continuum and can have almost any numeric value.

Data Coding: Coding of data refers to the process of transforming collected


information or observations to a set of meaningful, cohesive categories.
Graphical Presentation: This method involves presenting data in the form
of graphs or charts. Graphical presentation is useful when the researcher
wants to show patterns, trends, or relationships between variables.
Pie Chart: It is useful for displaying data that can be divided into categories
or parts, such as market share, budget allocations, or survey responses. They
are effective at showing the relative sizes of different categories, and can be
easily customized with colors, labels, and other formatting options to enhance
their visual impact.

Funnel Charts: These are a type of chart used to represent values across
multiple stages in a process, such as a sales or marketing funnel. The bars in a
funnel chart are arranged in decreasing order, with the first bar being the
largest and the subsequent bars becoming progressively smaller.

7.5 EXERCISE
1) Explain types of data? How quantitative data is differed from qualitative
data
2) Create a coding table which contains Gender, Age, Qualification, and
Income
3) Prepare a bar chart to check your skill with mathematics

Beginner Intermediate Advanced


Basic Skill
Skill Skill Skill

8 6 10 4

4) Create a Pie Chart for Gender ratio

Male Female
55 45
166
Organizing Data
5) The frequency polygon of a frequency distribution is shown below.

Answer the following about the distribution from the histogram.

i) What is the frequency of the class interval whose class mark is 15?
ii) What is the class interval whose class mark is 45?

167
Relational Database
Management System UNIT 8 STRUCTURED QUERY
(Rdbms)
LANGAUGE (SQL)

Objectives
After studying this unit, you will be able to:
• Explain relational database language.
• Create, modify, delete, and update the database using SQL.
• Understand Database management through queries and subqueries.
• Know how to control database access.

Structure
8.0 Introduction
8.0.1 Background
8.1 Data Definition Language (DDL)
8.2 Interactive Data Manipulation Language (DML)
8.3 View Definition
8.4 Transaction Control
8.5 Summary
8.6 Keywords
8.7 Self-Assessment Exercises
8.8 Further Readings

8.0 INTRODUCTION
SQL (Structured Query Language) is a widely used query language for
relational database management systems (RDBMS). It allows users to
manipulate and retrieve data from databases using various operations such as
select, insert, update, and delete. SQL provides a user-friendly interface for
querying databases and retrieving data in a structured and organized way.

SQL is based on relational algebra and calculus constructs, which make it


easy to learn and use for beginners. It provides a powerful set of features for
manipulating and querying data, such as aggregation functions, grouping,
sorting, and filtering. SQL can also be used for creating and modifying
database structures such as tables, indexes, and constraints.

However, as mentioned in the question, SQL has some limitations. It is not as


powerful as a universal Turing machine and cannot perform all types of
computations that a general-purpose programming language can. SQL also
lacks some features such as user input/output and network communication,
which are typically provided by host programming languages such as C,
C++, or Java.

168
Structured Query
SQL (Structured Query Language) is a programming language that is widely Langauge (SQL)
used for managing and manipulating data in relational databases. It allows
users to interact with databases to perform a wide range of tasks, including
querying and retrieving data, adding, modifying and deleting data, and
managing database structures and relationships.

SQL is an essential tool for working with relational databases, which are
organized into tables with rows and columns. It is used by database
administrators, analysts, and developers to manage and analyze data in a wide
range of industries, including finance, healthcare, retail, and more.

One of the main advantages of SQL is its simplicity and user-friendliness.


The language is relatively easy to learn and understand, and its syntax is
straightforward and consistent across different database platforms. This
makes it a popular choice for both beginners and experienced users alike.

Another advantage of SQL is its scalability and flexibility. SQL-based


databases can handle large volumes of data and are capable of handling
complex queries and transactions. SQL also allows for the integration of
different data sources and platforms, making it a valuable tool for managing
data across multiple systems and applications. Overall, SQL is an essential
tool for managing data effectively in relational databases, and its popularity
and versatility have made it a staple of modern data management and
analysis.

8.0.1 Background
IBM developed the original version of SQL as part of the System R project in
the early 1970s. The language was initially called Sequel, but its name was
later changed to SQL. Since then, SQL has become the standard language for
managing relational databases, and it is widely supported by various database
systems and products.
In 1986, ANSI and ISO published the first SQL standard, called SQL-86.
This standard established a common set of syntax and semantics for SQL,
ensuring that SQL implementations from different vendors could interoperate
with each other. IBM also published its own corporate SQL standard, the
SAA-SQL, in 1987.
In 1989, ANSI published an extended standard for SQL, called SQL-89,
which introduced additional features such as outer joins and null values. The
next version of the standard was SQL-92, which introduced more features
such as support for referential integrity constraints and triggers.

The most recent version of the SQL standard is SQL:1999, which introduced
new features such as support for XML data and object-relational extensions.
Since then, new versions of the standard have been released, including
SQL:2003, SQL:2006, SQL:2008, SQL:2011, and SQL:2016.
The SQL standardization process is a critical aspect of the language's
development and evolution. It helps to ensure that SQL remains a stable and
reliable language for managing relational databases, and that SQL
implementations from different vendors are interoperable and can work 169
Relational Database together seamlessly. The SQL standardization process is overseen by the
Management System
(Rdbms) International Organization for Standardization (ISO) and the American
National Standards Institute (ANSI). These organizations establish and
maintain a set of standards for SQL that specify the syntax, semantics, and
functionality of the language.

SQL standardization is a collaborative effort involving database vendors,


developers, and other stakeholders in the industry. The process involves
proposing and reviewing changes to the SQL standard, and ensuring that
these changes are implemented consistently across different database
platforms.

One of the main benefits of SQL standardization is that it helps to promote


consistency and compatibility across different database platforms. This
makes it easier for developers and database administrators to work with
different systems and applications, and reduces the risk of compatibility
issues and errors. Overall, the SQL standardization process is an essential
aspect of the language's development and evolution, and helps to ensure its
continued relevance and usefulness in managing and analyzing data in
relational databases.

8.1 DATA DEFINITION LANGUAGE (DDL)


The SQL Data Definition Language (DDL) provides commands for defining
and managing the schema objects in a relational database. The schema
objects include tables, indexes, views, sequences, and other database objects.
The DDL commands allow the DBA (Database Administrator) to create,
alter, and delete these objects as needed.
The SQL Data Definition Language (DDL) provides a set of commands for
defining and managing the schema objects in a relational database. The
schema objects include tables, indexes, views, sequences, and other database
objects. The DDL commands allow the database administrator (DBA) to
create, alter, and delete these objects as needed. Here are some common DDL
commands used in SQL:
CREATE: This command is used to create new schema objects, such as
tables, views, indexes, and sequences.

ALTER: This command is used to modify the structure of existing schema


objects, such as adding or dropping columns from a table, or modifying the
definition of an index.

DROP: This command is used to delete an existing schema object, such as a


table or view. TRUNCATE: This command is used to delete all the rows
from a table while keeping its structure intact.

RENAME: This command is used to rename an existing schema object, such


as a table or view.

In addition to these basic DDL commands, there are other commands that are
used for managing the security and integrity of the database, such as granting
170 and revoking privileges, setting constraints, and defining triggers.
Structured Query
The DDL commands in SQL are essential for managing the schema objects Langauge (SQL)
in a relational database, and are a critical tool for database administrators and
developers in maintaining the integrity and security of the data.

When a DDL command is executed, any changes made to the database


schema are automatically committed and become permanent. This means that
DDL commands are typically used with caution, as they can have a
significant impact on the structure and content of the database.

SQL language provides three main categories of commands: Data Definition


Language (DDL), Data Manipulation Language (DML), and Data Control
Language (DCL).

As you mentioned earlier, DDL commands are used for defining and
managing the schema objects in a relational database. In contrast, DML
commands are used for manipulating the data stored in the database.

DML commands include:

SELECT: This command is used to retrieve data from one or more tables in
the database. INSERT: This command is used to add new rows of data to a
table in the database. UPDATE: This command is used to modify existing
rows of data in a table in the database. DELETE: This command is used to
remove rows of data from a table in the database.
DCL commands are used for controlling access to the database objects. These
commands include:
GRANT: This command is used to give specific privileges to a user or group
of users on a database object.

REVOKE: This command is used to remove specific privileges from a user


or group of users on a database object.

Together, the DDL, DML, and DCL commands provide a comprehensive set
of tools for managing and administering relational databases. Database
administrators and developers use these commands to create and modify
database objects, manipulate data stored in the database, and control access to
the database objects to maintain the integrity and security of the data.

The SQL DDL allows specification of not only a set of relations, but also
information about each relation, including

The Schema for each Relation: the schema for each relation in a relational
database defines the structure of the data that is stored in the table. It specifies
the names of the columns or attributes, the data types of each column, and
any constraints or rules that govern the data in the table.

For example, consider a simple relational database that stores information


about employees. The schema for the "employees" table might include the
following columns:
Employee ID: a unique identifier for each employee (data type: integer)
First Name: the employee's first name (data type: string)
171
Relational Database Last Name: the employee's last name (data type: string)
Management System
(Rdbms) Date of Birth: the employee's date of birth (data type: date)
Department ID: The ID of the department the employee works in (data type:
integer)

The schema for this table defines the structure of the data that can be stored
in the table. It specifies that the table has five columns, and it defines the data
types of each column. It also specifies that the Employee ID column must
contain unique values, and that the Department ID column must contain
integer values that correspond to the ID of a department in the "departments"
table.

The schema for each relation is an important component of the database


design process. It provides a blueprint for the database that guides the
development of the database structure and the creation of tables and columns.
It also helps ensure that the data in the database is accurate, consistent, and
well-organized.
The Domain of Values Associated with each Attribute: The domain of
values associated with each attribute in a relational database defines the set of
allowable values that can be stored in the attribute. For example, consider the
"employees" table in a relational database. The "Date of Birth" attribute
might have a domain of values that includes all valid dates, such as
"01/01/1950", "06/15/1985", "12/31/2000", and so on. The "Department ID"
attribute might have a domain of values that includes all positive integers,
such as 1, 2, 3, and so on.
The domain of values for each attribute is an important consideration in
database design, as it helps ensure that the data stored in the database is
accurate and consistent. By limiting the range of allowable values for an
attribute, the domain helps prevent data entry errors and ensures that the data
is of a consistent type and format.
In addition to specifying the domain of values for each attribute, the database
designer may also define constraints or rules that further restrict the allowable
values for an attribute. For example, a constraint might specify that the
"Employee ID" attribute must be unique for each employee, or that the
"Salary" attribute must be a positive number. These constraints help ensure
the integrity and accuracy of the data stored in the database.
The Integrity Constraints: Integrity constraints are rules that are used to
ensure the accuracy, consistency, and validity of the data stored in a
relational database. These constraints help maintain the integrity of the data
by preventing the insertion of invalid or inconsistent data into the database.
There are several types of integrity constraints:
Primary Key Constraint: A primary key is a unique identifier for each row
in a table. It ensures that each row in a table is unique and can be identified
by a single value.
Foreign Key Constraint: A foreign key is a reference to a primary key in
another table. It ensures that the data in one table is consistent with the data
in another table.
172
Structured Query
Unique Constraint: A unique constraint ensures that each value in a column Langauge (SQL)
is unique.

Check Constraint: A check constraint is used to restrict the values that can
be inserted into a column. It ensures that only valid data is inserted into the
column.

Not Null Constraint: A not null constraint ensures that a column cannot
have a null value.

By enforcing these constraints, a database can maintain the integrity of the


data and prevent inconsistencies or errors. For example, a primary key
constraint can prevent duplicate rows from being inserted into a table, while a
foreign key constraint can ensure that a record in one table refers to a valid
record in another table.

The Set of Indices to be Maintained for each Relation: Indexes are data
structures that are used to speed up the retrieval of data from a database. They
work by creating a copy of a subset of the data in a table and organizing it in
a way that makes it easier to search for specific values.

The set of indices to be maintained for each relation will depend on the
queries that are commonly run against the table. For example, if a table is
frequently queried using a certain column, it may be beneficial to create an
index on that column to speed up the search. Some common types of indexes
include:

Primary Key Index: This is an index that is created on the primary key
column of a table. It is used to enforce the primary key constraint and to
speed up searches that use the primary key.

Unique Index: This is an index that is created on a column that has a unique
constraint. It is used to enforce the unique constraint and to speed up searches
that use the unique column.

Clustered Index: This is an index that organizes the table data physically
based on the values in the indexed column. This can speed up queries that
access data in the order defined by the index.
Non-clustered Index: This is an index that creates a separate data structure
to organize the index data. It is used to speed up queries that search for
specific values in a non-indexed column.
The decision on which indexes to create for a particular table should be made
carefully, as creating too many indexes can slow down data modification
operations such as inserts, updates, and deletes.
The Security and Authorization Information for each Relation: Security
and authorization information for each relation is an important consideration
in database design. It involves determining who has access to the data in each
relation and what type of access they have.

Some common security and authorization measures that can be implemented


in a relational database include:
173
Relational Database User Authentication: This involves verifying the identity of users who are
Management System
(Rdbms) attempting to access the database. Users are typically required to provide a
username and password to gain access.

Access Control: This involves specifying which users or groups of users


have access to each relation in the database. Access control can be set at the
database level or at the level of individual relations.

Data Encryption: This involves encrypting sensitive data in the database to


protect it from unauthorized access. Encryption can be used to protect data at
rest (stored in the database) and data in transit (being transmitted over a
network).

Auditing: This involves logging access to the database and monitoring user
activity to detect any suspicious behavior or security breaches.

Backup and Recovery: This involves creating regular backups of the


database to protect against data loss due to hardware failure, natural disasters,
or other unexpected events.

Implementing these security and authorization measures requires careful


planning and consideration of the specific requirements of the database and
the organization using it. It is important to strike a balance between security
and usability to ensure that authorized users can access the data they need
while keeping sensitive information protected from unauthorized access.

The Physical Storage Structure of Each Relation on Disk: The physical


storage structure of each relation on disk is an important consideration in
database design. It involves deciding how data is stored on the physical
storage media, such as hard disks or solid-state drives. The physical storage
structure determines how data is organized, accessed, and maintained.

There are several factors that can influence the choice of physical storage
structure for a relation, including:

Access Patterns: The way data is accessed by queries can affect the choice
of physical storage structure. For example, if a relation is frequently accessed
using range queries, it may be more efficient to store the data in a sorted
order.

Storage Medium: The type of storage medium being used can affect the
choice of physical storage structure. For example, solid-state drives have
faster random-access times than hard disks, so they may be better suited for
storing data in a hash-based index.

Size of Relation: The size of the relation can also influence the choice of
physical storage structure. For small relations, a simple linear file may be
sufficient, whereas for larger relations, a more complex structure such as a B-
tree or hash table may be required.

Security Requirements: Security requirements may also impact the choice


of physical storage structure. For example, if sensitive data needs to be
protected, encryption may be used, which can affect the choice of storage
174 structure.
Structured Query
Some common physical storage structures for relations include: Langauge (SQL)

Heap file Organization: This is the simplest storage structure, where data is
stored in an unordered list. It is useful for small relations or for append-only
workloads.

Sorted file Organization: In this structure, data is stored in a sorted order


according to a particular attribute. It is useful for range queries and joins.

Hash-based File Organization: In this structure, data is organized using a


hash function, which allows for efficient lookup of individual records. It is
useful for equality-based queries.

B-tree File Organization: This structure organizes data in a tree-like


structure, allowing for efficient range queries and updates.
The choice of physical storage structure can have a significant impact on the
performance and efficiency of a database system. It is important to carefully
consider the requirements of the database and the characteristics of the data
being stored when making this decision.
The SQL standard supports a variety of built-in domain /data types,
including:
• Char(n): A fixed-length character string with user-specified length n.
The full form, character, can be used instead.
• Varchar(n): A variable-length character string with user-specified
maximum length n. The full form, character varying, is equivalent.
• Int: An integer (a finite subset of the integers that is machine dependent).
The full form, integer, is equivalent.
• Smallint: A small integer (a machine-dependent subset of the integer
domain type).
• Numeric (p, d): A fixed-point number with user-specified precision. The
number consists of p digits (plus a sign), and d of the p digits are to the
right of the decimal point. Thus, numeric (3,1) allows 44.5 to be stored
exactly, but neither 444.5 or 0.32 can be stored exactly in a field of this
type.
• Real, Double Precision: Floating-point and double-precision floating-
point numbers with machine-dependent precision.
• Float(n): A floating-point number, with precision of at least n digits.
• Date: A calendar date containing a (four-digit) year, month, and day of
the month.

Let us discuss these commands in more detail:

CREATE TABLE Command


Syntax:
CREATE TABLE <table name> (
Column_name1 data type (column width) [constraints],
Column_name2 data type (column width) [constraints],
175
Relational Database Column_name3 data type (column width) [constraints],
Management System
(Rdbms) ………………………………………..
);

Where table name assigns the name of the table, column name defines the
name of the field, data type specifies the data type for the field and column
width allocates a specified size to the field.

Guidelines for Creation of Table:


• Table name should start with an alphabet.
• In table name, blank spaces and single quotes are not allowed.
• Reserve words of that DBMS cannot be used as table name.
• Proper data types and size should be specified.
• Unique column name should be specified.

Column Constraints: NOT NULL, UNIQUE, PRIMARY KEY, CHECK,


DEFAULT, REFERENCES,
On Delete Cascade: Using this option whenever a parent row is deleted in a
referenced table then all the corresponding child rows are deleted from the
referencing table. This constraint is a form of referential integrity constraint.
Primary Key (Aj1,Aj2, . . .,Ajm): The primary key specification says that
attributes Aj1,Aj2, . . .,Ajm form the primary key for the relation. The primary
key attributes are required to be non-null and unique; that is, no tuple can
have a null value for a primary key attribute, and no two tuples in the relation
can be equal on all the primary-key attributes.

Example 1 Example 2
create table account create table branch
(account-number char(10), (branch-name char(15),
branch-name char(15), branch-city char(30),
balance integer, assets integer,
primary key (account-number), primary key (branch-name),
check (balance >= 0)) check (assets >= 0))
CREATE TABLE Worker (
WORKER_ID INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
FIRST_NAME CHAR(25),
LAST_NAME CHAR(25),
SALARY INT(15),
JOINING_DATE DATETIME,
DEPARTMENT CHAR(25)
);

176
Structured Query
INSERT INTO Worker Langauge (SQL)

(WORKER_ID, FIRST_NAME, LAST_NAME, SALARY, JOINING_


DATE, DEPARTMENT) VALUES
(001, 'Monika', 'Arora', 100000, '14-02-20 09.00.00', 'HR'),
(002, 'Niharika', 'Verma', 80000, '14-06-11 09.00.00', 'Admin'),
(003, 'Vishal', 'Singhal', 300000, '14-02-20 09.00.00', 'HR'),
(004, 'Amitabh', 'Singh', 500000, '14-02-20 09.00.00', 'Admin'),
(005, 'Vivek', 'Bhati', 500000, '14-06-11 09.00.00', 'Admin'),
(006, 'Vipul', 'Diwan', 200000, '14-06-11 09.00.00', 'Account'),
(007, 'Satish', 'Kumar', 75000, '14-01-20 09.00.00', 'Account'),
(008, 'Geetika', 'Chauhan', 90000, '14-04-11 09.00.00', 'Admin');

To remove a relation from an SQL database, we use the drop table


command. The drop table command deletes all information about the dropped
relation from the database. The command drop table r is a more drastic action
than delete from r The latter retains relation r, but deletes all tuples in r. The
former deletes not only all tuples of r, but also the schema for r. After r is
dropped, no tuples can be inserted into r unless it is re-created with the create
table command.
We use the alter table command to add attributes to an existing relation. All
tuples in the relation are assigned null as the value for the new attribute. The
form of the alter table command is
alter table r add AD
where r is the name of an existing relation, A is the name of the attribute to be
added, and D is the domain of the added attribute. We can drop attributes
from a relation by the command
alter table r drop A
where r is the name of an existing relation, and A is the name of an attribute
of the relation. Many database systems do not support dropping of attributes,
although they will allow an entire table to be dropped.

8.2 INTERACTIVE DATA MANIPULATION


LANGUAGE (DML)
A data-manipulation language (DML) is a language that enables users to
access or manipulate data as organized by the appropriate data model. There
are basically two types:

• Procedural DMLs require a user to specify what data are needed and
how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a
user to specify what data are needed without specifying how to get those
data.
177
Relational Database Data manipulation is
Management System
(Rdbms) • The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database

Declarative DMLs are usually easier to learn and use than are procedural
DMLs. However, since a user does not have to specify how to get the data,
the database system has to figure out an efficient means of accessing data.
The DML component of the SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of
a DML that involves information retrieval is called a query language.
Although technically incorrect, it is common practice to use the terms query
language and data manipulation language synonymously.

This query in the SQL language finds the name of the customer whose
customer-id
is 192-83-7465:
select customer. customer-name
from customer
where customer. customer-id = 192-83-7465
The query specifies that those rows from the table customer where the
customer-id is 192-83-7465 must be retrieved, and the customer-name
attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the
following query finds the balance of all accounts owned by the customer with
customerid 192-83-7465.
select account. balance
from depositor, account
where depositor. customer-id = 192-83-7465 and
depositor. account-number = account. account-number
Order by clause
• It is used in the last portion of select statement
• By using this row can be sorted
• By default, it takes ascending order
• DESC: is used for sorting in descending order
• Sorting by column which is not in select list is possible
• Sorting by column Alias
Example:
SELECT EMPNO, ENAME, SAL*12 “ANNUAL”
FROM EMP
178
Structured Query
ORDER BY ANNUAL; Langauge (SQL)

Check Your Progress:


1) What is SQL? What are the advantages of SQL?
2) Prepare a table - Bonus
WORKER_REF_IDBONUS_DATE BONUS_AMOUNT
1 2016-02-20 5000
00:00:00
2 2016-06-11 3000
00:00:00
3 2016-02-20 4000
00:00:00
1 2016-02-20 4500
00:00:00
2 2016-06-11 3500
00:00:00

Q.3 Write an SQL query to fetch “FIRST_NAME” from Worker table using
the alias name as <WORKER_NAME>.

8.3 VIEW DEFINITION


A view is like a window through which data from tables can be viewed or
changed. The table on which a view is based is called Base table. The view is
stored as a SELECT statement in the data dictionary. When the user wants to
retrieve data, using view. Then the following steps are followed.
1) View definition is retrieved from data dictionary table. For example, the
view definitions in oracle are to be retrieved from table name USER-
VIEWS.
2) Checks access privileges for the view base table.
3) Converts the view query into an equivalent operation on the underlying
base table
Advantages:
• It restricts data access.
• It makes complex queries look easy.
• It allows data independence.
• It presents different views of the same data.
Type of Views: Simple views and Complex Views
Feature Simple Views Complex Views
Number of tables One One or more
Contain Functions No Yes
Contain groups of data No Yes
Data Manipulation IS allowed Not always
179
Relational Database Example: Create a view named employee salary having minimum, maximum
Management System
(Rdbms) and average salary for each department.

CREATE VIEW EMPSAL (NAME, MINSAL, MAXSAL, AVGSAL) AS


SELECT D.DNAME, MIN(E.SAL),MAX(E.SAL),AVG(E.SAL)
FROM EMP E, DEPT D
WHERE E.DEPTNO=D.DEPTNO
GROUP BY D.DNAME;
To see the result of the command above you can give the following
command:
SELECT * FROM EMPSAL;
You may get some sample output like:
NAME MINSAL MAXSA AVGSAL
-------------- --------- --------- -------------
ACCOUNTING 1300 5000 2916.6667
RESEARCH 800 3000 2175
SALES 950 2850 1566.6667
To see the structure of the view so created, you can give the following
command:
DESCRIBE EMPSAL;
Name Null? Type
--------------- ------------ ---------------------
NAME VARCHAR2 (14)
MINSAL NUMBER
MAXSAL NUMBER
AVGSAL NUMBER
Creating Views with Check Option: This option restricts those updates of
data values that cause records to go off the view. The following example
explains this in more detail:

Sequences
Sequences:
• automatically generate unique numbers
• are sharable
• are typically used to create a primary key value
• replace application code
• speed up the efficiency of accessing sequence Values when cached in
memory.
Example: Create a sequence named SEQSS that starts at 105, has a step of 1
and can take maximum value as 2000.
CREATE SEQUENCE SEQSS
180
Structured Query
START WITH 105 Langauge (SQL)
INCREMENT BY 1
MAX VALUE 2000;

8.4 TRANSACTION CONTROL


SQL provides commands for managing transactions in a relational database.
A transaction is a sequence of SQL statements that are treated as a single unit
of work. Transactions can include both read and write operations on the
database.

SQL specifies that a transaction begins implicitly when an SQL statement is


executed. The transaction remains active until it is explicitly ended with a
COMMIT or ROLLBACK statement. The COMMIT statement makes the
updates performed by the transaction become permanent in the database.
After the transaction is committed, a new transaction is automatically started.
The syntax for the COMMIT statement is:

Commit [WORK]

The optional keyword WORK is included for compatibility with some older
versions of SQL, but it is not necessary in modern SQL implementations.

When a COMMIT statement is executed, all changes made by the transaction


are permanently saved to the database, and the transaction is ended. If the
transaction was successful, the changes are made permanent. If the
transaction was not successful, the changes are rolled back and the database
is restored to its previous state.
In addition to COMMIT, SQL also provides the ROLLBACK statement for
ending a transaction and undoing any changes made during the transaction.

Rollback Work: Causes the current transaction to be rolled back; that is, it
undoes all the updates performed by the SQL statements in the transaction.
Thus, the database state is restored to what it was before the first statement of
the transaction was executed. The keyword work is optional in both the
statements. Transaction rollback is useful if some error condition is detected
during execution of a transaction. Commit is similar, in a sense, to saving
changes to a document that is being edited, while rollback is similar to
quitting the edit session without saving changes. Once a transaction has
executed commit work, its effects can no longer be undone by rollback
work.

The database system guarantees that in the event of some failure, such as an
error in one of the SQL statements, a power outage, or a system crash, a
transaction’s effects will be rolled back if it has not yet executed commit
work. In the case of power outage or other system crash, the rollback occurs
when the system restarts. For instance, consider a banking application, where
we need to transfer money from one bank account to another in the same
bank.

181
Relational Database To do so, we need to update two account balances, subtracting the amount
Management System
(Rdbms) transferred from one, and adding it to the other. If the system crashes after
subtracting the amount from the first account, but before adding it to the
second account, the bank balances would be inconsistent. A similar problem
would occur, if the second account is credited before subtracting the amount
from the first account, and the system crashes just after crediting the amount.
As another example, consider our running example of a university
application. We assume that the attribute tot cred of each tuple in the student
relation is kept up-to-date by modifying it whenever the student successfully
completes a course.

To do so, whenever the takes relation is updated to record successful


completion of a course by a student (by assigning an appropriate grade) the
corresponding student tuple must also be updated. If the application
performing these two updates crashes after one update is performed, but
before the second one is performed, the data in the database would be
inconsistent. By either committing the actions of a transaction after all its
steps are completed, or rolling back all its actions in case the transaction
could not complete all its actions successfully, the database provides an
abstraction of a transaction as being atomic, that is, indivisible. Either all the
effects of the transaction are reflected in the database, or none are (after
rollback).

8.5 SUMMARY
Structured Query Language (SQL) is a DML derived from relational
calculus. However, an SQL statement can be translated into equivalent
relational algebraic steps.

The basic construct of an SQL retrieval statement involves a


SELECT_FROM-WHERE block with which object fields and a predicate for
selection criteria are specified. An SQL statement may select one or more
records at a time. Nested SELECT-FROM -WHERE blocks may be coded
within an SQL expression.
Embedding SQL statements in a host programming language for batch
processing is described. The I/O operation of a conventional host
programming language is record-oriented, while the execution of an SQL
statement may retrieve an entire table. To accommodate such differences in
the I/O processing mode, SYSTEM R, DB2 and SQL/DS all provide a cursor
mechanism for application programs to process the table retrieved by an SQL
statement one record at a time.

8.6 KEYWORDS
Data Definition Language: To specify the database schema and a data
manipulation language to express database queries and updates.

Data-manipulation Language (DML): is a language that enables users to


access or manipulate data as organized by the appropriate data model. There
are basically two types:
182
Structured Query
• Procedural DMLs require a user to specify what data are needed and how Langauge (SQL)
to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a
user to specify what data are needed without specifying how to get those
data.

Schema Definition: The DBA creates the original database schema by


executing a set of data definition statements in the DDL.

SQL includes a variety of language constructs for queries on the database. All
the relational-algebra operations, including the extended relational-algebra
operations, can be expressed by SQL. SQL also allows ordering of query
results by sorting on specified attributes.

View relations can be defined as relations containing the result of queries.


Views are useful for hiding unneeded information, and for collecting together
information from more than one relation into a single view.

8.7 SELF-ASSESSMENT EXERCISES


1) What is a primary key?
2) What is a View?
3) Consider the following database
Branch (branch name, branch city, assets)
Customer (customer name, customer street, customer city)
Loan (loan number, branch name, amount)
Borrower (customer name, loan number)
Account (account number, branch name, balance)
Depositor (customer name, account number)
Give an expression in the relational algebra for each of the following
queries.
a) Find the names of all branches located in “Delhi”.
b) Find the names of all borrowers who have a loan in branch
“Saketnagar”
c) What are the appropriate primary keys?
d) Given your choice of primary keys, identify appropriate foreign
keys.
4) Discuss the relative merits of procedural and nonprocedural languages.
5) Consider the insurance database as follows
Person (driver id, name, address)
Car (license, model, year)
Accident (report number, date, location)
Owns (driver id, license)
Participated (report number, license, driver id, damage amount).
183
Relational Database Construct the following SQL queries for this relational database.
Management System
(Rdbms) a) Find the total number of people who owned cars that were involved in
accidents in 2009.
b) Add a new accident to the database; assume any values for required
attributes.
c) Delete the Mazda belonging to “John Smith”.

8.8 FURTHER READINGS


1. Atre S. Database Structural Techniques for Design, Performance &
Management, John Wiley & Sons, 1980
2. Date C.J. A Guide to DB2 Addison-Wesley. 1984
3. Date C.J. An Introduction to Database Systems, Addison-Wesley,1981
4. Hawry Stkiewycz LT Database Analysis and Design, SRA, 1984
5. Kroenke D M. Database Processing: Fundamentals, Design,
Implementation 2nd Edition, SRA, 1983

184
DBMS Implementation
UNIT 9 DBMS IMPLEMENTATION AND And Future Trends

FUTURE TRENDS

Objectives
After studying this unit, you will be able to:
• Explain relational database language.
• Understand web interfaces to databases.
• Understand how to manage a database using a cloud-based solution.
• Know how to control database access.

Structure
9.0 Introduction
9.1 Web Interfaces to Database
9.2 Specialty Database
9.3 Automation and Database
9.4 Augmented Database
9.5 In-memory Database
9.6 Graph Database
9.7 Open-Source Database
9.8 Databases as Service
9.9 Data Mining
9.10 Summary
9.10 Keywords
9.11 Self–Assessment Exercises
9.12 Further Readings

9.0 INTRODUCTION
With the advent of cloud computing, the market for Database Management
systems has shifted towards cloud-based solutions. Cloud-based DBMS
solutions offer many advantages over on-premises solutions, such as
scalability, flexibility, and reduced infrastructure costs. Cloud-based
solutions also offer the ability to easily integrate with other cloud-based
services, such as analytics and machine learning tools, which can help
organizations gain insights from their data more quickly and efficiently.

Another advantage of cloud-based DBMS solutions is that they often come


with built-in security features, such as encryption and multi-factor
authentication, which can help to protect sensitive data. This is especially
important in today’s world, where data breaches and cyberattacks are
becoming increasingly common.
185
D
Relational Database However, despite the advantages of cloud-based DBMS solutions, there are
Management System
(Rdbms) still some organizations that prefer to use on-premises solutions, due to
concerns about data security, compliance, and control. In response to this,
many DBMS vendors are now offering hybrid solutions, which combine on-
premises and cloud-based solutions to provide organizations with the best of
both worlds.
Overall, the shift towards cloud-based DBMS solutions is likely to continue,
as more organizations embrace cloud computing and the benefits it offers.
However, it is important for organizations to carefully evaluate their options
and choose a solution that best meets their specific needs and requirements.

Legacy systems refer to older computer systems, software applications, or


hardware components that are no longer considered up-to-date or supported
by the vendor. These systems are often critical to the day-to-day operations of
an organization and may be difficult to replace or upgrade due to various
reasons, such as budget constraints, technical limitations, or customizations
that have been made to the system over time.

Legacy systems can pose several challenges for organizations, such as


security vulnerabilities, difficulty integrating with newer technologies, and
the cost of maintaining outdated hardware and software. Despite these
challenges, many organizations continue to use legacy systems due to their
business-critical nature, cost-effectiveness, or simply because they work well
and there is no pressing need to change them.
To address the challenges posed by legacy systems, organizations can take a
variety of approaches, such as modernization, migration, or retirement.
Modernization involves updating the software or hardware components of a
legacy system to improve its functionality and maintainability, while
migration involves moving data and applications from a legacy system to a
newer one. Retirement involves decommissioning a legacy system and
transitioning to a new one altogether. Each approach has its own advantages
and disadvantages, and organizations must carefully consider their options
before making a decision.

Legacy systems are an important consideration when discussing database


management systems because many organizations still rely on them. These
systems are often built on older technology that may not be compatible with
newer systems or cloud-based DBMSs. Migrating data from legacy systems
to newer DBMSs can be a complex and time-consuming process, but it may
be necessary to take advantage of new features and improve efficiency. As
such, it's important for organizations to weigh the benefits and drawbacks of
upgrading or migrating their database systems. Additionally, with the rise of
electronic commerce, databases are becoming increasingly important in
managing and analyzing customer data and transactions. Standards are
crucial in ensuring interoperability between different systems and
applications, enabling organizations to exchange data and perform useful
tasks.

DBMSs are changing, however. They are expanding, taking on more


186 responsibilities, and providing smarter answers. As new goals and problems
DBMS Implementation
present themselves, the desire to find new ways to use Database Management And Future Trends
systems prompt unique solutions. Many of these innovations are available
only in cloud-based DBMSs. As Database Management systems develop new
features and new options, it makes sense to reexamine the organization’s
current system, and consider all new options.

Practically all use of databases occurs from within application programs.


Correspondingly, almost all user interaction with databases is indirect, via
application programs. Not surprisingly, therefore, database systems have long
supported tools such as form and GUI builders, which help in rapid
development of applications that interface with users. In recent years, the
Web has become the most widely used user interface to databases.
Standards are very important for application development, especially in the
age of the internet, since applications need to communicate with each other to
perform useful tasks. A variety of standards have been proposed that affect
database application development. Electronic commerce is becoming an
integral part of how we purchase goods and services and databases play an
important role in that domain. Legacy systems are systems based on older-
generation technology. They are often at the core of organizations, and run
mission-critical applications.

9.1 WEB INTEFACES TO DATABASE


The Web has become important as a front end to databases for several
reasons: Web browsers provide a universal front end to information supplied
by back ends located anywhere in the world. The front end can run on any
computer system, and there is no need for a user to download any special-
purpose software to access information. Further, today, almost everyone who
can afford it has access to the Web. With the growth of information services
and electronic commerce on the Web, databases used for information
services, decision support, and transaction processing must be linked with the
Web.

Web interfaces to databases allow users to interact with and manipulate data
stored in a database through a web browser. These interfaces can be custom-
built or implemented using off-the-shelf software, and they provide a user-
friendly way for non-technical users to access and use data. Some common
types of web interfaces to databases include:

Web Forms: Web forms allow users to input data into a database through a
web browser. They can be used for tasks such as data entry, surveys, or
online orders.

Reporting Interfaces: Reporting interfaces allow users to query and retrieve


data from a database, and display the results in a user-friendly format such as
a chart, graph, or table.

Dashboard Interfaces: Dashboard interfaces provide an overview of data in


a database, often using visualizations such as charts and graphs to display key
metrics and trends.
187
D
Relational Database Mobile Interfaces: Mobile interfaces provide access to a database through a
Management System
(Rdbms) mobile device such as a smartphone or tablet. They can be designed to be
responsive and mobile-friendly, making it easy for users to access data on-
the-go.

Web interfaces to databases can be integrated with various web technologies,


such as HTML, CSS, JavaScript, and AJAX. They can also be integrated
with server-side technologies such as PHP, Python, Ruby, and Java to
provide dynamic functionality and database connectivity.
Web interfaces to databases are essential for many organizations, as they
allow users to access and manipulate data without the need for specialized
software or technical expertise.
Web provides a way to integrate multimedia content and other resources into
database applications. For example, a database application for a museum
could include multimedia content such as images, videos, and audio
recordings to enhance the user experience.

Another important aspect of the Web is its ability to provide real-time access
to data. This means that users can access and interact with data in real-time,
rather than waiting for data to be processed and returned to them. This is
particularly important for applications such as financial transactions and real-
time monitoring systems.
Finally, the Web provides a way to easily distribute database applications to a
large number of users. With the use of cloud-based technologies,
organizations can deploy their database applications to a large number of
users without having to worry about managing hardware and software
infrastructure. This makes it easier for organizations to provide access to their
database applications to users located anywhere in the world.

Whenever relevant data in the database are updated, the generated documents
will automatically become up-to-date. The generated document can also be
tailored to the user on the basis of user information stored in the database.
Web interfaces provide attractive benefits even for database applications that
are used only with a single organization. browsers today can fetch programs
along with HTML documents, and run the programs on the browser, in safe
mode—that is, without damaging data on the user’s computer. Programs can
be written in client-side scripting languages, such as JavaScript, or can be
“applets” written in the Java language.

A Web server is a program running on the server machine, which accepts


requests from a Web browser and sends back results in the form of HTML
documents. The browser and Web server communicate by a protocol called
the Hyper Text Transfer Protocol (HTTP). HTTP provides powerful features,
beyond the simple transfer of documents. The most important feature is the
ability to execute programs, with arguments supplied by the user, and deliver
the results back as an HTML document. As a result, a Web server can easily
act as an intermediary to provide access to a variety of information services.

188
DBMS Implementation
A new service can be created by creating and installing an application And Future Trends
program that provides the service. The common gateway interface (CGI)
standard defines how the Web server communicates with application
programs. The application program typically communicates with a database
server, through ODBC, JDBC, or other protocols, in order to get or store
data.
CGI, or Common Gateway Interface, is a standard for interfacing external
applications with web servers to generate dynamic web content. CGI scripts
are used to enable web servers to execute programs or scripts, which can
generate dynamic content, such as web pages or other multimedia content.
CGI scripts can be written in many different programming languages,
including Perl, Python, and PHP.

ODBC stands for Open Database Connectivity. It is a standard interface for


accessing database management systems (DBMS) developed by Microsoft.
ODBC provides a common language for communication between
applications and DBMS, allowing different applications to access data stored
in various DBMS using a common set of functions. This enables developers
to write database applications that can work with different DBMS without
having to rewrite the application code each time. ODBC drivers are available
for most popular DBMS, such as Oracle, SQL Server, MySQL, and
PostgreSQL, making it a widely used interface in the industry.

ODBC, or Open Database Connectivity, is a standard programming interface


for accessing and manipulating data stored in a database. ODBC provides a
common API for database access, which allows software applications to
access data from different database management systems (DBMSs) using the
same programming interface.
JDBC stands for Java Database Connectivity. It is a standard Java API for
accessing and manipulating relational databases. JDBC allows Java programs
to interact with a wide range of DBMS, such as Oracle, SQL Server, MySQL,
and PostgreSQL, without having to use DBMS-specific APIs. JDBC provides
a set of interfaces and classes that enable Java programs to perform common
database operations, such as connecting to a database, executing SQL queries
and updates, retrieving results, and managing transactions. JDBC drivers are
available for most popular DBMS, making it a widely used interface in the
industry.
JDBC, or Java Database Connectivity, is a Java-based standard API for
accessing and manipulating data stored in a database. JDBC provides a
platform-independent interface for accessing databases from Java
applications, and supports a wide range of database management systems,
including Oracle, MySQL, and Microsoft SQL Server.

9.2 SPECIALITY DATABASE


The term object-oriented database is used to describe a database system that
supports direct access to data from object-oriented programming languages,
without requiring a relational query language as the database interface. The
189
D
Relational Database object-relational data model provides a smooth migration path from relational
Management System
(Rdbms) databases, which is attractive to relational database vendors. As a result,
starting with SQL:1999, the SQL standard includes a number of object-
oriented features in its type system, while continuing to use the relational
model as the underlying model.

Specialty databases are databases that are designed to handle specific types of
data or specialized applications. Some examples of specialty databases
include:
1. Geographic Information System (GIS) databases: These databases are
designed to store, retrieve, and analyze spatial data, such as maps and
satellite images.
2. Time-series databases: These databases are optimized for storing and
querying time-series data, such as stock prices, weather data, and sensor
data.
3. Graph databases: These databases are designed to store and query graph
structures, such as social networks and recommendation engines.
4. Document-oriented databases: These databases are optimized for storing
and querying semi-structured and unstructured data, such as text
documents and multimedia content.
5. In-memory databases: These databases store data in memory instead of
on disk, allowing for faster access and processing of data.
6. Real-time databases: These databases are designed to handle high-speed,
low-latency applications, such as financial trading systems and gaming
platforms.
Each type of specialty database has its own unique set of features and
capabilities, making it well-suited for specific types of applications and use
cases.

Big data refers to the large, complex datasets that cannot be easily managed
or processed by traditional database management systems. Big data solutions
typically involve distributed computing technologies that allow for parallel
processing of data across many servers, as well as specialized tools and
algorithms for data analysis and machine learning.

Big data refers to extremely large and complex data sets that cannot be
effectively processed and analyzed using traditional data processing tools and
techniques. These data sets are typically too large to be managed by a single
computer or traditional database management system, and they may also be
unstructured or semi-structured, meaning that they do not fit neatly into
traditional database structures.

Big data is often characterized by the "3Vs": volume, velocity, and variety.
Volume refers to the sheer size of the data, velocity refers to the speed at
which data is generated and needs to be processed, and variety refers to the
different forms and structures of data.

190
DBMS Implementation
To handle big data, specialized tools and technologies are needed, such as And Future Trends
distributed computing systems like Hadoop and Spark, NoSQL databases,
and data warehouses. These technologies allow for the processing and
analysis of large, complex data sets to extract valuable insights and
knowledge.

Another trend in database management is the use of NoSQL databases.


NoSQL databases are non-relational and are designed to handle unstructured
or semi-structured data, such as social media data or sensor data from the
Internet of Things (IoT). NoSQL databases use different data models than
traditional relational databases, such as key-value, document-oriented, or
graph databases.
NoSQL (Not Only SQL) databases are a class of databases that are designed
to handle large and complex data sets, which may be unstructured or semi-
structured, and do not fit well into the traditional tabular format of SQL
databases. NoSQL databases are designed to be highly scalable, flexible, and
efficient, with the ability to handle high volumes of data and complex
queries.

Unlike traditional SQL databases, NoSQL databases do not use the relational
model, and instead use a variety of data models, such as document-oriented,
graph-based, key-value, and column-family. Each data model is optimized
for specific types of data and use cases, allowing NoSQL databases to be
highly specialized for different applications.
Some examples of popular NoSQL databases include MongoDB, Cassandra,
Couchbase, and Amazon DynamoDB. These databases are often used in big
data applications, real-time data processing, content management, and other
data-intensive applications where scalability, flexibility, and performance are
critical.
Finally, database management systems are increasingly being integrated with
other technologies, such as cloud computing and artificial intelligence (AI).
Cloud-based database solutions allow for scalable and flexible data storage
and processing, while AI tools can be used to automate tasks such as data
cleaning, classification, and analysis. Artificial intelligence (AI) refers to the
development of computer systems that can perform tasks that typically
require human intelligence, such as visual perception, speech recognition,
decision-making, and natural language processing.
AI systems use techniques such as machine learning, deep learning, and
neural networks to analyze data, recognize patterns, and make decisions
based on that data. These systems can be used to automate and optimize a
wide range of processes, from customer service to medical diagnosis, and
they have the potential to revolutionize the way we live and work.
XML (Extensible Markup Language) was initially designed as a way to add
structured information to text documents. However, it has since become
widely used as a format for exchanging data between different applications
and systems. One of the key advantages of XML is its flexibility in
representing data with nested structure, which makes it useful for storing and
exchanging nontraditional data formats. 191
D
Relational Database XML (Extensible Markup Language) is a markup language used to store and
Management System
(Rdbms) transport data. It uses tags to define elements and attributes to provide
additional information about those elements. Unlike HTML, which is used to
define the structure and presentation of web pages, XML is used to define
data and its structure. XML is designed to be both human-readable and
machine-readable, making it a popular choice for exchanging data between
different systems. It is widely used in web services, where it provides a
standard format for exchanging data between different applications and
platforms. XML documents can be validated against a schema, which defines
the rules and constraints for the structure and content of the document. This
helps to ensure the accuracy and consistency of the data.

9.3 AUTOMATION AND DATABASE


MANAGEMENT
Automation and Database Management Systems (DBMS) are two
interrelated concepts in modern information technology. DBMS is a software
system that allows users to define, create, maintain and control access to a
database. A database is a collection of data that is organized in a manner that
allows for efficient retrieval, storage and management. DBMS makes it easier
for organizations to store and retrieve data, thereby reducing data
redundancy, inconsistency and improving data quality.

Automation, on the other hand, refers to the use of technology to perform


tasks without human intervention. In the context of DBMS, automation can
refer to automating tasks such as data entry, data backup, data recovery, and
database monitoring. Automation can be used to enhance the functionality of
a DBMS by reducing the time and effort required for manual tasks. For
instance, database administrators can use automation tools to perform regular
database backups, monitor database performance, and troubleshoot issues
automatically. Automation can also help in streamlining database operations
and reducing human error. The combination of automation and DBMS can
help organizations improve their efficiency, reduce errors, and improve the
accuracy of their data.

9.4 AUGMENTED DATABASE


Augmented DBMS is a term used to describe the integration of artificial
intelligence and machine learning technologies into traditional database
management systems. Augmented DBMS combines the capabilities of
DBMS with the intelligent processing capabilities of AI and machine
learning to create a more efficient and effective system for managing data.
One of the key benefits of augmented DBMS is its ability to automate tasks
that were previously done manually, such as optimizing query performance,
detecting anomalies in data, and predicting potential issues with the database.
By using AI and machine learning algorithms, augmented DBMS can
identify patterns in data and make predictions about future behavior. This can
help database administrators to proactively address issues and improve
overall system performance.
192
DBMS Implementation
Another benefit of augmented DBMS is that it can help organizations to And Future Trends
better utilize their data. By analyzing data in real-time, augmented DBMS
can identify trends and insights that may have been missed with traditional
database management systems. This can help organizations to make better
decisions and improve their operations.

Augmented DBMS has the potential to revolutionize the way that


organizations manage their data. By integrating AI and machine learning into
traditional database management systems, organizations can improve
efficiency, reduce errors, and gain insights that were previously impossible to
obtain.

Integrating AI and machine learning into traditional database management


systems can provide significant benefits for organizations. By automating
routine tasks and analyzing large volumes of data, organizations can improve
efficiency, reduce errors, and gain insights that were previously impossible to
obtain.

Some ways in which AI and machine learning can be integrated into


traditional database management systems include:
1. Automated data entry: AI and machine learning algorithms can be used
to automatically enter data into a database, reducing the risk of errors and
improving efficiency.
2. Predictive analytics: Machine learning algorithms can be used to analyze
large volumes of data and make predictions about future trends and
outcomes.
3. Fraud detection: AI and machine learning algorithms can be used to
identify patterns and anomalies in data that may indicate fraud or other
types of malicious activity.
4. Natural language processing: Natural language processing techniques
can be used to analyze unstructured data, such as customer feedback or
social media posts, and extract valuable insights.
5. Recommendation systems: Machine learning algorithms can be used to
analyze customer behavior and make personalized recommendations for
products or services.

Integrating AI and machine learning into traditional database management


systems can provide organizations with powerful tools for data analysis,
automation, and decision-making. However, it is important to carefully
evaluate the potential risks and benefits of these technologies, and to ensure
that data privacy and security are protected.

9.5 IN-MEMORY DATABASE


In-memory databases (IMDBs) are a type of database management system
that stores data in computer memory instead of on a hard drive or other
storage device. In contrast to traditional disk-based databases, which must
read and write data from disk, IMDBs can access and process data much
more quickly since data is already in memory. 193
D
Relational Database IMDBs are designed to handle large amounts of data and provide high-speed
Management System
(Rdbms) access to this data. They are often used in real-time applications, such as
financial trading, gaming, and e-commerce, where even small delays in data
processing can have a significant impact.
One of the key benefits of IMDBs is their speed. Since data is stored in
memory, IMDBs can access and process data much more quickly than disk-
based databases. This can be especially beneficial for applications that
require real-time data processing and analysis.
Another benefit of IMDBs is their ability to handle large amounts of data.
IMDBs can support terabytes of data, making them ideal for applications that
require large amounts of data to be stored and accessed quickly.
However, IMDBs can also have some limitations. Since data is stored in
memory, the amount of data that can be stored is limited by the amount of
available memory. Additionally, IMDBs can be more expensive than disk-
based databases due to the cost of memory.
IMDBs can provide significant benefits for applications that require high-
speed data processing and analysis. While they may not be suitable for all
applications, IMDBs can provide a valuable tool for organizations that
require fast and efficient data management.

9.6 GRAPH DATABASE


A graph database is a type of NoSQL database that uses graph theory to store,
query, and manage data. Graph databases represent data as nodes (vertices)
and edges, which connect the nodes. The nodes represent entities or objects,
while the edges represent relationships between these entities or objects.
In a graph database, nodes can have attributes (properties) and be connected
to other nodes through one or more edges. The edges can also have attributes
(properties) that describe the relationships between the nodes. Graph
databases can be used to model complex relationships between entities, such
as social networks, supply chain management, and recommendation systems.
One of the key benefits of graph databases is their ability to provide fast and
efficient queries for complex relationship-based data. Graph databases use
graph traversal algorithms to navigate the graph and retrieve data. This can
provide much faster and more efficient queries than traditional relational
databases, which may require complex join operations to retrieve the same
data.
Graph databases can also provide flexibility in data modeling, as they can
easily handle data with varying degrees of complexity and depth. This makes
graph databases well-suited for applications that require complex and flexible
data modeling, such as recommendation systems and fraud detection.
Graph databases can provide significant benefits for applications that require
complex relationship-based data modeling and fast and efficient querying.
While they may not be suitable for all applications, graph databases can
provide a valuable tool for organizations that require flexible and efficient
data management.
194
DBMS Implementation
9.7 OPEN-SOURCE DATABASE And Future Trends

An open-source database is a database management system that is freely


available to use, modify, and distribute under an open-source license. Open-
source databases provide users with the ability to view, modify, and enhance
the source code of the database, making them more flexible and customizable
than proprietary databases.

One of the key benefits of open-source databases is their cost-effectiveness.


Since they are freely available, open-source databases can significantly
reduce the cost of database management for organizations. Additionally,
open-source databases often have a large community of developers who
contribute to the development of the database, resulting in frequent updates
and bug fixes.

Open-source databases also provide a high degree of flexibility and


customization. Users can modify the source code of the database to meet their
specific needs, or they can choose from a wide range of plug-ins and
extensions developed by the community. This makes open-source databases
well-suited for organizations that require customized database solutions.

However, open-source databases may also have some limitations. They may
not provide the same level of support and documentation as proprietary
databases, and may require more technical expertise to set up and maintain.
Additionally, not all open-source databases have the same level of
functionality and scalability as proprietary databases, so users should
carefully evaluate their needs before selecting an open-source database.
Open-source databases can provide a cost-effective and flexible option for
organizations that require database management solutions. While they may
not be suitable for all applications, open-source databases can provide a
valuable tool for organizations that require customizable and affordable data
management.

9.8 DATABASE AS SERVICE


Database as a Service (DBaaS) is a cloud computing service model in which
a third-party provider offers a database management system over the internet.
DBaaS providers typically manage the installation, configuration, and
maintenance of the database, allowing customers to focus on their data and
applications instead of the underlying infrastructure.

One of the key benefits of DBaaS is its flexibility and scalability. Customers
can easily scale their database resources up or down to meet changing
demand without having to worry about hardware or infrastructure costs.
Additionally, DBaaS providers typically offer a range of database options,
such as relational, NoSQL, and graph databases, allowing customers to select
the database type that best meets their needs.

DBaaS also offers simplified database management, since the provider


manages the installation, configuration, and maintenance of the database.
This can reduce the burden on in-house IT staff, allowing them to focus on 195
D
Relational Database other tasks. DBaaS providers also typically offer automated backups, disaster
Management System
(Rdbms) recovery, and high availability options, ensuring that customer data is secure
and available.

However, DBaaS may also have some limitations. Since the database is
hosted on the cloud, customers may experience latency or network
performance issues. Additionally, customers may have limited control over
the database configuration and maintenance, which may be a concern for
organizations with complex or specialized database requirements.
DBaaS can provide significant benefits for organizations that require flexible
and scalable database management solutions without the burden of
infrastructure maintenance. While it may not be suitable for all applications,
DBaaS can provide a valuable tool for organizations that require simplified
and cost-effective data management.

9.9 DATA MINING


Data mining is the process of discovering patterns, relationships, and insights
from large sets of structured and unstructured data. It involves extracting
knowledge and useful information from large data sets using statistical and
machine learning algorithms, as well as data visualization and other
techniques.
The goal of data mining is to extract valuable insights and knowledge from
the data, which can be used to make informed business decisions, improve
processes, and gain a competitive advantage. Data mining techniques can be
applied to a wide range of data sources, including transactional databases,
customer data, social media data, and web data.
Data mining techniques include:
1. Classification: this involves grouping data into predefined categories
based on specific criteria.
2. Clustering: this involves grouping data into similar categories based on
similarity of features.
3. Regression analysis: this involves predicting a continuous variable based
on the relationship between other variables.
4. Association rule mining: this involves discovering relationships between
items in a dataset, such as products frequently purchased together.
5. Anomaly detection: this involves identifying unusual or unexpected
patterns in the data.

Data mining can be used in a variety of fields, such as healthcare, finance,


marketing, and social media analysis. However, data mining raises concerns
about privacy and data protection, and organizations must take steps to
protect sensitive information and ensure compliance with data privacy
regulations.

196
DBMS Implementation
9.10 SUMMARY And Future Trends

We have pointed out in this unit that there is resistance to new DBMS tools.
In spite of the apparent resistance, many more organizations are moving in
towards the use of DBMS. It is also quite clear that organizations will have to
move towards DBMS, especially those based on the relational approach in
order to maintain their competitive position in the emerging marketplaces.

The availability of many more products as well as the appearance of many


more features in these products causes a greater responsibility on the
managers responsible for the implementation to adopt an approach which
would lead to successfully meeting the information needs of the organization.
It has been emphasized in this unit that apart from identifying the technically
right approach and to provide the system on which it could run, other
administrative factors must also be borne in mind, so that the new systems
actually fit well into the organization. This unit also briefly refers to the
trends in the products and the emerging standards because while putting up a
DBMS system, it must be realized that it will have to cope with the
information needs of the organization not only as they stand today, but also
for some time into the future. It has been usually observed that once
computerization is successfully maintained in any organisation, the needs of
that organisation for data management grow exponentially in volume. It is,
therefore, of paramount importance that the management takes a vision which
is somewhat larger than just the needs of the moment, keeping in mind the
future trends and emerging standards.

9.10 KEYWORDS
Cloud-based DBMS: they often come with built-in security features, such as
encryption and multi-factor authentication, which can help to protect
sensitive data.
Legacy Systems: Refer to older computer systems, software applications, or
hardware components that are no longer considered up-to-date or supported
by the vendor.
Web Interfaces to Databases: allow users to interact with and manipulate
data stored in a database through a web browser. These interfaces can be
custom-built or implemented using off-the-shelf software, and they provide a
user-friendly way for non-technical users to access and use data. Some
common types of web interfaces to databases include Webforms, Reporting
interfaces, Dashboard interfaces, Mobile interfaces etc.

Mobile Interfaces: Mobile interfaces provide access to a database through a


mobile device such as a smartphone or tablet. They can be designed to be
responsive and mobile-friendly, making it easy for users to access data on-
the-go.

Web Server: A Web server is a program running on the server machine,


which accepts requests from a Web browser and sends back results in the
form of HTML documents. The browser and Web server communicate by a 197
D
Relational Database protocol called the Hyper Text Transfer Protocol (HTTP).
Management System
(Rdbms)
Speciality Databases: Specialty databases are databases that are designed to
handle specific types of data or specialized applications.

NoSQL: NoSQL (Not Only SQL) databases are a class of databases that are
designed to handle large and complex data sets, which may be unstructured
or semi-structured, and do not fit well into the traditional tabular format of
SQL databases.

9.11 SELF ASSESSMENT EXERCISES


1. Web interface in databases gaining popularity, discuss with the help of a
suitable real-life example.
2. Comment on the significance of speciality databases? Describe features
of the ProQuest database.
3. List key benefits and limitations of open-source databases. Suggest some
popular applications of open-source databases.
4. Data Mining is going to be the future of database technology, justify the
statement.
5. Highlight the differences between NoSQL and SQL, in light of their
applications and implementations?

9.12 FURTHER READINGS


1. Atre S. Database Structural Techniques for Design, Performance &
Management, John Wiley & Sons, 1980
2. Date C.J. An Introduction to Database Systems, Addison-Wesley, 1981
3. Kroenke D.M. Database Processing: Fundamentals, Design,
4. Implementation 2nd Edition, SRA, 1983
5. Everest, G.C., Database Management: Objectives System functions &
Administration, McGraw Hill, 1986.
6. Ven Halle Fleming, Handbook of Relational Database Design, Wesley,
1990

198
DBMS Implementation
And Future Trends

BLOCK 4
EMERGING TECHNOLOGIES FOR
BUSINESS

199
Relational Database D
Management System
(Rdbms)

200
Cloud Computing
UNIT 10 CLOUD COMPUTING

Objectives
After studying this unit, you will be able to:
• Appreciate cloud computing, and classify services of cloud computing.
• Understand cloud computing architecture.
• Comprehend the platforms for the development of cloud applications and
list the applications of cloud services.
• Summarise the features and associated risks of different cloud
deployment and service models.
• Appreciate the emergence of the cloud as the next-generation computing
paradigm.

Structure
10.0 Introduction to Cloud Computing
10.0.1 History and Evolution of Cloud Computing
10.0.2 Types of Cloud
10.0.3 Cloud Components
10.0.4 Cloud Computing Infrastructure
10.0.5 Pros and Cons of Cloud Computing
10.1 Cloud Computing Architecture
10.2 Cloud Deployment Models
10.3 Service Management in Cloud Computing
10.4 Data Management in Cloud Computing
10.5 Resource Management and Security in Cloud
10.5.1 Inter Cloud Resource Management
10.5.2 Resource Provisioning and Resource Provisioning Methods
10.5.3 Global Exchange of Cloud Resources
10.5.4 Cloud Security Challenges
10.5.6 Security Governance – Virtual Machine Security – IAM –Security Standards
10.6 Cloud Technologies and Advancements
10.6.1 Hadoop
10.6.2 Virtual Box
10.6.3 Google App Engine
10.6.4 Open Stack
10.6.5 Federation in the Cloud – Four Levels of Federation
10.7 Case Studies
10.8 Summary
10.9 Self-Assessment Excercises
10.10 Keywords
10.11 Further Readings

201
Emerging
Technologies for 10.0 INTRODUCTION TO CLOUD COMPUTING
Business
Cloud computing refers to the delivery of computing services such as storage,
processing power, and software applications over the internet. Instead of
having to install and manage software on their own computers, users can
access these services on-demand from a remote server or data centre.

Cloud computing offers many benefits such as flexibility, scalability, and


cost savings. It allows organizations to quickly scale up or down their
computing resources as needed, without having to invest in expensive
hardware and software. Additionally, it enables collaboration and remote
work, as users can access their applications and data from anywhere with an
internet connection.
Cloud computing also presents challenges such as data security, vendor lock-
in, and the need for reliable internet connectivity. However, these challenges
can be addressed through careful planning, implementation, and management
of cloud-based services.

10.0.1 History and Evolution of Cloud Computing


The roots of cloud computing can be traced back to the 1960s, with the
development of time-sharing systems that allowed multiple users to access a
single resource (computer) simultaneously. This model evolved in the 1990s
with the introduction of application service providers (ASPs), which provided
access to business applications over the internet. However, these early
versions of cloud computing were limited in their capabilities and
accessibility.

The modern era of cloud computing began in the mid-2000s, with the launch
of Amazon Web Services (AWS) in 2006. AWS provided a platform for
developers to build and deploy web applications without the need for
physical infrastructure. This marked the beginning of the Infrastructure as a
Service (IaaS) model, which has since become a dominant part of the cloud
computing landscape.
In 2008, Google launched its cloud-based application suite, Google Apps,
which provided businesses with email, word processing, and spreadsheet
software accessible through a web browser. This marked the beginning of the
Software as a Service (SaaS) model, which has since become a popular
choice for businesses looking to reduce their software licensing and
maintenance costs.
In 2010, the Platform as a Service (PaaS) model was introduced with the
launch of Heroku, a cloud-based platform for deploying and managing web
applications. PaaS provided developers with a platform for building and
deploying applications without having to worry about the underlying
infrastructure.
Since then, cloud computing has continued to evolve and expand, with new
technologies and services being introduced on a regular basis. Today, cloud
computing has become an integral part of many businesses' IT strategies,
202 enabling them to improve their agility, reduce costs, and focus on innovation.
Cloud Computing
10.0.2 Types of Cloud
Cloud computing can be categorized into four main types based on their
deployment models and ownership of infrastructure. These types are:

Public Cloud: A public cloud is a type of cloud computing in which the


infrastructure is owned and operated by a third-party service provider. The
resources, such as servers and storage, are shared among multiple
organizations, and users can access these resources over the internet.
Examples of public cloud providers include AWS, Microsoft Azure, and
Google Cloud Platform.
Private Cloud: A private cloud is a type of cloud computing in which the
infrastructure is owned and operated by a single organization. The resources
are not shared with other organizations, and the cloud can be hosted either
on-premises or in a third-party data centre. Private clouds provide
organizations with greater control over their infrastructure and data, and can
be customized to meet specific business needs.
Hybrid Cloud: A hybrid cloud is a combination of public and private cloud
models, allowing organizations to leverage the benefits of both.
Organizations can use the public cloud for non-sensitive data and
applications, while keeping sensitive data and applications in a private cloud.
Hybrid clouds provide organizations with greater flexibility and scalability,
while also providing greater control over sensitive data.

Multi-Cloud: A multi-cloud strategy involves using multiple cloud providers


for different applications or workloads. This approach can provide
organizations with greater flexibility and redundancy, as well as the ability to
choose the best cloud provider for each specific need. However, managing
multiple cloud providers can also present challenges in terms of integration
and security.
Each type of cloud has its own advantages and disadvantages, and
organizations should carefully evaluate their needs and objectives before
choosing a cloud deployment model.

10.0.3 Cloud Components


Cloud computing typically consists of several components that work together
to deliver computing resources and services. These components can be
categorized into three main layers: the Infrastructure layer, the Platform
layer, and the Software layer.
Infrastructure Layer: The Infrastructure layer provides the basic building
blocks of cloud computing, including servers, storage, networking, and
virtualization technology. These resources can be used to build and deploy
applications and services. Infrastructure as a Service (IaaS) providers like
Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform
(GCP) offer these resources as a service, allowing organizations to provision
and manage their own computing infrastructure.

203
Emerging Platform Layer: The Platform layer provides a platform for building and
Technologies for
Business deploying applications, without the need to manage the underlying
infrastructure. Platform as a Service (PaaS) providers like Heroku, Google
App Engine, and Microsoft Azure App Service offer a platform for
developers to build, deploy, and manage their applications, with built-in
scalability and flexibility.
Software Layer: The Software layer provides software applications and
services that can be accessed over the internet, without the need to install and
manage them on local devices. Software as a Service (SaaS) providers like
Salesforce, Microsoft Office 365, and Google Workspace offer software
applications that are hosted and managed by the provider, allowing users to
access them from anywhere with an internet connection.

In addition to these layers, cloud computing also includes management and


security components. Cloud Management Platforms (CMPs) provide tools for
managing and monitoring cloud resources, while Security as a Service
(SECaaS) providers offer security services to protect data and applications in
the cloud.

Overall, cloud computing components work together to provide organizations


with the flexibility, scalability, and cost savings they need to innovate and
grow their business.

10.0.4 Cloud Computing Infrastructure


Cloud computing infrastructure refers to the physical and virtual components
that make up a cloud computing environment. These components are used to
deliver cloud computing services, including computing resources, storage,
and networking capabilities. The infrastructure layer of cloud computing
typically includes the following components:
Servers: Cloud providers use large numbers of servers to provide computing
resources to their customers. These servers are typically housed in data
centres, and are managed and maintained by the cloud provider.

Storage: Cloud providers offer various types of storage options to their


customers, including object storage, block storage, and file storage. These
storage options can be accessed over the internet and can be used to store
data and applications.
Networking: Cloud providers offer networking capabilities that enable
customers to connect to their cloud resources over the internet. This includes
virtual private networks (VPNs), load balancers, and firewalls.

Virtualization: Cloud providers use virtualization technology to create


virtual machines (VMs) that can be used by customers to run their
applications. Virtualization enables customers to use computing resources
more efficiently, by allowing multiple VMs to run on a single physical
server.

Management Tools: Cloud providers offer management tools that enable


customers to manage their cloud resources. These tools include dashboards,
204
Cloud Computing
APIs, and command-line interfaces (CLIs) that allow customers to provision
and manage their cloud resources.

Cloud infrastructure can be deployed in different configurations, including


public cloud, private cloud, and hybrid cloud. Each configuration has its own
advantages and disadvantages, depending on the organization's needs and
requirements. Overall, cloud computing infrastructure provides organizations
with the flexibility and scalability they need to innovate and grow their
business, while also reducing costs and improving efficiency

10.0.5 Pros and Cons of Cloud Computing


Cloud computing refers to the delivery of computing resources, including
software, servers, storage, databases, networking, and more, over the internet.
It has become increasingly popular in recent years due to its many benefits,
including cost savings, scalability, and flexibility. However, there are also
some potential drawbacks to consider when deciding whether to use cloud
computing.

Pros of Cloud Computing


• Cost Savings: Cloud computing eliminates the need for businesses to
invest in expensive hardware and software, which can save significant
costs.
• Scalability: Cloud computing allows businesses to easily scale their
resources up or down depending on their needs, which can help them
quickly adapt to changing market conditions.
• Flexibility: Cloud computing allows employees to access data and
applications from anywhere with an internet connection, making it easier
for them to work remotely.
• Reliability: Cloud providers often offer high levels of uptime and
reliable service, which can help businesses avoid costly downtime.
• Security: Cloud providers often have robust security measures in place
to protect data, which can be especially important for businesses that
deal with sensitive information.
Cons of Cloud Computing
• Dependence on Internet: Cloud computing requires a reliable internet
connection, and any disruptions to this connection can result in
downtime for the business.
• Data Security Concerns: While cloud providers often have strong
security measures in place, businesses may still be concerned about the
security of their data when it is stored on third-party servers.
• Limited Control: Businesses may have limited control over the
hardware and software used by the cloud provider, which can make it
difficult to customize the environment to their needs.
• Compliance Issues: Certain industries may be subject to strict
compliance regulations, and using cloud computing may not be
compliant with these regulations.
205
Emerging • Cost over Time: While cloud computing can save money in the short
Technologies for
Business term, over time, costs can add up as businesses pay for ongoing
subscriptions and usage fees.
Overall, cloud computing can offer many benefits to businesses, but it is
important to carefully consider the potential drawbacks and evaluate whether
it is the right solution for your organization.

10.1 CLOUD COMPUTING ARCHITECTURE


Cloud computing architecture typically consists of three layers: the
infrastructure layer, the platform layer, and the application layer. These layers
are often referred to as Infrastructure as a Service (IaaS), Platform as a
Service (PaaS), and Software as a Service (SaaS), respectively. In addition to
these layers, there are also various cloud computing models that define how
cloud services are delivered to users.
 Infrastructure as a Service (IaaS): At the IaaS layer, cloud providers
offer virtualized computing resources such as servers, storage, and
networking infrastructure. These resources can be accessed and managed
by users through a web-based interface or API. Examples of IaaS
providers include Amazon Web Services (AWS), Microsoft Azure, and
Google Cloud Platform.
 Platform as a Service (PaaS) - The PaaS layer provides a platform for
developers to build, deploy, and manage applications in the cloud
without having to worry about the underlying infrastructure. PaaS
providers offer pre-configured development frameworks, middleware,
and tools that enable developers to quickly create and deploy
applications. Examples of PaaS providers include Heroku, Google App
Engine, and IBM Bluemix.
 Software as a Service (SaaS) - At the SaaS layer, cloud providers offer
fully functional software applications that can be accessed and used by
end-users over the internet. SaaS applications are often subscription-
based and offer a range of features and functionalities. Examples of SaaS
applications include Microsoft Office 365, Salesforce, and Dropbox.
 Anything as a service (XaaS) – "Anything as a Service" (XaaS) is a
term used to describe the growing trend of delivering services over the
Internet or through cloud computing. The concept of XaaS encompasses
a wide range of services, from traditional software as a service (SaaS) to
more specialized services such as platform as a service (PaaS),
infrastructure as a service (IaaS), security as a service (SECaaS),
database as a service (DBaaS), and many others. The XaaS model allows
companies to leverage the benefits of cloud computing, such as
scalability, flexibility, and cost-effectiveness, without having to manage
and maintain their own IT infrastructure. This approach also allows
service providers to offer more tailored and customizable solutions to
their customers, as well as providing new revenue streams and business
opportunities. Overall, XaaS is a trend that is expected to continue to
grow as more and more companies move towards cloud-based solutions
and services.
206
Cloud Computing
10.2 CLOUD DEPLOYMENT MODELS
There are four primary cloud deployment models:

• Public Cloud: A public cloud is a cloud computing environment that is


owned and operated by a third-party cloud service provider. The provider
makes computing resources, such as virtual machines, applications, and
storage, available to the general public over the internet.
• Private Cloud: A private cloud is a cloud computing environment that is
used exclusively by a single organization. It may be managed by the
organization itself or by a third-party service provider, but it is dedicated
to that organization's use.
• Hybrid Cloud: A hybrid cloud is a combination of public and private
cloud infrastructure. It allows an organization to use both public and
private cloud services, depending on their specific needs. For example,
an organization may use a public cloud for certain tasks, such as email,
while using a private cloud for more sensitive data and applications.
• Community Cloud: It is a type of cloud deployment model that is
designed for the use of a specific community or group of organizations
that have shared interests or requirements. In a community cloud, the
infrastructure, services, and resources are shared by multiple
organizations that belong to the same community. The community cloud
is managed either by a single organization or by a third-party provider,
and it can be hosted either on-premises or off-premises. The community
members typically share the cost of the infrastructure, and they have
more control over the configuration of the cloud environment than they
would have in a public cloud. Community clouds are often used by
organizations in industries with specific regulatory requirements or by
geographically dispersed organizations that need to collaborate on
projects. For example, healthcare organizations may use a community
cloud to share patient information securely, or universities may use a
community cloud to collaborate on research projects. The benefits of a
community cloud include increased security, cost savings, and improved
collaboration and communication among the community members.
However, community clouds also require careful planning, coordination,
and governance to ensure that the needs of all members are met and that
the shared resources are used effectively.

Each deployment model has its own set of benefits and challenges, and the
right model for an organization will depend on its specific needs and
requirements.

10.3 SERVICE MANAGEMENT IN CLOUD


COMPUTING
Service management in cloud computing refers to the set of practices and
tools used to manage and monitor the delivery of cloud-based services to
end-users. It involves a range of activities such as monitoring service
availability and performance, ensuring security and compliance, managing
207
Emerging user access, and handling incidents and service requests. Cloud service
Technologies for
Business management can be divided into several key areas:

• Service Level Management: This involves setting and monitoring


service level agreements (SLAs) to ensure that services are delivered
according to the agreed-upon performance and availability standards.
• Incident Management: This involves identifying, tracking, and
resolving incidents that occur during the delivery of cloud-based
services.
• Change Management: This involves managing changes to the cloud
infrastructure, applications, and services to ensure that they are made in a
controlled and coordinated manner.
• Configuration Management: This involves maintaining a record of the
configuration items that make up the cloud infrastructure and services,
and managing changes to those items.
• Security Management: This involves ensuring that the cloud
infrastructure and services are secure and compliant with relevant
regulations and standards.
• Capacity Management: This involves ensuring that there is sufficient
capacity in the cloud infrastructure to meet the demand for services.
Effective cloud service management requires the use of specialized tools and
technologies that enable organizations to monitor and manage cloud services
and infrastructure in real-time. Some popular cloud service management tools
include AWS CloudWatch, Microsoft Azure Monitor, and Google Cloud
Monitoring.

10.4 DATA MANAGEMENT IN CLOUD


COMPUTING
Data management in cloud computing refers to the process of storing,
processing, and managing data in a cloud environment. It involves a range of
activities, including data migration, backup and recovery, data security, and
data governance. Here are some key aspects of data management in cloud
computing:

• Data Migration: This involves moving data from on-premises storage to


a cloud environment. This can be a complex process that requires careful
planning and coordination to ensure that data is migrated efficiently and
without disruption to the business.

• Data Backup and Recovery: Cloud-based data backup and recovery


services are typically used to ensure that data is protected against
accidental or malicious data loss. These services provide automated
backups and point-in-time recovery options to help organizations quickly
restore their data in case of an outage or disaster.

• Data Security: Cloud providers typically offer a range of security


services to help protect data in the cloud. These include encryption,
208 identity and access management, and threat detection and response.
Cloud Computing
• Data Governance: This involves managing the policies and procedures
that govern the use of data in the cloud environment. This includes
managing access controls, defining data retention policies, and ensuring
compliance with regulatory requirements.
• Data Analytics: Cloud computing provides a powerful platform for
processing and analysing large amounts of data. Cloud-based data
analytics services can help organizations gain insights into their data,
identify trends, and make data-driven decisions.
Effective data management in the cloud requires a combination of
technology, process, and people. Cloud providers offer a range of tools and
services to help organizations manage their data in the cloud, and
organizations need to develop policies and procedures that align with their
business objectives and regulatory requirements.

10.5 RESOURCE MANAGEMENT AND


SECURITY IN CLOUD
Resource management and security are two critical aspects of cloud
computing that ensure that cloud resources are utilized efficiently and
securely. Effective resource management helps ensure that cloud-based
applications and services are highly available and perform well, while
security measures help protect sensitive data and prevent unauthorized
access. Here are some key aspects of resource management and security in
cloud computing:

• Resource Management: This involves managing the allocation of


resources, such as virtual machines, storage, and network bandwidth, to
ensure that applications and services have the resources they need to
operate effectively. Cloud providers offer a range of tools and services to
help organizations manage their resources efficiently, including auto-
scaling, load balancing, and monitoring tools.
• Security: Cloud providers offer a range of security services to help
protect cloud resources from unauthorized access, data loss, and other
security threats. These services include identity and access management,
network security, data encryption, and threat detection and response.
• Compliance: Cloud providers also help organizations meet regulatory
and compliance requirements by providing compliance certifications,
auditing and reporting tools, and other compliance-related services.
• Disaster Recovery: Disaster recovery is an essential component of cloud
security that involves preparing for and recovering from potential
disasters such as natural disasters or cyber-attacks. Cloud providers offer
a range of disaster recovery services, including data backup and
recovery, failover, and replication.
• Monitoring and Reporting: Cloud providers offer tools and services to
help organizations monitor and report on their cloud resources'
performance and security. This includes monitoring for security threats,
network traffic, and resource utilization, as well as providing reports on
compliance and security posture. 209
Emerging Effective resource management and security in the cloud require a
Technologies for
Business combination of technology, process, and people. Cloud providers offer a
range of tools and services to help organizations manage their resources and
security effectively, and organizations need to develop policies and
procedures that align with their business objectives and regulatory
requirements.

10.5.1 Inter Cloud Resource Management


Inter-cloud resource management refers to the process of managing resources
across multiple cloud environments, such as public, private, and hybrid
clouds. It involves managing resources such as compute, storage, and
network across different cloud providers and ensuring that they are utilized
effectively. Here are some key aspects of inter-cloud resource management:
• Resource Allocation: Inter-cloud resource management involves
allocating resources to applications and services across multiple cloud
environments. This can be challenging because different cloud providers
may have different resource allocation models and pricing structures.
• Load Balancing: Load balancing is a critical aspect of inter-cloud
resource management that involves distributing workloads across
multiple cloud environments to ensure that resources are utilized
effectively. Load balancing helps avoid over-provisioning of resources in
a single cloud environment and ensures that workloads are balanced
across multiple environments.
• Interoperability: Inter-cloud resource management requires
interoperability between different cloud providers to enable seamless
management of resources across multiple clouds. Cloud providers should
provide APIs that allow organizations to manage their resources
programmatically.
• Security: Inter-cloud resource management requires robust security
measures to protect data and resources across multiple cloud
environments. Security measures should include data encryption, identity
and access management, and threat detection and response.
• Cost Management: Inter-cloud resource management requires effective
cost management to ensure that resources are utilized efficiently across
multiple cloud environments. This includes monitoring resource
utilization and implementing cost optimization strategies such as using
reserved instances and spot instances.
Effective inter-cloud resource management requires a combination of
technology, process, and people. Cloud providers offer a range of tools and
services to help organizations manage their resources across multiple clouds
effectively, and organizations need to develop policies and procedures that
align with their business objectives and regulatory requirements.

10.5.2 Resource Provisioning and Resource Provisioning


Methods
Resource provisioning in cloud computing refers to the process of allocating
computing resources, such as virtual machines, storage, and network
210
Cloud Computing
bandwidth, to support cloud-based applications and services. Resource
provisioning can be done using several methods, depending on the needs of
the application or service. Here are some common resource provisioning
methods:

• Manual Provisioning: In this method, administrators manually allocate


resources to applications and services based on their requirements. This
approach is suitable for small-scale deployments or applications with
predictable resource requirements. However, it can be time-consuming
and error-prone.
• Automated Provisioning: This method involves automating the process
of resource allocation using tools such as configuration management and
orchestration tools. Automation enables faster and more reliable
provisioning of resources and allows administrators to respond to
changes in demand quickly.
• On-Demand Provisioning: In this method, resources are allocated on-
demand as needed by the application or service. This approach is suitable
for applications with unpredictable resource requirements or those that
experience sudden spikes in demand.
• Dynamic Provisioning: This method involves dynamically adjusting
resource allocation based on changes in demand. For example, additional
virtual machines may be provisioned during peak demand periods and
de-provisioned during low demand periods.
• Capacity Provisioning: Capacity provisioning involves allocating
resources to applications or services based on expected demand. This
approach involves predicting future demand based on historical data and
allocating resources accordingly.

Effective resource provisioning requires careful planning and management.


Cloud providers offer a range of tools and services to help organizations
manage their resources effectively, including auto-scaling, load balancing,
and monitoring tools. Organizations need to develop policies and procedures
that align with their business objectives and regulatory requirements to
ensure that resources are allocated effectively and efficiently.

10.5.3 Global Exchange of Cloud Resources


The global exchange of cloud resources refers to the ability to share cloud
resources, such as computing power, storage, and network bandwidth, across
different geographic locations and between different cloud providers. This
can help organizations better utilize their resources and access resources that
are not available locally. Here are some key aspects of the global exchange of
cloud resources:

• Interoperability: Interoperability is a critical aspect of global exchange


of cloud resources. Cloud providers should provide standard APIs and
protocols that enable seamless sharing of resources across different cloud
environments.

211
Emerging • Federation: Federation is another approach to the global exchange of
Technologies for
Business cloud resources. In a federated cloud environment, multiple cloud
providers collaborate to offer a unified cloud environment to customers.
This approach enables customers to access resources from different
cloud providers through a single interface.
• Resource Management: Effective resource management is essential for
the global exchange of cloud resources. Cloud providers should offer
tools and services that enable customers to manage their resources
effectively across different cloud environments.
• Security: Security is another critical aspect of the global exchange of
cloud resources. Cloud providers should offer robust security measures
to protect customer data and resources when sharing resources across
different cloud environments.
• Billing and Metering: Billing and metering are essential aspects of the
global exchange of cloud resources. Cloud providers should offer
transparent billing and metering models that enable customers to track
their resource usage and costs across different cloud environments.
Effective global exchange of cloud resources requires collaboration between
different cloud providers and effective management of resources and
security. Cloud providers should offer standard APIs and protocols that
enable seamless sharing of resources across different cloud environments and
provide tools and services to enable effective resource management and
security. Organizations need to develop policies and procedures that align
with their business objectives and regulatory requirements to ensure that
resources are shared effectively and efficiently.

10.5.4 Cloud Security Challenges


Cloud security challenges refer to the potential risks and threats associated
with storing data and running applications in a cloud environment. Some of
the key cloud security challenges are:
• Data Breaches: One of the most significant security challenges in the
cloud is the risk of data breaches. Hackers or malicious insiders can
exploit vulnerabilities in the cloud infrastructure to gain unauthorized
access to sensitive data.
• Insider Threats: Insider threats are a significant concern in the cloud
environment. Malicious insiders, such as employees or contractors, may
misuse their privileges to access and steal sensitive data.
• Cloud Service Provider Security: Cloud service providers are
responsible for the security of the cloud infrastructure, but organizations
still need to ensure that their data is adequately protected. Organizations
need to verify that cloud service providers have appropriate security
measures in place.
• Regulatory Compliance: Organizations need to comply with various
regulations and standards to protect sensitive data, such as HIPAA, PCI
DSS, and GDPR. Cloud service providers must also comply with these
regulations and standards to ensure that customer data is adequately
212 protected.
Cloud Computing
• Data Loss: Data loss can occur due to various reasons, including
accidental deletion, hardware failure, or natural disasters. Organizations
need to have backup and recovery mechanisms in place to ensure that
critical data can be recovered in case of data loss.
• Lack of Control: In a cloud environment, organizations may not have
complete control over their data and applications. This lack of control
can create security risks and make it difficult to manage the security of
the cloud environment.

Effective cloud security requires a combination of technology, process, and


people. Organizations need to have appropriate security measures in place,
such as access controls, encryption, and threat detection and response. They
also need to develop policies and procedures that align with their business
objectives and regulatory requirements to ensure that their data is adequately
protected. Finally, organizations need to ensure that their employees are
adequately trained in cloud security best practices.

10.5.5 Security Governance – Virtual Machine Security –


IAM –Security Standards
Security governance, virtual machine security, IAM (Identity and Access
Management), and security standards are essential components of cloud
security.

• Security Governance: Security governance refers to the framework of


policies, processes, and controls that an organization uses to manage and
protect its IT assets. In the context of cloud security, security governance
is critical for ensuring that cloud services are used in a secure and
compliant manner. Effective security governance involves the
development of policies and procedures, risk management, compliance
management, and regular security assessments.
• Virtual Machine Security: Virtual machine security is essential for
protecting the virtual infrastructure that runs in the cloud. Virtual
machines are vulnerable to a range of threats, including malware,
unauthorized access, and data theft. Organizations need to ensure that
virtual machines are adequately protected with security controls such as
firewalls, encryption, and vulnerability management.
• IAM (Identity and Access Management): IAM is a critical component
of cloud security that provides a framework for managing user identities
and access to cloud resources. IAM includes authentication,
authorization, and access control mechanisms that ensure that only
authorized users can access cloud resources. Organizations need to
ensure that IAM policies are implemented and enforced correctly to
prevent unauthorized access to sensitive data.
• Security Standards: Security standards are a set of guidelines and best
practices that help organizations to ensure that their cloud environment is
secure and compliant. Examples of security standards include ISO
27001, PCI DSS, and NIST. Compliance with these standards helps
organizations to demonstrate that they have implemented appropriate
213
Emerging security measures to protect their data and meet regulatory requirements.
Technologies for
Business
Effective cloud security requires a combination of technology, process, and
people. Organizations need to implement appropriate security controls,
develop policies and procedures, and train their employees to ensure that
cloud services are used securely and in compliance with applicable
regulations and standards. By adopting a proactive approach to cloud
security, organizations can mitigate the risks associated with cloud
computing and protect their sensitive data from unauthorized access and data
breaches.

10.6 CLOUD TECHNOLOGIES AND


ADVANCEMENTS
Cloud technologies and advancements have evolved rapidly in recent years,
and they continue to transform the way businesses operate. Some of the key
advancements in cloud technologies are:

• Server less Computing: Server less computing allows developers to


build and run applications without the need to manage servers. In server
less computing, cloud providers manage the infrastructure and
automatically scale resources based on application demand.
• Artificial Intelligence and Machine Learning: Cloud providers offer
AI and machine learning services that enable organizations to build
intelligent applications that can automate processes, analyse data, and
provide insights.
• Multi-Cloud: Multi-cloud is the practice of using multiple cloud
providers to meet different business needs. Multi-cloud enables
organizations to avoid vendor lock-in, reduce costs, and improve
resilience.
• Edge Computing: Edge computing brings computation and data storage
closer to the end-users, reducing latency and improving performance.
Edge computing is particularly useful for applications that require real-
time processing, such as IoT devices.
• Hybrid Cloud: Hybrid cloud is a combination of private and public
cloud environments. Hybrid cloud enables organizations to leverage the
benefits of public cloud, such as scalability and cost-effectiveness, while
maintaining control over their sensitive data in private cloud
environments.
• Containers: Containers enable developers to build and deploy
applications quickly and reliably. Containers provide a lightweight
alternative to virtual machines and enable applications to run consistently
across different environments.
• Cloud-Native Architecture: Cloud-native architecture is an approach to
building applications that leverage cloud technologies, such as
containers, microservices, and API gateways. Cloud-native architecture
enables organizations to build scalable, resilient, and agile applications
214 that can be deployed in any cloud environment.
Cloud Computing
These advancements in cloud technologies offer significant benefits to
organizations, such as increased agility, scalability, and cost-effectiveness.
As cloud technologies continue to evolve, organizations must keep up with
these advancements to stay competitive and realize the full potential of the
cloud.

10.6.1 Hadoop
Hadoop is an open-source framework that is used for distributed storage and
processing of large data sets. The framework is designed to be scalable, fault-
tolerant, and cost-effective, making it an ideal solution for big data
processing. Hadoop consists of two main components:

Hadoop Distributed File System (HDFS): HDFS is a distributed file system


that provides reliable and scalable storage of large data sets. It is designed to
run on commodity hardware and provides a fault-tolerant mechanism that
ensures data is not lost in the event of a hardware failure.
MapReduce: MapReduce is a programming model that is used for
processing large data sets in a distributed environment. It works by breaking
down large data sets into smaller chunks and processing them in parallel
across a cluster of computers. MapReduce provides a fault-tolerant
mechanism that ensures that processing can continue even if some nodes in
the cluster fail.
Hadoop also includes a range of additional tools and technologies that are
designed to work with HDFS and MapReduce, such as:
Hadoop YARN: YARN is a resource manager that enables multiple data
processing engines to share a common cluster. It provides a framework for
managing resources and scheduling tasks across a cluster of computers.

Hadoop Common: Hadoop Common includes common utilities that are


used by the other Hadoop components. It provides a set of libraries and
utilities that enable developers to build applications that run on Hadoop.

Hadoop Ecosystem: Hadoop ecosystem includes a range of tools and


technologies that are built on top of Hadoop, such as Apache Spark, Apache
Hive, and Apache Pig. These tools provide additional functionality for
processing and analysing large data sets.

Hadoop has become a popular solution for big data processing due to its
scalability, fault-tolerance, and cost-effectiveness. Many organizations,
including large enterprises, use Hadoop to process and analyse large data
sets, such as log files, sensor data, and social media data.

10.6.2 Virtual Box


VirtualBox is a free and open-source virtualization software that allows users
to run multiple operating systems on a single computer. It was developed by
Oracle Corporation and was initially released in 2007. With VirtualBox,
users can create virtual machines (VMs) that simulate an entire computer
system, including its hardware and operating system. This allows users to run
215
Emerging multiple operating systems on a single computer simultaneously, which can
Technologies for
Business be useful for software testing, running legacy applications, or experimenting
with new operating systems without affecting the host system.

VirtualBox supports a wide range of operating systems, including Windows,


Linux, macOS, and Solaris. It also supports a variety of virtual hardware
components, including CPUs, memory, disks, and network interfaces, which
can be configured to match the requirements of the guest operating system.

Overall, VirtualBox is a powerful and flexible tool for virtualization, and its
open-source nature makes it a popular choice among developers and
enthusiasts alike.

10.6.3 Google App Engine


Google App Engine (GAE) is a cloud computing platform developed by
Google that allows developers to build and host web applications on Google's
infrastructure. It was first introduced in 2008 and has since become a popular
choice for developing and deploying web applications. GAE supports a
variety of programming languages, including Java, Python, Go, Node.js,
PHP, and Ruby. It provides a scalable and managed platform for developers,
meaning that Google takes care of the underlying infrastructure, such as
servers, networking, and storage, allowing developers to focus on building
their applications.

GAE provides several key features for building web applications, such as
automatic scaling, load balancing, and a NoSQL datastore. It also offers
integration with other Google Cloud Platform services, such as Google Cloud
Storage and Google Cloud SQL. In addition, GAE offers a flexible pricing
model, with both free and paid tiers depending on usage and resources
required. The free tier allows developers to test and deploy their applications
at no cost, while the paid tier offers additional features and resources for
larger-scale applications.

Overall, Google App Engine is a powerful platform for building and


deploying web applications, offering scalability, reliability, and flexibility for
developers.

10.6.4 Open Stack


OpenStack is a free and open-source software platform for building and
managing cloud computing infrastructure. It was first released in 2010 and is
developed by a community of contributors from around the world, including
major technology companies like IBM, Red Hat, and Intel. OpenStack
provides a set of modular services that can be used to create a private or
public cloud infrastructure. These services include compute (Nova),
networking (Neutron), storage (Cinder), identity (Keystone), and dashboard
(Horizon), among others. The modular nature of OpenStack allows users to
select and configure only the services they need, making it highly
customizable and scalable.
OpenStack can be used to build public, private, or hybrid cloud
216 environments, and is compatible with a variety of hardware and software
Cloud Computing
configurations. It can be deployed on-premises, on dedicated hardware, or on
public cloud platforms like AWS and Google Cloud Platform. One of the key
benefits of OpenStack is its open-source nature, which allows users to freely
access and modify the source code. This enables a community of developers
to collaborate on new features and bug fixes, leading to rapid innovation and
improvements.
Overall, OpenStack is a powerful platform for building and managing cloud
infrastructure, offering flexibility, scalability, and a strong community of
contributors.

10.6.5 Federation in the Cloud – Four Levels of Federation


Federation in the cloud refers to the ability to connect and manage multiple
cloud environments as a single, cohesive unit. This is particularly important
for organizations that use multiple cloud providers or have a hybrid cloud
environment that includes both on-premises and cloud resources. There are
four levels of federation in the cloud:

• Identity Federation: This level of federation allows users to


authenticate and access resources across multiple cloud environments
using a single set of credentials. This can be achieved using standard
identity protocols like SAML or OAuth.
• Resource Federation: This level of federation allows resources, such as
virtual machines or storage, to be shared across multiple cloud
environments. This requires a common management platform that can
orchestrate the deployment and management of resources across multiple
clouds.
• Service Federation: This level of federation allows cloud services to be
accessed and consumed across multiple cloud environments. This
requires a common service catalogue that can identify and provision
services across multiple clouds.
• Policy Federation: This level of federation allows policies and
governance to be applied consistently across multiple cloud
environments. This requires a common policy engine that can enforce
policies across multiple clouds.

Overall, federation in the cloud enables organizations to take advantage of


the benefits of multiple cloud environments while maintaining control and
consistency across their infrastructure. By leveraging the four levels of
federation, organizations can achieve greater flexibility, scalability, and
agility in their cloud operations.

10.7 CASE STUDIES


Here are some examples of case studies in cloud computing:

• Netflix: Netflix, the popular streaming service, is a prime example of a


company that has fully embraced cloud computing. Netflix has built its
entire technology infrastructure on Amazon Web Services (AWS) and
uses the cloud to store, manage, and stream its vast library of video 217
Emerging content. Netflix has leveraged the scalability and reliability of AWS to
Technologies for
Business support its rapid growth and to deliver high-quality streaming content to
millions of users around the world.

• GE: General Electric (GE) is another example of a company that has


successfully adopted cloud computing. GE uses Microsoft Azure to
support its Predix platform, which is a cloud-based platform for
managing and analyzing data from industrial sensors and equipment. GE
has leveraged the scalability and agility of the cloud to enable real-time
monitoring and predictive maintenance of its industrial equipment,
improving efficiency and reducing downtime.

• Capital One: Capital One, the financial services company, has adopted
cloud computing to improve its agility and innovation. Capital One has
used AWS to build a cloud-based platform for developing and deploying
new financial products and services. The platform has enabled Capital
One to quickly experiment with new ideas and bring them to market
faster, while also reducing costs and improving scalability.

• Unilever: Unilever, the consumer goods company, has adopted a hybrid


cloud strategy that combines on-premises infrastructure with cloud-based
services. Unilever uses AWS and Microsoft Azure to support its various
business units, while also maintaining control over its core IT
infrastructure. The hybrid cloud strategy has enabled Unilever to balance
agility and control, while also reducing costs and improving scalability.
• Airbnb: Airbnb, the popular vacation rental platform, uses cloud
computing to support its vast and complex technology infrastructure.
Airbnb has built its technology stack on AWS, using a wide range of
cloud-based services to manage its website, mobile app, and data
analytics. Airbnb has leveraged the scalability and agility of the cloud to
support its rapid growth and to deliver a seamless and user-friendly
experience to its millions of users around the world.

These are just a few examples of how cloud computing has enabled
companies to improve their agility, innovation, scalability, and cost-
effectiveness.

10.8 SUMMARY
In summary, cloud computing is a technology that allows organizations to
store, manage, and access data and applications over the internet, rather than
on local servers or personal computers. Cloud computing offers several
benefits, including scalability, cost-effectiveness, flexibility, and reliability.
Cloud computing can be classified into several models, including public
cloud, private cloud, hybrid cloud, and multi-cloud. Cloud computing has
also enabled the development of new technologies, such as server less
computing, edge computing, and Kubernetes, which offer new ways of
managing and deploying applications in the cloud. Cloud computing has
become increasingly popular in recent years, with many companies, from
small startups to large enterprises, adopting cloud-based solutions to improve
218
Cloud Computing
their operations and competitiveness. Overall, cloud computing is a
transformative technology that is changing the way we work, communicate,
and do business in the digital age.

10.9 SELF – ASSESSMENT EXERCISES


1. What is cloud computing?
2. What are the benefits of cloud computing?
3. What are the service models of cloud computing?
4. What is the difference between public and private clouds?
5. What is a hybrid cloud?
6. What is serverless computing?
7. What is edge computing?
8. What are the security concerns related to cloud computing?
9. What are the emerging trends in cloud computing?
10. What are the advantages of using a multi-cloud strategy?

10.10 KEYWORDS
1. Cloud computing: A technology that allows users to store, manage, and
access data and applications over the internet, rather than on local servers
or personal computers.
2. Public cloud: A cloud computing model in which cloud services are
provided by third-party providers, accessible to anyone on the internet.
3. Private cloud: A cloud computing model in which cloud services are
provided exclusively for a single organization, either on-premises or by a
third-party provider.
4. Hybrid cloud: A cloud computing model that combines public and
private cloud services, allowing organizations to take advantage of the
strengths of each model.
5. Multi-cloud: A cloud computing model that involves using multiple
cloud services from different providers to achieve greater flexibility,
resilience, and cost-effectiveness.
6. Infrastructure as a Service (IaaS): A cloud computing service model
in which providers offer virtualized computing resources, such as
servers, storage, and networking, over the internet.
7. Platform as a Service (PaaS): A cloud computing service model in
which providers offer a platform for developing, deploying, and
managing applications over the internet.
8. Software as a Service (SaaS): A cloud computing service model in
which providers offer software applications over the internet, accessible
through web browsers or APIs.
9. Serverless computing: A cloud computing model in which cloud
providers manage the infrastructure and automatically allocate resources
219
Emerging based on the demands of the application, allowing developers to focus on
Technologies for
Business writing and deploying code.
10. Edge computing: A paradigm shift in cloud computing that enables data
processing to be done closer to the source of data, instead of sending all
data to the cloud.

10.11 FURTHER READINGS


1. Armbrust, M., & Stoica, I. (2017). Serverless computing: Economic and
architectural impact. Proceedings of the VLDB Endowment, 10(12),
1734-1737.
2. Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R., Konwinski,
A., ... & Zaharia, M. (2010). A view of cloud computing.
Communications of the ACM, 53(4), 50-58.
3. Buyya, R., Yeo, C. S., Venugopal, S., Broberg, J., & Brandic, I. (2009).
Cloud computing and emerging IT platforms: Vision, hype, and reality
for delivering computing as the 5th utility. Future Generation Computer
Systems, 25(6), 599-616.
4. Cloud Computing (Principles and Paradigms), Edited by Rajkumar
Buyya, James Broberg, Andrzej Goscinski, John Wiley & Sons, Inc.
2011.
5. Cloud computing a practical approach - Anthony T.Velte, Toby J. Velte
Robert Elsenpeter, TATA McGraw- Hill, New Delhi – 2010.
6. Cloud computing for dummies- Judith Hurwitz, Robin Bloor, Marcia
Kaufman, Fern Halper, Wiley Publishing, Inc, 2010.
7. Cloud Computing: Web-Based Applications That Change the Way You
Work and Collaborate Online - Michael Miller - Que 2008.
8. Dan C. Marinescu, Cloud Computing Theory and Practice, Morgan
Kaufmann, Elsevier 2013.
9. Hausman, K. K., Cook, S. L., & Sampaio, T. (2013). Cloud Essentials:
CompTIA Authorized Courseware for Exam CLO-001. John Wiley &
Sons.
10. Hurwitz, J. S., & Kirsch, D. (2020). Cloud computing for dummies. John
Wiley & Sons.
11. Jansen, W. A., & Grance, T. (2011). Guidelines on security and privacy
in public cloud computing. National Institute of Standards and
Technology, 57(7), 1-84.
12. Kai Hwang, Geoffrey C. Fox and Jack J. Dongarra, “Distributed and
cloud computing from Parallel Processing to the Internet of Things”,
Morgan Kaufmann, Elsevier, 2012.
13. Kshetri, N. (2014). Privacy and security issues in cloud computing: The
role of institutions and institutional evolution. Telecommunications
Policy, 38(9), 634-643.
14. Marston, S., Li, Z., Bandyopadhyay, S., Zhang, J., & Ghalsasi, A.
(2011). Cloud computing—The business perspective. Decision Support
220 Systems, 51(1), 176-189.
Cloud Computing
15. Mell, P. (2018). Edge computing. National Institute of Standards and
Technology. Special Publication, 500-325.
16. Mell, P., & Grance, T. (2011). The NIST definition of cloud computing.
National Institute of Standards and Technology, 53(6), 50.
17. Rajkumar Buyya, Christian Vecchiola, and Thamarai Selvi Mastering
Cloud. Computing McGraw Hill Education.
18. Rittinghouse, J. W., & Ransome, J. F. (2016). Cloud computing:
Implementation, management, and security. CRC Press.
19. Srinivasan, A. (2014). Cloud Computing: A practical approach for
learning and implementation. Pearson Education India.
20. Thomas, E., Zaigham, M., & Ricardo, P. (2013). Cloud Computing
Concepts, Technology & Architecture.
21. Tim Mather, Subra Kumaraswamy, and Shahed Latif, “Cloud Security
and Privacy: An Enterprise Perspective on Risks and Compliance”,
O’Reilly, 2009.
22. Vaquero, L. M., Rodero-Merino, L., & Caceres, J. (2008). A break in the
clouds: towards a cloud definition. ACM SIGCOMM Computer
Communication Review, 39(1), 50-55.

221
Emerging
Technologies for UNIT 11 BIG DATA
Business

Objectives
After studying this unit, you will be able to:
• Understanding the concept of Big Data, its characteristics, and the
challenges associated with it.
• Familiarizing with the Hadoop ecosystem and its components.
• Understanding the basics of MapReduce.
• Learning utility of Pig, a high-level platform for creating MapReduce
programs, to process and analyse data.
• Understanding the basics of machine learning algorithms for Big Data
analytics.

Structure
11.0 Introduction to Big Data
11.0.1 Data Storage and Analysis
11.0.2 Characteristics of Big Data
11.0.3 Big Data Classification
11.0.4 Big Data Handling Techniques
11.0.5 Types of Big Data Analytics
11.0.6 Typical Analytical Architecture
11.0.6 Challenges in Big Data Analytics
11.0.7 Case studies: Big Data in Marketing and Sales, Healthcare, Medicine, and
Advertising
11.1 Hadoop Framework & Ecosystem
11.1.1 Requirement of Hadoop Framework
11.1.2 Map Reduce Framework
11.1.3 Hadoop Yarn and Hadoop Execution Model
11.1.4 Introduction to Hadoop Ecosystem Technologies
11.1.5 Databases: HBase, Hive
11.1.6 Scripting language: Pig, Streaming: Flink, Storm
11.2 Spark Framework
11.3 Machine Learning Algorithms for Big Data Analytics
11.4 Recent Trends in Big Data Analytics
11.5 Summary
11.6 Self–Assessment Exercises
11.7 Keywords
11.8 Further Readings

222
Big Data
11.0 INTRODUCTION TO BIG DATA
Big data refers to the vast amount of structured and unstructured data that is
generated and collected by individuals, organizations, and machines every
day. Data is too large and complex to be processed by traditional data
processing applications, which often have limitations in terms of their
capacity to store, process, and analyse large datasets.

To process and analyse big data, specialized technologies and tools such as
Hadoop, Spark, and NoSQL databases have been developed. These tools
allow organizations to store, process, and analyse large volumes of data
quickly and efficiently. The insights derived from big data can be used to
make informed decisions, identify trends and patterns, improve customer
experiences, and enhance operational efficiency.

11.0.1 Data Storage and Analysis


Data storage and analysis are two critical components of big data processing.
As mentioned earlier, big data is too large and complex to be processed by
traditional data processing applications, and therefore specialized
technologies and tools have been developed to store and analyse big data.
One popular technology for storing big data is Hadoop, which is an open-
source software framework that allows for distributed storage and processing
of large datasets across clusters of computers. Hadoop is designed to handle
both structured and unstructured data, and it uses a distributed file system
called Hadoop Distributed File System (HDFS) to store data across multiple
machines.
Once data is stored, it can be analysed using specialized tools such as Spark,
which is an open-source data processing engine that allows for fast and
efficient processing of large datasets. Spark uses an in-memory processing
model to speed up computations, and it can handle both batch and streaming
data processing. Another popular tool for big data analysis is NoSQL
databases, which are non-relational databases that can handle large volumes
of structured and unstructured data with high scalability and availability.

In addition to these specialized technologies and tools, data scientists and


analysts can use programming languages such as Python and R to perform
data analysis and create visualizations to help interpret and communicate the
results of the analysis. Machine learning algorithms and artificial intelligence
techniques can also be used to derive insights from big data and make
predictions and recommendations.

11.0.2 Characteristics of Big Data


Big data can be characterized by several distinct features as presented in
figure 1, often referred to as "4 Vs", which are volume, velocity, variety, and
veracity.

• Volume: Refers to the sheer amount of data generated by businesses,


individuals, and machines every day. With the increase in the use of IoT
223
Emerging devices and social media, data volumes continue to grow at an
Technologies for
Business unprecedented rate.
• Velocity: Refers to the speed at which data is generated, processed, and
analysed. Big data requires rapid processing of data, including both
streaming and batch processing.
• Variety: Refers to the diverse types of data generated by businesses,
individuals, and machines, including structured, semi-structured, and
unstructured data. Big data requires the ability to manage and analyse
various data types.
• Veracity: Refers to the accuracy and reliability of the data, which may
vary due to data quality issues, noise, and other factors. Big data requires
careful consideration of data quality and data cleansing techniques to
ensure accurate analysis.

Figure 1: Characteristics of Big Data

In addition to these 3 Vs/ 4Vs, big data can also be characterized by several
other features, including:

• Variability: Refers to the inconsistency in the data, which may result


from changes in the data sources, data formats, and data quality.
• Complexity: Refers to the difficulty in understanding and processing
large and diverse datasets, including handling unstructured data, dealing
with data integration challenges, and identifying patterns and trends.
• Accessibility: Refers to the ability to access and share data across
different platforms and systems while ensuring data privacy, security,
and compliance with regulatory requirements.
Understanding the characteristics of big data is essential for businesses and
organizations to effectively collect, manage, and analyse data to gain insights
and make informed decisions.

224
Big Data
11.0.3 Big Data Classification
Big data can be classified based on several different criteria, such as the
source, the structure, the application, and the analytics approach. Here are
some common classifications of big data:

• Structured, semi-structured, and unstructured: This classification is


based on the structure of the data. Structured data is well-organized and
easy to process, like data in a database. Semi-structured data, such as
XML or JSON, has a defined structure but may also contain unstructured
data. Unstructured data, such as social media posts or images, does not
have a specific structure and is difficult to process.
• Internal and external: This classification is based on the source of the
data. Internal data is generated within an organization, such as sales data
or customer data. External data comes from sources outside the
organization, such as social media data, weather data, or financial data.
• Batch and real-time: This classification is based on the velocity of the
data. Batch data processing involves analysing data in large batches,
often overnight or at set intervals. Real-time data processing involves
analysing data as it is generated, like processing stock market data in
real-time.
• Descriptive, diagnostic, predictive, and prescriptive: This
classification is based on the analytics approach. Descriptive analytics
involves summarizing historical data to understand what has happened.
Diagnostic analytics involves identifying the causes of a particular event
or pattern. Predictive analytics involves forecasting future events or
patterns based on historical data. Prescriptive analytics involves
recommending actions based on insights from predictive analytics.

By understanding the different classifications of big data, businesses and


organizations can better plan and implement their big data strategies to
extract insights and drive value from their data.

11.0.4 Big Data Handling Techniques


There are several techniques and tools available to handle big data
effectively. Here are some of the most common big data handling techniques:

• Distributed File Systems: Distributed file systems such as Hadoop


Distributed File System (HDFS) and Apache Cassandra enable
distributed storage and processing of big data across a cluster of
computers.
• In-memory Data Processing: In-memory data processing systems such
as Apache Spark and Apache Flink allow for faster processing of big
data by storing the data in memory rather than on disk.
• NoSQL Databases: NoSQL databases such as MongoDB and Cassandra
are designed to handle unstructured and semi-structured data and provide
high scalability and availability.

225
Emerging • MapReduce: MapReduce is a programming model that is used to
Technologies for
Business process large datasets in parallel across a cluster of computers.
• Data Compression: Data compression techniques such as gzip and
bzip2 can be used to reduce the size of data, making it easier to transfer
and store.
• Data Partitioning: Data partitioning involves dividing a large dataset
into smaller subsets to enable distributed processing.
• Cloud Computing: Cloud computing platforms such as Amazon Web
Services (AWS) and Microsoft Azure provide scalable and cost-effective
solutions for storing and processing big data.
• Machine learning: Machine learning techniques can be used to analyse
big data and identify patterns and insights that can help organizations
make informed decisions.

By using these techniques, businesses and organizations can handle big data
more effectively, extract insights, and derive value from their data.

11.0.5 Types of Big Data Analytics


Big data analytics is the process of examining large and complex datasets to
uncover hidden patterns, unknown correlations, and other useful information
that can help organizations make informed decisions. Big data analytics
involves the use of advanced technologies and tools to process, store, and
analyse large volumes of structured, semi-structured, and unstructured data.
There are several types of big data analytics techniques as presented in table
1 below:

Table 1: Types of Big Data Analytics

Types of Answers Description Level of


Analytics the Advancement
Question
Descriptive What is Uses data aggregation & Low
happening? mining techniques to provide
insight into the past
Diagnostic Why is it Discovers root-cause of the Medium
happening? problem. It has the ability to
isolate all confounding
information
Predictive What’s Historical patterns are used to High
likely to predict specific outcomes
happen?
Prescriptive What do I Applies advanced analytical Very High
need to do? techniques (optimization &
simulation algorithms) to
advice on possible outcomes
& make specific
recommendations

226
Big Data
• Descriptive Analytics: This technique involves summarizing historical
data to understand what has happened in the past.
• Diagnostic Analytics: This technique involves analysing data to
determine the causes of a particular event or pattern.
• Predictive Analytics: This technique involves using statistical models
and machine learning algorithms to forecast future events or patterns
based on historical data.
• Prescriptive Analytics: This technique involves recommending actions
based on insights from predictive analytics.

To effectively implement big data analytics, organizations need to have a


clear understanding of their data sources, objectives, and analytical tools.
They also need to have the necessary infrastructure and skilled personnel to
manage and analyse large and complex datasets.

11.0.6 Typical Analytical Architecture


A typical analytical architecture for big data includes several layers, each of
which serves a specific purpose in the data analytics process. Here are the
typical layers of a big data analytical architecture:

• Data Sources: This layer includes all the sources of data, both internal
and external, that an organization collects and stores. These may include
data from customer transactions, social media, web logs, sensors, and
other sources.
• Data Ingestion and Storage: This layer is responsible for ingesting data
from various sources, processing it, and storing it in a format that can be
easily accessed and analysed. This layer may include technologies such
as Hadoop Distributed File System (HDFS) and NoSQL databases.
• Data Processing and Preparation: This layer is responsible for
cleaning, transforming, and preparing data for analysis. This may include
tasks such as data integration, data cleaning, data normalization, and data
aggregation.

• Analytics Engines: This layer includes the technologies and tools used
for analysing and processing data. This may include machine learning
algorithms, statistical analysis tools, and visualization tools.

• Data Presentation and Visualization: This layer includes the tools used
to present data in a meaningful way, such as dashboards, reports, and
visualizations. This layer is critical for making data accessible and
understandable to non-technical stakeholders.

• Data Governance and Security: This layer is responsible for ensuring


that data is managed in a secure and compliant manner. This may include
access controls, data quality monitoring, and data privacy regulations.

By implementing a big data analytical architecture, organizations can


streamline the data analytics process, extract valuable insights, and make
informed decisions that drive business growth and success.
227
Emerging 11.0.7 Challenges in Big Data Analytics
Technologies for
Business
Big data analytics is a complex and challenging process, and there are several
challenges that organizations face when trying to extract insights and value
from their data. Here are some of the key challenges in big data analytics:

• Data Complexity and Variety: Big data comes in many different forms,
including structured, semi-structured, and unstructured data, which can
be challenging to process and analyse.
• Data Quality: Big data is often incomplete, inconsistent, or inaccurate,
which can lead to erroneous insights and conclusions.
• Data Security and Privacy: Big data often contains sensitive and
confidential information, which must be protected from unauthorized
access and breaches.
• Scalability: As data volumes grow, the analytical architecture must be
able to scale to handle the increased load, which can be challenging and
costly.
• Talent Shortage: There is a shortage of skilled data scientists and
analysts who are able to process and analyse big data effectively.
• Integration: Big data analytics requires integration with multiple
systems and technologies, which can be challenging to implement and
maintain.
• Data Governance: Big data requires careful management and
governance to ensure compliance with regulations and policies.
• Interpreting Results: Big data analytics often produces large and
complex datasets, which can be challenging to interpret and translate into
actionable insights.

Addressing these challenges requires a combination of technology, processes,


and people. Organizations need to invest in robust analytical architectures,
data quality processes, security and privacy protocols, and talent development
to unlock the full potential of big data analytics.

11.0.8 Case Studies: Big Data in Marketing and Sales,


Healthcare, Medicine, and Advertising
Presented here are some examples of how big data is being used in marketing
and sales, healthcare, medicine, and advertising:

• Marketing and Sales: Big data is being used in marketing and sales to
understand customer behaviour and preferences, personalize marketing
messages, and optimize pricing and promotions. For example, Amazon
uses big data to personalize recommendations for individual customers
based on their browsing and purchase history. Walmart uses big data to
optimize pricing and inventory management in its stores. Coca-Cola uses
big data to optimize its vending machine placement, prices, and
promotions based on local weather conditions, events, and consumer
behaviour.
228
Big Data
• Healthcare: Big data is being used in healthcare to improve patient
outcomes, reduce costs, and enable personalized medicine. For example,
IBM's Watson Health is using big data to develop personalized cancer
treatments based on a patient's genetic profile and medical history.
Hospitals and healthcare providers are using big data to predict patient
readmission rates, identify patients at risk of developing chronic
conditions, and optimize resource allocation.

• Medicine: Big data is being used in medicine to accelerate drug


discovery and development, identify new treatments and therapies, and
improve clinical trial design. For example, Pfizer is using big data
analytics to mine existing drug data and identify new therapeutic targets.
Novartis is using big data to improve the efficiency of its clinical trials
and accelerate the development of new drugs.

• Advertising: Big data is being used in advertising to target and


personalize ads based on individual consumer preferences and behaviour.
For example, Google and Facebook use big data to target ads to specific
audiences based on their browsing and search history. Netflix uses big
data to recommend movies and TV shows to individual users based on
their viewing history and preferences.
These are just a few examples of how big data is being used in various
industries and sectors. Big data has the potential to transform the way
organizations operate and make decisions, leading to improved efficiency,
productivity, and innovation.

11.1 HADOOP FRAMEWORK & ECOSYSTEM


Hadoop is an open-source framework that is used for storing and processing
large volumes of data across distributed systems. It was originally developed
by Doug Cutting and Mike Cafarella in 2006 and is now maintained by the
Apache Software Foundation. The Hadoop ecosystem consists of several
components, including:
• Hadoop Distributed File System (HDFS): HDFS is a distributed file
system that stores data across multiple nodes in a cluster. It is designed
to handle large files and is fault-tolerant, ensuring that data is always
available even in the event of a hardware or software failure.

• MapReduce: MapReduce is a programming model and software


framework for processing large data sets across distributed systems. It is
used to parallelize data processing tasks and distribute them across
multiple nodes in a Hadoop cluster.

• YARN: YARN (Yet Another Resource Negotiator) is a resource


management system that is used to manage resources in a Hadoop
cluster. It enables the sharing of resources across multiple applications,
making it easier to run multiple applications simultaneously on a Hadoop
cluster.

• HBase: HBase is a NoSQL database that is used to store large volumes


229
Emerging of structured data. It is built on top of Hadoop and provides real-time
Technologies for
Business access to data stored in HDFS.

• Pig: Pig is a high-level programming language that is used to process


large datasets in Hadoop. It is designed to simplify the programming of
MapReduce jobs and provides a user-friendly interface for data
processing tasks.

• Hive: Hive is a data warehouse system that is built on top of Hadoop. It


provides a SQL-like interface for querying large datasets stored in
HDFS.

• Spark: Spark is a fast and powerful data processing engine that is


designed to handle both batch and real-time processing. It is built on top
of Hadoop and provides a unified platform for data processing, machine
learning, and graph processing.

As depicted in figure 2, Hadoop ecosystem provides a comprehensive and


powerful platform for storing and processing large volumes of data across
distributed systems. It is widely used in industries such as finance, healthcare,
retail, and telecommunications for data processing, analysis, and machine
learning.

Figure 2: Overview of architecture of data warehouse Hadoop ecosystem components


Source: https://www.researchgate.net/figure/Overview-of-architecture-of-data-warehouse-
Hadoop-ecosystem-components_fig3_346482337

11.1.1 Requirement of Hadoop Framework


The Hadoop framework was developed to address the challenges of
processing and analysing large volumes of data that traditional data
processing systems were not designed to handle. Here are some key
requirements that led to the development of Hadoop:

• Scalability: Traditional data processing systems were not designed to


handle the massive volumes of data being generated today. Hadoop was
designed to be scalable, allowing organizations to store and process large
volumes of data across distributed systems.
230
Big Data
• Fault-tolerance: As data volumes grow, the probability of hardware or
software failures also increases. Hadoop was designed to be fault-
tolerant, ensuring that data is always available even in the event of a
hardware or software failure.

• Cost-effectiveness: Traditional data processing systems can be


expensive to scale, both in terms of hardware and software. Hadoop was
designed to be cost-effective, using commodity hardware and open-
source software to reduce costs.
• Flexibility: Traditional data processing systems were designed for
specific types of data and processing tasks. Hadoop was designed to be
flexible, allowing organizations to store and process a wide variety of
data types and perform a range of processing tasks.

• Processing Speed: Traditional data processing systems were not


designed to handle real-time data processing and analysis. Hadoop was
designed to be fast, using parallel processing and distributed computing
to speed up data processing and analysis.
Overall, the Hadoop framework was developed to meet the growing demand
for processing and analysing large volumes of data in a cost-effective,
flexible, and scalable manner. Its architecture and ecosystem have enabled
organizations to develop new applications and use cases for big data
processing and analysis.

11.1.2 Map Reduce Framework


MapReduce is a programming model and software framework that is used to
process and analyse large datasets in parallel across a distributed computing
cluster. The MapReduce framework consists of two main components: Map
and Reduce.
1. Map: The Map component is responsible for processing the input data
and producing a set of key-value pairs as output. Each Map task
processes a portion of the input data and generates intermediate key-
value pairs, which are then passed on to the Reduce tasks.

2. Reduce: The Reduce component takes the intermediate key-value pairs


generated by the Map tasks and combines them to produce a final set of
key-value pairs as output. Each Reduce task processes a subset of the
intermediate data generated by the Map tasks, which are grouped by key.
Key features of the MapReduce framework include scalability, fault
tolerance, data locality, and ease of use. Some popular implementations of
the MapReduce framework include Apache Hadoop, Apache Spark, and
Apache Flink. These frameworks provide a range of tools and libraries that
can be used to build complex data processing workflows using the
MapReduce programming model.

231
Emerging 11.1.3 Hadoop Yarn and Hadoop Execution Model
Technologies for
Business
Hadoop YARN (Yet Another Resource Negotiator) is a resource
management layer that sits between the Hadoop Distributed File System
(HDFS) and the processing engines, such as MapReduce, Spark, and Tez. It
provides a central platform for managing cluster resources, allocating
resources to different applications, and scheduling jobs across a cluster.

The Hadoop execution model involves the following components:

• Client: The client submits a job to the YARN Resource Manager (RM),
which schedules it across the cluster.

• Resource Manager: The Resource Manager is responsible for managing


the resources in the cluster and scheduling jobs. It allocates resources to
each job based on the application requirements and the available
resources in the cluster.

• Node Manager: The Node Manager runs on each node in the cluster and
is responsible for managing the resources on that node, such as CPU,
memory, and disk space. It reports the available resources back to the
Resource Manager, which uses this information to allocate resources to
different applications.

• Application Master: The Application Master is responsible for


managing the lifecycle of a specific application, such as MapReduce or
Spark. It negotiates with the Resource Manager for resources and
monitors the progress of the application.

• Container: A container is a virtualized environment in which an


application runs. It provides an isolated environment for the application
to run in, with its own allocated resources, such as CPU and memory.

The Hadoop execution model is designed to be highly scalable and fault-


tolerant. It allows multiple applications to run concurrently on the same
cluster, with each application running in its own container. If a node fails, the
Resource Manager can redistribute the workload to other nodes in the cluster,
ensuring that the application continues to run without interruption. Largely,
the Hadoop execution model and YARN are essential components of the
Hadoop ecosystem, providing a powerful and flexible platform for processing
and analysing large datasets.

11.1.4 Introduction to Hadoop Ecosystem Technologies


The Hadoop ecosystem is a collection of open-source software tools and
frameworks that are built on top of the Hadoop Distributed File System
(HDFS) and the Hadoop MapReduce programming model. The Hadoop
ecosystem includes a wide range of tools and technologies for data
processing, storage, management, and analysis. Here are some of the most
popular Hadoop ecosystem technologies:

• Apache Spark: Apache Spark is an open-source big data processing


framework that is built on top of Hadoop. It provides a faster and more
232
Big Data
flexible alternative to MapReduce, with support for real-time data
processing, machine learning, and graph processing.
• Apache Hive: Apache Hive is a data warehouse system for querying and
analysing large datasets stored in Hadoop. It provides a SQL-like
interface for querying data and supports a range of data formats,
including structured and semi-structured data.
• Apache Pig: Apache Pig is a high-level data processing language that is
used to simplify the development of MapReduce jobs. It provides a
simple, easy-to-use syntax for writing data processing pipelines.
• Apache HBase: Apache HBase is a NoSQL database that is built on top
of Hadoop. It provides real-time random read and write access to large
datasets and supports low-latency queries.
• Apache Zoo Keeper: Apache ZooKeeper is a centralized service for
maintaining configuration information, naming, providing distributed
synchronization, and group services.
• Apache Oozie: Apache Oozie is a workflow scheduling system that is
used to manage Hadoop jobs. It provides a way to specify dependencies
between jobs and ensures that jobs are executed in the correct order.
• Apache Flume: Apache Flume is a distributed system for collecting,
aggregating, and moving large amounts of log data from various sources
to Hadoop.
• Apache Sqoop: Apache Sqoop is a tool for transferring data between
Hadoop and structured data stores, such as relational databases.

These are just a few examples of the many tools and technologies that are
available in the Hadoop ecosystem. Each of these technologies is designed to
address specific challenges and use cases in big data processing and
analytics. By leveraging the Hadoop ecosystem, organizations can build
powerful, scalable, and cost-effective data processing and analytics solutions.

11.1.5 Databases: HBase, Hive


HBase and Hive are two popular databases in the Hadoop ecosystem that are
used for storing and processing large-scale datasets.

HBase is a NoSQL database that is designed for storing and managing large
volumes of unstructured and semi-structured data in Hadoop. It provides real-
time random read and write access to large datasets, making it ideal for use
cases that require low-latency queries and high-throughput data processing.
HBase is modelled after Google's Bigtable database and is built on top of
Hadoop Distributed File System (HDFS). HBase uses a column-oriented data
model, which allows for efficient storage and retrieval of data, and provides a
powerful API for data manipulation.
Hive, on the other hand, is a data warehouse system for querying and
analysing large datasets stored in Hadoop. It provides a SQL-like interface
for querying data and supports a range of data formats, including structured
and semi-structured data. Hive is modelled after the SQL language, making it
easy for users with SQL experience to work with large-scale datasets in 233
Emerging Hadoop. Hive uses a metadata-driven approach to data management, which
Technologies for
Business allows for easy integration with other tools in the Hadoop ecosystem. Hive
provides a powerful SQL-like language called HiveQL for querying data and
supports advanced features such as user-defined functions, subqueries, and
joins.

Both HBase and Hive are powerful tools in the Hadoop ecosystem, and they
are often used together to provide a complete data management and analysis
solution. HBase is typically used for real-time data processing and low-
latency queries, while Hive is used for complex analytical queries and ad-hoc
data analysis.

11.1.6 Scripting language: Pig, Streaming: Flink, Storm


Pig, Flink, and Storm are popular scripting languages and streaming
frameworks used in the Hadoop ecosystem for big data processing and
analytics.

Pig is a high-level data processing language that is used to simplify the


development of MapReduce jobs. It provides a simple, easy-to-use syntax for
writing data processing pipelines. Pig supports a wide range of data formats
and provides a powerful set of operators for data manipulation. Pig programs
are compiled into MapReduce jobs, which are then executed on a Hadoop
cluster. Pig is often used for ETL (Extract, Transform, Load) tasks and data
preparation.
Flink is a real-time streaming framework that is designed for high-throughput
and low-latency data processing. Flink provides a distributed processing
engine that can process data in real-time as it arrives. It supports a wide range
of data sources and provides a powerful set of operators for data
manipulation. Flink supports a variety of programming languages, including
Java, Scala, and Python. Flink is often used for real-time analytics, machine
learning, and complex event processing.

Storm is another real-time streaming framework that is used for processing


large-scale data streams. Storm provides a distributed processing engine that
can process data in real-time and can scale to handle large volumes of data.
Storm supports a wide range of data sources and provides a powerful set of
operators for data manipulation. Storm is often used for real-time analytics,
machine learning, and event processing.

Both Flink and Storm support stream processing, whereas Pig supports batch
processing. Stream processing is useful in scenarios where data is generated
continuously and needs to be processed in real-time, such as sensor data or
social media feeds. Batch processing is useful in scenarios where large
volumes of data need to be processed in a non-real-time manner, such as ETL
jobs or data warehousing.

11.2 SPARK FRAMEWORK


Spark is a distributed computing framework that is used for big data
processing and analytics. It was developed at the University of California,
234
Big Data
Berkeley and is now maintained by the Apache Software Foundation. Spark
provides an in-memory data processing engine that can handle large-scale
data processing and analytics tasks. It supports a wide range of data sources
and provides a powerful set of operators for data manipulation. Spark is
designed to work with Hadoop Distributed File System (HDFS) and other
data sources, such as Apache Cassandra and Amazon S3.
Spark supports a variety of programming languages, including Java, Scala,
Python, and R. It provides a simple, easy-to-use API for data processing and
analytics tasks, and supports a wide range of applications, including real-time
stream processing, machine learning, and graph processing. Spark includes
several components, including:
• Spark Core: This is the fundamental computing engine of Spark and
provides the distributed task scheduling, memory management, and fault
tolerance features.
• Spark SQL: This is a module for structured data processing that allows
users to query structured data using SQL-like syntax.
• Spark Streaming: This is a module for processing real-time data
streams using Spark.
• Spark MLlib: This is a machine learning library for Spark that provides
a wide range of machine learning algorithms for tasks such as
classification, regression, clustering, and collaborative filtering.
• GraphX: This is a module for processing graph data using Spark.
Spark is known for its high-speed processing and scalability, and it has
become a popular choice for big data processing and analytics tasks. It is
often used in conjunction with Hadoop and other big data technologies to
provide a complete big data processing and analytics solution.

11.3 MACHINE LEARNING ALGORITHMS FOR


BIG DATA ANALYTICS
There are several machine learning algorithms that are commonly used for
big data analytics tasks. Here are some of the most popular ones:

1. Linear Regression: A popular algorithm used to model the relationship


between dependent and independent variables.
2. Logistic Regression: Used to model the probability of a binary outcome.
3. Decision Trees: Used to model decision-making processes and classify
data based on a series of rules.
4. Random Forest: An ensemble learning technique that combines
multiple decision trees to improve accuracy and reduce overfitting.
5. Naive Bayes: A probabilistic algorithm used for classification tasks
based on the Bayes theorem.
6. K-Nearest Neighbour’s (KNN): A non-parametric algorithm used for
classification and regression tasks that is based on the idea of finding the
k closest data points to a given input.
235
Emerging 7. Support Vector Machines (SVM): A popular algorithm used for
Technologies for
Business classification and regression tasks that involves finding the optimal
hyperplane that separates data points in a high-dimensional space.
8. Neural Networks: A family of algorithms used for various tasks, such
as classification, regression, and clustering, that mimic the structure and
function of the human brain.
9. Gradient Boosting: An ensemble learning technique that combines
multiple weak models to create a strong model.
10. Principal Component Analysis (PCA): A dimensionality reduction
technique that reduces the number of features in a dataset by finding the
most important features.

When dealing with big data, it is important to choose algorithms that are
scalable and can handle large amounts of data. Some of these algorithms,
such as KNN and SVM, can be memory-intensive and may not be suitable
for large datasets. In such cases, distributed computing frameworks like
Apache Spark can be used to handle the processing of big data.

11.4 RECENT TRENDS IN BIG DATA ANALYTICS


Big data analytics is an evolving field, and there are several recent trends that
are shaping the future of this domain. Here are some of the key trends in big
data analytics:

1. Real-time Analytics: Real-time data processing and analysis are


becoming increasingly important as businesses seek to make more
informed decisions based on up-to-date information.

2. Edge Computing: Edge computing involves processing data closer to


the source, rather than sending it to a centralized server or cloud. This
trend is gaining traction in industries such as healthcare and
manufacturing, where real-time insights are critical.

3. Cloud-based Analytics: Cloud-based analytics platforms are becoming


increasingly popular, as they offer flexibility, scalability, and cost-
effectiveness. Cloud platforms such as AWS, Azure, and Google Cloud
Platform offer a range of big data tools and services.
4. Artificial Intelligence and Machine Learning: Machine learning and
AI are becoming increasingly important in big data analytics, as they can
help automate data processing and analysis and provide more accurate
insights.

5. Data Privacy and Security: With the increasing amount of data being
collected and analysed, data privacy and security are becoming major
concerns. Businesses must ensure that they are compliant with data
protection regulations and that they are taking steps to protect sensitive
data.
6. Data Democratization: Data democratization involves making data
accessible to all stakeholders in an organization, enabling them to make
236
Big Data
data-driven decisions. This trend is gaining traction as businesses seek to
break down data silos and improve collaboration and communication
across teams.

7. Natural Language Processing (NLP): NLP is a field of AI that


involves analysing and interpreting human language. NLP is becoming
increasingly important in big data analytics, as it can help businesses
extract insights from unstructured data sources such as social media and
customer feedback.
These trends are shaping the future of big data analytics and will continue to
influence the development of new tools and technologies in this field.

11.5 SUMMARY
Big data refers to the large volume of structured and unstructured data that
inundates businesses on a daily basis. Big data analytics is the process of
collecting, processing, and analysing this data to gain insights and make
informed business decisions. The key characteristics of big data are
commonly summarized by the "3Vs": volume, velocity, and variety. To
handle big data, businesses require specialized tools and technologies, such
as the Hadoop ecosystem, which includes HDFS, MapReduce, and YARN, as
well as other technologies like Spark, HBase, and Hive. In addition to
handling the technical challenges of big data, businesses must also address
data privacy and security concerns, and ensure compliance with regulations
such as GDPR and CCPA.

Some of the key trends in big data analytics include real-time analytics, edge
computing, cloud-based analytics, artificial intelligence and machine
learning, data privacy and security, data democratization, and natural
language processing. Commonly, big data analytics has the potential to
provide businesses with valuable insights that can improve their operations,
customer experiences, and bottom lines.

11.6 SELF – ASSESSMENT EXCERCISES


1. What is big data?
2. What are the characteristics of big data?
3. How is big data analysed?
4. What are some applications of big data?
5. What are some challenges associated with big data?
6. What are some recent trends in big data analytics?
7. What are some popular big data platforms and technologies?
8. How is big data used in marketing?
9. What is the future of big data?

11.7 KEYWORDS
A glossary of commonly used terms in big data include: 237
Emerging 1. Big data: Refers to large volumes of structured and unstructured data
Technologies for
Business that inundate businesses on a daily basis.
2. Business intelligence: The use of data analysis tools and technologies to
gain insights into business performance and make informed decisions.
3. Cloud computing: The delivery of computing services, including
storage, processing, and analytics, over the internet.
4. Data cleaning: The process of identifying and correcting errors and
inconsistencies in data.
5. Data governance: The management of data assets, including policies,
procedures, and standards for data quality and security.
6. Data integration: The process of combining data from multiple sources
into a single, unified view.
7. Data lake: A centralized repository for storing large volumes of
structured and unstructured data in its native format.
8. Data mining: The process of extracting useful information from large
volumes of data.
9. Data pipeline: The process of moving data from its source to a
destination for storage, processing, or analysis.
10. Data privacy: The protection of sensitive and personal data from
unauthorized access or disclosure.
11. Data quality: The measure of the accuracy, completeness, and
consistency of data.
12. Data visualization: The process of creating visual representations of
data to aid in understanding and analysis.
13. Data warehousing: The process of collecting and storing data from
multiple sources to create a centralized repository for analysis.
14. Hadoop: A popular open-source big data framework used for storing
and processing large volumes of data.
15. Machine learning: A subset of AI that involves building algorithms and
models that can learn and make predictions based on data.
16. MapReduce: A programming model used to process large volumes of
data in parallel on a distributed system.
17. NoSQL: A non-relational database management system designed for
handling large volumes of unstructured data.
18. Predictive Analytics: The use of statistical models and machine learning
algorithms to make predictions about future events based on historical
data.
19. Spark: An open-source big data processing framework that allows for
fast, in-memory processing of large datasets.
20. Streaming: The process of analysing and processing real-time data as it
is generated.

238
Big Data
11.8 FURTHER READINGS
1. Provost, F., & Fawcett, T. (2013). Data science for business: What you
need to know about data mining and data-analytic thinking. O'Reilly
Media.
2. Zaharia, M., & Chambers, B. (2018). Spark: The definitive guide.
O'Reilly Media.
3. Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive
datasets. Cambridge University Press.
4. Marz, N., & Warren, J. (2015). Big data: Principles and best practices of
scalable real-time data systems. Manning Publications.
5. Apache Hadoop: https://hadoop.apache.org/
6. Apache Spark: https://spark.apache.org/
7. Big Data University: https://bigdatauniversity.com/
8. Hortonworks: https://hortonworks.com/
9. Big Data Analytics News: https://www.bigdataanalyticsnews.com/
10. Data Science Central: https://www.datasciencecentral.com/

239
Emerging
Technologies for UNIT 12 ENTERPRISE RESOURCE
Business
PLANNING

Objectives
After studying this unit, you will be able to:
• Developing a thorough understanding of ERP concepts and principles
• Understanding the components and architecture of ERP systems, as well
as their benefits and limitations.
• Learning how to implement and configure an ERP system.
• Understanding how to manage ERP projects.
• Understanding how to optimize business processes using ERP systems.

Structure
12.1 Introduction to Enterprise Resource Planning (ERP)
12.1.1 What is Enterprise Resource Planning (ERP)?
12.1.2 Evolution of Enterprise Resource Planning
12.1.3 Fundamental Technology of Enterprise Resource Planning
12.1.4 Benefits of Enterprise Resource Planning
12.2 ERP Solutions and Functional Modules
12.2.1 Business Process Reengineering
12.2.2 Supply Chain Management
12.2.3 Online Analytical Processing (OLTP)
12.2.4 Customer Relationship Management (CRM)
12.2.5 Data Warehousing
12.2.6 Data Mining
12.2.7 Management Information System (MIS)
12.2.8 Executive Support System (ESS)
12.2.9 Decision support system (DSS)
12.3 Implementation of ERP
12.3.1 Implementation Methodologies and Approaches
12.3.2 ERP Life-Cycle
12.3.3 SDLC-ERP Implementation Cost and Time
12.3.4 ERP Project Management, Training
12.3.5 ERP Implementation Stakeholder’s Roles and Responsibilities
12.4 Overview of ERP in Some of the Key Functional Areas
12.5 Summary
12.6 Self–Assessment Exercises
12.7 Keywords
12.8 Further Readings

240
Enterprise Resource
12.1 INTRODUCTION TO ENTERPRISE Planning

RESOURCE PLANNING (ERP)


12.1.1 What is Enterprise Resource Planning (ERP)?
The rapid development of information and communication technologies
(ICT), which is powered by microelectronics, computer hardware, and
software systems, has made an impact on all aspects of computing
applications across companies. Increasing volume of cross-functional data
flow is required for decision-making, efficient and timely product part
procurement, inventory management, accounting, human resource
management, and the distribution of goods and services as the corporate
environment becomes more complicated. Effective information systems are
necessary for business management in order to improve logistics and promote
competitiveness through cost savings. The ability to give the necessary
information at the right time results in huge rewards for organisations in a
highly competitive global market with complex business operations,
according to both large and small-to-medium-sized firms (SME). Large,
complex corporate companies are the primary target market for enterprise
resource planning (ERP) systems, new software solutions that initially
entered the market in the late 1980s and early 1990s. These intricate,
expensive, powerful, proprietary systems are pre-made solutions that need to
be customized and put into place by professionals in accordance with the
requirements of the organisation.

Enterprise Resource Planning (ERP) is a type of software system that


integrates various business functions and processes, such as finance,
accounting, human resources, inventory and supply chain management,
customer relationship management, and manufacturing, into a single, unified
system. The goal of an ERP system is to streamline business operations,
improve efficiency, and facilitate better decision-making by providing real-
time data and analytics. ERP systems typically use a centralized database,
which allows data to be shared across departments and functions.

ERP systems typically include a set of integrated modules that can be


customized to meet the specific needs of an organization. These modules
include finance and accounting, human resources management, production
and inventory management, procurement and supply chain management,
customer relationship management, and business intelligence and analytics.
ERP systems can be deployed on-premises or in the cloud, and can be
customized and configured to meet the unique requirements of an
organization. They can be used by businesses of all sizes, from small start-
ups to large multinational corporations.

12.1.2 Evolution of Enterprise Resource Planning


Enterprise Resource Planning (ERP) systems have evolved over several
decades, from basic inventory management systems to sophisticated,
integrated software platforms that manage all aspects of an organization's
operations. Here is a brief overview of the evolution of ERP:
241
Emerging • 1960s-1970s: Material Requirements Planning (MRP) systems were
Technologies for
Business developed to help manufacturers manage their inventory and production
planning. These systems were based on mathematical algorithms and
could calculate the materials needed for production based on customer
demand.
• 1980s: MRP systems evolved into Manufacturing Resource Planning
(MRP II) systems, which added more functionality, such as capacity
planning, scheduling, and financial management. MRP II systems were
integrated with other business functions, such as finance and accounting.
• 1990s: ERP systems emerged, combining the functionality of MRP II
with other business functions, such as human resources, procurement,
and customer relationship management. These systems were designed to
be more flexible, scalable, and customizable than previous systems.
• 2000s: ERP systems continued to evolve, with a focus on web-based
architecture and mobile accessibility. Cloud-based ERP systems became
more popular, allowing for easier deployment and lower costs.
• 2010s-2020s: ERP systems continued to evolve, with a focus on
advanced analytics, artificial intelligence, machine learning, and
automation. ERP vendors also started to offer industry-specific solutions,
tailored to the unique needs of different industries.
Overall, the evolution of ERP systems has been driven by the need for
organizations to manage complex business processes and operations more
efficiently and effectively. As technology continues to advance, ERP systems
will likely continue to evolve to meet the changing needs of businesses.

Source: https://www.omniaccounts.co.za/history-of-enterprise-resource-planning/

12.1.3 Fundamental Technology of Enterprise Resource


Planning
The fundamental technology of Enterprise Resource Planning (ERP) systems
is a combination of software, hardware, and networking technologies that
allow an organization to integrate and manage all its business processes in a
242
Enterprise Resource
single, integrated system. Here are some of the key technologies that Planning
underpin ERP:

• Software: ERP systems are software applications that typically run on a


server and can be accessed by users via a web-based interface. The
software includes a set of integrated modules that support various
business functions, such as finance, human resources, supply chain
management, and customer relationship management.

• Databases: ERP systems rely on databases to store and manage data.


Databases provide a centralized repository of data that can be accessed
by all modules of the ERP system. The database architecture of an ERP
system is typically designed to be highly scalable, so that it can handle
large volumes of data and support multiple users.

• Networking: ERP systems require a network infrastructure to support


communication between different modules and users. The network
infrastructure may include local area networks (LANs), wide area
networks (WANs), and internet connectivity.
• Hardware: ERP systems require server hardware to run the software
and store the databases. The server hardware may include a combination
of high-performance processors, memory, and storage devices.

• Integration: ERP systems must be able to integrate with other software


applications used by the organization, such as customer relationship
management systems, supply chain management systems, and
manufacturing systems. Integration is typically achieved through the use
of application programming interfaces (APIs) and other integration
technologies.

Overall, the fundamental technology of ERP systems is designed to support


the integration and management of all aspects of an organization's operations,
from finance and accounting to supply chain management and customer
relationship management. By leveraging these technologies, organizations
can improve efficiency, reduce costs, and make better-informed business
decisions.

12.1.4 Benefits of Enterprise Resource Planning


Enterprise Resource Planning (ERP) systems offer numerous benefits to
organizations. Here are some of the key benefits:

• Improved Efficiency: ERP systems streamline business processes by


automating routine tasks and providing real-time information. This leads
to improved efficiency, as employees can spend more time on value-
added tasks.
• Enhanced Visibility: ERP systems provide real-time data on all aspects
of an organization's operations. This helps managers make informed
decisions and respond quickly to changes in the business environment.
• Better Decision Making: ERP systems provide a comprehensive view
of an organization's operations, enabling managers to make better-
243
Emerging informed decisions. This can lead to improved profitability, reduced
Technologies for
Business costs, and increased competitiveness.
• Improved Customer Service: ERP systems provide a 360-degree view
of customer interactions, enabling organizations to respond quickly and
effectively to customer needs. This leads to improved customer
satisfaction and retention.
• Standardized Processes: ERP systems standardize business processes
across an organization, ensuring consistency and reducing errors. This
leads to improved quality and increased efficiency.
• Improved Inventory Management: ERP systems provide real-time
visibility into inventory levels, enabling organizations to optimize
inventory levels and reduce carrying costs.
• Improved Financial Management: ERP systems provide real-time
financial data, enabling organizations to manage cash flow and reduce
financial risks.
Overall, ERP systems offer numerous benefits to organizations, including
improved efficiency, enhanced visibility, better decision making, improved
customer service, standardized processes, improved inventory management,
and improved financial management. By leveraging these benefits,
organizations can improve profitability, reduce costs, and increase
competitiveness.

12.2 ERP SOLUTIONS AND FUNCTIONAL


MODULES
ERP solutions are typically comprised of various functional modules that are
designed to manage specific business processes within an organization. Here
are some of the common functional modules found in ERP systems:
• Financial Management: This module includes functionalities for
managing financial processes such as accounts payable and receivable,
general ledger, fixed assets, cash management, and financial reporting.
• Human Resource Management: This module includes functionalities
for managing employee data, payroll, benefits, performance
management, and training and development.
• Supply Chain Management: This module includes functionalities for
managing procurement, inventory, warehouse management, and order
fulfilment.
• Production Planning and Control: This module includes
functionalities for managing production planning, scheduling, and
tracking of manufacturing processes.
• Sales and Marketing: This module includes functionalities for
managing customer data, sales order management, pricing, and
marketing campaigns.
• Customer Relationship Management: This module includes
functionalities for managing customer interactions, such as customer
244 service requests, sales leads, and marketing campaigns.
Enterprise Resource
• Project Management: This module includes functionalities for Planning
managing project planning, scheduling, and tracking of project tasks and
resources.
• Quality Management: This module includes functionalities for
managing quality control processes, such as inspection, testing, and
corrective action.
• Business Intelligence: This module provides analytical tools and
dashboards to help users make informed decisions based on real-time
data.

Each functional module of an ERP system is designed to work together


seamlessly, enabling organizations to integrate business processes across
departments and improve operational efficiency.

12.2.1 Business Process Reengineering


Business Process Reengineering (BPR) is a management approach that
involves the radical redesign of business processes to achieve significant
improvements in performance, efficiency, and effectiveness. The goal of
BPR is to fundamentally rethink how work is done within an organization
and to identify opportunities for process improvement that can lead to
significant cost savings, improved quality, and increased competitiveness.
BPR involves the following steps:

• Identify Processes for Reengineering: Organizations identify processes


that are critical to their operations and in need of improvement.
• Analyse Existing Processes: Organizations analyse existing processes to
identify inefficiencies, redundancies, and areas for improvement.
• Design New Processes: Organizations redesign business processes to be
more efficient, effective, and customer-centric. This typically involves
rethinking and challenging existing assumptions about how work is
done.
• Implement New Processes: Organizations implement new processes
and support systems, such as technology solutions, to enable the new
processes.
• Monitor and Refine: Organizations monitor the new processes to ensure
they are delivering the desired results and make refinements as
necessary.

The benefits of BPR can be significant. BPR can help organizations improve
operational efficiency, reduce costs, improve quality, enhance customer
satisfaction, and increase competitiveness. However, BPR is also a complex
and resource-intensive process that requires significant investment in time,
expertise, and technology. As such, organizations must carefully evaluate the
potential benefits and costs of BPR before embarking on a BPR initiative.

12.2.2 Supply Chain Management


Supply Chain Management (SCM) refers to the management of the flow of
goods and services from the point of origin to the point of consumption. SCM 245
Emerging encompasses all activities involved in the production and delivery of goods
Technologies for
Business and services, from procurement of raw materials to delivery to the end
customer. Primary goal of SCM is to optimize the flow of goods and
services, reducing costs and increasing efficiency, while maintaining high
levels of quality and customer satisfaction. SCM involves a range of
activities, including:
• Procurement: Procurement involves sourcing materials, components,
and services required for production.
• Production: Production involves the actual manufacturing of goods,
including assembling components, packaging, and shipping.
• Transportation and Logistics: Transportation and logistics involve the
movement of goods and services from one location to another.
• Warehousing and Inventory Management: Warehousing and
inventory management involve the storage and management of inventory
and finished goods.
• Distribution and Customer Service: Distribution and customer service
involve the delivery of goods to the end customer and managing
customer relationships.

Effective SCM requires collaboration and coordination among various


stakeholders, including suppliers, manufacturers, distributors, and customers.
SCM software, such as ERP and CRM systems, can help organizations
manage their supply chains more efficiently by providing real-time visibility
into inventory, order status, and production schedules. SCM can also help
organizations reduce costs, increase efficiency, and improve customer
satisfaction by optimizing processes, improving quality, and reducing lead
times.

12.2.3 Online Analytical Processing (OLTP)


Online Analytical Processing (OLAP) is a computer-based approach to data
analysis that enables users to analyse large amounts of complex data from
multiple perspectives in real-time. OLAP systems are designed to provide
fast, flexible, and interactive access to data to support complex analytical
queries and decision-making processes.

OLAP is typically used in business intelligence applications and data


warehousing, where large amounts of data are stored in a central repository
for analysis. OLAP systems are optimized for complex queries and are
designed to enable users to quickly and easily navigate through large data
sets to identify trends, patterns, and insights. OLAP systems use a
multidimensional data model to organize and present data. This model is
based on the concept of a cube, where each dimension of the cube represents
a different aspect of the data, such as time, geography, or product category.
The cube allows users to slice and dice the data along different dimensions to
gain different perspectives on the data. OLAP systems also use a range of
analytical functions and techniques to help users analyse and interpret data.
These functions include aggregation, drill-down, pivot, and roll-up. OLAP
246 systems are typically used by business analysts, data scientists, and other
Enterprise Resource
decision-makers to support a range of activities, including financial analysis, Planning
market analysis, and product analysis.

Overall, OLAP provides a powerful and flexible approach to data analysis


that can help organizations gain insights into their data and make more
informed business decisions.

12.2.4 Customer Relationship Management (CRM)


Customer Relationship Management (CRM) refers to the strategies,
processes, and technologies that businesses use to manage and analyse their
interactions with customers and potential customers. The goal of CRM is to
improve customer satisfaction, loyalty, and retention by providing a more
personalized and effective customer experience.
CRM systems typically include a range of features and functions that enable
businesses to manage customer data, track customer interactions, and
automate various customer-related processes. These systems may include
features such as:
Contact Management: Contact management allows businesses to store and
manage customer information, such as names, addresses, and contact details.
• Sales Automation: Sales automation tools enable businesses to
automate various aspects of the sales process, such as lead generation,
lead qualification, and opportunity management.
• Marketing Automation: Marketing automation tools enable businesses
to automate various marketing activities, such as email campaigns, social
media marketing, and lead nurturing.
• Customer Service and Support: CRM systems often include tools for
managing customer service and support, such as ticketing systems,
knowledge bases, and chatbots.
• Analytics and Reporting: CRM systems provide a range of analytics
and reporting tools that enable businesses to analyse customer data and
gain insights into customer behaviour, preferences, and needs.
Effective CRM can provide a range of benefits to businesses, including
increased customer satisfaction, loyalty, and retention, improved sales and
marketing effectiveness, and better customer insights and analytics. CRM
systems can be used in a wide range of industries and businesses of all sizes,
from small start-ups to large multinational corporations.

12.2.5 Data Warehousing


Data warehousing is a process of collecting, managing, and organizing large
volumes of data from different sources in a central repository for analysis and
reporting purposes. The purpose of data warehousing is to provide a single,
integrated view of an organization's data that can be used to support business
decision-making processes. Data warehouses typically use a relational
database management system (RDBMS) to store and manage the data. The
data is organized into tables, with each table containing specific data fields or
attributes. The data is also structured in a way that enables easy querying and
reporting. 247
Emerging Data warehousing involves a range of processes, including data extraction,
Technologies for
Business transformation, and loading (ETL), data modelling, data integration, and data
quality management. ETL processes are used to extract data from different
sources, transform the data into a consistent format, and load the data into the
data warehouse. Data modelling is used to create a conceptual and logical
representation of the data, while data integration is used to combine data from
different sources into a single, integrated view. Data quality management is
used to ensure that the data is accurate, consistent, and reliable. Data
warehouses are typically used in business intelligence applications to support
reporting and analysis activities. The data in the warehouse can be used to
generate a wide range of reports and analysis, including financial reports,
sales reports, customer reports, and market analysis. Data warehouses can
also be used to support data mining and predictive analytics, which involve
using statistical algorithms to analyse the data and identify trends, patterns,
and insights.

Overall, data warehousing provides a powerful approach to managing and


analysing large volumes of data from different sources. By providing a
single, integrated view of an organization's data, data warehousing can
support better decision-making processes and enable organizations to gain
insights into their business operations and performance.

12.2.6 Data Mining


Data mining is the process of discovering patterns, trends, and insights from
large datasets using statistical and machine learning algorithms. The goal of
data mining is to extract useful and meaningful information from data, which
can then be used to support business decision-making processes.
Data mining involves a range of techniques and approaches, including
clustering, classification, association rule mining, and anomaly detection.
These techniques are used to identify patterns and relationships in the data,
such as customer buying behaviours, market trends, and performance metrics.
The data used in data mining can come from a range of sources, including
transactional databases, data warehouses, and data lakes. The data is typically
pre-processed and transformed to make it suitable for analysis, such as
removing missing values, normalizing the data, and reducing the
dimensionality of the data.
Data mining has a wide range of applications in different industries and
business domains. For example, in finance, data mining can be used to detect
fraudulent transactions, while in healthcare, it can be used to predict patient
outcomes and identify risk factors for diseases. In marketing, data mining can
be used to identify customer segments and personalize marketing campaigns,
while in manufacturing, it can be used to optimize production processes and
reduce costs.
Overall, data mining provides a powerful approach to analysing and gaining
insights from large datasets. By using statistical and machine learning
techniques to identify patterns and trends in the data, data mining can help
organizations to improve their decision-making processes and gain a
competitive advantage in their markets.
248
Enterprise Resource
12.2.7 Management Information System (MIS) Planning

Management Information System (MIS) is a computer-based system that


provides managers with the tools and information they need to manage and
control their organizations effectively. MIS is designed to collect, process,
store, and disseminate information to support decision-making, coordination,
and control within an organization.

MIS typically includes a combination of hardware, software, and people, as


well as data, procedures, and policies. The hardware components of MIS may
include computers, servers, network devices, and storage devices, while the
software components may include operating systems, database management
systems, and applications. The people who use MIS may include managers,
employees, customers, and suppliers, who use the system to access and
analyse data to support their decision-making processes. MIS supports a
range of business functions, including accounting, finance, marketing, sales,
production, and logistics. For example, in finance, MIS can be used to
generate financial reports, manage budgets, and track expenses. In marketing,
MIS can be used to collect and analyse customer data, create marketing
campaigns, and measure the effectiveness of marketing efforts.

MIS can provide a range of benefits to organizations, including improved


decision-making, increased productivity, better communication and
collaboration, and reduced costs. MIS can also help organizations to stay
competitive in their markets by providing them with the information they
need to make informed decisions and respond quickly to changes in their
business environment.
Overall, MIS plays a critical role in helping organizations to manage and
control their operations effectively. By providing managers with the tools and
information they need to make informed decisions, MIS can help
organizations to achieve their strategic objectives and gain a competitive
advantage in their markets.

12.2.8 Executive Support System (ESS)


Executive Support System (ESS) is a computer-based information system
designed to provide top-level executives with the information they need to
make strategic decisions. ESS is typically used by senior executives and
board members who require access to real-time, high-level data and analysis
to support their decision-making processes.

ESS provides executives with a range of tools and capabilities, such as data
visualization, trend analysis, and scenario planning. ESS is designed to
provide executives with a comprehensive view of their organization's
performance, including financial data, market trends, and competitor
analysis. ESS may also incorporate data from external sources, such as
industry reports, economic data, and news feeds. The key features of ESS
include:

• User-friendly Interface: ESS typically has a user-friendly interface that


allows executives to access data quickly and easily.
249
Emerging • Real-time Data: ESS provides executives with real-time data and
Technologies for
Business analysis, allowing them to make timely decisions.
• Data Visualization: ESS provides executives with visual representations
of data, such as graphs and charts, making it easier to interpret complex
data.
• Scenario Planning: ESS allows executives to model different scenarios
and analyze the potential impact of different decisions.
• Drill-down Capabilities: ESS allows executives to drill down into data
to explore underlying trends and relationships.

ESS is a critical tool for top-level executives who need to make strategic
decisions. By providing executives with real-time data and analysis, ESS
helps them to stay informed and make informed decisions that can have a
significant impact on their organization's performance.

12.2.9 Decision Support System (DSS)


A decision support system (DSS) is a computer-based information system
that is designed to support decision-making activities within an organization.
It provides interactive tools and techniques to help users analyse data,
identify alternatives, and make decisions.
The primary goal of a DSS is to assist decision-makers in making informed,
timely, and effective decisions by providing them with relevant information
and analysis tools. DSSs are typically used in complex decision-making
scenarios where there is a high level of uncertainty or ambiguity, and where
traditional decision-making methods may not be sufficient. A DSS typically
includes the following components:
• Data Management: This involves collecting, storing, and retrieving data
from various sources, such as databases, spreadsheets, and other
information systems.
• Model Management: This involves creating and managing
mathematical or statistical models that can be used to analyse data and
support decision-making.
• User Interface: This provides a user-friendly interface for accessing and
manipulating data and models, and for visualizing and presenting results.
• Decision Support Tools: These are interactive tools and techniques that
help users analyse data and identify alternatives, such as data mining,
simulation, optimization, and visualization.
Overall, DSSs are powerful tools that can help organizations make better
decisions, improve efficiency and effectiveness, and gain a competitive
advantage in the marketplace.

12.3 IMPLEMENTATION OF ERP


12.3.1 Implementation Methodologies and Approaches
ERP implementation methodologies and approaches are structured processes
250 used to plan, design, develop, and deploy an ERP system within an
Enterprise Resource
organization. Here are some of the most common ERP implementation Planning
methodologies and approaches:

Waterfall Methodology: The Waterfall methodology is a linear, sequential


approach that involves a series of phases, including planning, design,
development, testing, and deployment. Each phase must be completed before
moving on to the next phase.

• Agile Methodology: The Agile methodology is a more flexible, iterative


approach that involves breaking down the project into smaller, more
manageable chunks called sprints. Each sprint focuses on a specific set
of requirements and is followed by a review and feedback process.

• Hybrid Methodology: The Hybrid methodology combines elements of


both the Waterfall and Agile methodologies. It involves a phased
approach that includes planning, design, development, and testing, but
also allows for flexibility and adaptability as requirements change.
• Business Process Reengineering (BPR) Approach: The BPR approach
involves a fundamental rethinking and redesign of business processes to
achieve significant improvements in performance and efficiency. It
involves analysing existing processes, identifying areas for improvement,
and redesigning processes to meet new business requirements.

• Big Bang Approach: The Big Bang approach involves implementing


the entire ERP system at once, rather than phasing it in gradually. This
approach is faster but can be riskier and more complex.

• Pilot Approach: The Pilot approach involves implementing the ERP


system in a single department or location first, to test and refine the
system before rolling it out to other departments or locations.

Each approach has its advantages and disadvantages, and the choice of
methodology depends on various factors, such as the size and complexity of
the organization, the scope of the ERP system, and the specific requirements
and needs of the organization. The successful implementation of an ERP
system requires careful planning, communication, and stakeholder
engagement.

12.3.2 ERP Life-Cycle


ERP life-cycle refers to the various stages that an ERP system goes through,
from its conceptualization to its retirement. Here are the different stages of
the ERP life-cycle:

• Planning: The planning stage involves identifying the organization's


needs and requirements, defining the scope of the ERP system, and
selecting the appropriate ERP solution. This stage also involves
developing a project plan, setting timelines, and determining the project
budget.
• Analysis: The analysis stage involves assessing the organization's
existing processes, identifying gaps, and defining the required
modifications to align with the ERP system. This stage also involves 251
Emerging defining the data structures and business rules that will be used by the
Technologies for
Business system.
• Design: The design stage involves creating a detailed blueprint for the
ERP system. This stage involves defining the technical architecture, data
migration plan, and system configuration. The design stage also involves
developing prototypes and conducting testing to ensure that the system
meets the requirements.
• Development: The development stage involves building and
customizing the ERP system, based on the design specifications. This
stage also involves developing interfaces to integrate the ERP system
with other systems and applications.
• Testing: The testing stage involves verifying that the ERP system meets
the requirements and functions as expected. This stage includes testing
the system's functionality, performance, and security.
• Deployment: The deployment stage involves installing the ERP system
and transitioning to the new system. This stage also involves training the
users, migrating data, and validating that the system works correctly.
• Maintenance: The maintenance stage involves ongoing support,
maintenance, and updates to the ERP system. This stage includes
monitoring the system's performance, addressing issues, and making
enhancements.
• Retirement: The retirement stage involves retiring the ERP system
when it is no longer needed or is replaced by a newer system. This stage
also includes data archiving and disposal.

Each stage of the ERP life-cycle is critical for a successful ERP


implementation, and careful planning, execution, and monitoring are essential
to ensure that the ERP system meets the organization's needs and
requirements.

12.3.3 SDLC.ERP Implementation Cost and Time


SDLC (Software Development Life Cycle) is a process used for developing
and implementing software systems. It consists of several stages such as
planning, analysis, design, development, testing, and maintenance. ERP
implementation is a complex project that requires following the SDLC
approach to ensure its success. The cost and time required for ERP
implementation can vary significantly, depending on several factors such as
the size and complexity of the organization, the scope of the ERP system, the
chosen implementation methodology, the level of customization required,
and the extent of data migration and integration with other systems.

The cost of ERP implementation typically includes software licensing fees,


hardware and infrastructure costs, implementation consulting fees,
customization costs, data migration costs, and training costs. The total cost of
implementation can range from tens of thousands of dollars to millions of
dollars, depending on the size and complexity of the organization and the
chosen ERP solution. The time required for ERP implementation can also
252 vary widely. Smaller organizations with less complex ERP requirements may
Enterprise Resource
be able to complete the implementation in a few months, while larger Planning
organizations with more complex requirements may take a year or more to
complete the implementation. The chosen implementation methodology and
the extent of customization required can also impact the time required for
implementation.

To ensure the success of an ERP implementation project, it is essential to


plan carefully, choose the appropriate implementation methodology, engage
stakeholders, manage risks, and monitor progress closely. A well-planned
and executed ERP implementation can help organizations achieve significant
benefits, such as improved efficiency, increased productivity, and better
decision-making capabilities.

12.3.4 ERP Project Management, Training


ERP project management is a critical aspect of any ERP implementation
project. It involves coordinating and managing all aspects of the project,
including planning, design, development, testing, and deployment. Effective
ERP project management requires careful planning, communication, risk
management, and stakeholder engagement. Here are some key steps involved
in ERP project management:
• Define the scope of the project and establish goals and objectives.
• Develop a detailed project plan with timelines, milestones, and resources
required.
• Assemble a project team with the necessary skills and expertise.
• Assign roles and responsibilities to team members.
• Monitor progress and performance against the project plan and adjust as
necessary.
• Communicate regularly with stakeholders to keep them informed of
progress and any issues.
• Manage risks and issues that arise during the project.
• Ensure that project deliverables meet quality standards.

Training is another critical aspect of ERP implementation. ERP training helps


users understand how to use the system and make the most of its features and
functionality. Effective ERP training requires developing a comprehensive
training plan, identifying training needs, and delivering training in a variety
of formats, such as classroom training, online training, and on-the-job
training. Here are some key steps involved in ERP training:

• Develop a comprehensive training plan that includes training objectives,


training materials, and training delivery methods.
• Identify the training needs of different user groups, such as
administrators, managers, and end-users.
• Develop training materials, such as user manuals, training videos, and
interactive tutorials.
• Deliver training in a variety of formats, such as classroom training,
253
Emerging online training, and on-the-job training.
Technologies for
Business • Evaluate the effectiveness of the training program and make adjustments
as necessary.
Effective ERP project management and training are critical for the success of
ERP implementation. By following best practices in project management and
training, organizations can ensure that their ERP implementation projects are
completed on time, within budget, and deliver the expected benefits.

12.3.5 ERP Implementation Stakeholder’s Roles and


Responsibilities
ERP implementation is a complex process that requires the active
involvement of different stakeholders in the organization. Each stakeholder
has a unique role and responsibility in ensuring the success of the
implementation project. Here is a breakdown of the roles and responsibilities
of the key stakeholders in ERP implementation:

Vendors: ERP vendors are responsible for providing the software,


implementation services, and technical support. Their responsibilities
include:
• Providing a comprehensive understanding of the software's features and
capabilities
• Offering training and support services
• Ensuring that the software is installed and configured correctly
• Providing customization and integration services as needed

Consultants: ERP consultants are responsible for advising the organization


on how to implement the software effectively. Their responsibilities include:
• Conducting a business analysis to determine the organization's
requirements and identify potential problems
• Providing guidance on best practices for ERP implementation
• Assisting in the development of an implementation plan
• Facilitating communication between different stakeholders

Top Management: Top management is responsible for providing leadership


and direction for the implementation project. Their responsibilities include:

• Ensuring that the implementation aligns with the organization's strategic


objectives
• Providing resources and support for the implementation project
• Approving the implementation plan and budget
• Making key decisions related to the implementation project

End-users: End-users are responsible for using the software effectively to


achieve their job functions. Their responsibilities include:

• Participating in training programs and workshops to learn how to use the


254
Enterprise Resource
software effectively Planning
• Providing feedback on the usability of the software and suggesting
improvements
• Ensuring that data is entered correctly and consistently
• Reporting any issues or problems with the software to the
implementation team

Overall, ERP implementation is a team effort that requires the active


involvement and collaboration of all stakeholders. By understanding their
roles and responsibilities, stakeholders can work together effectively to
ensure the success of the implementation project.

12.4 OVERVIEW OF ERP IN SOME OF THE KEY


FUNCTIONAL AREAS
Presented below is a brief overview of ERP modules in some of the key
functional areas:
• Finance: The Finance module is designed to manage financial
transactions, including general ledger, accounts payable, accounts
receivable, cash management, and financial reporting.
• Human Resource Management (HRM): The HRM module manages
employee-related processes, such as payroll, benefits administration,
recruitment, training, performance management, and compliance.
• Sales & Distribution: The Sales & Distribution module manages sales-
related processes, such as order processing, pricing, delivery, and
invoicing.
• Production Planning: The Production Planning module manages the
manufacturing process, including planning, scheduling, and monitoring
production operations.
• Material Management: The Material Management module manages the
procurement process, including purchasing, inventory management, and
logistics.
• Inventory Control System: The Inventory Control System module
manages inventory levels, including tracking inventory, controlling stock
levels, and managing inventory movements.
• Quality Management: The Quality Management module manages the
quality control process, including quality planning, quality control, and
quality assurance.
• Marketing: The Marketing module manages marketing-related
processes, such as lead management, campaign management, and
customer segmentation.
• Customization: The Customization module allows businesses to tailor
the ERP system to their specific needs and requirements.
Each of these modules is designed to work seamlessly with the other
modules in the ERP system, providing businesses with a comprehensive
view of their operations and enabling them to make better-informed
decisions.
255
Emerging
Technologies for 12.5 SUMMARY
Business
To sum up, an Enterprise Resource Planning (ERP) system is a software
application that helps organizations manage their business processes,
operations, and resources in a centralized manner. ERP systems typically
have several modules that cover different business functions, such as finance,
human resources, sales and distribution, production planning, material
management, inventory control, quality management, and marketing.
ERP implementation is a complex and challenging process that requires
careful planning, project management, and stakeholder engagement.
Different ERP implementation methodologies and approaches can be used,
such as the Waterfall model, the Agile model, or a hybrid approach. Effective
ERP project management involves defining the scope of the project,
developing a detailed project plan, assembling a project team, monitoring
progress, managing risks and issues, and ensuring that project deliverables
meet quality standards. ERP training is also critical to ensure that users
understand how to use the system effectively and make the most of its
features and functionality. ERP implementation stakeholders have different
roles and responsibilities, including ERP vendors, consultants, top
management, and end-users. By understanding their roles and
responsibilities, stakeholders can work together effectively to ensure the
success of the implementation project.
Overall, ERP systems can help organizations improve their efficiency, reduce
costs, increase productivity, and enhance customer satisfaction. However,
successful ERP implementation requires careful planning, effective project
management, and stakeholder engagement.

12.6 SELF – ASSESSMENT EXCERCISES


1. What is ERP, and how does it work?
2. Why do businesses need ERP systems?
3. What are the key benefits of implementing an ERP system?
4. What are the typical features of an ERP system?
5. How do you select the right ERP system for your business?
6. How do you implement an ERP system, and what are the key
challenges?
7. How do you train employees to use an ERP system?

12.7 KEYWORDS
1. Enterprise Resource Planning (ERP) - A software system that helps
organizations manage their business processes, operations, and resources
in a centralized manner.
2. Modules - Functional components within an ERP system that cover
different business functions, such as finance, human resources, sales and
distribution, production planning, material management, inventory
control, quality management, and marketing.
256
3. Implementation - The process of installing and configuring an ERP Enterprise Resource
Planning
system to meet the specific needs of an organization.
4. Project Management - The practice of planning, executing, and
monitoring a project to achieve specific goals and objectives, such as the
implementation of an ERP system.
5. Stakeholders - Individuals or groups who have an interest or a role in
the implementation of an ERP system, such as ERP vendors, consultants,
top management, and end-users.
6. Customization - The process of modifying an ERP system to meet the
unique needs of an organization.
7. Data Migration - The process of transferring data from legacy systems
to the new ERP system.
8. User Acceptance Testing (UAT) - The process of testing an ERP
system to ensure that it meets the requirements and expectations of end-
users.
9. Training - The process of educating end-users on how to use the ERP
system effectively.

12.8 FURTHER READINGS


1. Enterprise Resource Planning – Alexis Leon – Second Edition – TMH.
2. ERP in practice – Vaman – TMH.
3. Daniel E.O’Leary, Enterprise Resource Planning Systems, Cambridge
University Press, 2002.
4. Ellen Monk, Bret Wagner, Concepts in Enterprise resource planning,
Cengage learning, Third edition, 2009.
5. Manufacturing Resource Planning (MRP II) with Introduction to ERP;
SCM; an CRM by Khalid Sheikh, Publisher: McGraw-Hill
6. The Impact of Enterprise Systems on Corporate Performance: A study of
ERP, SCM, and CRM System Implementations [An article from: Journal
of Operations Management] by K.B. Hendricks; V.R. Singhal; and J.K.
Stratman, Publisher: Elsevier
7. ERP and Supply Chain Management by Christian N. Madu, Publisher:
CHI
8. Implementing SAP ERP Sales & Distribution by Glynn C. Williams,
Publisher McGraw-Hill.
9. R. Addo-Tenkorang and P. Helo, “Enterprise Resource Planning (ERP):
A Review Literature Report”, Proceedings of the World Congress on
Engineering and Computer Science 2011 Vol II, WCECS 2011, October
19-21, 2011, San Francisco, USA.

257
Emerging
Technologies for UNIT 13 APPLICATIONS OF IOT, AI AND
Business
VR

Objectives
After studying this unit, you will be able to:
• Understand the architecture of the Internet of Things and illustrate the
real-time IoT applications to make the smart world.
• Understand AI history, explore its evolution and contribute to
comprehending what led to the AI impacts we have in society today.
• Introduce students to virtual reality (VR) and explore different
technologies, concepts, and development environments that can be used.

Structure
13.1 Introduction
13.2 Internet of Things (IoT)
13.2.1 Evolution of IoT
13.2.2 IoT Ecosystem Concepts
13.2.3 Components of an IoT Ecosystem
13.2.4 IoT Layered Architectures with Security Attacks
13.2.5 IoT Applications and Services
13.2.6 Unlocking the Massive Potential of an IoT Ecosystem for a Business
13.2.7 Key Challenges of IoT Implementation and Future Directions
13.2.8 Business Case: India (Rajkot)
13.3 Artificial Intelligence (AI) for Business
13.3.1 Historical Overview of Artificial Intelligence
13.3.2 Why is Artificial Intelligence Important?
13.3.3 Artificial Intelligence - Components and Approaches
13.3.4 What Types of Artificial Intelligence exist?
13.3.5 How do Artificial Intelligence, Machine Learning, and Deep Learning
Relate?
13.3.6 Artificial Intelligence at Work Today
13.3.7 Ethics of Artificial Intelligence
13.3.8 Future of Artificial Intelligence – Endless Opportunities and Growth
13.4 Virtual Reality (VR)
13.4.1 Introduction to Virtual Reality
13.4.2 Evolution of Virtual Reality
13.4.3 Basic Components of VR Technology
13.4.4 Applications of Virtual Reality
13.4.5 Advantages and Disadvantages of Virtual Reality
13.4.6 The Future of Virtual Reality
13.5 Summary
13.6 Self–Assessment Exercises
258
Applications of IOT,
13.7 Keywords AI and VR
13.8 Further Readings

13.1 INTRODUCTION

13.2 INTERNET OF THINGS (IOT)


13.2.1 Evolution of IoT
IoT was first conceptualised as pervasive computing or the embedded
internet in the early 1970s. Scientists saw the potential for mobile agility,
location and energy-aware applications, as well as integrated information
systems. With the creation of ARPANET, the first connected network of the
modern "Internet," the development of began and proceeds along a timeline
of significant turning points that we shall outline below:

Figure 1: Evolution of Internet of Things

Year Significant Contributions


Soda vending machine at the computer science department of
Carnegie Mellon University being at a distance from graduate
student's classroom, with the help of two other students and a
1982
research engineer, student created a code that allowed anyone
on the university's ARPANET to keep track of the vending
machine's inventory and the temperature of its soda bottles.
Tim Berners Lee, an English computer scientist, developed
1989
World Wide Web's structure and sets basis for the Internet.

1990 John Romkey of MIT creates a toaster that can be activated or


deactivated online. This toaster is regarded as the first IoT 259
Emerging gadget.
Technologies for
Business Trojan Room Coffee Pot was created by Quentin Stafford-
Fraser and Paul Jardetzky from the University of Cambridge
1993 in their computer lab. One can check the level of coffee by
looking at an image of the coffee pot's interior that is
uploaded to the building's server three times per minute.
1999 Kevin Ashton coined the phrase "Internet of Things" (IoT) at
MIT. He referred to connecting RFIDs in their supply chain to
the internet.
2003- IoT terminology started to appear frequently in publications
2004 like The Guardian and Scientific American.
2005 The International Telecommunications Union of the United
Nations mentions the implications of IoT in its report.
2008 The first IoT conference organized in Zurich, brought
together academics and professionals from business to share
information. IoT was named one of the six disruptive civil
technologies by the US National Intelligence Council same
year.
2010 Industry leaders looked at IoT as a new emerging wonder that
would translate into a billion dollar market
2011 IoT officially began in 2008 and 2009, when there were more
connected objects than online users, according to a white
paper published in 2011 by the Cisco Internet Business
Solutions Group (CIBSG).
2012 and We start to have devices that control individual things in our
later homes, all working in conjunction with our computers and
phones to share data and interact.
Industry estimates 50 billion devices being interconnected
2020 with a goal of improving the overall quality of life. Today,
everything we use connects to the internet simultaneously.

13.2.2 IoT Ecosystem Concepts


In the modern world, electronic devices, smart devices, autonomous vehicles,
smart buildings, and so on are all around us. These actual objects are outfitted
with software that, in accordance with their intended uses and designs, offer
particular features and services. Strong communication networks enable these
physical things to communicate across geographic borders. The Internet of
Things (IoT) concept facilitates communication between millions to billions
of physically inter-connected devices over an IoT network by utilising
modern computing technologies (such as Edge computing, Fog computing
and Roof computing). IoT physical devices that interact with one another are
outfitted with sensors, device-specific embedded software, and network-
supporting components. The internet serves as a communication channel for
many physically separated entities with individual identification numbers.
Connected gadgets that gather data, exchange it with the Internet or local
networks, and generate analytics to help people better understand their
environment.

260
Applications of IOT,
Internet of Things = Physical Devices + Controller, Sensor and Actuators AI and VR
+ Internet Connectivity
The concept underlying the IoT could alternatively be illustrated, as in Figure
2. Globalisation of IoTs technology with A’s (anything, anyone, any service,
any path, any place, any time, etc.) and C’s (collections, convergence,
connectivity, computing. Etc.), has outpaced its capabilities today.

Figure 2: Concept Underlying IoT Technology

13.2.3 Components of an IoT Ecosystem


IoT devices can be connected to a variety of organisations, including
enterprises, governments, and consumers. Components of an IoT ecosystem
include:

1. Sensors – These electronic devices perceive the physical and mechanical


environment, convert it into a signal suitable for processing (optical,
electrical, or mechanical), and create useful data that is sent to the
Internet via network technologies. Sensors can be used in everyday
physical objects, public infrastructure, transportation systems, and
machinery found in commercial buildings and industries. Biological,
biometric, environmental, visual, auditory, or any combination of these
types of sensing are all possible. Sensors may be physically hardwired,
built into the product, or communicate via a short-haul communication
protocol. Examples of sensors include temperature sensors, light sensors,
moisture sensors, GPS receivers, vehicle on-board diagnostics etc.

2. Actuators - The hub of the Internet of Things is an actuator. In order to


deliver signals or commands to the actuator, sensors must collect and
compile data. The actuator then reacts to the command or signal and
"acts" or causes something to occur as a result of this signal. As an
illustration, sensors are employed to track any changes in the
environment's temperature in offices. They provide a signal to the
actuators when a change is detected, and the actuators then automatically
modify the airflow. Examples of actuators include locking/unlocking the
doors, switch on/off the lights or other electrical appliances, alert users of
threats through alarms or notifications, and control the temperature of a
home (via a thermostat).
261
Emerging 3. Long-haul Communication – Data gathered at the device level is
Technologies for
Business retransmitted through short-haul communication to a cloud-based service
for additional processing. IoT systems usually need long-distance
connectivity (long-haul communication) to a cloud-based application for
stakeholders to consume information. Particularly in the areas of
security, footprint, and dependability, these long-haul protocols have
distinct goals than short-haul ones. Depending on the use case, there are
many different long-haul choices for IoT systems, including cellular and
satellite, WiFi and wired Ethernet, as well as subgigahertz. Similar to
short-haul communication, long-haul communication uses a variety of
networking protocols, including TCP (Transmission Control Protocol)
and UDP (User Datagram Protocol) for transport layer and HTTP
(Hypertext Transfer Protocol) and CoAP (Constrained Application
Protocol) for the application layer.
4. IoT Cloud – Data flows to the cloud for processing and storage after
passing via the IoT protocols and gateway. In order to quickly decide
what action should be taken in response to the data gathered and signals
received, data is then leveraged to deliver real-time analytics.

5. IoT Analytics and Data Management – Enormous amount of digital


data (also known as big data) produced by billions of inter-connected
sensor devices is often stored in the digital realm utilising cloud
computing services over the Internet. From these massive data streams,
advanced analytics helps produce actionable insights and real-time
solutions that can be used for effective decision making. To assist IoT
deployments in practical applications, a number of large data processing
tools, effective databases, streaming analytics engines, and platforms
have been created. Deep learning, crowd analytics, anomaly detection
engines, tracking algorithms, and pattern recognition and detection
methods are a few common analytical tools. Additionally, models of
artificial intelligence have been created for people to engage with IoT
technology. Other methods that can be taken into consideration for IoT
interactions with people include augmented reality and virtual reality.

IoT Components

IoT
Description
Components
Sensors • Mobile phone-based sensors - Due to the
embedded sensors included in smartphones, which
are becoming more and more popular, academics are
expressing an interest in developing smart IoT
solutions.
• Medical sensors - Are used to measure and keep
track of the body's different medical parameters.
Smart watches, wristbands, monitoring patches, and
smart fabrics are examples of wearable technology.
• Neural sensors - Are used to improve mental
262 health and train the brain to concentrate, pay
Applications of IOT,
attention to details, handle stress, and manage AI and VR
emotions.
• Environmental and chemical sensors - Are
employed to detect the presence of gases and other
airborne particulates. Chemical sensors are used to
assess food and agricultural items in supply chain
applications, monitor food quality in smart kitchens,
and track pollution levels in smart cities.
• Supply chain management, access control,
identity authentication, and object tracking are just a
few of the applications that employ radio frequency
identification (RFID) to make conclusions and take
further action.
Actuators • Hydraulic actuators facilitate mechanical motion
using fluid or hydraulic power.
• Pneumatic actuators use the pressure of
compressed air.
• Electrical ones use electrical energy.
Things Physical/ virtual objects
Communication IEEE 802.15.4, low power WiFi, 6LoWPAN, RFID,
Technologies NFC, Sigfox, LoraWAN, and other proprietary
protocols for wireless networks
Middleware Oracle’s Fusion Middleware - OpenIoT,
MiddleWhere, and Hydra
Applications of Home Automation, Smart Cities, Social Life and
IoT Entertainment, Health and Fitness, Smart
Environment and Agriculture, Supply Chain and
Logistics, and Energy Conservation

13.2.4 IoT Layered Architectures with Security Attacks


There isn't a single, universal IoT design that researchers and people
throughout the world agree upon. Researchers have proposed a wide variety
of architectures. Some academics claim that IoT design has three layers,
while others advocate for a four-layer architecture. Five-layer design has also
been suggested as a solution to the security and privacy issue that the Internet
of Things faces. It is believed that a recently proposed architecture can
satisfy the IoT's security and privacy needs.

Three-Layer Architecture
It’s extremely straightforward design that complies with the core principles
of the Internet of Things. In the early stages of IoT development, three
layers—application, network, and perception—are advised.

263
Emerging
Technologies for
Business

Figure 3: Three-layer Architecture

Perception Layer – It is also known as a sensor layer. This layer includes


wireless gadgets, sensors, and radio frequency identification (RFID) tags that
are used to gather and transmit unprocessed data, like temperature and
moisture levels, to the following layer. Based on the requirements of the
applications, the sensors are chosen. Common security threats of perception
layer are:

• Eavesdropping: An illegal real-time attack in which an attacker


intercepts a victim's private, insecure conversations.

• Node Capture: Attacker takes total control of a gateway node or a key


node through node capture. Any information, including communications
between sender and receiver, may be leaked.

• Fake Node and Malicious: To prevent the transmission of actual


information, the attacker adds a node to the system and inputs false
information.

• Timing Attack: Allows an attacker to identify security flaws and obtain


sensitive data kept in a system's security by keeping track of how quickly
the system responds to various inputs, queries, or cryptographic
algorithms.

Network Layer – Between the application layer and the perception layer is
the transmission layer/network layer that moves and sends the data gathered
from actual objects via sensors. The transmission medium might be either
wireless or wired technology. Additionally, it takes on responsibility for
linking networks, network devices, and intelligent objects. It has significant
security weaknesses that compromise the authenticity and integrity of the
data being exchanged over the network. Typical security concerns that impact
264 network layers include the following:
Applications of IOT,
• Denial of Service (DoS) Attack: Attacks against legitimate users of AI and VR
devices or other network resources are known as denial of service (DoS)
attacks. It prevents users from using the targeted devices or network
resources, by flooding them with repetitive requests.

• Main-in-the-Middle (MiTM) Attack: Occurs when the attacker


deceives sender and receiver by altering messages to have control over
the communication. Attacker records and manipulates information in real
time, as they pose severe danger to internet security.
• Storage Attack: Attackers may target users' data kept on storage devices
or in the cloud, by changing user information to inaccurate specifics.

• Exploit Attack: Attacker makes use of security flaws in software,


hardware, or operating systems and frequently attempts to take control of
the system and steals information stored on a network.

Application Layer - It classifies all IoT and IoT-deployed applications. It is


responsible for providing services to the applications depending on the data
collected by sensors. At the application layer, where it is vital, there are a
number of difficulties, including security. When used to build a smart home,
IoT in particular presents various risks and weaknesses from both inside
and outside. The devices used in smart homes, like ZigBee, have inadequate
processing capability and limited storage, which is one of the major obstacles
in establishing comprehensive security in an IoT-based smart home.
Common vulnerabilities in application layer security include:

• An injection attack is cross-site scripting. Attacker inserts a client-side


script, onto a reliable website in an unlawful manner and fully alter the
application's contents to suit their demands.
• Malicious Code Attack: Contains software code that is meant to harm
the system and have unintended consequences.
• The Capacity for Handling Mass Data: Due to a large number of
devices and a massive amount of data transmission between users, it has
no ability to deal with data processing according to the requirements,
resulting in network disturbance and data loss.

Four-Layer Architecture
Three layer architecture was unable to meet all IoT standards, due to ongoing
development in the field. As a result, researchers suggested a four-layer
architecture. Similar to the prior architecture, it contains three layers, with an
additional fourth layer known as the support layer. The three layers have the
same functionality as the three-layer architecture that we have already
discussed above.

265
Emerging
Technologies for
Business

Figure 3: Four-Layer Architecture

Support Layer - Due to flaws in three-layer architecture, a new layer is


proposed. Data from a perception layer is transmitted to the support layer in a
four-layered IoT architecture system. It confirms that information is sent by
authorised users and is secure from assaults. Using an authentication
technique, the users and the data can both be confirmed. It's implemented
using pre-shared passwords, keys, and secrets. Using wired and wireless
transmission methods, the support layer must additionally communicate with
the network layer. This layer may be affected by assaults such as DoS
attacks, malicious insider attacks, unauthorised access, etc. Risks and
challenges with the support layer include:

• Denial of Service (DoS) Attack: he DoS attack is related to the network


layer by a support layer. An attacker sends a lot of data to overwhelm the
network. The user can no longer access the system since the IoT is
exhausted as a result of the intensive system resource utilisation.

• A malicious insider attack tries to gain access to users' personal


information and targets IoT environment. It is done by an authorised user
to gain access to data that belongs to another user. Numerous defences
are required to eliminate the threat because it is such a distinctive and
sophisticated attack.

Five-Layer Architecture
In the four-layer architecture, there were certain additional security and
storage concerns. To make the IoT secure, researchers suggested a five-layer
architecture. Similar to earlier architectures, it has three layers: the perception
layer, the transport layer, and the application layer. There are two further
layers proposed namely processing layer and business layer. The recently
proposed architecture has the capacity to safeguard IoT applications. The
following describes how these layers function and how security attacks may
affect them:
266
Applications of IOT,
Processing Layer - The processing layer is also known as a middleware AI and VR
layer. Data transmitted from the transport layer is collected and processed by
it. It is in charge of eliminating unwanted, useless data and extracting
important information. It also addresses the IoT big data dilemma of
numerous risks that may hamper IoT performance by affecting the processing
layer. Typical assaults consist of:
• Exhaustion: An attacker employs exhaustion to hinder IoT structure
processing. It happens as a result of attacks, such as a DoS attack, in
which the attacker floods the target with requests in an effort to disable
the network for users. It can be the result of previous attacks intended to
deplete system resources like the battery and memory. IoT is distributed
in nature, thus there aren't many risks associated with it. It is
significantly simpler to put protective measures in place against it.

• Malware: It is an attack on the users' personal information's


confidentiality. It describes the usage of worms, Trojan horses, spyware,
adware, and infections to interact with the system. It appears in the form
of scripts, executable codes, and contents. Stealing information
confidentiality is against system requirements.
Business Layer - The business layer manages the entire system and refers to
an application's intended behaviour. Its duties include managing and
controlling IoT applications, business models, and revenue streams. This
layer is also in charge of managing user privacy. It can also decide how
information can be produced, saved, and modified. By ignoring the business
logic, a vulnerability at this layer enables attackers to abuse an application.
The majority of security issues are flaws in an application that arise from a
faulty or absent security control. The following are typical issues with
business layer security:
• Business Logic Attack: It makes use of a programming error. It
regulates and oversees the information flow between a user and an
application's auxiliary database. The business layer has a number of
frequent weaknesses, including incorrect programming by a programmer,
password recovery validation, input validation, and encryption methods.

• Zero-Day Attack: This term describes a security flaw or issue with an


application that is unknown to the vendor. Without the user's knowledge
or agreement, the attacker uses this security flaw to seize control.

Figure 4 depicts the hierarchy of all proposed layer architectures for the
Internet of Things (IoT), with three, four, and five layers, respectively.

267
Emerging
Technologies for
Business

Figure 4: Layered Architecture of IoT (three, four and five layers)

13.2.5 IoT Applications and Services


IoT devices are booming, as they improve life for all areas and sectors of
business, society, and individuals who use the Internet to automate their
duties. There will reportedly be 50 billion Internet-connected devices in use
in 2018, which is an increase of more than three times the amount from IoT
devices in 2012. There are many applications that use IoT technology. Few
include:

• Healthcare Domain - IoT sensors are being utilised to continuously


record and monitor health situations (person's heart rate, blood pressure,
blood oxygen level, and body temperature etc.) and send out alerts if any
unusual symptoms are discovered. An Electronic Health Record (EHR),
which is a record of all a person's medical information, can be made
using IoT technologies. College students can monitor their level of stress
using stress recognition apps that are put on their smartphones. To track
a person's level of fitness, fitness trackers are a popular category of
wearable technology. A Bluetooth-enabled toothbrush keeps track of
how often you brush. It benefits people who are elderly or have
disabilities. Family members are immediately notified in the event of an
emergency.

• Smart Homes - People use a range of electrical items at home. To


provide customers with automated and intelligent services, the Internet
of Things makes use of a range of sensors. They help with schedule
maintenance and automation of daily tasks. They contribute to energy
conservation by automatically turning off lights and electronic
equipment. You are directed to purchase groceries and products whose
expiration dates are approaching by an LCD refrigerator. The owners are
268
Applications of IOT,
intended to be notified by smart home systems if they find such AI and VR
anomalies. Researchers have already begun working on leveraging AI
and machine learning methods to accomplish this.

• Animal Tracking: To quickly track down an animal, GPS sensors are


incorporated into its body. It is also employed to keep an eye on the
animal's diet. Sensors are implanted in the ears of cattle, allowing
farmers to monitor cows’ health and track their movements. On average,
each cow generates about 200 MB of information per year.
• Smart Transport – Smart transport applications can regularly manage
city traffic using sensors and advanced information processing
technologies. Intelligent parking management, smart traffic lights, and
accident prevention by effectively rerouting traffic and recognising
drunk drivers are among the primary objectives of intelligent
transportation systems.
• Smart Water Systems – Given the pervasive water scarcity that prevails
in the majority of the world, it is imperative to manage our water
resources effectively. Smart water metering systems are used in
conjunction with weather satellite data to evaluate water intake and
outflow rates and search for probable leaks. They can also help with
flood forecasts.

• Smart Environment and Agriculture – The current state of the planet's


climate and declining air quality is a big problem. To improve crop
productivity and quality, environmental factors such as temperature and
soil information are regularly measured and analysed. Manna's proposed
IoT application tracks roadside air pollution that emits high amounts of
pollutants.

• Smart Supply Chain and Logistics - IoT aims to make business and
information system procedures in the real world simpler. Using sensor
technologies like RFID and NFC, it is simple to trace the commodities in
the supply chain from the point of manufacture to the final distribution
points. Real-time data is captured, processed, and tracked. The
effectiveness of supply chain systems will ultimately be improved by
doing this.

• Energy Conservation - Modern power generation, transmission,


distribution, and consumption systems are made possible by smart grid
technologies. Smart grids allow for the two-way flow of power and add
intelligence at every stage (back from the consumer to the supplier).
Power usage patterns at both regular and peak load hours are read by
smart metres and analysed. Then, power generation is adjusted based on
consumption trends. As a result, user can modify consumption to cut
expenditures.
• Using GPS Tracking to \Safeguard Wildlife - IoT offers cutting-edge
solutions for doing so. Real-time tracking of equipment, people, and
animals is done through sensors. Wild regions can be reached by
communication networks, data streams are collected by control centres,
269
Emerging and timely management of actions is carried out with the aid of machine
Technologies for
Business learning (ML) and artificial intelligence (AI). With an emphasis on
keystone species that are crucial to preserving the diversity and
functionality of ecological groups, Smart Parks is a social enterprise with
offices in the Netherlands and the UK that offers digital solutions for
wildlife protection.
• In addition, there are many IoT applications, including smart retail, smart
building, smart agricultural, and smart infrastructure management
(including roads, bridges, and railroad tracks).

13.2.6 Unlocking the Massive Potential of an IoT Ecosystem


for a Business
One essential element of the IoT ecosystem, which is a complex system, is
the interconnection of many different systems. The information presented
above might not make it clear how an IoT ecosystem could benefit businesses
and add value. Among the advantages of an IoT ecosystem for a firm are:

1. Offers the possibility of new revenue streams - Opportunities to


provide new services that weren't previously required may arise, creating
the possibility of new revenue streams for many firms.
2. Encourages efficiency - An IoT ecosystem gives organisations the
ability to automate and streamline procedures, boosting their productivity
and that of their staff in the process. Businesses are able to complete
tasks faster, with fewer resources and more automated processes, which
also lowers the risk of human error.
3. Better business insights and customer experience - The IoT ecosystem
functions swiftly, making real-time data, analytics, and insights
accessible. These insights can be used to improve operations, product,
and service offerings, which in turn affects the customer experience.

13.2.7 Key challenges of IoT Implementation and Future


Directions
Despite having significant economic advantages, the IoT also has a number
of significant challenges.

• Inadequate Management - Because of poor management, IoT-based


applications are at a disadvantage. Developers prioritise using sensors to
retrieve pertinent data from objects, but are not concerned about the
method adopted to obtain the data. As a result, attackers access user
information and use it for their own needs. So, developers must adjust
their objectives and focus on the information gathering process.

• Naming and Identity Management - Each device needs a unique


identity in order to communicate within the network. It is therefore
necessary to develop a method to dynamically assign each object in the
network with a distinct identification. When IoT first began, IPv4 was
used to determine each network's distinct identification. As the number
of IoT-based devices have increased, IPv6 is now utilised to assign
270
Applications of IOT,
device identification. AI and VR

• Trust Management and Policy - Trust is a crucial and intricate idea. It


requires a variety of other qualities as well, such as scalability,
reliability, strength, and availability, in addition to security. Its scope is
wider than security. IoT applications receive private information from
the users. In order to maintain privacy, users' personal data must be
secure and off-limits to third parties. Though researchers have presented
a variety of strategies in their research papers to promote privacy and
trust. These methods haven't been able to give IoT applications privacy
and confidence. These are now significant IoT concerns that need to be
resolved.
• Big Data - Internet of Things is currently made up of billions of
connected devices. Information generated by these gadgets is enormous.
IoT's challenge is the transmission and processing of large data. As a
result, there is a need for a method that can address the large data
problem.

• Information security is a difficult task with the Internet of Things. In


order to complete their tasks, users send private information. The user's
confidential information is accessible to a wide range of attackers.
Therefore, there should be systems in place to protect user information
so that hackers cannot access it.
• Secure storage has also grown to be a problem in the Internet of Things.
By using sensors, information is extracted from objects and transmitted
to storage units. Storage device security cannot be achieved through
encounter measurement. Therefore, there needs to be a system to shield
the data from outside surveillance or intruders.

• Authentication and Authorization - Authenticating users can be done


in a variety of methods. The conventional method is to utilise a login and
password; alternative methods include access cards, voice recognition,
fingerprints, and retina scans. The access control can also be defined to
accomplish authorization. It is a security method that can be used to
manage and regulate who or what can access a system's resources. The
network has grown complicated since there are so many things in it. As a
result, the huge network cannot be secured using conventional
authentication and authorisation methods. Even though research has tried
to address the authentication and authorisation problems, some problems
still persist.

• Secure Network - The network layer is subject to numerous attacks,


including man-in-the-middle and denial of service (DoS) attacks. A
denial-of-service (DoS) attack is a security occurrence that occurs when
an attacker takes action to bar authorised users from accessing systems,
devices, or other network resources. A cyberattack known as a "man-in-
the-middle" occurs when an attacker sneakily intercepts and relays
messages between two parties who believe they are speaking directly to
one another. As a result, a network layer should have some security-
related procedures. 271
Emerging 13.2.8 Business Case: India (Rajkot)
Technologies for
Business
(RSCDL (smartcityrajkot.in))

The Indian government has created a nationwide digitization strategy called


Digital India. Broadband Highways, Universal Access to Mobile
Connectivity, Public Internet Access Program, e-Governance, Electronic
Delivery of Services, Information for All, Electronics Manufacturing, IT for
Jobs, and Early Harvest Programs are the main priority areas that the strategy
focuses on. Center of Excellence for IoT in India (CoE IoT), a government-
run IoT incubator, "aim[s] to develop a Startup Ecosystem for the Nation." Its
goal is to utilize the startup community to develop cutting-edge apps and
acquire domain expertise in the IoT sector with the help of the business sector
and academia. The financing and equipment from businesses like Cisco, TI,
and Qualcomm, as well as matching mentoring, lab space, and research
assistance from organizations like the Indian Institute of Technology, all
contribute to the support for IOT development. The incubator, which focuses
in robotics, agritech, healthtech, smartcity, and Industry 4.0 goods, has space
for up to 40 start-ups and 10 emerging businesses. The incubator started
operating in July 2016, therefore there aren't any concrete success stories to
mention as yet.

A draft IoT policy has been prepared by the Indian government (107). The
strategy consists of two horizontal supports in addition to five vertical pillars
(Demonstration Centers, Capacity Building and Incubation, R&D and
Innovation, Incentives and Engagements, Human Resource Development)
(Standards and Governance Structure).
According to NASSCOM, the trade group for the IT sector in India, the IoT
market there is expected to reach $15 billion by 2020, with around 120
companies presently providing solutions. Smart cities, industrial IoT, and
health care are identified as three of the most important growth potential
involving government-business partnerships.

Rajkot, in the Gujarat state, was chosen by India's Ministry of Urban


Development (MoUD) as one of the 100 cities taking part in its smart cities
initiative. Rajkot has adopted smart initiatives in a number of sectors,
including water supply, solid waste management, and e-governance. With an
emphasis on infrastructure and ICT developments in the city and at important
sites, the city's vision under its smart city proposal includes a number of pan-
city and area-based development initiatives. The development of public
safety and surveillance, traffic control, the calibre of public services, and
real-time service tracking are among the goals of the proposed ICT
initiatives. Many of these projects have IoT components. A 200-kilometer
fiber-optic network, CCTV network, environmental and other related IoT
sensors, and numerous Wi-Fi access points are among the private sector
options that the city's municipal corporation is now examining. The city
wants to provide businesses access to the infrastructure so they may use it
and benefit from it. In collaboration with the business sector, a number of
IoT-based projects on water conservation, solar-powered heating, and climate
change-related initiatives are being developed.
272
Applications of IOT,
AI and VR

Check your Progress 1 (Answer in about 200 words)


o Space is given below for your answer.
o Check your answer with the one given at the end of this unit.
1. What are the components of an IoT Ecosystem?
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….
2. Explain five-layer architecture of an IoT system.
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….
3. What is the potential of an IoT ecosystem for a business?

…………………………………………………………………………….

…………………………………………………………………………….

…………………………………………………………………………….
…………………………………………………………………………….

13.3 ARTIFICIAL INTELLIGENCE (AI) FOR


BUSINESS
13.3.1 Historical Overview of Artificial Intelligence
High aspirations for significant increases in productivity and efficiency were
held when artificial intelligence first came into being. Nevertheless, despite
investing enormously in technology, project after project came to a standstill,
mostly because of problems with business strategies, technological
273
Emerging challenges, and cultural obstacles that made it impossible to fully harness the
Technologies for
Business power of AI.

Over the past few years, AI has grown into a powerful technology that
enables machines to think and act like humans. It has also caught the eye of
IT companies all around the world and is regarded as the third significant
technical innovation after the creation of mobile and cloud platforms. Even
some people use the term "fourth industrial revolution" to describe it. Over
the past ten years, businesses have migrated in increasing numbers to internet
platforms and cloud service providers. As a result of these advancements,
computers are now able to process far more data, and a large amount of fresh
data has also been produced that these systems can now analyse.

Computer science's field of artificial intelligence (AI) attempts to make


computers into intelligent machines. With the use of sophisticated
algorithms, artificial intelligence (AI) and associated technologies like
machine learning (ML) are able to replicate human behaviour, offer answers
to challenging issues, and produce stimulations.

Some believe that the fourth industrial revolution, which will differ from the
past three in some ways, is about to start. The notion of what it means to be a
human has been debated throughout history, beginning with the development
of steam and water power and continuing through the industrial revolution,
the computer era, and the current day. Smarter technology in our offices and
factories, as well as networked equipment that will interact, view the entire
process, and make autonomous decisions, are just a few of the ways the
Industrial Revolution will benefit organisations. One of the key benefits of
the fourth industrial revolution is the ability to increase income levels and
enhance living standards for the majority of people worldwide. As the quality
of life for the world's population is improved by humans, robots, and smart
devices, our businesses and organisations are becoming "smarter" and more
productive.

274
Applications of IOT,
One of the earliest articles on artificial neural networks is the 1943 study by AI and VR
Warren McCulloch and Walter Pitts, which formalised how fundamental
logical operations from propositional logic may be computed in a
connectionist setting. Later, in 1950, Alan Turing raises the topic of whether
computers are capable of thought and offers the "imitation game" as a
method of evaluating the deductive reasoning and thought processes of
computing apparatuses. During a workshop for the Dartmouth Summer
Research Project on Artificial Intelligence in 1956, John McCarthy coined
the term artificial intelligence (AI). The first chatbot, Eliza Weizenbaum, was
created in 1964 by Joseph Wiezenbaum in the MIT Artificial Intelligence
Laboratory. Eliza was created as an artificially intelligent, rule-based
psychiatrist who could answer queries from users.

The "Geometry Analogy Difficulties" that frequently show up in IQ tests are


connected to the issues that Evans showed a computer could solve in 1964.
Due to extensive study favouring the topic, DARPA (the American Defense
Advanced Research Studies Agency) began funding several AI-related
initiatives from the middle of the 1960s onward, mainly at MIT. Science
fiction fully embraced the AI paradise with the release of well-known films
like Stanley Kubrick's "2001: A Space Odyssey" (with a screenplay by
Arthur C. Clarke) in 1968 and Jean-Luc Godard's 1965 French film
"Alphaville." The inaugural International Joint Conference on Artificial
Intelligence was held at Stanford University in 1969. (IJCAI). According to
Stanford AI research, a "computer capable of fascinating perceptual-motor
action" Feldman et al., Nilsson's Mobius automation tool, and Green's QA3
software programme were provided to handle real-world problems for a basic
robot. Inductive logic programming and statistical relational learning, two
approaches utilised in Watson5, IBM's question-answering computer system,
were developed using Prolog, a well-known programming language founded
by Alain Colmerauer and Robert Kowalski. The Lighthill Report evaluated
the 25-year deployment of AI outcomes and gave the state of AI research a
critical review. Early in the 1980s, the results of AI research were seen from
the outside again, and this time it was mostly because expert systems were
becoming more and more popular in business. Hopfield showed in 1982 that
using perceptrons rather than the simpler A.N.N. allowed the "Hopfield
network" to learn more effectively. In 1986, Rumelhart et al. showed the
potential of "Backpropagation" in (neural) machine learning models. The use
of expert systems in real-world applications has received criticism, which has
led to decreasing funding for AI research (from the late 1980s to the early 275
Emerging 2000s). In 1994, the TD-GAMMON programme showed the promise of
Technologies for
Business reinforcement learning by creating a backgammon programme that can
educate itself. The biggest victory was won by IBM in 1997 when Deep Blue
defeated Garry Kasparov in a match series conducted in accordance with
tournament rules.

The Roomba robot vacuum was introduced by iRobot7 in 2002. The AI-
based personal assistants Siri, Google Assistant, Alexa, Cortana, and Bixby
from Apple, Google, Amazon, Microsoft, and Samsung are each better at
understanding natural language and capable of carrying out a larger range of
tasks. These technologies have improved over the past 10 years, despite the
fact that they initially did not work very effectively. Most of the focus on this
topic has been on Deep Learning since 2000. The Artificial Neural Network
(ANN) concept, a system created to mimic how brain cells work, is the
foundation of deep learning. Some of the most significant recent
achievements are based on Generative Adversarial Networks (GANs). After
defeating Lee Sedol in four of five games in 2016, Google's AlphaGo
defeated Chinese Go expert Ke Jie in a series of games in 2017. In a 1v1
exhibition match, OpenAI's Dota 2 bot triumphed against Dendi, a Ukrainian
professional player, in 2017. In a game with little information and practically
infinite possible future possibilities, this triumph was a noteworthy
demonstration of power. Later in 2019, OG, the current Dota world
champions, were defeated in back-to-back 5v5 games by a new version of the
same bot called OpenAI Five. In Starcraft II, DeepMind's AlphaStar bot
earns the highest rating possible in 2019.

Research on neuro-symbolic AI and hybrid models, which integrate learning


and reasoning in a systematic fashion, is becoming more and more popular in
academia and business. Sepp Hochreiter (2022), a forerunner in deep learning
and supporter of the Long Short-Term Memory (LSTM), one of the most
popular deep learning models, also offers his thoughts on a more all-
encompassing AI "is a sophisticated and adaptive system that can complete
any cognitive task successfully thanks to its sensory perception, prior
knowledge, and acquired skill. The most promising route to a broad AI,
276 according to Hochreiter, is "a neuro-symbolic AI, which is a bilateral AI that
Applications of IOT,
includes approaches from symbolic and sub-symbolic AI." For the AI and VR
development of neuro-symbolic AI technologies, graph neural networks
(GNNs) can be essential. GNNs are the most widely used models for neural-
symbolic computing, claim Lamb et al. (2020) ". It is clear that the field of
artificial intelligence (AI) has grown in significance over the past few
decades, even though specialists and researchers still have much to
accomplish and demonstrate. The importance and growth of AI ethics, laws,
and regulations as well as the yearly reports of the global impact of AI
generated by a number of leading research organisations are all evidence of
the impact of AI.

13.3.2 Why is Artificial Intelligence Important?


• AI differs from hardware-driven, robotic automation in that it automates
repetitive learning and discovery through data. AI executes regular, high-
volume, computerised jobs reliably & without becoming tired rather than
automating manual operations.

• AI enhances the intelligence of currently available goods - In most cases,


AI is not marketed as a standalone application. Instead, existing products
are enhanced with AI capabilities (Siri - added feature of Apple
products). Large volumes of data are paired with automation,
conversational platforms, bots, and smart robots to improve various
technologies used at home and at work.
• AI adapts through algorithms for progressive learning - AI recognises
patterns and regularities in data and learns a skill (classifier or predictor)
so that the models change in response to new data.
• AI analyses more and deeper data - Deep learning models, which use
neural networks with many hidden layers, require a large amount of data
to be trained. They grow more precise the more information we offer
them.

• Deep neural networks enable AI to reach astounding accuracy - Deep


learning is used in interactions with Alexa, Google Search, Google
Photos, and other services. They become more precise the more we use
them. With the same accuracy as highly skilled radiologists, deep
learning AI approaches for image classification and object recognition
are employed in the medical profession to detect cancer on MRIs.

• When algorithms are self-learning, answers are in the data after applying
AI to extract them. This maximises the value of the data. Even when
everyone is using the same methods, the best data can provide you a
competitive edge. Best data prevails.

13.3.3 Artificial Intelligence - Components and Approaches


Rapid, iterative processing of large amounts of data is made possible by AI's
clever algorithms. A machine learning algorithm automatically picks up on
patterns and features in data. The following are some of the main artificial
intelligence (AI) subfields:
277
Emerging • Automating the creation of analytical models is machine learning.
Technologies for
Business Without explicit programming, uses methods from physics, operations
research, statistics, and neural networks to find hidden patterns in data.

• A neural network is a type of machine learning technology that consists


of interconnected, information-processing units (much like neurons) that
talk to one another and respond to outside stimuli. The process comprises
numerous repetitions of the data in order to find linkages and derive
meaning from undefined facts.
• Deep learning uses powerful neural networks with numerous layers of
processing units, improved training techniques, and computing power to
find intricate patterns in enormous amounts of data. Examples of typical
uses include image and voice recognition.

• Making interactions with machines look natural and human is the goal of
cognitive technology, a subfield of artificial intelligence. The ultimate
objective of artificial intelligence (AI) and cognitive computing is for a
machine to emulate human processes by being able to comprehend
words and sights and then respond logically.
• Computer vision makes use of deep learning and pattern recognition to
determine what's in a picture or video. Real-time photos and videos can
be taken by machines that can perceive, analyse, and interpret visuals, as
well as their surroundings.
• Natural language processing is the process by which computers can
analyse, comprehend, and produce human language, including speech
(NLP). The next phase of NLP, known as natural language interaction,
enables users to communicate with computers in everyday language to
complete tasks.

13.3.4 What Types of Artificial Intelligence Exist?


Super AI, broad AI, and narrow AI are the three main kinds of AI in the
literature on artificial intelligence. The powers and flexibility of these three
types of AI are where they diverge most.

• Artificial Narrow Intelligence - Is developed to deal with a very specific


set of problems. For instance, Google's Waymo is a self-driving
automobile, Amazon Alexa can respond to straightforward voice
commands, and Roombas can clean your floors. Each time, the agent is
performing a limited number of tasks while operating inside a
constrained knowledge base. Artificial Narrow Intelligence is most
commonly abbreviated ANI/ Narrow AI/ Applied AI/ Weak AI. This
type of AI already exists, is rapidly developing, and is fundamentally
changing our reality.

• Artificial General Intelligence (AGI) - General-purpose intelligence that


can handle a variety of problems in the future. In literature, television,
and film, sentient robots like R2-D2 from Star Wars, Data from Star
Trek, and HAL 9000 are frequently referenced. In each scenario, the
278 agent is capable of dynamically completing a far greater range of tasks
Applications of IOT,
across diverse knowledge domains. Acronyms General AI/ AGI/ Strong AI and VR
AI/ Full AI are more commonly used to describe artificial general
intelligence. This type of AI advancement is still away. It doesn't exist
yet. However, this is a common objective and the general direction of
many AI research programmes.

• Super Artificial Intelligence - Future-focused AI that is more intelligent


than human (or natural) intellect. These are frequently the supervillains
in sci-fi movies. Think about Skynet from The Terminator, Ultron from
The Avengers, and The Architect from The Matrix as examples. Every
time, the agent is able to perform superhuman feats that are impossible
for one human to accomplish. It is anticipated that shortly after we
develop artificial general intelligence, artificial super intelligence will
emerge. This is because generic AI is unrestricted by the biological
elements that have an impact on physical brains and can recursively
improve. If super AI ever existed, it would surely vastly surpass any
forms of human intelligence that exist today on Earth (combined).

13.3.5 How do Artificial Intelligence, Machine Learning, and


Deep Learning Relate?
Artificial intelligence, deep learning, and machine learning are integrating
technology into every aspect of our daily life. Businesses rely on learning
algorithms more and more to streamline processes. You can keep up with its
usage on social media or directly interact with devices. These technologies
usually involve artificial intelligence, machine learning, deep learning, and
neural networks. Unfortunately, even though they all have a purpose, they are
commonly used interchangeably, which leads to misunderstandings about the
nuances of each.

Artificial intelligence is the study of how to develop intelligent machines and


robots that have the capacity for critical and creative thought, which has long
been seen as a fundamental human right. Artificial intelligence (AI) has a
subset called machine learning that enables computers to develop naturally
based on their experiences without the need for explicit programming. There
are numerous ML algorithms available for issue solving (e.g., neural
networks). Deep learning is a subset of machine learning that uses neural
networks and a framework inspired by the human nervous system to analyse
a wide range of factors. This collection of algorithms takes design cues from
the way the human brain is organised. The techniques employ complex,
multi-layered neural networks that abstract non-linear input-data changes
over time. Or, to put it another way, the branch of artificial intelligence that
deals with it is machine learning. Neural networks, a branch of computer
learning, offer the structure for deep learning algorithms. A single neural
network has a different number and depth than a deep learning method
requiring more than three node layers.

279
Emerging
Technologies for
Business

Machine learning in Detail


Machine learning makes use of statistical algorithms to build predictive
models based on prior knowledge and discoveries. Applications that use
machine learning process a lot of data and build a robust database by learning
from successes and failures. Machine learning can be divided into three areas.
1. Supervised Learning - To learn under supervision, this method uses an
algorithm (y = f(x)), an output variable (y), and input variables (x1, x2,
…). Goal of the model is to create an association between the inputs and
outputs that can be adjusted and applied to new data with some degree of
accuracy using training data. Gradient boosting, decision trees, random
forests, neural networks, support vector machines, linear and logistic
regression, naive Bayes, and support vector machines are a few examples
of supervised learning methods. Problems with supervised learning fall
into the regression and classification areas. Classification flaw: The
output variable is categorised as either sick or not sick, for instance as
red, blue, or green. Because the output variable is a real value, such as
dollars, kilogrammes, etc., regression has an issue. Expert systems for
speech and image identification, forecasting, specialised business
domains, etc. are few applications of supervised learning models.

2. Unsupervised Learning - This type of learning occurs when there are


only input variables, such X, and no matching output variables. The
algorithms are left to recognise and depict the fascinating structure in the
data. Putting clients into groups based on the things they order, how
often they buy them, when they were most recent visitors, etc. The
observations are divided and organised into several categories using a
variety of clustering techniques, including K-means, hierarchical
clustering, and mixture models. Algorithms for reducing the number of
dimensions in the data—the majority of which are unsupervised—such
as PCA, ICA, or auto encoder are used to discover the best way to
represent the data.

3. The Reinforcement Learning - Method acts appropriately to maximise


reward in a certain circumstance. It is used by various programmes and
computer systems to determine the optimum course of action in a given
circumstance. The agent will decide which course of action will
280
Applications of IOT,
maximise its utility given its current situation and the state of the AI and VR
environment; alternatively, it may take into account new options.
Environments will change as a result of actions, and the agent claims that
they will be rewarded. To improve the behaviour of the AI, this loop will
be performed multiple times.

Deep Learning in Detail


Brain networks, which were modelled after biological neural networks, can
be quite useful for solving complex, multi-layered computing issues. Deep
learning has made a good name for itself in a variety of well-known academic
fields, such as facial and speech recognition, natural language processing,
machine translation, and others. Following are the top five deep learning
architectures that are both popular and widely used:

1. Convolutional Neural Networks, or CNNs for short, are the preferred


neural networks for a variety of Computer Vision applications, such as
picture identification.

2. Recurrent Neural Networks (RNNs) have become quite popular in


domains where it is important to provide information in a specific order
i.e., each element of a series is given the same uniform job that depends
on previous computations. They therefore have a wide variety of
applications in real-world contexts such speech synthesis, machine
translation, and natural language processing.
3. Backpropagation is a technique used by autoencoders in an unsupervised
environment. Autoencoders' main responsibility is, in essence, to
recognise and define what constitutes usual, normal data before spotting
outliers or irregularities. One of the common uses for autoencoders is
anomaly detection, such as recognising fraud in bank financial
transactions.
4. The core concept of Generative Adversarial Networks is the
simultaneous training of two deep learning models (GANs). The
generator model, which seeks to generate new instances or examples,
competes with these deep learning networks in a real sense. The
discriminator is a different model that makes an effort to distinguish
between instances that originate from training data and those that do not.
Recent advances in deep learning have led to the development of GANs.
It has numerous, substantial applications in computer vision, most
notably in image creation.

5. ResNets - Since they rose to prominence in 2015, ResNets, often referred


to as Deep Residual Networks, have been enthusiastically embraced and
utilised by numerous data scientists and AI researchers. As you already
know, CNNs are quite beneficial for addressing image classification and
visual recognition problems. Since more deep layers are required to
calculate and increase the model's accuracy, training the neural network
becomes more challenging as these tasks become more complicated.
This problem served as the impetus for the residual learning theory that
gave rise to the ResNet architecture. Multiple residual modules, each of
281
Emerging which represents a layer, make up a ResNet. Each layer consists of a
Technologies for
Business collection of functions to be applied to the input. There is no limit to the
number of layers that a ResNet can have; one that Microsoft researchers
built for an image classification problem had 152 layers.

13.3.6 Artificial Intelligence at Work Today


1. AI in Marketing and Sales - Starting from brand/ product promotions,
pre-sales, to lead generation, lead management, and lead tracking,
everything can be streamlined using AI-tech powered software
applications. According to Harvard Business Review, companies using
AI for sales can increase their leads by more than 50%, reduce call time
by 60-70%, & have cost reductions of 40-60%.
a. WordStream is a platform for managing advertising. AI is used to
analyse advertisements and make cost-effective optimization
suggestions. Additionally, it enables quick adjustments to
campaigns.
b. InsideSales.com prioritises leads using AI technologies. By
gathering past data on a client, social media postings, and the history
of customer interactions between the salesperson and that client, an
AI programme may score the leads according to their likelihood of
closing successfully.
c. GumGum scans photographs, information, and event videos on the
website using AI-powered computer vision technology.
d. LeadCrunch - employs an AI-powered prediction model to identify
the B2B audience. Then, it will use insights from the studied report
to assist the company place ads that customers pay the most
attention to. A growing AI model will eventually link customers
who are a good fit for your services and products.
e. Salesforce Einstein, which intelligently automates some monotonous
operations, leverages AI.

2. AI in Telecom Industry – To serve the customers better, large telecom


companies are increasingly turning to artificial intelligence (AI).
a. Vodafone aims to roll out its ML chatbot TOBi to 5 more markets after
launching it in 11 popular markets. Automation has been used in 66% of
consumer contacts with the business.
b. By implementing AI and machine learning technology, AT&T has
transformed the user experience. The reports indicate that AT&T's travel
distance has decreased by 7% while its productivity has grown by 5%.
AT&T is looking into the potential of machine learning and artificial
intelligence in order to give its consumers a useful and effective 5G
282 network experience.
Applications of IOT,
c. Telephonica's AI-powered platform Aura enables the company to AI and VR
develop a new customer relationship model using cognitive services and
personal data. Corporate users can radically reinvent client contact, data
transparency, 24/7 assistance, technical support, and contextualised,
personalised customer support services with the aid of this platform.
d. Tinka is a chatbot created by Deutsche Telekom that works like a search
engine. The company can offer Austrians customer service around-the-
clock because to constant changes to search results. A real customer care
agent is contacted for the remaining 20% of enquiries after Tinka
Chatbot has handled about 80% of them.
e. Globe Telecom integrates Cloudera with ML to improve customer
experience, optimise product optimization, and follow industry
standards. AI and predictive analytics offer business insights that support
the company's prudent decision-making. It creates marketing campaigns
with a defined aim.

3. AI in HR and Recruiting - AI in telecom industry enhances network


credibility, improves user experience, & predict maintenance. In
addition, AI-based solutions provided relevant business insights allowing
them to provide a better user experience, increase its operations, thereby
influencing the overall revenue of the business.
• Talent Acquisition - Before utilising AI to make hiring decisions,
recruiters first analyse the labour market, find competences, match
talents, check for bias in job descriptions, and rank candidates. HR
recruiters utilise chatbots to schedule appointments and address
frequently asked questions. Applications powered by artificial
intelligence (AI) have the capacity to assess and comprehend the
responses of candidates, predict their performance and level of suitability
for available positions and other potential opportunities, and foresee their
performance.
• Modern Voice of the Employee (VoE) analytics solutions use a variety
of natural language processing (NLP) and textual analysis techniques to
analyse sentiment and extract insights from text-based responses. HR
leaders are interested in finding, analysing, and reporting the emotions
and attitudes of employees displayed across a number of communication
platforms in order to prevent wasteful, expensive, and unpleasant
turnover. Especially helpful when there is a significant shift, such a large
reorganisation, new management, or new strategy.
283
Emerging • Support staff for online HR Virtual HR assistant deployment is still in its
Technologies for
Business infancy. However, every HR operation envisioned is anticipated to have
a unique front end (answering queries by employees, delivering insights
on talent metrics, or conducting process workflow steps).
• Paradox - The conundrum of the AI assistant is on managing candidates.
• Voice recognition is done by an AI-powered robot called VCV during
applicant recruiting and interviewing.
• Glider: AI-powered autopilot. Utilised while hiring new workers
• Based on career trajectories and performance, SAP Success Factors,
Cornerstone, and Talentsoft provide courses to applicants.

4. AI in Education Sector - With the usage of smart devices and


computers making educational resources accessible to everybody,
technology has overturned the world of learning while automating
administrative work, enabling staff to spend much more time focusing on
their students.
• Post-secondary students can receive individualised online education
through Carnegie Learning's "Mika" programme, especially those who
need remedial courses as freshmen in college. Students now have access
to more practical and personalised learning techniques.
• IntelliMetric, a state-of-the-art AI application, can analyse student
assessments and provide writers with rapid feedback. Additionally, AI
gives teachers the ability to design individualised lesson plans that
motivate pupils to focus on their studies.
• University of Southern California (USC) has created intelligent virtual
environments and programmes that make learning enjoyable for students
with the use of artificial intelligence, animation, and 3D gaming.
• The Carnegie Mellon University of Engineering used the iTalk2Learn
system16 to test how well a learner model affected student learning. It
contains information about the students' arithmetic ability, emotional
284 state, cognitive demands, feedback, and responses.
Applications of IOT,
• With the best AI technology, such as augmented reality, intelligent AI and VR
tutors, virtual facilitators, other coaching, and remote learning
programmes, Captivating Virtual Instruction for Training (CVIT
learning)'s strategy combines online live classroom techniques with other
coaching methods and remote learning programmes.
5. AI in Manufacturing – It enables collaboration between employees and
robots to complete jobs rather than replacing people. Machines will be
able to perform an increasing number of repetitive activities as they
become smarter. As more AI systems are used in production, efficiency,
accuracy, and quality control will all increase.
• Sensor data forecasts possible failures in predictive maintenance,
preventing downtime and allowing for maintenance scheduling.
• Industrial Robotics automates repetitive processes using AI for choosing,
putting, welding, and assembly.
• AI-powered quality control systems use computer vision to identify
production output variations and sound an alarm.
• Inventory management employs AI to control costs by managing demand
by maintaining optimal stock levels.
• Process optimization enables businesses to evaluate performance, costs,
and other aspects of their operations in order to make informed decisions
and streamline their operations.
6. Other Applications
• In order to determine whether we are paying attention to the road and to
warn us when we are growing fatigued, smart cars employ facial
recognition algorithms.
• AI algorithms are used by smartphones to maintain call quality and
improve the quality of our photos.
• Smart toilets are coming; they can analyse stool samples using computer
vision to assist diagnose digestive problems.
• On Microsoft Azure, FintechOS Mental health services creates unique
platforms in collaboration with businesses and banks to digitise the
consumer experience. By utilising AI, businesses hope to increase the
usability of their services for non-technical clients. We replaced a three-
day paper-based process with a 15-minute digital client journey,
according to Remco Jorna, CTO at FintechOS.
• SAS WildTrack Viya is a non-invasive wildlife monitoring group with a
focus on the environment. They monitor and safeguard the continent of
Africa's endangered species. They replicated native animal-tracking
abilities using AI on a large scale. Footprints are regularly recognised,
sorted, and categorised.
• Universal Studios 20th Within Google Cloud AI - A nearly 90-year-old
American film company aimed to modernise its audience discovery and
film marketing strategies by utilising big data analytics and artificial
intelligence. They created AI techniques to examine their non-sequel
285
Emerging films and identify the right market for marketing. An AI technology does
Technologies for
Business extensive parallel multi-tests, sampling hundreds of distinct audiences,
and produces a complex report that helps identify, in unprecedented
detail, who the movie's core fans are as well as the persuadable
audiences and no-shows.
• On Amazon, Epiq Enhanced AI - offers legal & business services to
enterprises, law firms, financial institutions, & governmental
organisations around the world. The most well-known service is the
computerised transcription of courtroom proceedings. Epiq produced a
transcription solution in a matter of weeks that is 5% more accurate than
previous transcription engines used by third parties.
• Spotify, an AI-powered music streaming service, plays songs based on
your past play lists and personal preferences.
• Hemingway app: AI-based tool that uses Natural Language Processing to
improve written material
• AI-based Bot CLARKE while participating in conference calls, takes
notes and sends them right to the mailbox.
• The Google Smart Reply feature uses machine learning to analyse emails
and recommend quick responses you might wish to make.
• AMY is a virtual assistant with AI. Aids in meeting scheduling.
• Captain is a smart personal assistant powered by AI. Keeps a list of the
things you want to buy.
• An AI-powered application is Netlfix. Streams your favourite movies
and TV shows.
• The Amazon Anticipatory Project uses artificial intelligence to predict
the goods you'll likely order and send.
• An AI-powered application is Netlfix. Streams your favourite movies
and TV shows.
• The Amazon Anticipatory Project uses artificial intelligence to predict
the goods you'll likely order and send.
• AI-based programme called Lunit. Detects breast and lung cancer tissue.
• AI-enabled system called Corti. To promptly and accurately identify
cardiac arrests occurring outside of a hospital, cognitive technologies are
deployed.
• ROMU - An AI-powered autonomous floating garbage truck is being
developed to collect significant amounts of plastic pollution from the
waterways and protect marine life.
• Google Maps (Maps) - Examines the velocity of traffic movement at any
given moment using anonymized location data from smartphones. By
providing the quickest routes to and from work, maps shorten commute
times.
• Intelligent personal assistants are capable of maintaining and organising
286 information. It entails managing emails, calendar events, files, and to-do
Applications of IOT,
lists, among other things. Google Assistant, Nina, Viv, Jibo, Google AI and VR
Now, Hey Athena, Cortana, Mycroft, and Braina are some of the top
intelligent personal assistants. Siri, SILVIA, Bixby, Lucida, Cubic,
Dragon Go, Hound, Aido, Ubi Kit, Blackberry Assistant, Maluuba, and
Vlingo are a few examples of virtual assistants.
• With the help of behavioural algorithms, the Google-acquired Nest
thermostat predicts and adjusts the temperature in your home or office
based on your needs.
• The amazing capacity of Amazon Alexa to understand voice from
anywhere in the room makes it useful for a variety of tasks, like
shopping, making appointments, setting alarms, and searching the web
for information. Additionally, it serves as a conduit for people with
limited mobility and powers smart homes.
• Pseudo-intelligent digital personal assistant Siri is capable of anticipating
and comprehending natural-language inquiries and requests. It provides
directions, finds information, adds events to calendars, facilitates
message sending, among other things.
• Google Assistant - Talks to you in two-way talks, remembers details
from previous interactions to provide the most relevant results, integrates
with smart home devices, and does a lot more.
• Microsoft's intelligent personal assistant, Cortana offers cross-platform
functionality and offers personal digital support with reminders. Can also
perform photo or location-based reminders.

13.3.7 Ethics of Artificial Intelligence


AI algorithms are showing up more and more in modern culture.

1. Robust, Scalable, as Well as Transparent - Imagine a scenario where a


bank uses a machine learning algorithm to propose personal loan
applications for approval in the near future. A rejected applicant who
filed a lawsuit against the bank contends that the algorithm discriminates
against applicants based on gender. Despite the bank's denial of the
charge, statistics show that the acceptance rate for loans to women
business owners has been steadily falling. The algorithm submits ten
applicants who appear to be equally qualified, but only accepts men,
while rejecting women business owners. Therefore, it will be more
important than ever to develop AI algorithms that are robust and scalable
as well as transparent to examination. Not all people find transparency to
be a desirable trait in AI. It's crucial that AI algorithms that take over
social roles are predictable to the people they control. Contracts, for
instance, can be written knowing how they will be carried out.

2. Robust against manipulation - A machine vision system that scans


airline luggage for fake things needs to be strong against human
adversaries who are actively looking for algorithmic vulnerabilities. If a
form is placed next to the object, it would make the object less
recognisable.
287
Emerging 3. Responsibility, transparency, auditability, incorruptibility, predictability,
Technologies for
Business and a propensity not to cause helpless screams of frustration from
innocent victims are all criteria that apply to humans performing social
functions; they must all be taken into account in an algorithm that aims
to replace human evaluation of social functions.

4. The characteristics of current AI algorithms are human-equivalent or


greater performance, and purposely cultivated competence only in a
single, confined domain. Deep Blue won the chess world championship,
but it is unable to perform any other tasks, much less operate a vehicle or
make any new discoveries. Despite the fact that human intelligence is not
entirely universal, we do outperform machines in some cognitive
activities.

The possibility of AIs possessing superhuman intelligence and strength


forces us to face the extraordinarily difficult challenge of developing an
algorithm that produces super-ethical conduct. These problems may seem far-
reaching, but it seems inevitable that we will face them, and there are plenty
of ideas for current study directions in them.

13.3.8 Future of Artificial Intelligence – Endless


Opportunities and Growth
According to Gartner's Top Strategic Technology Developments for 2022,
hyper-automation, privacy-enhancing computation, cloud-native platforms,
cybersecurity mesh, and AI engineering trends will accelerate digital business
and innovation over the next three to five years. CIOs and CTOs are boosting
their technology spending, according to Bain & Company's 3rd Annual
Global Technology Report 2022. A technical report from the IEEE Computer
Society, titled IEEE CS 2022, highlighted 23 technologies by 2022,
including, among others, machine learning and intelligent systems, cloud
computing, big data and analytics, the internet of things, medical robotics,
and computer vision and pattern recognition. According to McKinsey, the
Industrial Internet of Things will be connected to more than 50 billion
devices by 2025. (IIoT). Around 79.4 zettabytes of data will be produced
annually by robots, automation, 3D printing, and other technologies.

The artificial intelligence (AI) market was worth approximately $ 59.67


billion in 2021, and it is anticipated to reach $ 422.37 billion in 2028,
growing at a CAGR of 39.4%. The International Data Corporation projects
that India's AI market would increase from $3.1 billion in 2020 to $ 7.8
billion in 2025, with a CAGR of 20.2%. (IDC). Since 2000, there have been
14 times as many AI start-ups. Since 2000, there have been 14 times as many
startups specialising in artificial intelligence as there have been investments
in such startups, according to statistics from Forbes. According to a different
Statista survey, 87% of organisations use AI mostly to improve email
marketing and forecast sales. By 2030, there will be a 31.4% increase in the
number of occupations related to data science and mathematical science,
many of which will be AI-based. The market for machine learning
employment is anticipated to reach $31 billion in value by 2024 after
288 expanding at a 40% annual rate for the preceding six years.
Applications of IOT,
Contrary to popular belief, the development of artificial intelligence is AI and VR
occurring more quickly than expected. By the year 2031, experts expect AI
will be able to read lips and identify diseases. By 2049–2053, AI might be
able to comprehend and think like humans. In 2025, artificial intelligence
(AI) is expected to produce music and works of art. Human brains could be
computerised by 2025. A number of cognitive tasks that people currently
carry out as well as a number of tasks that individuals are unable to
accomplish are anticipated to be completed by AI. AI will be able to learn
from people more accurately than our existing techniques. If AI acquires this
degree of independence and autonomy, it will be able to teach itself skills like
driving and music composition without the aid of humans. It will possess
intrinsic intelligence rather than merely intellect. The application of AI and
related technology has significantly transformed numerous areas of the
industry by increasing their efficacy and efficiency. Over 40% of all
organisations will incorporate some sort of AI into their "mainstream
technology" this year, despite the field of artificial intelligence slowing pace
in 2021, according to academics. The year 2022 will bring new developments
in artificial intelligence and a more stable economy, according to top tech
experts. Robots will eventually replace 60,000 factory workers, according to
China's Foxconn Technology Group, which supplies Apple and Samsung. At
the Ford factory in Cologne, Germany, robots work side by side with humans
on the factory floor.
AI requires creation of an AI ecosystem for long-term, sustainable growth.
To harness the potential of AI, three actions to be considered include build
more public–private partnerships, continue promoting the free & open
sharing of AI knowledge & resources, and promote increased understanding
of AI.

Check your Progress 2 (Answer in about 200 words)


o Space is given below for your answer.
o Check your answer with the one given at the end of this unit.
4. What are three main kinds of Artificial Intelligence?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
5. List machine learning models.
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
6. Discuss any five deep learning architectures?
……………………………………………………………………………
……………………………………………………………………………
……………………………………………………………………………
289
Emerging
Technologies for 13.4 VIRTUAL REALITY (VR)
Business
13.4.1 Introduction to Virtual Reality
Virtual reality (VR), a fast developing cutting-edge human-computer
interface, simulates a realistic environment and offers an alternative to reality
by incorporating the majority of the senses, including sight, hearing, and
touch. The main goal of virtual reality is to immerse the user in a virtual
environment that makes them feel as though they are "there." To accomplish
this, it is necessary to connect the "virtual environment" with the human
sensory and motor systems.

According to Steven, virtual reality (VR) is the process of "inducing specific


behaviour in an organism through the use of artificial sensory stimulation,
while the organism has little to no awareness of the interference." The
definition has the following four essential elements:
1. Targeted behaviour: The organism is engaging in a pre-planned
"experience." Flying, walking, exploring, watching a movie, and
interacting with other species are a few examples.
2. Organism: This may be a fruit fly, cockroach, fish, rodent, or even a
monkey (scientists have used VR technology on all of these!). It could
also be you or another person.
3. Artificial sensory stimulation: One or more of an organism's senses are
co-opted, at least in part, by engineering, and their natural inputs are
replaced or increased by artificial stimulation.
4. Awareness: During the experience, organism appears unaffected by the
disturbance, leading it to be "fooled" into believing it is physically
present in a virtual environment. This ignorance creates a sense of
presence in a modified or other reality. It is acknowledged as natural.
A computer creates sensory impressions in a virtual environment system and
transmits them to the human senses. The kind and calibre of these sensations
dictate the degree of immersion and presence in VR. Information displayed to
the user's senses should ideally be high-resolution, high-quality, and
consistent across all displays [Slat94]. Additionally, the environment itself
ought to respond to the user's activities genuinely. The reality, however,
differs greatly from this ideal situation. In many applications, just one or a
small number of senses are stimulated, and the information is frequently of
poor quality and out of sync. According to the degree of immersion they
provide to the user, we can categorise the VR systems as follows:

• Non-Immersive (Desktop VR) systems – Desktop VR is a simpler kind


of immersive VR that may be used in a variety of contexts without the
need for specialised hardware. Systems that may observe a virtual
environment through one or more computer screens are frequently
known as Window on World (WoW) systems. The user is somewhat
immersed in it, despite being able to engage with it. On a standard
monitor, it displays the world's vision, which is frequently monoscopic.
290
Applications of IOT,
There is no support for various sensory outputs. Desktop virtual reality is AI and VR
beginning to acquire popularity in modern education because technology
makes it possible for real-time viewing and interaction within a virtual
world that closely resembles the real one.

• Semi-Immersive (Fish Tank VR) Systems – Improved version of


Desktop VR. Due to the motion parallax effect, these systems support
head tracking, which enhances the sensation of "being there." While they
still employ a traditional monitor, they typically do not enable sensory
output. This is especially true when viewing in stereoscopic mode with
LCD shutter glasses.

• Immersive systems - The pinnacle of VR technology. Depending on the


user's position and orientation, HMD supports a stereoscopic view of the
scene, enabling the user to fully immerse in the computer-generated
environment. The use of audio, haptic, and sensory interfaces may
improve these systems.

Types of Immersion
The term "immersion" refers to the degree to which realistic physical stimuli,
such as light patterns and sound waves, are delivered to the senses of sight,
hearing, and touch in order to produce potent delusions of reality. Ernest
Adams claims that there are three primary types of immersion:
• Tactical immersion - Tactical immersion is felt when carrying out
skilled tactile activities. As they perfect the actions that lead to victory,
players experience being "in the zone."
• Strategic immersion – It is more cerebral and including a mental
challenge. Chess players are immersed in strategy as they select the best
answer from a wide range of options.
• Narrative immersion - Similar to what one feels while reading a book
or watching a movie, narrative immersion happens when gamers become
immersed in a tale. Staffan Björk and Jussi Holopainen categorise
immersion into related groups, referred to correspondingly as sensory-
motoric immersion, cognitive immersion, and emotional immersion.
They also include three more categories in addition to these:
• Spatial immersion - A player experiences spatial immersion when they
perceive the virtual environment to be perceptually convincing. A virtual
world seems and feels "real" to the player as if they are actually "there."
• Psychological immersion: This happens when a person mistakenly
believes that they are in a game instead of real life.
• Sensory immersion - As the player merges with the picture medium,
they experience an oneness of time and space that impacts perception
and consciousness.

13.4.2 Evolution of Virtual Reality


Ivan Sutherland first proposed the concept of the enormously popular and on-
trend technology of the current decade in 1965. Below is a brief overview of
291
Emerging virtual reality research conducted over the past three decades:
Technologies for
Business
• Sensorama: Morton Heilig built a multi-sensory simulator between 1960
and 1962. Binaural sound, fragrance, wind, and vibration experiences
were added to a colour, stereo, pre-recorded movie, which was not
interactive.

• The Ultimate Display - In 1965, Ivan Sutherland proposed the idea of


an artificial world constructed with interactive graphics, force-feedback,
sound, smell, and taste as the ultimate virtual reality solution.

• "The Sword of Damocles"— first virtual reality system, that was really
built as opposed to just being a concept. The first Head Mounted Display
(HMD), with accurate head tracking, is created by Ivan Sutherland. In
accordance with the user's head position and orientation, it supported a
stereo view that was updated accurately.

• GROPE, a force-feedback system prototype developed in 1971 at the


University of North Carolina (UNC).

• Myron Krueger developed VIDEOPLACE, an artificial reality in


1975. In this system, users' silhouettes captured by cameras were
displayed on a 2D screen's space that allowed them to communicate with
one another.
• Visually Coupled Airborne Systems Simulator (VCASS) - Thomas
Furness created the sophisticated flight simulator in 1982. The fighter
pilot wore a head-mounted display (HMD) that added graphics
describing targeting or the best flight route to the view outside the
window.

• Virtual Visual Environment Display (VIVED), created in 1984 at


NASA Ames, used commercially available technology and a
monochrome stereoscopic HMD.

• VPL—the first commercially available VR devices, the DataGlove


(1985) and Eyephone HMD (1988), were made by the VPL corporation.

• BOOM, made commercially available in 1989 by Fake Space Labs, is a


tiny box with two CRT monitors inside, which are visible through the
eye holes. While the mechanical arm measures the location and
orientation of the box, the user can grab the box, hold it by their eyes,
and navigate the virtual environment.

• UNC Walkthrough Project - At the University of North Carolina, an


architectural walkthrough application was created in the second half of
the 1980s. To enhance the performance of this system, a number of VR
devices including HMDs, optical trackers, and the Pixel-Plane graphics
engine were developed.

• Virtual Wind Tunnel, a NASA Ames application created in the early


1990s that made it possible to observe and study flow fields using
BOOM and DataGlove.
292
Applications of IOT,
• CAVE Automatic Virtual Environment (CAVE), a virtual reality and AI and VR
scientific visualisation system debuted in 1992. It projects stereoscopic
visuals on the room's walls rather than utilising an HMD (user must wear
LCD shutter glasses). Compared to HMD-based systems, this method
ensures greater image quality and resolution as well as a larger field of
view.
• Augmented Reality (AR): This technology "presents a virtual
environment that complements, rather than replaces, the real world."
This is accomplished by using a transparent HMD to superimpose actual
items in three dimensions on virtual ones. Earlier, this technology was
employed to enhance the fighter pilot's field of view with additional
flight data (VCASS). In the early 1990s, augmented reality became the
subject of numerous research initiatives because of its enormous
potential—the improvement of human eyesight.

13.4.3 Basic Components of VR Technology


The essential components of immersive systems are shown in the figure
below. A head-mounted display, a tracker, and a manipulation tool are
provided to the user (e.g., three-dimensional mouse, data glove etc.). As the
user engages in activities like walking and rotating their head, the input
devices provide the computer with data describing their action. The computer
processes data in real-time and provides the user with necessary feedback via
output displays. In general, interactivity is the responsibility of input devices,
immersion is the responsibility of output devices, and effective regulation and
synchronization of the entire environment is the responsibility of software.

• Input devices - When used in conjunction, these tools make it easy,


natural, and nearly undetected to influence a user's environment.
Unfortunately, current technology has not advanced enough to achieve
naturalness in all situations. However, it is still necessary to introduce
interaction metaphors, which could be difficult for a less seasoned user.
3D input devices include 3D Mice and Bats, Gloves and Dexterous
manipulators. Desktop input devices include SpaceBall, CyberMan, 2D
input devices etc.
• Output devices - Output devices, which might be tactile, aural, or visual
displays, are in charge of presenting the virtual environment and its
phenomena to the user. The input devices are more advanced than the
293
Emerging output devices, which are less so. The current state of technology hinders
Technologies for
Business an ideal stimulation of the human senses because current VR output
devices are heavy, subpar, and low-resolution. Most often used output
devices in VR include 3D glasses, Surround displays, Binocular Omni
Oriented Monitors (BOOM), Head Mounted (Coupled) Displays (HMD)
etc.
• Software - In addition to input and output devices, supporting software
that is responsible for managing I/O devices, analysing incoming data,
and generating pertinent feedback is also essential. Software is required
to manage the substantially more complicated and time-sensitive VR
devices. Both the processing of input data and the rapid delivery of
system responses to the output displays are necessary to sustain the
immersive experience. VR software includes development kits,
visualization software, content management, game engines, social
platforms, and training simulators.
• Human Factors - Because virtual environments are meant to resemble
the real world, it is important to know how to "fool the user's senses"
when creating them. This challenge is not simple and a decent enough
solution has not yet been found because we need to give the user a good
sense of immersion while also making the solution feasible.
• Tracker – There are six degrees of freedom for objects in three
dimensions. (DOF): x, y, and z offsets for position; yaw, pitch, and roll
angles for orientation. This data or a portion of it must be supported by
each tracker. There are two different types of trackers: those that provide
absolute data (total position/orientation values) and individuals who
provide comparative data (i.e. a change of data from the last state).
a. Magnetic Trackers – Most often used tracking devices in
immersive applications. They typically consist of a static part
(emitter, sometimes called a source), a number of movable parts
(receivers, sometimes called sensors), and a control station unit.
b. Acoustic trackers - Ultrasonic waves (above 20 kHz) are used by
acoustic trackers to determine an object's position and orientation in
space. Multiple emitters (usually three) and multiple receivers
(generally three) with specified geometry are used to acquire a set of
distances in order to calculate position and orientation because
sound can only be utilised to determine the relative distance between
two places.
c. Optical trackers - There are many different kinds and configurations
of optical trackers. Generally we can divide them into three
categories: beacon trackers, pattern recognition, and laser ranging.
d. Mechanical trackers - A mechanical linkage of a few rigid arms
with joints between them is used to measure position and orientation
of a free point in relation to the base.

294
Applications of IOT,
13.4.4 Applications of Virtual Reality AI and VR

Undoubtedly, in recent years, a lot of individuals have become interested in


virtual reality. Without having to understand how the complex user interface
functions, the user can observe and interact with the virtual environment in
the same manner that we do in the actual world. As a result, several
applications such as data visualisation systems, architectural walkthroughs,
and flying simulators were developed quickly. The current generation of VR
systems was brought about by advances in display, sensing, and computing
technology from the smartphone industry.
1. Video games – You may have a lot of fun trying out new virtual reality
experiences or playing well-known normal games in a completely
different way if you have one of the finest VR headsets or best
inexpensive VR headsets with you.
a. Cities VR - Ultimate virtual reality city-building and administration
simulator, offers a degree of presence, immersion, and interactive
gameplay that is unmatched. It makes it possible to manage all area of
city development, from utility planning to construction, and to give your
population access to healthcare, education, and other services. From a
wide vantage point, you may watch your city's skyline develop, or you
can observe it from the streets as it comes to life.
b. Hitman 3 –The stealth video game Hitman 3 was created and released in
2021 by IO Interactive. In HITMAN 3, Agent 47 makes a triumphant
return as a cunning professional vying for the most significant contracts
of his whole career. The tragic conclusion to the World of Assassination
trilogy invites you to go on a personal journey of despair and hope.
Players once again take control of assassin Agent 47, that goes to
different places to carry out hired assassinations.
c. A thrilling VR survival adventure, Jurassic World Aftermath is full
with suspense. You crash-land on Isla Nublar after Jurassic World is
destroyed, and when your attempt to get important information goes
horribly wrong, you end up stranded in a research facility. You'll need to
investigate the area and work out puzzles while avoiding the three
vicious Velociraptors who follow you everywhere in order to survive.
d. A virtual reality parkour action game is called STRIDE. Fight foes
beneath the canopy of a confined city. As you flow across balconies and
between rooftops in your attempt to save the city.
e. Thumper is rhythmic violence, featuring traditional rhythmic motion,
breakneck speed, and ferocious physicality. A space insect, you are.
Battle a crazed huge head from the future in the horrible void. With easy,
secure controls, race down the never-ending course and smash through
harsh obstacles. Hurry along, learn new techniques, accelerate to
dizzying speeds, and make it through dramatic boss fights. Every
crushing blow is accompanied by an intense original music.
2. Immersive cinema – Hollywood productions keep getting more and
more realistic. A French multiplex firm opened the first Immersive
Cinema Experience (ICE) theatres. The audio-visual experience in these
295
Emerging immersive movie theatres is sharp and clear because to the utilisation of
Technologies for
Business cutting-edge technology. In order to expand the display field and give the
viewer a sense of immersion in the movie's environment, they also use
peripheral screen panels.
3. Telepresence – The first step in creating the illusion that we are
somewhere else is to take a panoramic picture of the distant scene.
Simple VR programmes that directly query the Street View server allow
the user to feel as though he is standing in any of these sites, and
switching between nearby locations is also made easy. It's even more
fascinating for viewers to access live panoramic video interfaces
of sporting events and concerts. With today's VR technology, one can
virtually travel to distant locations and participate in most of the ways
that were previously only feasible in person.
4. Virtual societies - Telepresence gives us the impression that we are in a
different location in the real world, but VR also enables us to create
entire civilizations that resemble the real one while yet being synthetic
worlds with avatars linked to actual people. Through avatars, people
interact in a fantasy world; these experiences were initially created for
screen viewing but are now accessible through VR. People may gather in
these areas with other people for a variety of reasons, such as shared
hobbies, academic objectives, or just to get away from everyday life.
5. Education – Beyond engagement, VR allows students to explore,
experience, and become immersed in virtual environments. VR provides
the opportunity to see geometric relationships in challenging concepts or
difficult-to-interpret data in engineering, mathematics, and the sciences.
Additionally, as skills learned in a realistic virtual environment may
readily transfer to the actual world, VR is well suited for practical
training. If the real environment is expensive to provide or poses health
dangers, incentive is very high. Flight simulation is one of the earliest
and most popular types of VR training. The use of traditional instruction
mediums and textbooks is often ineffective for students with special
needs. With the introduction of VR, students have become more
responsive and engaged. At Charlton Park Academy in London, teachers
use immersive technology to address their students’ unique needs better.
6. Virtual prototyping – It enables designers to reside in a virtual world
that houses their prototype. Customers can interact with it and make
changes to it with ease in a virtual setting. A wide range of industries,
including real estate, architecture, and the design of automobiles,
furniture, clothing, and medical equipment, have major use for virtual
reality (VR) prototyping.
7. Health care – Although safety and health are challenging problems,
technology can also help us live healthier lives. The practise of
"distributed medicine," which educates people to do routine medical
treatments in remote locations around the globe, is becoming more
popular. Doctors can analyse the patient's body and more effectively
arrange a medical procedure by using virtual reality to immerse in 3D
organ models made from medical scan data. Future VR technologies may
296
Applications of IOT,
also lengthen life by enabling seniors to engage in enjoyable physical AI and VR
therapy, travel virtually, and fight loneliness by interacting with loved
ones via an interface that lets them feel present and participate in distant
activities.
8. Travel: Hotels are able to show you around their establishment so you
will know what to expect. VR has potential for luxury travel (e.g.,
honeymoons or luxury resorts). Instead of watching an online video or
perusing 2D images, the user would experience the location from their
point of view.
9. Real estate: To mimic living within a new complex, developers can go
beyond 3D models. VR could be used in both residential and commercial
settings. Additionally, co-working spaces can use virtual reality to place
a potential renter inside the facility before they sign up.
10. Military: VR is already a useful tool in simulations of conflict and other
similar situations. It can take the place of costly and occasionally risky
real-world exercises. All military branches and the defence sector find it
appealing due to the flexibility of the scenarios.
11. Dining: By adjusting taste, smell, vision, sound, and touch, Project
Nourished simulates dining. People view virtual reality as a fine dining
experience. A revolving utensil, an aroma diffuser, a system that
simulates chewing sounds, a VR headset, and tasteless, 3D-printed food
are all used in the process. The effort seeks to maximise the usefulness
and therapeutic potential of foods, medicines, and beverages while
utilising fewer natural resources.
12. Manufacturing: Using virtual reality, designers and engineers may
quickly experiment with the design and construction of cars before
ordering costly prototypes. The technology is used by companies like
Jaguar and BMW for early design and technical assessments. Because
fewer prototypes are developed for each vehicle line thanks to virtual
reality, the automotive sector saves millions of dollars.
13. Journalism: Immersive journalism enables viewers to experience events
or circumstances as they are portrayed in news articles and
documentaries. In order to communicate anything from wildfires to
tornadoes to flooding, The Weather Channel uses mixed reality.
14. Law enforcement: Virtual reality training has benefited law
enforcement training since the introduction of VR eyewear. Realistic
incident training helps officers get ready for commonplace
circumstances.
15. Marketing and advertising: Using virtual reality in marketing enables
businesses to connect experience and action. Since consumers are drawn
to VR experiences, like those offered by Toms Shoes and The North
Face, the relationship between consumers and businesses is altered.
16. Museums: Visitors can access sites that were previously inaccessible
with a mobile phone, projector, headset, or online browser. Visitors can
explore many animal species and their connections using a permanent
VR display at the National Museum of Natural History in Paris. The
297
Emerging display mimics actual interactions with or observations of animals in
Technologies for
Business their natural settings.
17. Religion: You can even use an app to interact with God. I really believe
that VR and The Virtual Reality Church enable individuals to engage in
meaningful worship wherever they are. Throughout the pandemic
shutdowns, VR Church experienced tremendous growth.

13.4.5 Advantages and Disadvantages of Virtual Reality


Advantages of Virtual Reality

• Additionally, virtual reality has been used widely to treat post-traumatic


stress disorder (PTSD) and phobias (including a fear of heights, flying,
and spiders).
• VR has been demonstrated to be successful in an academic setting.
• Many commercial organisations are now able to provide patients with
this form of therapy.
• The computer-based simulations offered a number of benefits over the
live training, despite the fact that it was discovered that employing
standardised patients for such training was more realistic.
• Their goal was to expose more people to realistic emergency scenarios to
enhance performance and decision-making while lowering psychological
stress.
Disadvantages of Virtual Reality
• Some psychologists are concerned that a user's total absorption in virtual
surroundings may have a detrimental psychological effect.
• Virtual environment (VE) systems subject a user to violent situations,
especially when they take the role of the aggressor, may cause
desensitisation in the user, creating a generation of sociopaths.
• Interesting virtual settings might be more addictive.
• Crimes like sex crimes and murder have proved challenging to describe
in the virtual world.
• Since studies have shown that people can have genuine physical and
emotional reactions to stimuli in a virtual world, a victim of a virtual
attack might actually suffer from emotional trauma.

13.4.6 The Future of Virtual Reality


Virtual reality, like all new technologies, must have both its technological
and societal futures taken into account. New research directions and their
possible use for scientific goals are technological features. The impact of new
inventions on people, both personally and collectively as a society, is one of
the social elements. Even if the majority of today's VR applications are of
poor quality and don't match reality, they are nonetheless incredibly effective
and persuasive. Virtual reality has a lot of potential, but much work needs to
be done to make it easier and more natural to interact with virtual
environments. Although it does not have to exactly replicate reality, for
298
Applications of IOT,
training purposes the simulation should closely resemble actual operational AI and VR
conditions. Human considerations must be taken into account regardless of
the programme or its intended use; otherwise, the system won't be intuitive or
comfortable enough. Mechanisms that make it simple for people to switch
from virtual reality to reality and vice versa are needed. Lots of study must be
done and new technologies must be created in order to better meet these
needs than current systems do.

Check your Progress 3 (Answer in about 200 words)


o Space is given below for your answer.
o Check your answer with the one given at the end of this unit.
7. What are the categories of VR systems?
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….
8. List the basic components of VR technology.
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….
9. List couple of VR advantages.
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….

13.5 SUMMARY
The Internet of Things (IoT), artificial intelligence (AI), and virtual reality
(VR) are three cutting-edge technological advancements that are examined in
this unit. Also highlighted is how these technologies might affect businesses
in the future.

The networking of physical items including machines, structures, electrical


devices, sensors, actuators, etc. that may communicate with one another
(sensor1 to sensor2, sensor2 to sensor3, etc.) or with other objects outside of
their immediate area is referred to as the "Internet of Things" (sensor to
vehicles, vehicles to humans, etc.).

A machine is given the ability to mimic the cognitive functions of a human


(or any other being capable of cognitive thought) in the area of computer
science known as artificial intelligence. It can then make decisions based on
its prior experiences or in response to an action that it was completely
unaware of up until that point. It is given a task and always strives to improve
its performance based on past deeds in order to accomplish the goal as
successfully as possible. A neural network that mimics the brain and a
learning mechanism allow an artificial intelligence (AI) system to learn by
299
Emerging analysing its environment, adapting to it, and coming to justifiable
Technologies for
Business conclusions. You can never truly know what an AI computer is capable of
until it accomplishes that.

Users engage with a virtual reality (VR) environment by moving their bodies
while a computer provides them with audio and visual stimuli. A type of
technology known as virtual reality (VR) focuses on generating images in
three dimensions and producing a view that takes up much of the graphical
user interface. It's like actually creating the atmosphere we've always wanted.

13.6 SELF – ASSESSMENT EXCERCISES


1. What are the components of an IoT Ecosystem?
2. Explain five-layer architecture of an IoT system.
3. What is the potential of an IoT ecosystem for a business?
4. What are three main kinds of Artificial Intelligence?
5. List machine learning models.
6. Discuss any five deep learning architectures?
7. What are the categories of VR systems?
8. List the basic components of VR technology.
9. List couple of VR advantages.

13.7 KEYWORDS
1. Accuracy - This statistic indicates how successful your AI model is at
predicting outcomes. The number of correct predictions is divided by the
total number of predictions made.

2. Actuator - Actuators transform electrical signals (energy, usually


transported by air, electric current, or liquid) into different forms of
energy such as motion or pressure. This is the opposite of what sensors
do, which is to capture physical characteristics and transform them into
electrical signals.

3. Analytics - the information resulting from the systematic analysis of both


events occurring within the artificial reality and of the device being used
to create the artificial reality.

4. Architecture - The fundamental organization of a system embodied in its


components, their relationships to each other and to the environment, and
the principles guiding its design and evolution.

5. Artificial General Intelligence (AGI) - Is a computational system that


may execute any intellectual function that a human can. Also known as
“Strong AI.” At the moment, AGI is just a concept.

6. Artificial Neural Network - Artificial intelligence and machine learning


are based on the human brain’s neural network designs, particularly the
brain. It is one of the most searched Artificial Intelligence terms.
300
Applications of IOT,
7. Artificial Intelligence (OR WEAK AI) - A computer program that AI and VR
simulates aspects of human intelligence but focuses on a single, specific
function. Narrow AI is also known as focused artificial intelligence (AI)
in distinction to AGI.

8. Augmented reality (AR) – In AR the visible natural world is overlaid


with a layer of digital content.

9. Backpropagation – It is a method of teaching neural networks based on a


known, desired output for certain sample circumstances.

10. Cellular Network - A radio network distributed over land through cells
where each cell includes a fixed-location transceiver known as a base
station. These cells together provide radio coverage over larger
geographical areas. User equipment (UE), such as mobile phones, is
therefore able to communicate even if the equipment is moving across
cells during transmission.
11. Chatbot - It is a software application that imitates human-to-human
conversation through text or voice commands.

12. Computer Vision - A multidisciplinary scientific discipline that


investigates how computers can be programmed to understand digital
images or movies at a high level. It focuses on automating activities that
the human visual system can perform.

13. Convolutional Neural Networks (CNN) - Are deep artificial neural


networks used to classify pictures (e.g., identify what they see), group
them by similarity (photo search), and recognize objects in scenes.

14. Data glove - An interactive device – often resembling a glove worn on


the hand – which connects to a computer system and facilitates fine-
motion control within virtual reality.
15. Deep learning – It is an artificial intelligence technique that mimics the
human brain by learning from how data is structured rather than a pre-
programmed algorithm. It is one of the most searched Artificial
intelligence terms.
16. Deep Neural Network - The input and output layers of an artificial neural
network (ANN) are separated by multiple intermediate layers. It uses
sophisticated mathematical modeling to process data in complicated
ways.

17. Ecosystem IoT - Refers to the multi-layers that go from devices on the
edge to the middleware. The data is transported to a place that has
applications that can do the processing and analytics.

18. Eye tracking - The ability for a head mounted display (HMD) to read the
position of the experiencer’s eyes versus their head.
19. Facial recognition – It is a computer program that can recognize or
authenticate a person.
301
Emerging 20. Field of view (FOV) - Is the view that is visible to the experiencer while
Technologies for
Business rotating their head from a fixed body position.

21. Generative Adversarial Network (GAN) - A machine learning approach


in which two neural networks compete to create new data with the same
statistics as the training set.

22. Head mounted display (HMD) - A set of goggles or a helmet with tiny
monitors in front of each eye to generate images seen by the wearer as
three-dimensional.

23. Immersion - A psychological sense of being in a virtual environment.

24. IoT - A development of the Internet in which everyday objects have


network connectivity, allowing them to send and receive data. A state in
which physical objects (things) having embedded technology to sense
and communicate, being connected via an identifier such as a micro-
chip/SIM. This will serve the communication among those things,
closing the gap between the real and the virtual world and creating
smarter processes and structures that can support us without needing our
attention. It can be compared with the digital connection on the internet.

25. Low Power Wireless Network - Low power wireless network or


6LoWPAN concept originated from the idea that ‘the Internet Protocol
could and should be applied even to the smallest devices, and that low-
power devices with limited processing capabilities should be able to
participate in the IoT’. The 6LoWPAN group has defined encapsulation
and header compression mechanisms that allow IPv6 packets to be sent
and received over IEEE 802.15.4 networks. IPv4 and IPv6 are the work-
horses for data delivery for local-area networks, metropolitan-area
networks, and wide-area networks such as the Internet. Likewise, IEEE
802.15.4 devices provide sensing communication-ability in the wireless
domain. The inherent natures of the two networks are, however,
different.

26. Machine Learning - Robots are still in the early days of AI, but it is a
field that is moving forward quickly. Machine learning refers to the
ability of computers to learn without being explicitly programmed.
Computers “learn” via patterns they detect and adapt their behavior as a
result. It is one of the most searched Artificial Intelligence terms.

27. Natural language processing (NLP) - Is the capacity of computers to


comprehend or extract meaning from natural human languages. NLP
generally entails computer interpretation of text or speech recognition.

28. Neural Network - A neural network is a computer system that functions


as the brain of a human. Although researchers are still attempting to
construct a computer model of the human brain, current neural networks
can already accomplish many things regarding speech, vision, and board
game strategy. It is one of the most searched Artificial Intelligence terms.

29. Peripheral - A device that helps enhance a virtual reality experience by


302 enabling greater immersion within the virtual world.
Applications of IOT,
30. Sensor - To determine certain physical or chemical characteristics and AI and VR
transform them into an electrical signal to make them digitally process
able. Sensors form the backbone of the IoT, helping to bridge the gap
between digital and physical.

31. Smart Cities - A concept that tries to create a more intelligent city
infrastructure by using modern information and communication
technologies. Smart cities are about a more flexible adaptation to certain
circumstances, more efficient use of resources, improved quality of life,
fluent transportation and more. This will be achieved through networking
and integrated information exchange between humans and things.

32. Supervised Machine Learning – It is a type of machine learning in which


output data train the machine to produce the correct algorithms, such as a
teacher guiding a student. It’s more prevalent than unsupervised learning.

33. Training Data - The term “training data” refers to all of the data used
throughout training a machine learning algorithm and the particular
dataset utilized to train rather than test.
34. Unsupervised learning – It is a form of machine learning technique that
concludes datasets with unannotated data.

35. Validation Data - This data is structured similarly to training data, with
input and labels, and it’s used to evaluate a recently trained model
against new data and assess performance, with a particular emphasis on
detecting overfitting.

36. Virtual reality (VR) - Places the experiencer in another location entirely.
Whether that location has been generated by a computer or captured by
video, it entirely occludes the experiencer’s natural surroundings.

37. Wireless Communication Technology - The transfer of information over


a distance without the use of enhanced electrical conductors or ‘wires’.
The distances involved may be short (a few meters as in television
remote control) or long (thousands or millions of kilometers for radio
communications). When the context is clear, the term is often shortened
to ‘wireless’. Wireless communication is generally considered to be a
branch of telecommunications.

13.8 SELF – ASSESSMENT EXERCISES


1. “Internet of Things: The New Government to Business Platform - A
Review of Opportunities, Practices, and Challenges” (2017), The World
Bank Group.
2. “Artificial Intelligence in 2022: Endless Opportunities and Growth”
(2022), Indiaai.Gov.In.
3. Joe Biron and Jonathan Follett (March 2016), “Foundational Elements of
an IoT Solution: The Edge, The Cloud, and Application Development”,
O’Reilly Media Inc., USA.
4. John Vince (2004), “Introduction to Virtual Reality”, Springer-Verlag
303
Emerging London.
Technologies for
Business 5. Muhammad Burhan, Rana Asif Rehman, Bilal Khan, Byung-Seo Kim
(2018), “IoT Elements, Layered Architectures and Security Issues: A
Comprehensive Survey”, Sensors.
6. Nick Bostrom, Eliezer Yudkowsky, “The Ethics of Artificial
Intelligence”, Cambridge Handbook of Artificial Intelligence,
Cambridge University Press, New York.
7. Rafael B. Audibert , Henrique Lemos , Pedro Avelar, Anderson R.
Tavares,*, And Luís C. Lamb (2022), “On the Evolution of A.I. and
Machine Learning: Towards Measuring and Understanding Impact,
Influence, and Leadership at Premier A.I. Conferences”,
Arxiv:2205.13131v1 [Cs.AI] 26 May 2022, Cornell University.
8. Ryan Betts (June 2016), “Architecting for The Internet of Things”,
O’Reilly Media, Inc., USA.
9. Sharmistha Mandal (2013), “Brief Introduction of Virtual Reality & its
Challenges”, International Journal Of Scientific & Engineering
Research, Volume 4, Issue 4, April-2013.
10. Steven M Lavalle (2019), “Virtual Reality”, Cambridge University Press.
11. “The Future of IoT: Adoption of AI and 5G”, KORE Whitepaper.
12. Zheng, JM. Chan, KW. And Gibson, I (1998), “Virtual Reality”, IEEE
Potentials: The Magazine for Engineering Students, Volume 17, pg. 20-
23.

304
Blockchain Technology
UNIT 14 BLOCKCHAIN TECHNOLOGY

Objectives
Some possible objectives for the course on blockchain technology include:
• Understand the fundamental concepts of blockchain technology,
including decentralized architecture, consensus mechanisms, and
cryptographic algorithms.
• Analyse real-world use cases of blockchain, such as cryptocurrencies,
supply chain management, digital identity, and smart contracts.
• Evaluate the security and privacy implications of blockchain technology,
including vulnerabilities, attacks, and privacy-preserving techniques.
• Explore the economic and social impact of blockchain technology.
• Apply critical thinking and problem-solving skills to identify
opportunities and challenges of blockchain technology.

Structure
14.0 Introduction to Block Chain Technology
14.1 Cryptography and Consensus Mechanism
14.1.1 Cryptographic Algorithms and their role in blockchain technology
14.1.2 Byzantine fault tolerance and consensus mechanisms
14.1.3 Proof of Work, Proof of Stake, and other consensus mechanisms
14.2 Blockchain Architecture and Platforms
14.2.1 Types of blockchain networks: public, private, and Hybrid
14.2.2 Introduction to Blockchain platforms: Ethereum, Hyperledger, and EOS
14.2.3 Smart Contracts and their applications
14.2.4 Gas, Ether, and Other blockchain-specific concepts
14.3 Security and Privacy on the Blockchain
14.3.1 Threats to blockchain security and Privacy
14.3.2 Prevention and mitigation of attacks: double spending, 51% attacks, and
others
14.3.3 Privacy-preserving techniques: ring signatures, zk-SNARKs, and others
14.3.4 Case Studies of blockchain attacks and their impact
14.4 Use Cases of Blockchain Technology
14.5 Social and Economic Impact of Blockchain Technology
14.6 Future of Blockchain Technology
14.7 Summary
14.8 Self–Assessment Excercises
14.9 Keywords
14.10 Further Readings

305
B
Emerging
Technologies for 14.0 INTRODUCTION TO BLOCK CHAIN
Business TECHNOLOGY
Blockchain technology is a distributed and decentralized digital ledger that
stores data in a secure, transparent, and immutable way. It was originally
designed to enable the creation and exchange of digital currencies, such as
Bitcoin, but has since evolved into a versatile platform for a wide range of
applications and industries.

At its core, a blockchain is a network of computers (nodes) that collaborate to


verify and validate transactions or data blocks. Each block contains a set of
transactions, a timestamp, and a unique digital signature called a hash, which
links it to the previous block in the chain. Once a block is added to the chain,
it cannot be altered or deleted without consensus from the network, making
the blockchain tamper-proof and resistant to fraud, censorship, and hacking.
Generally, blockchain technology has the potential to transform many
industries by enabling secure and transparent transactions, reducing
intermediation and costs, and promoting innovation and collaboration.
However, it also faces challenges and limitations, such as scalability,
interoperability, and regulatory compliance, which need to be addressed to
fully realize it’s potential.

Definition of blockchain technology


Blockchain technology is a distributed and decentralized digital ledger that
stores data in a secure, transparent, and immutable way. It uses cryptographic
algorithms to ensure the integrity and authenticity of data, as well as
consensus mechanisms to validate transactions and maintain the ledger. Each
block in the chain contains a set of transactions, a timestamp, and a unique
digital signature called a hash, which links it to the previous block. Once a
block is added to the chain, it cannot be altered or deleted without consensus
from the network, making the blockchain tamper-proof and resistant to fraud,
censorship, and hacking. Blockchain technology was originally designed to
enable the creation and exchange of digital currencies, such as Bitcoin, but
has since evolved into a versatile platform for a wide range of applications
and industries.

History of block chain technology and its evolution


Blockchain technology has its roots in the early 1990s, when researchers and
cryptographers began exploring the concept of a digital ledger that could
store and verify transactions in a secure and decentralized way. However, the
first practical implementation of blockchain technology came in 2009, with
the launch of Bitcoin, the first decentralized digital currency. The creator of
Bitcoin, who used the pseudonym Satoshi Nakamoto, proposed a new way of
verifying and validating transactions on a peer-to-peer network, using
cryptographic algorithms and consensus mechanisms.

The success of Bitcoin led to the development of other cryptocurrencies and


blockchain-based applications, such as Ethereum, which introduced the
306 concept of smart contracts, and Ripple, which focused on cross-border
Blockchain Technology
payments. As the popularity and value of cryptocurrencies grew, so did the
interest in blockchain technology, which was seen as a disruptive innovation
with the potential to transform many industries, from finance and supply
chain management to healthcare and government.

In recent years, blockchain technology has undergone significant evolution


and expansion, with new protocols, platforms, and use cases emerging. For
example, the rise of decentralized finance (DeFi) has enabled users to access
financial services, such as lending, borrowing, and trading, without relying
on traditional intermediaries. The emergence of non-fungible tokens (NFTs)
has created new opportunities for artists, musicians, and creators to monetize
their digital content and assets.
At the same time, the development of blockchain technology has faced
challenges and criticisms, such as scalability, interoperability, energy
consumption, and regulatory compliance. To address these issues, researchers
and developers are exploring new approaches and solutions, such as layer-2
scaling solutions, interoperability protocols, energy-efficient consensus
mechanisms, and regulatory frameworks.

Overall, the history of blockchain technology is still unfolding, and its future
evolution and impact are likely to be shaped by technological innovation,
market dynamics, and societal values.

Basic components of blockchain technology


The basic components of blockchain technology are:

• Distributed Ledger: A digital ledger that records all transactions in a


secure and tamper-proof manner. It is distributed among all nodes in the
network and is maintained by consensus.
• Blocks: The ledger is divided into blocks, which are linked together in a
chain using cryptographic hash functions. Each block contains a set of
transactions, a timestamp, and a unique hash that identifies the block and
links it to the previous block.
• Nodes: A node is a computer or device that participates in the
blockchain network by validating transactions and maintaining a copy of
the ledger. Nodes communicate with each other to reach consensus on
the state of the ledger.
• Consensus Mechanism: A protocol that ensures that all nodes in the
network agree on the state of the ledger. It involves a set of rules and
incentives that encourage nodes to follow the same rules and validate
transactions in a fair and transparent manner.
• Cryptography: A set of mathematical algorithms that ensure the
security and integrity of the blockchain. Cryptography is used to secure
transactions, verify identities, and protect the privacy of users.
• Smart Contracts: Self-executing contracts that automate the execution
of transactions based on pre-defined conditions. Smart contracts are
stored on the blockchain and are executed by the network nodes, without
the need for intermediaries.
307
B
Emerging These components work together to create a decentralized and secure system
Technologies for
Business that can be used for a wide range of applications, from digital currencies and
payments to supply chain management and identity verification.

Comparison of centralized and decentralized systems


Centralized systems and decentralized systems are two different approaches
to organizing and managing data and resources. Here's a comparison of some
of their key characteristics:

Centralized Systems
 Controlled by a single entity, such as a company, organization or
government.
 Data and resources are stored and managed in a central location.
 Central authority makes decisions and enforces rules.
 Users rely on intermediaries to access and manage data and resources.
 More vulnerable to hacking and data breaches.
Decentralized Systems
 Controlled by a distributed network of nodes.
 Data and resources are distributed among nodes and managed through
consensus.
 No central authority; decisions are made through consensus.
 Users have direct control over their data and resources.
 More resilient to hacking and data breaches.

Centralized systems are commonly used in traditional organizations, where a


central authority controls and manages data and resources. They offer greater
control and security, but can also be less efficient and less resilient to failures.
Decentralized systems, on the other hand, are more common in blockchain-
based applications, where data and resources are distributed among a network
of nodes. They offer greater transparency, resilience, and autonomy, but may
also be more complex and harder to govern. The choice between these two
approaches depends on the specific needs and goals of the organization or
application.

14.1 CRYPTOGRAPHY AND CONSENSUS


MECHANISM
Cryptography and consensus mechanisms are two key components of
blockchain technology that ensure the security, integrity, and validity of
transactions on the blockchain. Cryptography involves the use of
mathematical algorithms to secure data and communications. In the context
of blockchain, cryptography is used to ensure that transactions are secure,
private, and tamper-proof. Cryptography is used in several ways in
blockchain, including:

308
Blockchain Technology
• Public key cryptography: a technique that uses a pair of keys (public
and private) to encrypt and decrypt data, and verify digital signatures.

• Hash functions: a mathematical function that takes an input (such as a


block of data) and produces a fixed-size output (hash).

Consensus mechanisms are protocols that ensure that all nodes in the network
agree on the state of the blockchain. Consensus mechanisms are necessary
because the blockchain is a distributed ledger that is maintained by a network
of nodes, and there is no central authority to validate transactions. The most
common consensus mechanisms used in blockchain include Proof of Work
(PoW), Proof of Stake (PoS), and Delegated Proof of Stake (DPoS).

In summary, cryptography and consensus mechanisms are critical


components of blockchain technology that ensure the security, integrity, and
validity of transactions on the blockchain. Cryptography is used to secure
data and communications, while consensus mechanisms are used to ensure
that all nodes in the network agree on the state of the blockchain.

14.1.1 Cryptographic Algorithms and their role in blockchain


Technology
Cryptographic algorithms play a crucial role in blockchain technology by
providing security, privacy, and integrity of data and transactions. Here are
some of the cryptographic algorithms used in blockchain technology and
their role:
• Hash Functions: Hash functions are mathematical algorithms that
convert input data of any size into a fixed-size output (called a hash) that
represents the input. The hash function is deterministic, which means the
same input will always produce the same hash. In blockchain
technology, hash functions are used to create digital signatures of
transactions and to link blocks together in a chain.

• Public Key Cryptography: Public Key Cryptography (also known as


asymmetric cryptography) is a cryptographic system that uses a pair of
keys - a public key and a private key. The public key is used to encrypt
messages and verify digital signatures, while the private key is used to
decrypt messages and create digital signatures. In blockchain technology,
public key cryptography is used to verify the identity of users and to
secure transactions.

• Digital Signatures: A digital signature is a mathematical scheme that


verifies the authenticity and integrity of digital messages or documents.
In blockchain technology, digital signatures are used to ensure that only
the owner of a private key can access and transfer digital assets.

• Symmetric Encryption: Symmetric encryption is a cryptographic


system that uses the same key for both encryption and decryption. In
blockchain technology, symmetric encryption is used to secure the
communication between nodes in the network.

309
B
Emerging • Merkle Trees: A Merkle tree is a tree-like structure in which each leaf
Technologies for
Business node is a hash of a data block, and each non-leaf node is a hash of the
concatenation of its child nodes. In blockchain technology, Merkle trees
are used to efficiently verify the integrity of transactions and blocks.

In summary, cryptographic algorithms play a vital role in ensuring the


security, privacy, and integrity of data and transactions in blockchain
technology. Hash functions, public key cryptography, digital signatures,
symmetric encryption, and Merkle trees are some of the cryptographic
algorithms used in blockchain technology to provide these guarantees.

14.1.2 Byzantine fault tolerance and consensus mechanisms


Byzantine fault tolerance and consensus mechanisms are both important
concepts in blockchain technology that ensure the security and reliability of
the network.

Byzantine Fault Tolerance (BFT) is a property of a distributed system that


can tolerate the failure of nodes or components that exhibit arbitrary or
malicious behaviour (known as Byzantine faults). In blockchain technology,
Byzantine fault tolerance refers to the ability of the network to continue
operating correctly and reaching consensus even when some nodes in the
network fail or behave maliciously. Byzantine fault tolerance is achieved
through the use of consensus mechanisms that ensure all nodes in the
network agree on the same state of the blockchain.
Consensus Mechanisms are algorithms used in blockchain technology to
ensure that all nodes in the network agree on the same state of the blockchain.
Consensus mechanisms are necessary because blockchain is a decentralized
system, and there is no central authority to validate transactions or resolve
conflicts. There are several different consensus mechanisms used in
blockchain technology, including Proof of Work (PoW), Proof of Stake
(PoS), Delegated Proof of Stake (DPoS), and Byzantine Fault Tolerance
(BFT).
Proof of Work is the most well-known consensus mechanism, used in the
Bitcoin network, which requires miners to solve complex mathematical
problems in order to validate transactions and earn rewards. Proof of Stake,
on the other hand, requires validators to hold a certain amount of
cryptocurrency to participate in the validation process. Delegated Proof of
Stake is a variation of PoS where validators are elected by token holders to
validate transactions. Byzantine Fault Tolerance is a consensus mechanism
that ensures that all nodes in the network agree on the same state of the
blockchain even when some nodes fail or behave maliciously.

In summary, Byzantine fault tolerance and consensus mechanisms are both


important concepts in blockchain technology that ensure the security and
reliability of the network. Byzantine fault tolerance refers to the ability of the
network to continue operating correctly and reaching consensus even when
some nodes in the network fail or behave maliciously. Consensus
mechanisms are algorithms used to ensure that all nodes in the network agree
310 on the same state of the blockchain, and there are several different consensus
Blockchain Technology
mechanisms used in blockchain technology, including Proof of Work, Proof
of Stake, Delegated Proof of Stake, and Byzantine Fault Tolerance.

14.1.3 Proof of Work, Proof of Stake, and other consensus


mechanisms
Proof of Work (PoW) and Proof of Stake (PoS) are two of the most well-
known consensus mechanisms used in blockchain technology, but there are
other consensus mechanisms as well. Let's take a closer look at each of them:

• Proof of Work (PoW): is the original consensus mechanism used in the


Bitcoin network. It requires miners to solve complex mathematical
problems in order to validate transactions and earn rewards. Miners
compete to solve these problems by using their computing power to find
the correct solution. The first miner to find the solution to the problem
broadcasts it to the network, and other miners verify the solution. Once
the solution is verified, the miner who found the solution earns a reward
and the transaction is added to the blockchain. PoW is energy-intensive
and requires a lot of computational power, which can make it expensive
and slow.

• Proof of Stake (PoS): is an alternative consensus mechanism that


requires validators to hold a certain amount of cryptocurrency to
participate in the validation process. Validators are chosen at random to
validate transactions, and they put their cryptocurrency at stake to ensure
they act honestly. If they validate transactions correctly, they earn
rewards. If they act maliciously, they lose their stake. PoS is less energy-
intensive than PoW and can be faster and more efficient.

• Delegated Proof of Stake (DPoS): is a variation of PoS where


validators are elected by token holders to validate transactions. Token
holders vote for validators, and the validators with the most votes are
chosen to validate transactions. Validators put their cryptocurrency at
stake to ensure they act honestly. DPoS is faster and more efficient than
PoW and PoS, but it is more centralized because token holders have
more influence over the validation process.

• Proof of Authority (PoA): is a consensus mechanism where validators


are chosen based on their reputation and authority rather than their
computational power or cryptocurrency holdings. Validators are
typically known entities or organizations that have a reputation to
uphold. PoA is fast and efficient, but it is more centralized than other
consensus mechanisms because validators are chosen by a central
authority.

In summary, there are several different consensus mechanisms used in


blockchain technology, including Proof of Work, Proof of Stake, Delegated
Proof of Stake, and Proof of Authority. Each consensus mechanism has its
own advantages and disadvantages, and the choice of consensus mechanism
depends on the specific needs of the blockchain network.

311
B
Emerging
Technologies for 14.2 BLOCKCHAIN ARCHITECTURE AND
Business PLATFORMS
Blockchain architecture refers to the way in which the various components of
a blockchain network are organized and work together as depicted in figure 1.
There are several different blockchain architectures and platforms, each with
its own unique features and capabilities.

Figure 1: Blockchain architecture

Source: https://www.researchgate.net/figure/Blockchain-architecture_fig1_343137671

Here are some of the most commonly used blockchain architectures and
platforms:

 Bitcoin Blockchain Architecture: The Bitcoin blockchain is the first


and most well-known blockchain architecture. It is a public blockchain,
which means that anyone can participate in the network and view the
blockchain. The Bitcoin blockchain uses the Proof of Work consensus
mechanism and is primarily used for sending and receiving payments.
 Ethereum Blockchain Architecture: The Ethereum blockchain is a
public blockchain that allows developers to build decentralized
applications (dapps) on top of it. Ethereum uses the Proof of Work
consensus mechanism, but it is in the process of transitioning to a Proof
of Stake consensus mechanism. Ethereum's blockchain architecture
includes a programming language called Solidity, which is used to write
smart contracts.
312
Blockchain Technology
 Hyperledger Fabric Architecture: Hyperledger Fabric is a private
blockchain architecture that is designed for enterprise use cases. It is
permissioned, which means that participants in the network must be
authorized to join. Hyperledger Fabric uses a modular architecture that
allows for greater flexibility and customization. It supports multiple
consensus mechanisms, including PBFT (Practical Byzantine Fault
Tolerance) and Raft.

 Corda Architecture: Corda is another private blockchain architecture


that is designed for enterprise use cases. It is also permissioned and
supports multiple consensus mechanisms. Corda is unique in that it does
not use a traditional blockchain architecture. Instead, it uses a distributed
ledger technology that allows for greater privacy and scalability.

 Stellar Architecture: Stellar is a public blockchain architecture that is


designed for cross-border payments and asset transfers. It uses the Stellar
Consensus Protocol (SCP) consensus mechanism, which is a variation of
the Federated Byzantine Agreement (FBA) consensus mechanism.
Stellar is known for its fast transaction speeds and low transaction fees.

 Ripple Architecture: Ripple is another public blockchain architecture


that is designed for cross-border payments and asset transfers. It uses a
consensus mechanism called the Ripple Protocol Consensus Algorithm
(RPCA), which is designed to be fast and efficient. Ripple is known for
its high transaction throughput and low transaction fees.
In summary, there are several different blockchain architectures and
platforms, each with its own unique features and capabilities. The choice of
blockchain architecture and platform depends on the specific needs of the use
case and the desired characteristics of the network, such as speed, scalability,
and privacy.

14.2.1 Types of blockchain networks: public, private, and


hybrid
There are four types of blockchain networks: public, private, hybrid and
consortium as depicted in figure 2.

Figure 2: Types of blockchain networks

Source: https://www.researchgate.net/figure/Types-of-blockchain-networks_fig4_366925431
313
B
Emerging 1) Public Blockchain Networks: are open to anyone who wants to
Technologies for
Business participate in the network. Anyone can read, write, and validate
transactions on the network. The most well-known example of a public
blockchain network is Bitcoin. Public blockchain networks are
decentralized and do not have a central authority controlling the network.
The security of the network is ensured through consensus mechanisms
like Proof of Work or Proof of Stake.

2) Private Blockchain Networks: are restricted to a specific group of


participants who have been granted permission to access the network.
These participants are typically known and trusted entities, such as
companies, organizations, or government agencies. Private blockchain
networks are centralized and have a designated authority controlling the
network. Private blockchain networks can be more efficient and faster
than public networks because they have fewer nodes and transaction
verifiers.

3) Hybrid Blockchain Networks: combine features of both public and


private blockchain networks. Hybrid blockchain networks allow for
public participation in some aspects of the network while restricting
access to other aspects. Hybrid blockchain networks can be used in cases
where some data needs to be public while other data needs to be kept
private. For example, a hybrid blockchain network could be used for a
supply chain management system, where the public can view the status
of shipments, while the private data about the parties involved in the
shipment is kept confidential.
4) Consortium Blockchain Networks: is a type of blockchain network
that is governed by a group of organizations, rather than a single entity.
In a consortium blockchain, the participating organizations maintain joint
control over the network and its operations.

In summary, blockchain networks can be classified as public, private, hybrid,


or consortium. The choice of blockchain network type depends on the
specific use case and the desired level of security, privacy, and
decentralization.

14.2.2 Introduction to blockchain platforms: Ethereum,


Hyperledger, and EOS
Blockchain platforms are software frameworks that provide the infrastructure
and tools to build decentralized applications (dApps) on top of blockchain
technology. Three of the most popular blockchain platforms are Ethereum,
Hyperledger, and EOS.
1) Ethereum: is a public blockchain platform that allows developers to
build dApps and execute smart contracts on its blockchain. Ethereum
uses a programming language called Solidity to write smart contracts.
Ethereum is known for its flexibility, as it allows developers to create
custom tokens and implement various consensus mechanisms. Ethereum
is also in the process of transitioning from a Proof of Work consensus
314 mechanism to a Proof of Stake consensus mechanism, which is expected
Blockchain Technology
to improve its scalability and energy efficiency.

2) Hyperledger: is a private blockchain platform that is designed for


enterprise use cases. It provides a modular architecture that allows for
flexibility and customization. Hyperledger supports multiple consensus
mechanisms, including Practical Byzantine Fault Tolerance (PBFT) and
Raft. Hyperledger Fabric is the most widely used Hyperledger platform,
which is used to build decentralized applications for supply chain
management, healthcare, and finance.
3) EOS: is a public blockchain platform that is designed for decentralized
application development. EOS uses a delegated proof of stake consensus
mechanism, which allows for faster transaction processing times
compared to other blockchain platforms. EOS is also known for its
scalability, as it can process thousands of transactions per second. EOS is
used to build decentralized applications in various fields, including
finance, social media, and gaming.

In summary, Ethereum, Hyperledger, and EOS are three popular blockchain


platforms that provide infrastructure and tools for decentralized application
development. Ethereum is a public blockchain platform known for its
flexibility, while Hyperledger is a private blockchain platform designed for
enterprise use cases. EOS is a public blockchain platform known for its
scalability and fast transaction processing times.

14.2.3 Smart Contracts and their applications


A smart contract is a self-executing digital contract that contains the terms
and conditions of an agreement between two parties. Smart contracts are
written in code and are stored on a blockchain network, where they can be
executed automatically when certain conditions are met. Smart contracts
eliminate the need for intermediaries in the contract execution process, and
can help reduce costs and increase efficiency in various industries. Smart
contracts have many applications, some of which include:

 Finance: Smart contracts can be used to automate financial transactions,


such as payments, loans, and insurance claims. This can help reduce the
risk of fraud and increase the speed and accuracy of financial
transactions.
 Supply Chain Management: Smart contracts can be used to track the
movement of goods and services in a supply chain, ensuring that they are
delivered to the right place at the right time. This can help reduce the risk
of errors and increase transparency and accountability in the supply
chain.
 Real Estate: Smart contracts can be used to automate real estate
transactions, such as property sales and rentals. This can help reduce the
need for intermediaries in the real estate industry, and increase the speed
and accuracy of transactions.
 Healthcare: Smart contracts can be used to automate healthcare
transactions, such as insurance claims and medical records management.
315
B
Emerging This can help reduce the risk of errors and increase efficiency in the
Technologies for
Business healthcare industry.
 Voting: Smart contracts can be used to automate voting processes,
ensuring that they are transparent and tamper-proof. This can help
increase trust and confidence in the voting process, and reduce the risk of
fraud.

In summary, smart contracts are self-executing digital contracts that can


automate various processes in different industries. Smart contracts have many
applications, including finance, supply chain management, real estate,
healthcare, and voting. Smart contracts can help reduce the need for
intermediaries in contract execution, increase efficiency, and reduce the risk
of fraud.

14.2.4 Gas, Ether, and other blockchain-specific concepts


Gas and Ether are two important concepts in blockchain technology,
specifically on the Ethereum blockchain.

 Gas: is a unit of measurement that determines the computational effort


required to execute a transaction or a smart contract on the Ethereum
blockchain. The more complex the transaction or the smart contract, the
more gas is required. The gas cost is paid by the sender of the transaction
in Ether, which serves as the currency on the Ethereum blockchain.
 Ether: is the cryptocurrency used on the Ethereum blockchain. It is used
as the currency to pay for gas fees, and it can also be traded on
cryptocurrency exchanges. Ether is used to pay for transaction fees,
smart contract execution, and other services on the Ethereum blockchain.
Other blockchain-specific concepts include:

 Nodes: are computers that are connected to the blockchain network and
participate in the process of verifying transactions and maintaining the
blockchain ledger. There are different types of nodes, including full
nodes and light nodes, which differ in the amount of data they store and
the level of participation in the network.

 Forks: occur when a blockchain splits into two separate chains due to a
disagreement among the network participants. There are two types of
forks: soft forks and hard forks. Soft forks are temporary changes to the
blockchain protocol that are backward-compatible with older versions,
while hard forks are permanent changes that are not compatible with
older versions.
 Tokens: are digital assets that are created and managed on a blockchain
network. Tokens can represent anything of value, such as
cryptocurrency, assets, or utility tokens. Tokens are often created using
smart contracts on the Ethereum blockchain.

In summary, gas and Ether are important concepts in the Ethereum


blockchain, while nodes, forks, and tokens are other blockchain-specific
316 concepts that are used in various blockchain networks.
Blockchain Technology
14.3 SECURITY AND PRIVACY ON THE
BLOCKCHAIN
Security and privacy are two critical aspects of the blockchain technology.
While the blockchain is known for its robust security, it is equally important
to ensure the privacy of the participants involved.

Let's first discuss security. The blockchain technology provides security in


several ways. Firstly, it uses cryptography to secure transactions and prevent
unauthorized access. Each block in the blockchain is encrypted and linked to
the previous block, creating a secure and tamper-proof ledger. Secondly, the
blockchain is decentralized, which means that there is no single point of
failure, making it resilient to cyber-attacks. Lastly, consensus algorithms such
as Proof of Work and Proof of Stake ensure that only valid transactions are
added to the blockchain.
Now, let's talk about privacy. While the blockchain is transparent, it is not
always desirable to disclose sensitive information about the participants or
the transactions. Therefore, several techniques have been developed to ensure
privacy on the blockchain. One of the most popular techniques is the use of
cryptographic techniques such as zero-knowledge proofs and ring signatures.
These techniques enable participants to prove the validity of transactions
without revealing any information about themselves or the transaction.
Another technique is the use of private or permissioned blockchains. These
blockchains restrict the access to the network and ensure that only authorized
participants can access the blockchain. This approach is particularly useful in
applications such as supply chain management or identity verification, where
only authorized parties need access to the information.

In summary, security and privacy are critical aspects of the blockchain


technology. While the blockchain is inherently secure, additional measures
are necessary to ensure the privacy of the participants and transactions. The
blockchain community is continuously working to improve the security and
privacy features of the blockchain, and new techniques and solutions are
being developed regularly.

14.3.1 Threats to Blockchain Security and Privacy


Although the blockchain is known for its robust security, there are still some
threats that can compromise the security and privacy of the participants. Here
are some of the most common threats:
 51% attacks: In a blockchain network, the consensus algorithm ensures
that the majority of the nodes agree on the validity of a transaction.
However, if a single entity controls more than 50% of the nodes, they
can manipulate the transactions and compromise the integrity of the
blockchain.
 Sybil attacks: In a Sybil attack, an attacker creates multiple fake
identities or nodes on the network to gain control of the consensus
algorithm. This attack can be used to carry out other attacks, such as a
51% attack. 317
B
Emerging  Malware attacks: Malware can infect a node on the network and steal
Technologies for
Business private keys or sensitive information, compromising the security of the
blockchain.

 Smart contract vulnerabilities: Smart contracts are self-executing


contracts that are stored on the blockchain. If there is a vulnerability in a
smart contract, it can be exploited to compromise the security of the
blockchain.

 Privacy breaches: Although the blockchain is transparent, certain


information such as transaction amounts or wallet addresses can be used
to identify the participants. This can compromise the privacy of the
participants and make them vulnerable to targeted attacks.
 Insider attacks: An insider attack occurs when someone with authorized
access to the blockchain uses their access to steal sensitive information
or manipulate the transactions.
In summary, although the blockchain is secure, it is not immune to attacks. It
is important to stay vigilant and take steps to mitigate these threats to ensure
the security and privacy of the blockchain. This can include implementing
multi-factor authentication, regularly updating software and hardware, and
using encryption and other security measures to protect sensitive information.

14.3.2 Prevention and mitigation of attacks: double spending,


51% attacks, and others
Preventing and mitigating attacks on the blockchain is a continuous process,
and it requires a combination of technical measures and best practices. Here
are some prevention and mitigation measures for some common attacks:

 Double spending attacks: Double spending attacks occur when a user


spends the same cryptocurrency twice. To prevent this attack, most
blockchain networks use a consensus algorithm such as Proof of Work or
Proof of Stake. These algorithms ensure that a majority of the nodes on
the network agree on the validity of a transaction before it is added to the
blockchain. In addition, some blockchains use mechanisms such as
transaction fees or transaction locks to prevent double-spending attacks.

 51% attacks: In a 51% attack, an attacker gains control of the majority


of the nodes on the network and can manipulate the transactions. To
prevent this attack, some blockchain networks use consensus algorithms
that are resistant to 51% attacks, such as Delegated Proof of Stake
(DPoS) or Byzantine Fault Tolerance (BFT). In addition, some
blockchains use mechanisms such as sharding or multi-party
computation to distribute control over the network and make it more
difficult for an attacker to gain control.

 Sybil attacks: Sybil attacks can be prevented by implementing identity


verification mechanisms or by using reputation systems to identify
trustworthy nodes. Some blockchain networks also use Proof of Work or
Proof of Stake algorithms to prevent Sybil attacks.
318
Blockchain Technology
 Malware attacks: To prevent malware attacks, it is important to use
security software and to regularly update software and hardware. In
addition, it is recommended to use cold storage wallets or hardware
wallets to store cryptocurrencies, as these are less vulnerable to malware
attacks than software wallets.

 Smart contract vulnerabilities: Smart contract vulnerabilities can be


mitigated by conducting thorough security audits and code reviews
before deploying a smart contract. In addition, it is important to use
programming best practices, such as input validation and error handling,
to prevent vulnerabilities.

 Privacy breaches: To prevent privacy breaches, it is important to use


encryption and other privacy-enhancing technologies such as zero-
knowledge proofs and ring signatures. In addition, it is important to
follow best practices such as not reusing addresses and using privacy
coins for transactions that require anonymity.

In summary, preventing and mitigating attacks on the blockchain requires a


combination of technical measures and best practices. It is important to stay
vigilant and keep up-to-date with the latest security and privacy techniques to
ensure the security and privacy of the blockchain.

14.3.3 Privacy-preserving techniques: ring signatures, zk-


SNARKs, and others
Privacy-preserving techniques are an important aspect of blockchain
technology, especially for cryptocurrencies that prioritize anonymity and
confidentiality. Here are some commonly used privacy-preserving techniques
in blockchain:
 Ring signatures: Ring signatures allow a user to sign a transaction
without revealing their identity. In a ring signature scheme, a user can
choose a set of public keys, including their own, and use them to create a
signature that is valid for a given message. The verifier can verify the
signature, but cannot determine which key was used to create the
signature. This technique is used in privacy-focused cryptocurrencies
such as Monero.

 zk-SNARKs: Zero-knowledge Succinct Non-Interactive Argument of


Knowledge (zk-SNARKs) is a technique that allows a user to prove the
validity of a transaction without revealing any information about the
transaction. In a zk-SNARK scheme, a user can create a proof that a
transaction is valid without revealing any information about the inputs or
outputs of the transaction. This technique is used in privacy-focused
cryptocurrencies such as Zcash.

 Stealth addresses: A stealth address is a one-time use address that is


generated for each transaction. When a user receives a payment, the
payment is sent to the stealth address, which is linked to the user's actual
address through a cryptographic mechanism. This technique is used in
privacy-focused cryptocurrencies such as Monero.
319
B
Emerging  CoinJoin: CoinJoin is a technique that allows multiple users to combine
Technologies for
Business their transactions into a single transaction. This makes it difficult to
determine which user owns which output in the transaction, thus
preserving privacy. This technique is used in privacy-focused
cryptocurrencies such as Wasabi Wallet.

 Homomorphic encryption: Homomorphic encryption is a technique


that allows a user to perform calculations on encrypted data without
decrypting it. This technique can be used to preserve privacy in
blockchain transactions by encrypting the transaction inputs and outputs
and performing the necessary calculations on the encrypted data.

In summary, privacy-preserving techniques such as ring signatures, zk-


SNARKs, and stealth addresses are important for preserving anonymity and
confidentiality in blockchain transactions. These techniques are used in
privacy-focused cryptocurrencies and can help protect users' privacy and
security.

14.3.4 Case Studies of Blockchain Attacks and their Impact


There have been several notable attacks on blockchain networks and
cryptocurrency platforms that have had a significant impact on the industry.
Here are some case studies of blockchain attacks and their impact:

 The DAO Attack: The DAO was a decentralized autonomous


organization built on the Ethereum blockchain that aimed to facilitate
decentralized decision-making and funding for projects. In 2016, a
hacker exploited a vulnerability in the smart contract that allowed them
to drain $50 million worth of Ether from The DAO. This attack led to a
contentious hard fork in the Ethereum network, resulting in the creation
of Ethereum Classic. The incident highlighted the importance of smart
contract security and the potential risks associated with decentralized
autonomous organizations.
 Mt. Gox Hack: Mt. Gox was a cryptocurrency exchange that was
hacked in 2014, resulting in the loss of 850,000 Bitcoins worth over
$460 million at the time. The hack led to the bankruptcy of the exchange
and significant losses for its users. The incident highlighted the
importance of exchange security and the risks associated with leaving
cryptocurrency holding on exchanges.
 51% Attacks on Verge and Ethereum Classic: In 2018, the Verge
cryptocurrency was hit by a series of 51% attacks that allowed an
attacker to manipulate transactions and double-spend coins. The attacks
led to significant losses for users and raised questions about the security
of smaller cryptocurrencies. In the same year, Ethereum Classic was also
hit by a 51% attack that led to the loss of $1.1 million worth of ETC.
 Parity Wallet Hack: In 2017, a vulnerability in the Parity Wallet smart
contract library led to the loss of $30 million worth of Ether. The
vulnerability allowed an attacker to take control of the Parity Wallet's
multi-signature feature, which is used to secure large amounts of
cryptocurrency. The incident highlighted the importance of secure smart
320
Blockchain Technology
contract development and the potential risks associated with using multi-
signature wallets.

In conclusion, these case studies demonstrate the potential risks associated


with blockchain technology and the importance of maintaining security and
implementing best practices. The incidents also highlight the need for
continued development and improvement in blockchain security to protect
the integrity and trust of the industry.

14.4 USE CASES OF BLOCKCHAIN


TECHNOLOGY
Blockchain technology has several potential use cases across various
industries, and its applications extend far beyond just cryptocurrency. Here
are some examples of how blockchain technology can be used:

 Supply chain management: Blockchain technology can be used to track


and verify the movement of goods and products along a supply chain.
This can help increase transparency, reduce fraud, and improve
efficiency in the supply chain.
 Digital identity: Blockchain technology can be used to create secure and
decentralized digital identities, which can be used to authenticate and
verify individuals' identities. This can help reduce identity theft, fraud,
and other forms of online security threats.
 Decentralized finance: Blockchain technology can be used to create
decentralized financial systems that operate independently of traditional
financial institutions. This can provide greater financial access and
inclusion, reduce transaction costs, and increase transparency in financial
transactions.
 Real estate: Blockchain technology can be used to create a secure and
decentralized record of property ownership and transfer, making real
estate transactions more efficient, transparent, and secure.

 Voting: Blockchain technology can be used to create secure and


transparent voting systems that prevent fraud, ensure accuracy, and
protect the integrity of the voting process.

 Intellectual property: Blockchain technology can be used to create a


secure and transparent record of intellectual property ownership, making
it easier to register, track, and transfer intellectual property rights.

 Healthcare: Blockchain technology can be used to create a secure and


decentralized record of patient medical records, ensuring privacy and
security, while also providing accessibility to doctors, hospitals and other
healthcare providers.

 Energy: Blockchain technology can be used to create a decentralized


energy grid, which can help reduce costs, increase efficiency, and
improve access to renewable energy sources.
321
B
Emerging In summary, blockchain technology has a wide range of potential use cases
Technologies for
Business across various industries, including supply chain management, digital
identity, decentralized finance, real estate, voting, intellectual property,
healthcare, and energy. These applications have the potential to increase
efficiency, transparency, and security in various sectors, making blockchain
an important technology for the future.

14.5 SOCIAL AND ECONOMIC IMPACT OF


BLOCKCHAIN TECHNOLOGY
Blockchain technology has the potential to have a significant social and
economic impact in various ways. Here are some examples:

1) Decentralization: Blockchain technology allows for decentralized


systems that operate independently of central authorities. This can
potentially reduce the concentration of power and control in the hands of
a few, leading to more distributed decision-making and governance.
2) Increased transparency: Blockchain technology can increase
transparency in various industries, including finance, supply chain, and
voting. By providing a tamper-proof record of transactions, blockchain
can help reduce fraud, corruption, and other unethical practices.
3) Improved efficiency: Blockchain technology can streamline processes
and reduce costs by eliminating intermediaries and automating tasks that
traditionally require human intervention. This can result in increased
productivity and reduced costs for businesses and individuals.
4) Financial inclusion: Decentralized finance (DeFi) on the blockchain can
potentially provide greater financial access and inclusion for people who
are underbanked or unbanked. By eliminating the need for
intermediaries, blockchain-based financial systems can provide financial
services to people who are excluded from traditional banking systems.
5) New business models: Blockchain technology can enable new business
models, such as decentralized marketplaces, tokenization of assets, and
peer-to-peer lending. These new models have the potential to disrupt
traditional industries and create new opportunities for innovation and
growth.
6) Privacy and security: Blockchain technology can potentially provide
better privacy and security for individuals by providing secure and
decentralized systems for data storage and transfer. This can help protect
sensitive information and prevent data breaches.
7) Environmental impact: Blockchain technology can be used to create
decentralized energy systems that use renewable energy sources,
reducing the carbon footprint of traditional energy systems. Additionally,
blockchain-based systems can reduce paper-based record-keeping,
leading to a reduction in deforestation and other environmental impacts.

In summary, the social and economic impact of blockchain technology is


significant and varied, ranging from increased transparency and efficiency to
322 financial inclusion and new business models. The potential of blockchain
Blockchain Technology
technology to disrupt and transform traditional industries is enormous, and it
will be exciting to see how this technology continues to develop and evolve
in the future.

14.6 FUTURE OF BLOCKCHAIN TECHNOLOGY


The future of blockchain technology is promising, as it continues to evolve
and mature. Here are some potential developments and trends that may shape
the future of blockchain:

 Scalability: One of the biggest challenges facing blockchain technology


is scalability. As more people and businesses adopt blockchain, the
technology needs to be able to handle increased demand. Solutions such
as sharding, sidechains, and off-chain solutions are being developed to
address this issue.

 Interoperability: As blockchain networks continue to grow, there is a


need for greater interoperability between different blockchains.
Interoperability protocols such as Polkadot and Cosmos are being
developed to enable cross-chain communication and collaboration.
 Sustainability: The energy consumption of blockchain networks,
particularly Proof of Work (PoW) based networks such as Bitcoin, has
been a concern. The development of more energy-efficient consensus
algorithms, such as Proof of Stake (PoS), may help address this issue.
 Tokenization: Tokenization of assets, such as real estate, art, and
intellectual property, is expected to increase, creating new opportunities
for investment and asset management. This trend may lead to the
creation of new asset classes and investment opportunities.
 DeFi: Decentralized finance (DeFi) is an area of blockchain that is
rapidly growing, offering a range of financial services such as lending,
borrowing, and trading, without intermediaries. The growth of DeFi may
lead to the creation of new financial products and services, as well as
greater financial inclusion.

 Regulation: As blockchain technology becomes more mainstream, there


is likely to be increased regulation from governments and financial
authorities. Regulation can provide clarity and legitimacy to the industry,
but it may also stifle innovation and limit the potential of blockchain
technology.
In summary, the future of blockchain technology is likely to involve greater
scalability, interoperability, sustainability, tokenization, DeFi, and regulation.
As blockchain continues to evolve and mature, it has the potential to
transform various industries, creating new opportunities for innovation and
growth.

14.7 SUMMARY
In summary, blockchain technology is a decentralized and secure ledger that
allows for secure and transparent transactions. It has the potential to 323
B
Emerging transform various industries, including finance, supply chain, and healthcare,
Technologies for
Business by providing increased transparency, efficiency, and security.

However, there are also challenges associated with blockchain, including


scalability, interoperability, and energy consumption. Various solutions are
being developed to address these challenges, including sharding, sidechains,
off-chain solutions, Proof of Stake consensus algorithms, and interoperability
protocols.

Blockchain technology has the potential to have a significant social and


economic impact by promoting decentralization, increasing transparency,
improving efficiency, promoting financial inclusion, enabling new business
models, providing privacy and security, and reducing environmental impact.
The future of blockchain technology is likely to involve greater scalability,
interoperability, sustainability, tokenization, DeFi, and regulation.

14.8 SELF – ASSESSMENT EXERCISES


1) What is blockchain technology and how does it work?
2) What is the difference between public and private blockchains?
3) How is data secured on a blockchain?
4) What is a consensus algorithm and why is it important in blockchain?
5) How can blockchain technology be used in supply chain management?
6) What are the advantages and disadvantages of blockchain technology?
7) What is a smart contract and how does it work on a blockchain?
8) How can blockchain be used in finance?
9) What is a token and how is it used in blockchain?

14.9 KEYWORDS
1) Blockchain: A decentralized and secure digital ledger that records
transactions in a series of blocks that are cryptographically linked to each
other.
2) Cryptography: The practice of secure communication in the presence of
third parties.
3) Consensus algorithm: A protocol used to verify transactions and ensure
that the network agrees on the current state of the ledger.
4) Decentralization: The process of distributing power away from a central
authority, making the network more secure and resistant to attack.
5) Hash: A unique code that identifies a block in a blockchain.
6) Smart contract: A self-executing contract that is written in code and
stored on a blockchain. It contains the terms of an agreement between
parties and is automatically executed when certain conditions are met.
7) Token: A digital asset that is created and managed on a blockchain. It
can represent a variety of assets, such as currencies, commodities, or
324 even real estate.
Blockchain Technology
8) Transaction: The transfer of data on a blockchain, typically involving the
exchange of cryptocurrency or other digital assets.
9) Mining: The process of validating transactions and adding new blocks to
the blockchain.
10) Public key cryptography: A cryptographic system that uses two keys, a
public key and a private key, to encrypt and decrypt data.
11) Private key: A secret key that is used to encrypt and decrypt data in a
public key cryptography system.
12) Public key: A key that is made publicly available and used to encrypt
data in a public key cryptography system.
13) Permissioned blockchain: A private blockchain that only allows
authorized users to participate.
14) Permissionless blockchain: A public blockchain that is open to anyone
and allows anyone to participate in the network.
15) Fork: A change to the blockchain protocol that can result in the creation
of a new cryptocurrency.

14.10 FURTHER READINGS


1) Antonopoulos, A. M. (2014). Mastering Bitcoin: Unlocking digital
cryptocurrencies. O'Reilly Media.
2) Buterin, V. (2014). A next-generation smart contract and decentralized
application platform. Ethereum White Paper, 1-36.
3) Casey, M. J., & Vigna, P. (2018). The truth machine: The blockchain and
the future of everything. St. Martin's Press.
4) Nakamoto, S. (2008). Bitcoin: A Peer-to-Peer Electronic Cash System.
Self-published.
5) Narayanan, A., Bonneau, J., Felten, E., Miller, A., & Goldfeder, S.
(2016). Bitcoin and Cryptocurrency Technologies: A Comprehensive
Introduction. Princeton University Press.
6) Swan, M. (2015). Blockchain: Blueprint for a new economy. O'Reilly
Media.
7) Tapscott, D., & Tapscott, A. (2016). Blockchain revolution: How the
technology behind bitcoin is changing money, business, and the world.
Penguin Random House.
8) Wood, G. (2014). Ethereum: A secure decentralised generalised
transaction ledger. Ethereum Project Yellow Paper, 151.
9) Zohar, A. (2015). Bitcoin: Under the hood. Communications of the
ACM, 58(9), 104-113.

325

You might also like