MIS IGNOU
MIS IGNOU
Management
School of Management Studies Information Systems
BLOCK 1
OVERVIEW OF MANAGEMENT INFORMATION
SYSTEM 7
BLOCK 2
BUSINESS INTELLIGENCE & DECISION MAKING 49
BLOCK 3
RELATIONAL DATABASE MANAGEMENT SYSTEM 149
BLOCK 4
EMERGING TECHNOLOGIES FORM BUSINESS 199
COURSE DESIGN AND PREPARATION TEAM
Prof. K. Ravi Sankar Prof. Sourbhi Chaturvedi
Director, School of Management Studies, Faculty of Management Studies,
IGNOU, New Delhi Ganpat University, Mehsana,
Gujarat
Prof. Deepak Jaroliya*
Prestige Institute of Management and Research, Dr. P. Mary Jeyanthi*
Indore Associate Professor - Business Analytics
Jaipuria Institute of Management,
Dr.Shaheen* Jaipur
Associate Professor - IT & Analytics
Institute of Public Enterprise Prof. Anurag Saxena
Hyderabad SOMS, IGNOU
New Delhi
Acknowledgement: The persons marked with (*) were the original contributors, and the profiles are as
they were on the initial print date.
PRINT PRODUCTION
Mr. Tilak Raj
Assistant Registrar
MPDD, IGNOU, New Delhi
May 2023
© Indira Gandhi National Open University, 2023
ISBN:
All rights reserved. No part of this work may be reproduced in any form, by mimeograph or any other
means, without permission in writing from the Indira Gandhi National Open University.
Further information on the Indira Gandhi National Open University courses may be obtained from the
University’s Office at Maidan Garhi, New Delhi – 110 068
Printed and published on behalf of the Indira Gandhi National Open University, New Delhi, by the
Registrar, MPDD, IGNOU.
Laser typeset by Tessa Media & Computers, C-206, A.F.E-II, Jamia Nagar, New Delhi -110025
Content
BLOCK 1 OVERVIEW OF MANAGEMENT INFORMATION 7
SYSTEM
The contents of this course are practical, relevant, and current. All the topics
discussed in this course are simple and intuitive. This course may help the
learners to improve their knowledge and skills in management information
systems.
INFORMATION SYSTEM
Objectives
After studying this unit, you will be able to:
• Define what an information system is by identifying its major
components.
• Understand the information subsystems which could be defined within a
typical organization.
• Differentiate between various types and levels of information systems.
Structure
1.1 Introduction
1.2 Defining Information System
1.3 Types of Information
1.4 Dimensions of information system:
1.5 Operating Elements of Information Systems:
1.6 Types of Information Systems:
1.7 The Components of Information Systems
1.8 Major processing functions in information systems:
1.9 How to Apply Information Systems in Business?
1.10 Facts of information systems
1.11 Summary
1.12 Self-Assessment Exercises
1.13 Further Readings
1.1 INTRODUCTION
Information systems (IS) are critical to the operation of modern
organizations. They are interconnected networks of hardware, software, data,
people, and procedures designed to collect, process, store, and disseminate
information to aid in decision-making, coordination, and control. The rise of
digital technologies, as well as the increased use of computers and the
internet, has altered how organizations operate and interact with their
stakeholders. In a rapidly changing business environment, information
systems have become critical tools for organizations of all sizes and types to
remain competitive, efficient, and effective. They assist organizations in
achieving their objectives by enhancing internal operations, facilitating
communication and collaboration, and assisting in strategic decision-making.
Information systems study is multidisciplinary, combining elements of
computer science, management, and information technology.
9
Overview of
Management
In today's business, information systems are critical because they allow
Information System organizations to collect, store, and process data to make informed decisions.
These systems can be used to improve internal and external communication
and collaboration, as well as gain insights into customer behavior and market
trends. Furthermore, by providing real-time data and analysis, they can help
businesses become more agile, responsive to market changes, and
competitive. Information systems are critical for businesses to operate
effectively and efficiently in today's fast-paced and data-driven environment.
The combination of hardware, software, data, people, and procedures that
organizations use to collect, process, store, and disseminate information is
referred to as an information system. These systems aid in decision-making,
coordination, and control, and they assist organizations in achieving their
objectives. Simple manual systems to complex computer-based systems that
automate many business processes are examples of information systems.
Activity A
Write down examples of an information system that you know in real-time or
in your real life.
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
Internal information and external information are the two broad categories of
information. The illustration below depicts the scope of internal and external
information in the context of business organizations.
Organizational Dimension:
Organizations include information systems. The standard operating procedure
and culture of an organization will be embedded in an information system.
Functional specialities, business processes, culture, and political interest
groups are all part of this. This refers to the people, policies, and procedures
that govern how an organization's information system is used and managed.
This refers to how the information system fits into the organizational
structure and how it supports the organization's goals and objectives. A sales
management system, for example, is part of the organizational dimension 11
Overview of
Management
because it helps to improve sales performance.
Information System
Management Dimension:
Managers perceive environmental business challenges. Information systems
provide managers with the tools and information they need to allocate,
coordinate, and monitor their work, make decisions, create new products and
services, and make long-term strategic decisions. The policies, procedures,
and rules that govern the use of the information system are referred to as this.
The management dimension includes things like passwords, backup
procedures, and data security policies.
Technology Dimension:
Management makes use of technology to carry out their duties. Computer
hardware/software, data management technology and networking/telecom
technology are all part of it. It is one of many tools used by managers to deal
with change. This includes the hardware, software, data, and network
components that comprise an information system's technical infrastructure. A
server, a personal computer, and database software, for example, are all
examples of technical dimensions.
Strategic Dimension:
This entails aligning information systems with an organization's overall goals
and strategies. This includes decision-making processes as well as the impact
of information systems on the competitiveness and success of the
organization.
User dimension:
This refers to the information system's end users and how they interact with
it. An e-commerce website, for example, is part of the user dimension
because it allows customers to purchase goods and services.
Experts System:
Experts systems include expertise to assist managers in diagnosing and
solving problems. These systems are based on artificial intelligence research
principles. Experts Systems is a data-driven information system. It acts as an
expert consultant to users by applying its knowledge of a specific area. An
expert system's components are a knowledge base and software modules.
These modules perform knowledge inference and provide answers to user
questions.
IN BUSINESS?
Here are some of the business activities that require the intervention of an
information system.
17
Overview of
Management
Customer relationship management (CRM):
Information System
Customer Relationship Management (CRM) is a strategy that organizations
use to manage their interactions with customers and potential customers. The
goal of CRM is to create and maintain strong, lasting relationships with
customers by understanding their needs and behaviors and by delivering the
products, services, and experiences that they value. CRM is typically
achieved through the use of software and technology. CRM systems can
collect and store data about customers, including demographic information,
purchase history, and interaction history with the organization. This
information can be used to inform business decisions, such as which products
to develop or which customers to target with marketing campaigns.
Activity B
What is the role of information systems in business and society?
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
…………………………………………………………………………………
1.11 SUMMARY
In this unit, you have been introduced to information systems. First, we have
reviewed several definitions, focusing on the components of information
systems: technology, people, and process. Next, we have studied how the
business use of information systems has evolved over the years, from the use
of large mainframe computers for number crunching, through the
introduction of the PC and networks, all the way to the era of mobile
computing. Software and technology innovations allowed businesses to
integrate technology more deeply during each phase.
We are now to a point where every company uses information systems and
asks: Does it bring a competitive advantage? So, in the end, that is really
what this course is about what every businessperson should understand, what
an information system is and how it would use to bring a competitive
advantage.
2. The Walmart case study introduced you to how that company used
information systems to become the world’s leading retailer. Walmart has
continued to innovate and is still looked to as a leader in the use of
technology. Do some original research and write a one-page report
detailing a new technology that Walmart has recently implemented or is
pioneering.
19
Overview of
Management 1.13 FURTHER READINGS
Information System
1. “Does IT Matter?” by Nicholas Carr
2. "Information Systems: A Manager's Guide to Harnessing Technology"
by John Gallaugher
3. Wikipedia entry on "Information Systems," as displayed on August 19,
2012. Wikipedia: The Free Encyclopedia. San Francisco: Wikimedia
Foundation.
4. Information Systems Today - Managing in the Digital World, fourth
edition. Prentice-Hall, 2010.
5. Management Information Systems, twelfth edition, Prentice-Hall, 2012.
20
Introduction to
UNIT 2 INTRODUCTION TO MANAGEMENT Management
Information System
INFORMATION SYSTEM
Objectives
After studying this unit, you will be able to:
• Understand the organizational and Strategic view of MIS.
• Understanding the depth of different components of MIS.
• Understand the various terms involved in MIS.
Structure
2.1 Introduction to Management Information System (MIS)
2.2 Organizational & Strategic View of MIS
2.3 Information Systems and Technology
2.4 Database Management Systems
2.5 Data Analytics
2.6 Network and Telecommunication
2.7 Enterprise Resource Planning (ERP) Systems
2.8 Electronic Commerce (e-commerce)
2.9 Cybersecurity & Data Privacy
2.10 Business Intelligence
2.11 Project Management
2.12 System Development Life Cycle
2.13 IT Strategy and Management
2.14 Ethics and Legal Issues in Information Systems
2.15 Summary
2.16 Self-Assessment Exercises
2.17 Further Readings
Information systems are created for job positions rather than individuals.
Regardless of who holds the job position, information systems are designed
with the job responsibilities that the individual is supposed to perform in
mind and are dependent on the information needs of the individual in the
organisational hierarchy in mind. The information systems are designed for
different levels of management; they are intended to meet the information
needs of top, middle, and junior management decision-makers. The
information systems are intended to provide information to managers in
various functional areas. Managers in marketing, finance, production,
personnel, materials, logistics, and other areas receive the information.
Databases should be used to integrate information systems. Integrating
information systems eliminates data storage, processing, and report
generation redundancy. To reduce the likelihood of data integrity
discrepancies, ensure single-point data entry and upkeep of master data files.
Computers and other electronic devices help to facilitate information
systems.
Disadvantages of MIS:
Management Information Systems (MIS) is a significant tool for
organisations to help them manage their data and information effectively, but
like any system, it has its disadvantages.
The company can make data-driven decisions that can lead to increased sales,
customer loyalty, and competitive advantage by integrating technology and
data analysis into its overall business strategy. In this case, the MIS function
is viewed as a critical component of the company's overall growth and
success strategy.
In the following ways, Management Information System (MIS) is related to
organisations and strategy:
• Assists decision-making: MIS provides decision-makers with the
pertinent information they require to make informed decisions.
• Aligns with organisational strategy: MIS is designed and implemented
to support the overall strategy and goals of the organisation.
• Increases operational efficiency: MIS automates many business
processes, which reduces errors and increases efficiency.
• Promotes communication and collaboration: MIS promotes
communication and collaboration among departments and employees,
thereby improving organisational coordination and alignment.
• Improves competitiveness: MIS assists organisations in remaining
competitive by providing timely, accurate information and allowing them
to respond quickly to changing market conditions.
24
Introduction to
MIS is critical in assisting organisations in achieving their objectives by Management
providing the information and tools required for effective decision-making Information System
and operational performance.
As managers, the majority of you will work for companies that heavily rely
on information systems and make significant investments in information
technology. You will undoubtedly want to learn how to invest this money
wisely. Your company can outperform competitors if you make wise
decisions. You will waste valuable capital if you make poor decisions. This
book is intended to assist you in making informed decisions about
information technology and information systems. Information technology
(IT) encompasses all of the hardware and software that a company requires to
meet its business objectives. This includes not only computer machines,
storage devices, and handheld mobile devices, but also software such as the
Windows or Linux operating systems, the Microsoft Office desktop
productivity suite, and the many thousands of computer programmes found in
the average large firm.
The IS&T viewpoint is concerned with how information systems can be used
to support an organisation's day-to-day operations, automate manual
processes, and provide real-time access to information. This viewpoint
regards information systems as a collection of tools that can be used to
improve efficiency, cut costs, and boost productivity. Implementing an
enterprise resource planning (ERP) system, developing a customer
relationship management (CRM) system, and deploying a cloud-based data
storage and management system are all examples of IS&T in the MIS
context. These technologies can be used to support organisational operations,
manage data, and provide information for decision-making.
25
Overview of
Management 2.4 DATABASE MANAGEMENT SYSTEMS
Information System
Database Management Systems are concerned with the design, development,
and administration of databases, including data models and SQL. A database
is a collection of data that has been organised to efficiently serve many
applications by centralising the data and controlling redundant data. Instead
of storing data in separate files for each application, data is stored in such a
way that it appears to users to be stored in only one location. A single
database can support multiple applications. Instead of storing employee data
in separate information systems and files for personnel, payroll, and benefits,
a company could create a single common human resources database. A
database management system (DBMS) is software that allows an organisation
to centralise data, manage it efficiently, and provide application programmes
with access to the stored data. The database management system (DBMS)
serves as a bridge between application programmes and physical data files.
When an application programme requests a data item, such as gross pay, the
DBMS locates it in the database and returns it to the application.
Using traditional data files, the programmer would need to specify the size
and format of each data element used in the programme, as well as tell the
computer where to find them. By separating the logical and physical views of
the data, the DBMS relieves the programmer or end user of the task of
understanding where and how the data are stored. The logical view depicts
data as it would be perceived by end users or business specialists, whereas
the physical view depicts data organisation and structure on physical storage
media.
The use of relational databases, such as Microsoft SQL Server and Oracle,
for storing and retrieving data, as well as NoSQL databases, such as
MongoDB and Cassandra, for storing and retrieving large amounts of
unstructured data, are examples of DBMS in the MIS context. The DBMS
chosen will be determined by the organisation's specific needs, such as the
size of the data set, the complexity of the data relationships, and the system's
performance requirements.
Both cybersecurity and data privacy are critical aspects of MIS as they help
protect an organisation's information assets, maintain the trust of customers
and stakeholders, and comply with legal requirements.
28
Introduction to
2.10 BUSINESS INTELLIGENCE & DECISION Management
Information System
MAKING
This topic focuses on using data and analytics to make informed business
decisions, including data warehousing, business intelligence tools, and
dashboards. Business Intelligence (BI) in the context of Management
Information Systems (MIS) refers to the use of data, analytical tools, and
technology to support informed decision-making in an organisation. The goal
of BI is to turn data into actionable information that can inform and support
strategic, tactical, and operational decision-making. This can include
activities such as data collection and warehousing, data analysis, reporting,
and visualisation.
• Ethics: Refers to moral principles and values that guide behaviour in the
use of information and technology. This includes issues such as privacy,
accuracy, security and intellectual property.
• Privacy: Protecting the confidentiality and personal information of
individuals.
• Accuracy: Ensuring that the information stored and processed by IS is
accurate, up-to-date and free from errors.
• Security: Protecting the confidentiality, integrity and availability of
information and technology systems.
• Intellectual Property: Protecting the rights of creators and owners of
information, such as copyrights and patents.
• Legal issues in IS in MIS include:
• Compliance with laws and regulations: Organisations must comply with
laws and regulations related to information and technology, such as data
protection laws and privacy regulations.
• Liability: Organisations can be held responsible for the misuse of
information or technology systems.
• Electronic contracts: The legality and enforceability of electronic
contracts and agreements.
The goal of addressing ethics and legal issues in IS in MIS is to ensure the
responsible and legal use of information and technology and to protect the
rights and interests of individuals and organisations.
2.15 SUMMARY
Management Information System (MIS) is a systematic approach to studying
the information needs and decision-making at all levels of an organization. It
helps managers make informed decisions through reports, dashboards, and
visualizations.
An MIS can have a wide range of functions, from providing basic reports to
conducting complex data analysis and decision-making support. It involves
the use of hardware like computers and servers, and software like databases
and reporting tools. Implementing an MIS can bring numerous benefits to an
organization, including enhanced decision-making, increased efficiency, and
better coordination among different departments. However, the effectiveness
of an MIS largely depends on the quality and precision of the data used, as
well as the organization's capability to utilize the information provided. 31
Overview of In conclusion, an MIS is essential for organizations of all sizes and industries.
Management
Information System Providing timely and accurate information to managers can help
organizations to improve their operations, achieve their goals, and stay
competitive in today's fast-paced business environment.
32
System Development
UNIT 3 SYSTEM DEVELOPMENT LIFE Life Cycle (SDLC)
CYCLE (SDLC)
Objectives
After studying this unit, you will be able to:
• Understand the importance of SDLC in MIS
• Learn in detail about the phases of SDLC
• Deep knowledge about Methodologies of System Development Life
Cycle
Structure
3.1 Introduction of System Development Life Cycle (SDLC)
3.2 Phases of SDLC
3.3 Methodologies of System Development Life Cycle
3.4 Benefits of System Development Life Cycle
3.5 Possible Drawbacks of SDLC
3.6 Summary
3.7 Self-Assessment Exercises
3.8 Further Readings
• Planning: The organization identifies the need for a new system and
defines its objectives and scope during this stage. A feasibility study is
carried out to determine whether or not the project is feasible and the
resources required.
• Analysis: During this stage, the organization collects and analyses the
system requirements. Gathering requirements from stakeholders and
34 developing a detailed system specification are part of this stage.
System Development
• Design: The system design is created during this stage, which includes Life Cycle (SDLC)
the software and hardware architecture, database design, user interfaces,
and system security.
• Development: This stage entails the actual coding and development of
the system based on the previous stage's design. Developers design,
develop, debug, and test the system.
• Integration & Testing: The system is tested during this stage to ensure
that it meets the requirements and functions as expected. To validate the
system, various types of testing are performed, including unit testing,
integration testing, and acceptance testing.
• Implementation: The system is installed and deployed in a live
environment for end-users to use at this stage. The system is deployed in
a production environment and used by customers and end users.
• Maintenance: After the system has been deployed, this stage entails
providing support for it. The system may require maintenance and bug
fixes, as well as the addition of new features based on customer
feedback.
Waterfall Model
The Waterfall Model was the first to be introduced as a Process Model as
shown in Figure 3.2.
Iterative Model
The Iterative model begins with a simple implementation of a small set of
software requirements and iteratively improves the evolving versions until
the entire system is implemented and ready for deployment. An iterative life
cycle model does not attempt to begin with a complete set of requirements.
Instead, development begins with specifying and implementing only a
portion of the software, which is then reviewed to identify additional
requirements. This process is then repeated, resulting in a new version of the
software at the end of each model iteration.
Spiral Model
• Identification:
This phase begins with gathering the baseline spiral's business
requirements. This phase is used to identify system requirements,
subsystem requirements, and unit requirements in subsequent spirals as
the product matures. This phase also includes continuous communication
between the customer and the system analyst to understand the system
requirements. The product is deployed in the identified market at the end
of the spiral.
• Design:
The Design phase begins with conceptual design in the baseline spiral
and progresses to architectural design, logical module design, physical
product design, and final design in subsequent spirals.
• Construct or Build:
At each spiral, the Construct phase refers to the production of the actual
software product. In the baseline spiral, when the product is still being
thought about and the design is being developed, a POC (Proof of
Concept) is created to solicit customer feedback. Then, in subsequent
spirals with greater clarity on requirements and design details, a working
model of the software known as a build with a version number is
produced. These prototypes are sent to the customer for review.
V-Model
The 'V-Model' is a modern version of the traditional software development
model. The letter 'V' represents verification and validation and is an extension
40 of the Waterfall model. The crux of the V model is the connection between
System Development
each phase of testing and that of development. The phases of testing are Life Cycle (SDLC)
categorized as the "Validation Phase" and that development as the
"Verification Phase". As a result, for each stage of development, a
corresponding test activity is planned ahead of time.
Verification Phases:
Validation Phases:
• Unit Testing Phase: Unit tests are designed to validate single modules
and identify and eliminate bugs. A unit test is simply running a piece of
code to see if it provides the desired functionality. 41
Overview of • Integration Testing: Integration testing is the process of collaborating
Management
Information System pieces of code to ensure that they perform as a single entity.
• System Testing: When the entire system is ready, the application is run
on the target environment in which it must operate, and a conclusion is
drawn to determine whether the system is capable of performing
efficiently with the shortest response time.
• User Acceptance Testing: The user acceptance test plan is created
during the requirement analysis phase because when the software is
ready to be delivered, it is tested against a set of tests that must be passed
to certify that the product met its goal.
This model is used for rapid and ongoing release cycles, to implement minor
but significant changes between releases. This implies more tests and
iterations and is mostly applicable to removing minor issues from larger,
more complex projects. As you can see, different SDLC methodologies are
used depending on the specifics of each project, its requirements, the client's
core vision, and other factors. Knowing the specific characteristics of each
SDLC model can assist in selecting the best one to deliver a high-quality,
effective product.
Agile is defined as quick or adaptable. The term "Agile process model" refers
to an iterative software development approach. Agile methods divide tasks
into smaller iterations or parts and do not involve long-term planning
directly. The project scope and requirements are established at the start of the
development process. The number of iterations, duration, and scope of each
iteration are all clearly defined in advance.
In the Agile process model, each iteration is considered a short time "frame,"
typically lasting one to four weeks. The division of the entire project into
smaller parts aids in reducing project risk and overall project delivery time
requirements. Before a working product is demonstrated to the client, each
iteration involves a team going through the entire software development life
cycle, including planning, requirements analysis, design, coding, and testing.
3.6 SUMMARY
To sum up, the system development life cycle is a complex project
management model that encompasses the system creation from its initial idea
to its finalized deployment and maintenance. The SDLC includes 7 different
stages: planning, analysis, design, development, testing, implementation, and
maintenance – all these are particularly important for delivering a high-
quality cost-effective product in the shortest time frames. Learning the basics
of the SDLC performance, its major methodologies, great benefits, and
possible drawbacks can help you to set up an ergonomic system development
process that will help you to deliver the best outcome.
45
Overview of The software development life cycle can and is adapted by software
Management
Information System development teams based on the philosophy, methodology, and framework
they use when developing a specific software product, or by organizations.
The SDLC is a project management tool that should be tailored to the needs
of the project, the team working on it, and other key stakeholders involved in
the process. The names of the phases, their order, and whether they are
distinct or merged into one another change. However, every software
development project has a life cycle, and you should now understand its role
in project management and as a tool for improving outcomes.
Regarding this case, other Stakeholders are the Accounting department for
details of customer accounts, Customers who use the taxi company's services,
Other despatchers who work for the taxi company, Public Carriage Office
who are responsible for setting the tariff for taxis.
As per the above case scenario, the owner wants to improve the performance
of the company and provide quality service through the System Development
Life cycle. Explain the below questions to ensure performance and quality.
1. Define the project scope and objectives: Write down the specific goals,
deliverables, and timeline for your project.
7. Monitor and maintain the system: Regularly monitor the system for
performance and take action to resolve any issues that arise.
47
Overview of
Management
Information System
48
System Development
Life Cycle (SDLC)
BLOCK 2
BUSINESS INTELLIGENCE AND
DECISION MAKING
49
Overview of
Management
Information System
50
Introduction to
UNIT 4 INTRODUCTION TO BUSINESS Business Intelligence
INTELLIGENCE
Objectives
After studying this unit, you will be able to:
• Understand the depth of knowledge of Business Intelligence (BI) and
relative terminologies.
• Recognize the usage of various Business Intelligence tools and
techniques to collect, analyze, and interpret data from different sources.
• Gain insights into business operations, customer behaviour, market
trends, and other key areas of interest.
• Evaluate the goal of BI is to provide decision-makers with the
information they need to optimize business performance, reduce costs,
increase revenue, and achieve strategic objectives.
Structure
4.1 Introduction to Business Intelligence
4.2 Data Warehousing
4.2.1 Data Modeling and schema design
4.3 Data Mining and Analytics
4.4 Data Governance and Security
4.5 Business Intelligence Applications
4.6 Summary
4.7 Self-Assessment Exercises
4.8 Further Readings
After the data is collected, it is converted into a format that can be analyzed
with BI tools such as data visualization software, dashboards, and reporting
tools. These tools enable businesses to analyze large datasets, identify trends,
and gain a better understanding of their operations. One of the most
significant advantages of business intelligence is that it allows organizations
to make data-driven decisions. Businesses can identify areas for improvement
in their operations and make changes as a result of real-time data analysis.
For example, if a company's sales are declining, it can use BI tools to identify
the root cause of the problem and take corrective action. Another advantage
of BI is that it can assist organizations in optimizing their operations.
Businesses, for example, can identify areas where they can improve customer
satisfaction and increase sales by analyzing customer behaviour. They can
also use business intelligence to track inventory levels and optimize supply
chain operations to save money.
Assume a retail company wants to increase sales in its physical stores. The
company can use business intelligence to analyze customer behaviour and
identify areas where operations can be improved. First, the company can
collect data from various sources, such as point-of-sale systems, customer
loyalty programs, and social media platforms, using data mining techniques.
This information may include customer demographics, purchasing habits, and
product preferences. The company can then use data visualization tools to
build dashboards and reports that highlight key performance metrics like
sales per store, sales per employee, and customer satisfaction ratings. These
reports can assist the company in identifying customer behaviour trends and
patterns, as well as tracking the effectiveness of marketing campaigns and
promotions.
The company can make data-driven decisions to improve its operations based
on the insights gained from data analysis. For example, if data shows that a
specific product sells well in one store but not in others, the company can
stock more of that product in the underperforming stores. They can also use
the data to identify peak shopping hours and adjust staffing levels
accordingly to ensure that customers are served as soon as possible. By
analyzing customer behavior with business intelligence, the company can
optimize operations and increase sales in its physical stores. They can also
use the data analysis insights to improve their online sales and marketing
efforts, resulting in higher revenue and profits.
Purpose of BI:
Business intelligence (BI) serves several functions, including the following:
The data sources are the first component of a BI system. This includes all
data-generating and data-capture systems and applications, such as
transactional databases, customer relationship management (CRM)
systems, and social media platforms. Data sources are the raw materials
from which a BI system generates insights. Internal databases, external
sources such as social media and market research, and data warehouses
are examples of these sources. To provide a comprehensive view of the
organization's performance, a BI system must be able to access and
integrate data from all of these sources.
In the diagram above, data is collected from various sources, and the ETL
process converts unstructured data into structured and meaningful
information. This leads to the creation of a data warehouse, reporting, and
analysis to make better decisions.
Following the definition of the data model, the next step is to design the
schema that will be used to implement the data model in a specific database
management system. Selecting appropriate data types for each attribute,
defining tables and their columns, and establishing relationships between
tables using primary and foreign keys are all part of this process. Data Model
Schemas are commonly used to visually represent the architecture of a
database and serve as the foundation for an organization's Data Management
practice.
Choosing the right Data Model Schema can help to eliminate bottlenecks and
anomalies during software project execution. An incorrect Schema Design,
on the other hand, can cause several errors in an application and make
refactoring expensive. For example, if you didn't realize early on that your
58 application would require multiple table JOINS, your service will eventually
Introduction to
stop when you reach a certain number of users and data. Business Intelligence
The Data Model Schema design begins with a high level of abstraction and
progresses to become more concrete and specific, as with any design process.
Based on their level of abstraction, data models are generally classified into
three types. The process will start with a Conceptual Model, then a Logical
Model, and finally a Physical Model. Such data models provide a conceptual
framework within which a database user can specify the requirements,
structure, and set of attributes for configuring a Database Schema. A Data
Model also offers users a high-level design implementation that dictates what
can be included in the schema.
The following are some popular data model schemas:
• Hierarchical Schema
• Relational Schema
• Network Schema
• Object-Oriented Schema
• Entity-Relationship Schema
Hierarchical Schema:
A hierarchical schema is a type of database schema that organizes data in a
tree-like structure with a single root, with each node having one parent and
potentially multiple children. A tree schema or a parent-child schema is
another name for this type of schema. Data is organized top-down in a
hierarchical schema, with the parent node at the top of the tree representing
the most general information and child nodes below it representing more
specific information.
59
Business Intelligence A hierarchical schema for a company, for example, might have "Company"
& Decision Making
as the root node, with child nodes for "Departments," "Employees," and
"Projects." One of the primary benefits of a hierarchical schema is that it is
simple to understand and apply, making it ideal for small to medium-sized
databases with simple data relationships. However, when dealing with
complex relationships or when changes to the schema structure are required,
it can be limiting. Furthermore, because some data may need to be repeated at
multiple levels of the hierarchy, this type of schema can result in data
redundancy.
This data model arranges data using a tree-like structure, with the root node
being the highest. When there are multiple nodes at the top level, root
segments exist. It has nodes that are linked together by branches. Each node
has one parent, who may have multiple children. A one-to-many connection
between various types of data. The information is saved as a record and
linked together.
The "Company" node is the root of the hierarchy in this schema, with three
child nodes representing the company's departments. Each department node
has a manager node as its first child, followed by one or more employee
nodes. This structure enables efficient querying of data related to specific
departments or employees, as well as easy navigation of the company's
organizational structure.
Advantages:
• Easy to understand and implement: A hierarchical schema is a simple
and intuitive way to organize data. It is simple to comprehend and
implement.
• Querying data becomes more efficient because the hierarchical schema is
organized in a tree-like structure. This is due to the ease with which we
can navigate the hierarchy by following the links between parent and
child nodes.
• Data Integrity: A hierarchical schema ensures that data is always
consistent and that data integrity is maintained. This is because each
60
Introduction to
child node can only have one parent node, preventing data duplication Business Intelligence
and inconsistency.
• Improved Security: A hierarchical schema improves security because
access to nodes can be easily controlled by setting permissions at the
appropriate levels.
Challenges:
• Limited flexibility: Hierarchical schema has limited flexibility because
it can only represent data in a tree-like structure. This makes representing
complex data relationships difficult.
• Data redundancy: Because data may need to be duplicated at multiple
levels in the hierarchy, hierarchical schema can lead to data redundancy.
• Difficult to scale: Hierarchical schema can be difficult to scale because
adding new levels or nodes requires significant restructuring of the
schema.
• Inefficient updates: Updating data in a hierarchical schema can be
inefficient because changes to a parent node may necessitate updates to
all of its children nodes.
Relational schemas are important because they standardize the way data is
organized and accessed in a database. They make data management easier
and ensure data integrity by imposing rules and constraints on the data. They
also allow for efficient data querying and reporting, making it easier for
applications to retrieve the information they require.
Employee table:
Department table:
Foreign Key:
The field "department_id" in the "Employee" table is a foreign key that refers
to the field "department_id" in the "Department" table. This indicates that
each employee is assigned to a specific department. By joining the
"Employee" and "Department" tables on the "department_id" field, we can
answer questions like "What is the name of the department that employee
John Smith belongs to?" This is just a simple example; in practice, a
relational schema could be much more complex, with many more tables and
relationships.
Advantages:
• Standardization: Relational schema provides a standardized method of
organizing and accessing data in a database. This facilitates the
understanding of the data structure by developers and users, as well as
the access and manipulation of the data by applications.
Challenges:
Network Schema:
The network schema is a type of database schema that organizes data
similarly to the hierarchical schema, but with a more flexible and complex
structure. Data is organized as a graph in a network schema, with nodes
representing entities and edges representing relationships between them. In
contrast to the hierarchical schema, which allows only one parent for each
child, nodes in a network schema can have multiple parents, allowing for
more complex relationships between entities.
Employee record:
Set Fields:
Manager: Pointer to the manager's employee record.
Employee: Pointer to the employee records that belong to the department.
Project record:
Set Fields:
Manager: Pointer to the manager's employee record
Employee: Pointer to the employee records working on the project
We can use this schema to answer questions like "What are the names of the
employees working on project X?" by following the pointers from the project
record to the employee records, and "Who is the supervisor of employee Y?"
by following the pointer from the employee record to the supervisor's
employee record.
The network schema's flexibility in representing complex relationships
between entities is one of its advantages. Entities with multiple parents can
have more complex and flexible relationships than in the hierarchical schema.
Furthermore, the network schema supports many-to-many relationships,
which allow entities to have multiple relationships with other entities.
The network schema's complexity, on the other hand, can make it difficult to
understand and manage. It is more difficult to program and may be less
efficient than the hierarchical schema. Furthermore, the use of pointers or
links between records can make navigating and querying the data more
difficult.
64
Introduction to
Object-Oriented Schema: Business Intelligence
Advantages:
• Encapsulation: One of the primary characteristics of object-oriented
schema is encapsulation. It allows data to be hidden from other parts of
the program, limiting access to only defined interfaces. This reduces
complexity, improves modularity, and boosts security.
Challenges:
Entity-Relationship Schema:
An entity-relationship (ER) schema is a diagrammatic representation of a
database's entities, attributes, and relationships between them. It is a high-
level conceptual model of a database's structure. The ER schema is
commonly used to design relational databases and to communicate database
designs to developers and stakeholders. An entity in an ER schema is a real-
world object, concept, or event with its own identity and the ability to be
uniquely identified. Attributes describe the characteristics of entities and are
used to define their properties. Relationships describe the connection between
entities.
Entities, attributes, and relationships are the three main components of an ER
schema.
Entities: A rectangle represents an entity, and its name is written inside the
rectangle. An entity can be a person, a place, a thing, an event, or a concept.
Attributes are represented by an oval shape and are linked to their respective
entities by a line. An attribute defines an entity's properties and provides
additional information about it.
Book Author
book_id author_id
title name
genre nationality
publish_year
publisher_id
Publisher Borrower
publisher_id borrower_id
name Name
address Email
phone
Borrowed Book Borrowing Log
book_id borrow_id
borrower_id book_id
borrow_date borrower_id
return_date borrow_date
return_date
The lines connecting the entities represent their relationships. The "Book"
entity, for example, has a "publisher_id" attribute that links it to the
"Publisher" entity. The "Borrowed Book" entity has attributes "book_id" and
"borrower_id" that link it to the "Book" and "Borrower" entities, respectively.
67
Business Intelligence Finally, the "Borrowing Log" entity has attributes that describe how
& Decision Making
borrowers borrow and return books, and it is linked to both the "Book" and
"Borrower" entities.
Now that the data is in the data warehouse, the e-commerce company can
analyze the data and gain insights into its business operations using tools
such as SQL queries, data visualization software, and machine learning
algorithms. This can assist the company in making data-driven decisions to
improve customer satisfaction, boost sales, and optimize its supply chain.
ETL is a Data Warehousing process that stands for Extract, Transform, and
Load. An ETL tool extracts data from various data source systems,
transforms it in the staging area, and then loads it into the Data Warehouse
68 system.
Introduction to
Extraction: Business Intelligence
Extraction is the first step in the ETL process. In this step, data from various
source systems is extracted into the staging area in various formats such as
relational databases, No SQL, XML, and flat files. Because the extracted data
is in various formats and can be corrupted, it is critical to extract it from
various source systems and store it first in the staging area rather than directly
in the data warehouse. As a result, loading it directly into the data warehouse
may cause it to be damaged, and rollback will be much more difficult. As a
result, this is one of the most crucial steps in the ETL process.
Transformation:
Transformation is the second step in the ETL process. In this step, the
extracted data is subjected to a set of rules or functions to be converted into a
single standard format. It could include the following processes/tasks:
• Filtering is the process of loading only specific attributes into a data
warehouse.
• Cleaning entails replacing NULL values with default values, mapping
the U.S.A, United States, and America into the USA, and so on.
• Joining is the process of combining multiple attributes into one.
• Splitting is the process of dividing a single attribute into multiple
attributes.
• Sorting is the process of organizing tuples based on some attribute.
(generally key-attribute).
Loading:
Loading is the third and final step in the ETL process. The transformed data
is finally loaded into the data warehouse in this step. The data is sometimes
updated very frequently by loading it into the data warehouse, and other
times it is done at longer but regular intervals. The rate and duration of
loading are solely determined by the requirements and differ from system to
system.
69
Business Intelligence
& Decision Making 4.3 INTRODUCTION TO DATA MINING AND
ANALYTICS
Data mining and analytics are two related fields that involve extracting
insights and knowledge from data using computer algorithms and statistical
techniques. While the two terms are frequently used interchangeably, there
are some distinctions between them. Data mining is the process of identifying
patterns and relationships in large datasets by using algorithms. The goal of
data mining is to extract previously unknown insights and knowledge from
data that can then be used to make better decisions and predictions.
Data mining techniques can be used for a variety of purposes, including fraud
detection, market segmentation, and customer churn prediction. Analytics, on
the other hand, entails analyzing and interpreting data using statistical and
mathematical techniques. Analytics can be used to spot trends, forecast future
outcomes, and test hypotheses. In business, analytics is frequently used to
inform decision-making, such as optimizing pricing strategies or improving
supply chain efficiency.
Data mining and analytics are inextricably linked because both involve
working with data and extracting insights from it. Many techniques, such as
clustering, classification, and regression, are also shared. Data mining, on the
other hand, focuses on identifying patterns and relationships, whereas
analytics focuses on analyzing and interpreting data to make informed
decisions. Data mining and analytics are both critical tools for businesses and
organizations looking to maximize the value of their data. Businesses can
gain valuable insights into customer behavior, market trends, and operational
efficiency by using these techniques, which can help them stay ahead of the
competition and make data-driven decisions.
As an example, suppose a business wants to improve customer retention by
identifying customers who are likely to cancel their subscriptions. They have
a large dataset with data on customer demographics, behaviour, and
transaction history. To use data mining techniques, the company could group
customers based on their behaviour and transaction history using clustering
algorithms. Customers who have made large purchases in the past are more
likely to renew their subscriptions, whereas customers who have recently
decreased their purchase activity are more likely to cancel.
70
Introduction to
4.4 DATA GOVERNANCE AND SECURITY Business Intelligence
Data governance and security are two essential elements of any organization's
data management strategy. Data governance refers to the processes and
policies that ensure data is managed and used effectively and efficiently,
whereas data security refers to the safeguards put in place to prevent
unauthorized access, disclosure, modification, or destruction of data. The
creation of policies, procedures, and standards for managing data throughout
its lifecycle is referred to as data governance. It includes data quality,
privacy, security, and compliance management, as well as overall data asset
management. Data governance aims to ensure that data is managed
effectively and efficiently and that it is used to support business goals.
• Data mining tools: Extract information from large data sets to identify
patterns, correlations, and trends. This data can be used to inform
business decisions and improve operations.
4.6 SUMMARY
This unit gives an overview of Business Intelligence (BI), which is the
process of gathering, analyzing, and transforming raw data into useful
information that businesses can use to make informed decisions. BI employs
a wide range of tools, technologies, and strategies to access and analyze data
from a variety of sources, including databases, spreadsheets, and data
repositories. This unit discusses the advantages of business intelligence, such
as its ability to provide insights into business operations, identify areas for
improvement, and enable data-driven decision-making, which can increase
revenue and profitability. Dashboards, reports, and data visualizations are
also highlighted as tools to assist decision-makers in interpreting complex
data and identifying patterns and trends.
The unit also discusses some common BI tools and technologies, such as data
warehouses, ETL (Extract, Transform, Load) tools, analytics software, and
data visualization platforms. It also discusses the significance of data quality,
data governance, and data security in business intelligence. Overall, this unit
provides a thorough overview of Business Intelligence and its significance in
modern business operations. It focuses on the key concepts, strategies, and
technologies involved in business intelligence and explains how they can be
used to gain a competitive advantage.
75
Business Intelligence
& Decision Making UNIT 5 INFORMATION AND DECISION
MAKING
Objectives
After studying this unit, you will be able to:
• Understand how to manage information and make effective decisions in
real-time.
• Understand the decision-making process and decision-making models.
• Develop skills to assess the quality and relevance of information,
including identifying biases and evaluating sources of information.
Structure
5.1 Introduction to Information & Decision Making
5.2 The Decision-Making Process
5.3 Information Sources and Systems
5.4 Decision-Making Models and Tools
5.5 Summary
5.6 Self-Assessment Exercises
5.7 Further Readings
76
Information &
The Information & Decision-Making process can be applied in a variety of Decision Making
contexts, ranging from personal decision-making to business strategy
development. It is a valuable skill set for both individuals and organizations
because it allows them to make informed decisions that can have a significant
impact on their success.
Individuals and organizations can now make better use of data and make
more informed decisions thanks to advances in technology. It has improved
data processing speed and accuracy, and it has made it easier for decision-
makers to collaborate and share information. As technology advances, we can
expect it to play an increasingly important role in information and decision-
making, assisting in driving innovation and improving outcomes in a variety
of fields.
Challenges:
• Information Overload: With so much available information, it can be
difficult to separate the relevant information from the noise.
Opportunities:
• Big Data: The availability of big data allows decision-makers to make
more informed decisions.
After identifying the problem, the next step is to gather information about it.
This is a critical step in the decision-making process because it provides the
necessary information and insights to make informed and effective decisions.
Gathering relevant data, analyzing it to identify patterns and trends, and using
that information to guide decision-making are all part of this step. In most
cases, several key components are involved in the process of gathering and
analyzing information:
• Choose the best option: Select the alternative that best meets the
established criteria and is most likely to achieve the desired outcomes
based on the assessment. This may entail consulting with stakeholders or
conducting additional research to confirm the decision.
• Evaluate the risks: Consider the risks associated with each option. What
are the possible negative outcomes of each option, and how likely are
they to occur?
• Analyze trade-offs: Think about any trade-offs you might need to make
between different options. One option, for example, may be less
expensive but take longer to implement, whereas another may be more
expensive but faster.
A key feature of a DSS is that it allows users to interact with the system in
real-time, allowing them to ask questions, change inputs, and see the results
of their decisions right away. This interactivity assists users in better
understanding the consequences of their decisions and making more informed
decisions. Depending on the specific problem being addressed, DSS can be
designed to use a variety of decision-making techniques, such as
optimization, simulation, and artificial intelligence. They may also include
data visualization tools, such as charts and graphs, to assist users in
86
Information &
comprehending the information presented. Decision Making
• User interface: The user interface makes it simple to navigate the system.
The primary goal of the user interface of the decision support system is
to make it simple for the user to manipulate the data stored on it. The
interface can be used by businesses to assess the effectiveness of DSS
transactions for end users. Simple windows, complex menu-driven
interfaces, and command-line interfaces are all examples of DSS
interfaces.
Expert systems:
These are computer programs that use artificial intelligence to simulate a
human expert's decision-making abilities. Based on their knowledge and
experience, they can make recommendations and offer advice. Expert
systems are computer-based decision-making tools that solve complex
problems by applying knowledge and rules. These systems are intended to
mimic the decision-making abilities of a human expert in a particular domain.
An expert system captures and encodes a human expert's knowledge and
expertise in a computer program. This knowledge is typically represented by
rules and if-then statements, which the computer can use to reason and make
decisions.
• User interface: The expert system's user interface allows the user to
interact with it and provide input data.
88
Information &
Because they provide a consistent and reliable way to make decisions based Decision Making
on expert knowledge, expert systems can be an effective tool in decision-
making models. They are especially useful when there is a large amount of
data to be analyzed and the decision-making process necessitates a thorough
understanding of a particular domain.
The AHP model functions by first identifying the decision criteria and then
evaluating the relative importance of each criterion. This is accomplished by
using pair-wise comparisons to compare the criteria to one another. In
pairwise comparisons, each criterion is compared to every other criterion, and
a score is assigned to indicate how much more important one criterion is than
the other. These scores are then used to rank the criteria in order of
importance. The AHP model is used to evaluate alternatives based on how
well they meet each criterion after the criteria have been prioritized. The
alternatives are ranked based on how well they perform on each criterion, and
the overall ranking is determined by a weighted sum of the rankings.
89
Business Intelligence The AHP model incorporates both quantitative and qualitative data into the
& Decision Making
decision-making process, which is a key feature. This is especially useful
when decisions must be made based on several factors, such as financial
considerations, technical requirements, and stakeholder preferences. A real-
world application of AHP is the evaluation of potential renewable energy
sources. AHP could be used by a team to compare the criteria for various
renewable energy sources such as wind, solar, and hydropower. They would
then assess the relative importance of each criterion and the performance of
each energy source on those criteria. They could make a decision about
which renewable energy source to pursue based on the results.
Tableau: Users can use this data visualization software to create interactive
dashboards and charts to analyze data and gain insights into complex
business problems.
IBM SPSS Statistics: Researchers and data scientists use this statistical
analysis software to analyze data and make predictions using a variety of
statistical techniques.
Gurobi: Organizations use this optimization software to optimize complex
operations such as supply chain management, logistics, and scheduling.
FICO Decision Management Suite: This software suite is intended to assist
businesses in automating and optimizing decision-making processes
throughout the organization, utilizing data and analytics to make informed
decisions.
Palantir: To make sense of large and complex datasets, government agencies
and organizations in industries such as finance, healthcare, and energy use
this data integration and analytics platform.
SAP Business Objects: This business intelligence software helps
organizations make informed decisions by providing tools for data
visualization, reporting, and analytics.
These are only a few of the many decision-making tools and software options
available today. The key is to select the appropriate tool for the decision-
making problem at hand and to use it effectively to gain insights and make
sound decisions.
5.5 SUMMARY
The unit on Information and Decision Making discusses the various types of
information, decision-making models, and decision-making tools. It
emphasizes the significance of information in decision-making, emphasizing
the need for relevant and trustworthy data. The rational decision-making
model, bounded rationality model, intuitive decision-making model, and
91
Business Intelligence political decision-making model are all discussed in this chapter. Each model
& Decision Making
takes a different approach to decision-making, and the model chosen is
determined by the decision context.
The unit also discusses decision-making tools like Expert Systems, Analytic
Hierarchy Process (AHP), and Multi-Criteria Decision Analysis. (MCDA).
These decision-making tools provide a structured approach, allowing
decision-makers to evaluate multiple criteria and make informed choices. In
the end, the chapter emphasizes the importance of effective information
management and decision-making for organizations to make sound, informed
decisions. It provides a decision-making framework, emphasizing the
importance of data, models, and tools in the decision-making process.
Questions:
1. What type of information do you need to make this decision?
2. Which decision-making model would you use to decide whether to
launch the drug or not?
3. What are the potential risks associated with launching the drug?
4. How can MCDA aid in making the decision to launch the drug?
5. How can expert systems be used to support the decision-making process
in this scenario?
92
Spreadsheet Analysis
UNIT 6 SPREADSHEET ANALYSIS
Objectives
After studying this unit, you will be able to:
• Learn how to navigate, input data, use formulas and functions, format
data, and use built-in features of spreadsheet software like Excel.
• Analyse and organise data and understand the essential functions and
formulas.
• Learn how to create charts, graphs, and other visualisations to help
communicate insights and trends in the data.
Structure
6.1 Introduction to Spreadsheet Analysis
6.2 Basic Functions and Formulas
6.3 Spreadsheet Design and Formatting
6.4 Pivot Tables and Data Analysis
6.5 Summary
6.6 Self-Assessment Exercises
6.7 Further Readings
• Formatting: Formatting tools such as font, colour, and cell borders can
be used to make data more readable and visually appealing. Formatting
can also be used to highlight important data or to differentiate between
different types of data.
• Functions and formulas: Functions and formulas are used to perform
calculations on data in the spreadsheet. Common functions include SUM,
AVERAGE, and COUNT, while formulas use operators such as +, -, *,
and / to perform more complex calculations.
• Pivot tables: Pivot tables are a powerful tool used to summarise and
analyse large amounts of data. Pivot tables allow users to group, filter,
and analyse data in various ways, making it easier to identify trends and
patterns.
• Data validation: Data validation is used to ensure that data entered into
the spreadsheet meets certain criteria, such as a specific format or range
of values. This helps prevent errors and ensures that the data is accurate.
• Charts and graphs: Charts and graphs are used to visualise data and
make it easier to interpret. Common types of charts include bar charts,
line charts, and pie charts.
• Conditional formatting: Conditional formatting allows users to format
cells based on specific criteria. For example, cells can be formatted to
turn red if the value is below a certain threshold, or to turn green if the
value is above a certain threshold.
Spreadsheet analysis is a powerful tool for organising and analysing data, and
is widely used in many fields. With the right tools and techniques, users can
gain valuable insights into their data and make informed decisions based on
the results.
These tools are just a few examples of the many spreadsheet analysis tools
available. By using these tools and others, users can gain valuable insights
into their data and make more informed decisions based on the results.
95
Business Intelligence Spreadsheet Analysis Use Cases:
& Decision Making
Spreadsheet analysis can be used in a wide variety of contexts and industries.
Here are some examples of use cases:
These are just a few examples of the many ways that spreadsheet analysis can
be used to analyse and interpret data in various industries and contexts.
96
Spreadsheet Analysis
Excel Functions: A formula is a mathematical expression that computes the
value of a cell. Functions are predefined formulas that are already in Excel.
Functions carry out specific calculations in a specific order based on the
values specified as arguments or parameters. For example, =SUM (A1:A10).
This function adds up all the values in cells A1 through A10.
This horizontal menu, shown below, in more recent versions of Excel, allows
you to find and insert Excel formulas into specific cells of your spreadsheet.
On the Formulas tab, you can find all available Excel functions in the
Function Library:
The more you use Excel formulas, the easier it will be to remember and
perform them manually. Excel has over 400 functions, and the number is
increasing from version to version. The formulas can be inserted into Excel
using the following method:
1. Simple insertion of the formula - Typing a formula in the cell:
Typing a formula into a cell or the formula bar is the simplest way to insert
basic Excel formulas. Typically, the process begins with typing an equal sign
followed by the name of an Excel function. Excel is quite intelligent in that it
displays a pop-up function hint when you begin typing the name of the
function.
97
Business Intelligence 2. Using the Insert Function option on the Formulas Tab:
& Decision Making
If you want complete control over your function insertion, use the Excel
Insert Function dialogue box. To do so, go to the Formulas tab and select the
first menu, Insert Function. All the functions will be available in the dialogue
box.
3. Choosing a Formula from One of the Formula Groups in the Formula Tab:
This option is for those who want to quickly dive into their favourite
functions. Navigate to the Formulas tab and select your preferred group to
access this menu. Click to reveal a sub-menu containing a list of functions.
You can then choose your preference. If your preferred group isn’t on the tab,
click the More Functions option — it’s most likely hidden there.
98
Spreadsheet Analysis
4. Use Recently Used Tabs for Quick Insertion:
If retyping your most recent formula becomes tedious, use the Recently Used
menu. It’s on the Formulas tab, the third menu option after AutoSum.
99
Business Intelligence 2. SUBTRACTION:
& Decision Making
To use the subtraction formula in Excel, enter the cells you want to subtract
in the format =SUM (A1, -B1). This will subtract a cell from the SUM
formula by appending a negative sign before the cell being subtracted.
For example, if A3 was 300 and B3 was 225, =SUM(A1, -B1) would perform
300 + -225, returning a value of 75 in D3 cell.
3. MULTIPLICATION:
In Excel, enter the cells to be multiplied in the format =A3*B3 to perform the
multiplication formula. An asterisk is used in this formula to multiply cell A3
by cell B3.
For example, if A3 was 300 and B3 was 225, =A1*B1 would return a value
of 67500.
4. DIVISION:
To use the division formula in Excel, enter the dividing cells in the format
=A3/B3. This formula divides cell A3 by cell B3 with a forward slash, “/.”
For example, if A3 was 300 and B3 was 225, =A3/B3 would return a decimal
value of 1.333333333.
101
Business Intelligence 6. IF formula:
& Decision Making
In Excel, the IF formula is denoted as =IF(logical test, value if true, value if
false). This lets you enter a text value into a cell “if” something else in your
spreadsheet is true or false.
For example, You may need to know which values in column A are greater
than three. Using the =IF formula, you can quickly have Excel auto-populate
a “yes” for each cell with a value greater than 3 and a “no” for each cell with
a value less than 3.
7. PERCENTAGE:
To use the percentage formula in Excel, enter the cells you want to calculate
the percentage for in the format =A1/B1. To convert the decimal value to a
percentage, select the cell, click the Home tab, and then select “Percentage”
from the numbers dropdown. There is no specific Excel “formula” for
percentages, but Excel makes it simple to convert the value of any cell into a
percentage so you don’t have to calculate and reenter the numbers yourself.
The basic setting for converting a cell’s value to a percentage is found on the
Home tab of Excel. Select this tab, highlight the cell(s) you want to convert
to a percentage, and then select Conditional Formatting from the dropdown
menu (this menu button might say “General” at first). Then, from the list of
options that appears, choose “Percentage.” This will convert the value of each
highlighted cell into a percentage. This feature can be found further down.
102
Spreadsheet Analysis
8. CONCATENATE:
CONCATENATE is a useful formula that combines values from multiple
cells into the same cell. For example , =CONCATENATE(A3,B3) will
combine Red and Apple to produce RedApple.
103
Business Intelligence 9. DATE:
& Decision Making
DATE is the Excel DATE formula =DATE(year, month, day). This formula
will return a date corresponding to the values entered in the parentheses,
including values referred to from other cells.. For example, if A2 was 2019,
B2 was 8, and C1 was 15, =DATE(A1,B1,C1) would return 15-08-2019.
10. TRIM:
The TRIM formula in Excel is denoted =TRIM(text). This formula will
remove any spaces that have been entered before and after the text in the cell.
For example, if A2 includes the name ” Virat Kohli” with unwanted spaces
before the first name, =TRIM(A2) would return “Virat Kohli” with no spaces
in a new cell.
104
Spreadsheet Analysis
11. LEN:
LEN is the function to count the number of characters in a specific cell when
you want to know the number of characters in that cell. =LEN(text) is the
formula for this. Please keep in mind that the LEN function in Excel counts
all characters, including spaces:
For example,=LEN(A2), returns the total length of the character in cell A2
including spaces.
On the Home tab, in the Font group, click the arrow next to Borders Button
image, and then click the border style that you want.
The Borders button displays the most recently used border style. You can
click the Borders button (not the arrow) to apply that style.
105
Business Intelligence Change text color and alignment:
& Decision Making
1. Select the cell or range of cells that contain (or will contain) the text that
you want to format. You can also select one or more portions of the text
within a cell and apply different text colors to those sections.
2. To change the color of text in the selected cells, on the Home tab, in
the Font group, click the arrow next to Font Color , and then
under Theme Colors or Standard Colors, click the color that you want to
use.
To apply a color other than the available theme colors and standard
colors, click More Colors, and then define the color that you want to use
on the Standard tab or Custom tab of the Colors dialog box.
3. To change the alignment of the text in the selected cells, on
the Home tab, in the Alignment group, click the alignment option that
you want.
For example, to change the horizontal alignment of cell contents, click Align
Text Left , Center , or Align Text Right .
2. Under Theme Colors or Standard Colors, pick the color you want.
106
Spreadsheet Analysis
To use a custom color, click More Colors, and then in the Colors dialog box
select the color you want. To apply the most recently selected color, you can
just click Fill Color You'll also find up to 10 most recently selected custom
colors under Recent Colors.
3. On the Fill tab, under Background Color, pick the color you want.
4. To use a pattern with two colors, pick a color in the Pattern Color box,
and then pick a pattern in the Pattern Style box.
To use a pattern with special effects, click Fill Effects, and then pick the
options you want.
107
Business Intelligence Remove cell colors, patterns, or fill effects:
& Decision Making
To remove any background colors, patterns, or fill effects from cells, just
select the cells. Then click Home > arrow next to Fill Color, and then
pick No Fill.
If print options are set to Black and white or Draft quality — either on
purpose, or because the workbook has large or complex worksheets and
charts that caused draft mode to be turned on automatically — cells won't
print in color. Here's how you can fix that:
1. Click Page Layout > Page Setup dialog box launcher.
2. On the Sheet tab, under Print, uncheck the Black and white and Draft
quality check boxes.
The color on the top of the icon will apply to the highest values.
109
Business Intelligence 4. Click on the "Green - Yellow - Red Colour Scale" icon
& Decision Making
Now, the Speed value cells will have a colored background highlighting:
Dark green is used for the highest values, and dark red for the lowest values.
Charizard has the highest Speed value (100) and Squirtle has the lowest
Speed value (43). All the cells in the range gradually change color from
green, yellow, orange, then red.
Aligning Columns and Rows: To align columns or rows in Excel, select the
cells you want to align, then click the "Align Text" button in the "Home" tab
of the ribbon. From there, you can choose to left, center, or right align text
and data.
If you’d like to realign text in a cell to enhance the visual presentation of your
data, here’s how you can do it:
1. Select the cells that have the text you want aligned.
2. On the Home tab choose one of the following alignment options:
110
Spreadsheet Analysis
3. To vertically align text, pick Top Align , Middle Align , or Bottom
Align .
4. To horizontally align text, pick Align Text Left , Center , or Align
Text Right .
5. When you have a long line of text, part of the text might not be visible.
To fix this without changing the column width, click Wrap Text.
6. To center text spanning several columns or rows, click Merge & Center.
To create a chart or graph in Excel, select the data you want to chart, then
click the "Insert" tab on the ribbon. From there, you can choose the type of
chart or graph you want to create, such as a bar chart, line graph, or pie chart.
You can then use the formatting tools in the "Chart Tools" tab to customise
the appearance of your chart or graph.
• Bar Charts: The main difference between bar charts and column charts
are that the bars are horizontal instead of vertical. You can often use bar
charts interchangeably with column charts, although some prefer column
charts when working with negative values because it is easier to visualise
negatives vertically, on a y-axis.
112
Spreadsheet Analysis
• Line Charts: A line chart is most useful for showing trends over time,
rather than static data points. The lines connect each data point so that
you can see how the value(s) increased or decreased over a period of
time. The seven line chart options are line, stacked line, 100% stacked
line, line with markers, stacked line with markers, 100% stacked line
with markers, and 3-D line.
• Scatter Charts: Similar to line graphs, because they are useful for
showing change in variables over time, scatter charts are used
specifically to show how one variable affects another. (This is called
correlation.) Note that bubble charts, a popular chart type, is categorised
under scatter. There are seven scatter chart options: scatter, scatter with
smooth lines and markers, scatter with smooth lines, scatter with straight
lines and markers, scatter with straight lines, bubble, and 3-D bubble.
113
Business Intelligence
& Decision Making
There are also four minor categories. These charts are more use case-specific:
• Area: Like line charts, area charts show changes in values over time.
However, because the area beneath each line is solid, area charts are
useful to call attention to the differences in change among multiple
variables. There are six area charts: area, stacked area, 100% stacked
area, 3-D area, 3-D stacked area, and 3-D 100% stacked area.
• Stock: Traditionally used to display the high, low, and closing price of
stock, this type of chart is used in financial analysis and by investors.
However, you can use them for any scenario if you want to display the
range of a value (or the range of its predicted value) and its exact value.
Choose from high-low-close, open-high-low-close, volume-high-low-
close, and volume-open-high-low-close stock chart options.
114
Spreadsheet Analysis
115
Business Intelligence How to Chart Data in Excel:
& Decision Making
To generate a chart or graph in Excel, you must first provide the program
with the data you want to display. Follow the steps below to learn how to
chart data in Excel 2016.
We’ll use this chart for the rest of the walkthrough. You can download this
same chart to follow along.
117
Business Intelligence
& Decision Making
COLUMN CHART TEMPLATE
There are two tabs on the toolbar that you will use to make adjustments to
your chart: Chart Design and Format. Excel automatically applies design,
layout, and format presets to charts and graphs, but you can add
customisation by exploring the tabs. Next, we’ll walk you through all the
available adjustments in Chart Design.
118
Spreadsheet Analysis
To Display or Hide Axes:
1. Select Axes. Excel will automatically pull the column and row headers
from your selected cell range to display both horizontal and vertical axes
on your chart (Under Axes, there is a check mark next to Primary
Horizontal and Primary Vertical.)
2. Uncheck these options to remove the display axis on your chart. In this
example, clicking Primary Horizontal will remove the year labels on the
horizontal axis of your chart.
3. Click More Axis Options… from the Axes dropdown menu to open a
window with additional formatting and text options such as adding tick
marks, labels, or numbers, or to change text color and size.
119
Business Intelligence
& Decision Making
1. Click Add Chart Element and click Axis Titles from the dropdown menu.
Excel will not automatically add axis titles to your chart; therefore,
both Primary Horizontal and Primary Vertical will be unchecked.
120
Spreadsheet Analysis
121
Business Intelligence To Add Data Labels:
& Decision Making
1. Click Add Chart Element and click Data Labels. There are six options
for data labels: None (default), Center, Inside End, Inside Base, Outside
End, and More Data Label Title Options.
2. The four placement options will add specific labels to each data point
measured in your chart. Click the option you want. This customisation
can be helpful if you have a small amount of precise data, or if you have
a lot of extra space in your chart. For a clustered column chart, however,
adding data labels will likely look too cluttered. For example, here is
what selecting Center data labels looks like:
• None is the default setting, where the data table is not duplicated within
the chart.
• With Legend Keys displays the data table beneath the chart to show the
data range. The color-coded legend will also be included.
• No Legend Keys also displays the data table beneath the chart, but
without the legend.
123
Business Intelligence Note: If you choose to include a data table, you’ll probably want to make
& Decision Making
your chart larger to accommodate the table. Simply click the corner of your
chart and use drag-and-drop to resize your chart.
2. For example, when we click Standard Error from the options we get a
chart that looks like the image below.
To Add Gridlines:
1. Click Add Chart Element and click Gridlines. In addition to More Grid
Line Options, there are four options: Primary Major Horizontal, Primary
Major Vertical, Primary Minor Horizontal, and Primary Minor Vertical.
For a column chart, Excel will add Primary Major Horizontal gridlines
by default.
124
Spreadsheet Analysis
2. You can select as many different gridlines as you want by clicking the
options. For example, here is what our chart looks like when we click all
four gridline options.
To Add a Legend:
1. Click Add Chart Element and click Legend. In addition to More Legend
Options, there are five options for legend placement: None, Right, Top,
Left, and Bottom.
2. Legend placement will depend on the style and format of your chart.
Check the option that looks best on your chart. Here is our chart when 125
Business Intelligence we click the Right legend placement.
& Decision Making
To Add Lines: Lines are not available for clustered column charts. However,
in other chart types where you only compare two variables, you can add lines
(e.g. target, average, reference, etc.) to your chart by checking the appropriate
option.
To Add a Trendline:
1. Click Add Chart Element and click Trendline. In addition to More
Trendline Options, there are five options: None (default), Linear,
Exponential, Linear Forecast, and Moving Average. Check the
appropriate option for your data set. In this example, we will
click Linear.
126
Spreadsheet Analysis
2. Because we are comparing five different products over time, Excel
creates a trendline for each individual product. To create a linear
trendline for Product A, click Product A and click the blue OK button.
3. The chart will now display a dotted trendline to represent the linear
progression of Product A. Note that Excel has also added Linear
(Product A) to the legend.
127
Business Intelligence
& Decision Making
Note: You can create separate trendlines for as many variables in your chart
as you like. For example, here is our chart with trendlines for Product A and
Product C.
To Add Up/Down Bars: Up/Down Bars are not available for a column chart,
but you can use them in a line chart to show increases and decreases among
data points.
Step 4: Adjust Quick Layout
1. The second dropdown menu on the toolbar is Quick Layout, which
allows you to quickly change the layout of elements in your chart (titles,
legend, clusters etc.).
128
Spreadsheet Analysis
2. There are 11 quick layout options. Hover your cursor over the different
options for an explanation and click the one you want to apply.
129
Business Intelligence
& Decision Making
In this example, switching the row and column swaps the product and year
(profit remains on the y-axis). The chart is now clustered by product (not
year), and the color-coded legend refers to the year (not product). To avoid
confusion here, click on the legend and change the titles from Series to Years.
130
Spreadsheet Analysis
2. A window will open. Type the cell range you want and click
the OK button. The chart will automatically update to reflect this new
data range.
131
Business Intelligence Step 9: Change Chart Type
& Decision Making
1. Click the Change Chart Type dropdown menu.
2. Here you can change your chart type to any of the nine chart categories
that Excel offers. Of course, make sure that your data is appropriate for
the chart type you choose.
2. A dialogue box appears where you can choose where to place your chart.
You can either create a new sheet with this chart (New sheet) or place
this chart as an object in another sheet (Object in). Click the
132
Spreadsheet Analysis
blue OK button.
2. Click the dropdown menu on the top left side of the toolbar and click the
chart element you are editing.
By using these basic techniques, you can improve the design and formatting
of your spreadsheets and make them easier to read and understand.
Data analysis, on the other hand, is the process of examining and interpreting
data to identify patterns, trends, and insights. It is used to gain a deeper
understanding of data and to make informed decisions based on the results.
Pivot tables are a powerful tool for data analysis, as they allow users to
quickly and easily summarise and manipulate large amounts of data. By
using pivot tables in combination with other data analysis techniques, such as
charts, graphs, and statistical analysis, users can gain valuable insights into
their data and make informed decisions based on the results.
Consider the following table of sales data. From this data, you might have to
summarise total sales region wise, month wise, or salesperson wise. The easy
way to handle these tasks is to create a PivotTable that you can dynamically
modify to summarise the results the way you want.
Creating PivotTable
To create PivotTables, ensure the first row has headers.
• Click the table.
• Click the INSERT tab on the Ribbon.
• Click PivotTable in the Tables group. The PivotTable dialog box
appears.
134
Spreadsheet Analysis
As you can see in the dialog box, you can use either a Table or Range from
the current workbook or use an external data source.
• In the Table / Range Box, type the table name.
• Click New Worksheet to tell Excel where to keep the PivotTable.
• Click OK.
135
Business Intelligence
& Decision Making
Recommended PivotTables
In case you are new to PivotTables or you do not know which fields to select
from the data, you can use the Recommended PivotTables that Excel
provides.
• Click the data table.
• Click the INSERT tab.
• Click on Recommended PivotTables in the Tables group. The
Recommended PivotTables dialog box appears.
136
Spreadsheet Analysis
Click OK. The selected PivotTable appears on a new worksheet. You can
observe the PivotTable fields that was selected in the PivotTable fields list.
PivotTable Fields
The headers in your data table will appear as the fields in the PivotTable.
You can select / deselect them to instantly change your PivotTable to display
only the information you want and in a way that you want. For example, if
you want to display the account information instead of order amount
information, deselect Order Amount and select Account.
137
Business Intelligence
& Decision Making
PivotTable Areas
You can even change the Layout of your PivotTable instantly. You can use
the PivotTable Areas to accomplish this.
An instant update helps you to play around with the different Layouts and
pick the one that suits your report requirement.
You can just drag the fields across these areas and observe the PivotTable
layout as you do it.
138
Spreadsheet Analysis
In the PivotTable Areas, in rows, click region and drag it below salesperson
such that it looks as follows −
139
Business Intelligence
& Decision Making
Note − You can clearly observe that the layout with the nesting order –
Region and then Salesperson yields a better and compact report than the one
with the nesting order – Salesperson and then Region. In case Salesperson
represents more than one area and you need to summarise the sales by
Salesperson, then the second layout would have been a better option.
Filters
You can assign a Filter to one of the fields so that you can dynamically
change the PivotTable based on the values of that field.
140
Spreadsheet Analysis
The filter with the label as Region appears above the PivotTable (in case you
do not have empty rows above your PivotTable, PivotTable gets pushed
down to make space for the Filter.
141
Business Intelligence
& Decision Making
• Check the option Select Multiple Items. Check boxes appear for all the
values.
• Select South and West and deselect the other values and click OK.
The data pertaining to South and West Regions only will be summarised as
shown in the screen shot given below −
You can see that next to the Filter Region, Multiple Items is displayed,
indicating that you have selected more than one item. However, how many
items and / or which items are selected is not known from the report that is
displayed. In such a case, using Slicers is a better option for filtering.
Slicers
You can use Slicers to have a better clarity on which items the data was
142 filtered.
Spreadsheet Analysis
• Click ANALYSE under PIVOTTABLE TOOLS on the Ribbon.
• Click Insert Slicer in the Filter group. The Insert Slicers box appears. It
contains all the fields from your data.
• Select the fields Region and month. Click OK.
Slicers for each of the selected fields appear with all the values selected by
default. Slicer Tools appear on the Ribbon to work on the Slicer settings,
look and feel.
143
Business Intelligence Summarising Values by other Calculations
& Decision Making
In the examples so far, you have seen summarising values by Sum. However,
you can use other calculations also if necessary.
In the PivotTable Fields List
• Select the Field Account.
• Unselect the Field Order Amount.
The Value Field Settings box appears. Several types of calculations appear as
a list under Summarise value field by −
• Select Count in the list.
• The Custom Name automatically changes to Count of Account. Click
OK.
144
Spreadsheet Analysis
PivotTable Tools
Follow the steps given below to learn to use the PivotTable Tools.
• Select the PivotTable.
The following PivotTable Tools appear on the Ribbon −
• ANALYSE
• DESIGN
145
Business Intelligence Some of the ANALYZE Ribbon commands are −
& Decision Making
• Set PivotTable Options
• Value Field Settings for the selected Field
• Expand Field
• Collapse Field
• Insert Slicer
• Insert Timeline
• Refresh Data
• Change Data Source
• Move PivotTable
• Solve Order (If there are more calculations)
• PivotChart
Some of the DESIGN Ribbon commands are −
• PivotTable Layout
o Options for Sub Totals
o Options for Grand Totals
o Report Layout Forms
o Options for Blank Rows
• PivotTable Style Options
• PivotTable Styles
6.5 SUMMARY
Spreadsheet analysis refers to the process of using electronic spreadsheets,
such as Microsoft Excel or Google Sheets, to organise, manipulate, and
analyse data. It involves creating formulas and functions to perform
calculations and automate tasks, formatting data to make it more readable,
and using charts and graphs to visualise and communicate insights.
Spreadsheet analysis can be used for a wide range of applications, from
budgeting and financial analysis to inventory management and project
tracking. By leveraging the power of spreadsheets, analysts can save time,
reduce errors, and gain valuable insights from their data.
Spreadsheet analysis can be used in a variety of contexts, such as finance and
accounting, sales and marketing, human resources, project management, and
more. Some common use cases include budgeting, forecasting, financial
statement analysis, inventory management, and data analysis. To get the most
out of spreadsheet analysis, it's important to follow best practices such as
organising data in a logical and consistent manner, using descriptive labels
and formulas, keeping formulas simple and transparent, and testing and
validating calculations to ensure accuracy.
146
Spreadsheet Analysis
6.6 SELF-ASSESSMENT EXERCISES
Caselet:
You have been tasked with creating a budget spreadsheet for your household
expenses. Your goal is to create a spreadsheet that can track your income and
expenses, calculate your monthly savings, and provide a summary of your
spending by category.
Questions:
• Create a new Excel spreadsheet and label the first row as "Month" and
the second row as "Income" and "Expenses" respectively.
• Under the "Income" column, list your sources of income for the month
(such as salary, freelance work, or rental income).
• Under the "Expenses" column, list your expenses for the month,
including categories such as housing, food, transportation, and
entertainment.
• Use Excel's SUM function to calculate the total income and expenses for
the month.
• Create a formula to calculate your monthly savings by subtracting your
total expenses from your total income.
• Use conditional formatting to highlight any cells that are over budget or
below a certain threshold (such as a minimum savings amount).
• Create a pie chart or bar chart to visualise your spending by category.
• Use data validation to create a drop-down list of categories for your
expenses.
• Save the spreadsheet and update it each month to track your progress and
make adjustments as needed.
147
Business Intelligence
& Decision Making
148
Spreadsheet Analysis
BLOCK 3
RELATIONAL DATABASE
MANAGEMENT SYSTEM (RDBMS)
149
Business Intelligence
& Decision Making
150
Organizing Data
UNIT 7 ORGANIZING DATA
Objectives
After studying this unit, you will be able to:
• Define types of data
• Describe the processing of data
• Demonstrate and visualize data using graph and
• Interpret data for decision making
Structure
7.0 Introduction
7.1 Types of Data
7.1.1 Quantitative Data
7.1.2 Qualitative Data
7.1.3 Nominal Data
7.1.4 Ordinal Data
7.1.5 Interval Data
7.1.6 Discrete Data
7.1.7 Continuous Data
7.2 Data Processing
7.2.1 Coding of Data
7.2.2 Data Presentation for Clearer Reference
7.3 Let Us Sum Up
7.4 Glossary
7.5 Exercises
7.0 INTRODUCTION
In this unit we dealt with data and its types. Data organization is the way to
arrange raw data in an understandable order. Graphical representation,
classification and arranging of data are the part of organizing data. Data
organization helps in reading of data. We can easily work on organized data.
Organized data determines the cause of problems in the organization. Data is
knowledge in today’s world, good data provides help in making informed
decision in the organization. Data allows the happening visible, organized
data increases efficiency.
It's true that organizing data is essential for making informed decisions in an
organization. By arranging and classifying data in a logical and
understandable way, it becomes easier to analyze and interpret. Graphical
representation of data also helps in understanding complex patterns and
trends, and can be an effective way to communicate findings to others.
Additionally, having organized data can increase efficiency by reducing the
time and effort needed to find and extract relevant information.
151
Relational Database The aim of this unit is to teach you about the basics of data analysis and
Management System
(Rdbms) visualization, including understanding the nature of data, processing raw data
for presentation in graphical form, and classifying data to make informed
decisions in an organization. Understanding the nature of data is important in
order to properly interpret and analyze it. This includes understanding the
different types of data (e.g. numerical, categorical, ordinal etc.) and the
various measures used to describe and summarize it (e.g. mean, median,
mode, standard deviation etc.).
Processing raw data involves transforming it into a format that is more easily
analyzed and visualized, such as using spreadsheet software to organize and
prepare summary statistics. Graphical presentation of data is an effective way
to visually communicate patterns and relationships within the data, using
charts such as histograms, box and whisker plots, funnel charts to name a
few.
155
Relational Database 7.2.1 Coding of Data
Management System
(Rdbms)
Coding of data refers to the process of transforming collected data or
observations to a set of meaningful, cohesive categories. It is a process of
summarizing and re-presenting data in order to provide a systematic account
of the recorded or observed phenomenon. Coding is the analytic task of
assigning codes to non-numeric data. The data that is obtained from surveys,
experiments or secondary sources are in raw form. This data needs to be
refined and organized to evaluate and draw conclusions.
Data coding is the process of driving codes from the observed data. In
research the data is either obtained from observations, interviews or from
questionnaires. The purpose of data coding is to bring out the essence and
meaning of the data that respondents have provided. The data coder extracts
preliminary codes from the observed data, the preliminary codes are further
filtered and refined to obtain more accurate precise and concise codes.
After the coding of data, the next step involves visualization of the data
which is a complex process.
156
Organizing Data
7.2.2 Data Presentation for Clearer Reference
Data without a definite presentation, will be burdensome. Presentation of data
helps the researcher to make study meaningful. In this context we study types
of presentation of data in this sub section.
• Column Chart
A column chart is a type of chart that is commonly used to display data that is
arranged in columns or rows on a worksheet. In a column chart, categories
157
Relational Database are typically displayed along the horizontal (category) axis, while values are
Management System
(Rdbms) displayed along the vertical (value) axis. Each column in the chart represents
a different category, and the height of each column represents the value
associated with that category.
Column charts are useful for displaying data that can be divided into discrete
categories, such as sales by month, or the number of students in different
grade levels. They are also effective at showing changes in data over time,
such as changes in revenue from year to year. Additionally, column charts
can be easily customized to display different colors, labels, and formatting
options, making them a flexible and versatile tool for data visualization.
• Histogram
Where the grouped frequency of pens (from the above example) is written on
the X-axis and the numbers of students are marked on the Y-axis. The data is
presented in the form of bars. A histogram is a graphical representation of a
frequency distribution. It consists of a series of vertical bars, or bins, that
represent the frequency of occurrence of a range of values. The width of each
bin is determined by the range of values being analyzed, and the height of
each bin represents the frequency of occurrence of values within that range.
Overall, histograms are a useful tool for exploring and visualizing data, and
can provide valuable insights into the underlying patterns and trends within a
distribution.
158
Organizing Data
• Frequency Polygon
When you join the midpoints of the upper side of the rectangles in a
histogram, you get a Frequency Polygon
• Line Chart
A line chart is another type of chart commonly used to display data that is
arranged in columns or rows on a worksheet. In a line chart, the horizontal
axis represents evenly spaced categories, such as time periods, and the
vertical axis represents values. Each value is connected by a line, creating a
continuous visual representation of the data over time. 159
Relational Database Line charts are useful for displaying trends in data over time or for comparing
Management System
(Rdbms) multiple sets of data. They are effective at showing changes in data over time,
as well as identifying patterns or cycles in the data. Line charts can also be
used to compare multiple data sets, with each set represented by a different
line. Additionally, line charts can be easily customized with colors, labels,
and other formatting options to enhance their visual impact.
• Pie Chart
The pie chart is a type of chart that is commonly used to display data that is
arranged in one column or row on a worksheet. In a pie chart, the data points
are represented as slices of a circle, with each slice representing a proportion
of the whole. The size of each slice is proportional to the value of the data
point it represents, and the total size of the pie chart represents the sum of all
the values in the data series.
Pie charts are useful for displaying data that can be divided into categories or
parts, such as market share, budget allocations, or survey responses. They are
effective at showing the relative sizes of different categories and can be
easily customized with colours, labels, and other formatting options to
enhance their visual impact. However, pie charts can be difficult to read
when there are too many categories or when the data points are very similar
in size, making it hard to distinguish between them. In these cases, other
types of charts, such as bar charts or stacked column charts, may be more
suitable.
Pie of pie and bar of pie: Pie of pie or bar of pie charts show pie charts with
smaller values pulled out into a secondary pie or stacked bar chart, which
makes them easier to distinguish.
• Doughnut Chart
A doughnut chart is similar to a pie chart in that it also shows the relationship
of parts to a whole, but it can contain more than one data series. In a
doughnut chart, data points are represented as slices of a doughnut-shaped
circle, with each slice representing a proportion of the whole. The size of
each slice is proportional to the value of the data point it represents, and the
total size of the doughnut chart represents the sum of all the values in the data
series.
Doughnut charts are useful for displaying data that can be divided into
categories or parts, and can be effective at showing the relative sizes of
different categories or parts. They are also useful for comparing multiple data
series, with each series represented by a different doughnut slice. Doughnut
charts can be easily customized with colors, labels, and other formatting
options to enhance their visual impact. However, like pie charts, doughnut
charts can be difficult to read when there are too many categories or when the
data points are very similar in size.
• Bar Chart
A bar chart is a type of chart commonly used to display data that is arranged
in columns or rows on a worksheet. In a bar chart, categories are displayed
along the horizontal (category) axis, while values are displayed along the
vertical (value) axis. Each bar in the chart represents a different category, and
the length of each bar represents the value associated with that category.
Bar charts are useful for illustrating comparisons among individual items,
such as sales by product or revenue by department. They are also effective at
showing changes in data over time, such as changes in market share from
year to year. Additionally, bar charts can be easily customized with colors,
labels, and other formatting options to enhance their visual impact.
Area charts are useful for plotting change over time, and for drawing
attention to the total value across a trend. By showing the sum of the plotted
values, an area chart also shows the relationship of parts to a whole.
Additionally, area charts can be easily customized with colors, labels, and
other formatting options to enhance their visual impact. However, like other
charts, they can be difficult to read when there are too many data points, or
when the data points are very similar in size.
• Bubble Chart
Bubble charts are useful for displaying data that has three dimensions, such
as sales revenue, profit, and market share for different products. They can be
effective at showing patterns or relationships among different sets of data
points, and they can be customized with colors, labels, and other formatting
options to enhance their visual impact. However, like other charts, bubble
charts can be difficult to read when there are too many data points, or when
the data points are very similar in size.
Radar charts are useful for displaying data that has multiple variables, and for
showing the relative strengths or weaknesses of different data series. They
can be effective at highlighting patterns or relationships among the data, and
they can be customized with colors, labels, and other formatting options to
enhance their visual impact.
However, radar charts can be difficult to read when there are too many
variables, or when the data points are very similar in value. In addition,
interpreting the data in a radar chart can be challenging for people who are
not familiar with this type of chart, so it is important to use them
appropriately and provide clear explanations of the data being presented.
The "whiskers" in the plot extend vertically from the top and bottom of the
box, indicating the range of values that lie within 1.5 times the IQR above the
third quartile and below the first quartile. Any point outside those whiskers is
considered an outlier. Box and whisker plots are useful when comparing
multiple datasets, as they allow you to quickly compare the medians, ranges,
and variability of the data. They can also help identify potential outliers in the
data. However, they are less useful for visualizing the shape of the
distribution compared to other chart types, such as histograms or density
plots.
163
Relational Database Overall, box and whisker plots are a valuable tool for data analysis and
Management System
(Rdbms) visualization, particularly when comparing multiple datasets.
• Funnel Chart
Funnel charts are a type of chart used to represent values across multiple
stages in a process, such as a sales or marketing funnel. The bars in a funnel
chart are arranged in decreasing order, with the first bar being the largest and
the subsequent bars becoming progressively smaller. This creates a funnel-
like shape, with the largest section at the top and the smallest section at the
bottom.
164
Organizing Data
Check Your Progress:
1) What is Bar Graph? Create a Bar Graph for the following data
A B C D
4 12 10 2
7.3 SUMMARY
In this unit, we discussed the nature of quantitative and qualitative data, the
various methods of representing the quantified data graphically. The main
points are as follows:
1) Data collected by the researcher are raw in nature, it requires cleaning,
classification and editing decision-making process.
2) Data are classified into six categories viz Quantitative, Qualitative,
Ordinal, Ratio, Discrete and Continuous Data.
3) Data is represented broadly into three categories: Textual, Tabulation
and Graphical
4) Data coding helps the researcher to prepare tables and graphs.
5) Qualitative data consist of detailed descriptions of situations, events, and
people. interactions, and observed behaviors. These data are also
available in the form of direct quotations from people about their
experiences, attitudes, beliefs and thoughts.
7.4 KEYWORDS
Quantitative Data: Quantitative data is anything that can be counted in
definite units and numbers. Quantitative data is made up of numerical values
and has numerical properties, and can easily undergo math operations like 165
Relational Database addition and subtraction.
Management System
(Rdbms)
Qualitative Data: Qualitative data can’t be expressed as a number and can’t
be measured. Qualitative data consist of words, pictures, and symbols, not
numbers. Qualitative data is also called categorical data because the
information can be sorted by category, not by number.
Nominal Data: Nominal data is used just for labeling variables, without any
type of quantitative value.
Ordinal Data: Ordinal data shows where a number is in order. This is the
crucial difference from nominal types of data.
Funnel Charts: These are a type of chart used to represent values across
multiple stages in a process, such as a sales or marketing funnel. The bars in a
funnel chart are arranged in decreasing order, with the first bar being the
largest and the subsequent bars becoming progressively smaller.
7.5 EXERCISE
1) Explain types of data? How quantitative data is differed from qualitative
data
2) Create a coding table which contains Gender, Age, Qualification, and
Income
3) Prepare a bar chart to check your skill with mathematics
8 6 10 4
Male Female
55 45
166
Organizing Data
5) The frequency polygon of a frequency distribution is shown below.
i) What is the frequency of the class interval whose class mark is 15?
ii) What is the class interval whose class mark is 45?
167
Relational Database
Management System UNIT 8 STRUCTURED QUERY
(Rdbms)
LANGAUGE (SQL)
Objectives
After studying this unit, you will be able to:
• Explain relational database language.
• Create, modify, delete, and update the database using SQL.
• Understand Database management through queries and subqueries.
• Know how to control database access.
Structure
8.0 Introduction
8.0.1 Background
8.1 Data Definition Language (DDL)
8.2 Interactive Data Manipulation Language (DML)
8.3 View Definition
8.4 Transaction Control
8.5 Summary
8.6 Keywords
8.7 Self-Assessment Exercises
8.8 Further Readings
8.0 INTRODUCTION
SQL (Structured Query Language) is a widely used query language for
relational database management systems (RDBMS). It allows users to
manipulate and retrieve data from databases using various operations such as
select, insert, update, and delete. SQL provides a user-friendly interface for
querying databases and retrieving data in a structured and organized way.
168
Structured Query
SQL (Structured Query Language) is a programming language that is widely Langauge (SQL)
used for managing and manipulating data in relational databases. It allows
users to interact with databases to perform a wide range of tasks, including
querying and retrieving data, adding, modifying and deleting data, and
managing database structures and relationships.
SQL is an essential tool for working with relational databases, which are
organized into tables with rows and columns. It is used by database
administrators, analysts, and developers to manage and analyze data in a wide
range of industries, including finance, healthcare, retail, and more.
8.0.1 Background
IBM developed the original version of SQL as part of the System R project in
the early 1970s. The language was initially called Sequel, but its name was
later changed to SQL. Since then, SQL has become the standard language for
managing relational databases, and it is widely supported by various database
systems and products.
In 1986, ANSI and ISO published the first SQL standard, called SQL-86.
This standard established a common set of syntax and semantics for SQL,
ensuring that SQL implementations from different vendors could interoperate
with each other. IBM also published its own corporate SQL standard, the
SAA-SQL, in 1987.
In 1989, ANSI published an extended standard for SQL, called SQL-89,
which introduced additional features such as outer joins and null values. The
next version of the standard was SQL-92, which introduced more features
such as support for referential integrity constraints and triggers.
The most recent version of the SQL standard is SQL:1999, which introduced
new features such as support for XML data and object-relational extensions.
Since then, new versions of the standard have been released, including
SQL:2003, SQL:2006, SQL:2008, SQL:2011, and SQL:2016.
The SQL standardization process is a critical aspect of the language's
development and evolution. It helps to ensure that SQL remains a stable and
reliable language for managing relational databases, and that SQL
implementations from different vendors are interoperable and can work 169
Relational Database together seamlessly. The SQL standardization process is overseen by the
Management System
(Rdbms) International Organization for Standardization (ISO) and the American
National Standards Institute (ANSI). These organizations establish and
maintain a set of standards for SQL that specify the syntax, semantics, and
functionality of the language.
In addition to these basic DDL commands, there are other commands that are
used for managing the security and integrity of the database, such as granting
170 and revoking privileges, setting constraints, and defining triggers.
Structured Query
The DDL commands in SQL are essential for managing the schema objects Langauge (SQL)
in a relational database, and are a critical tool for database administrators and
developers in maintaining the integrity and security of the data.
As you mentioned earlier, DDL commands are used for defining and
managing the schema objects in a relational database. In contrast, DML
commands are used for manipulating the data stored in the database.
SELECT: This command is used to retrieve data from one or more tables in
the database. INSERT: This command is used to add new rows of data to a
table in the database. UPDATE: This command is used to modify existing
rows of data in a table in the database. DELETE: This command is used to
remove rows of data from a table in the database.
DCL commands are used for controlling access to the database objects. These
commands include:
GRANT: This command is used to give specific privileges to a user or group
of users on a database object.
Together, the DDL, DML, and DCL commands provide a comprehensive set
of tools for managing and administering relational databases. Database
administrators and developers use these commands to create and modify
database objects, manipulate data stored in the database, and control access to
the database objects to maintain the integrity and security of the data.
The SQL DDL allows specification of not only a set of relations, but also
information about each relation, including
The Schema for each Relation: the schema for each relation in a relational
database defines the structure of the data that is stored in the table. It specifies
the names of the columns or attributes, the data types of each column, and
any constraints or rules that govern the data in the table.
The schema for this table defines the structure of the data that can be stored
in the table. It specifies that the table has five columns, and it defines the data
types of each column. It also specifies that the Employee ID column must
contain unique values, and that the Department ID column must contain
integer values that correspond to the ID of a department in the "departments"
table.
Check Constraint: A check constraint is used to restrict the values that can
be inserted into a column. It ensures that only valid data is inserted into the
column.
Not Null Constraint: A not null constraint ensures that a column cannot
have a null value.
The Set of Indices to be Maintained for each Relation: Indexes are data
structures that are used to speed up the retrieval of data from a database. They
work by creating a copy of a subset of the data in a table and organizing it in
a way that makes it easier to search for specific values.
The set of indices to be maintained for each relation will depend on the
queries that are commonly run against the table. For example, if a table is
frequently queried using a certain column, it may be beneficial to create an
index on that column to speed up the search. Some common types of indexes
include:
Primary Key Index: This is an index that is created on the primary key
column of a table. It is used to enforce the primary key constraint and to
speed up searches that use the primary key.
Unique Index: This is an index that is created on a column that has a unique
constraint. It is used to enforce the unique constraint and to speed up searches
that use the unique column.
Clustered Index: This is an index that organizes the table data physically
based on the values in the indexed column. This can speed up queries that
access data in the order defined by the index.
Non-clustered Index: This is an index that creates a separate data structure
to organize the index data. It is used to speed up queries that search for
specific values in a non-indexed column.
The decision on which indexes to create for a particular table should be made
carefully, as creating too many indexes can slow down data modification
operations such as inserts, updates, and deletes.
The Security and Authorization Information for each Relation: Security
and authorization information for each relation is an important consideration
in database design. It involves determining who has access to the data in each
relation and what type of access they have.
Auditing: This involves logging access to the database and monitoring user
activity to detect any suspicious behavior or security breaches.
There are several factors that can influence the choice of physical storage
structure for a relation, including:
Access Patterns: The way data is accessed by queries can affect the choice
of physical storage structure. For example, if a relation is frequently accessed
using range queries, it may be more efficient to store the data in a sorted
order.
Storage Medium: The type of storage medium being used can affect the
choice of physical storage structure. For example, solid-state drives have
faster random-access times than hard disks, so they may be better suited for
storing data in a hash-based index.
Size of Relation: The size of the relation can also influence the choice of
physical storage structure. For small relations, a simple linear file may be
sufficient, whereas for larger relations, a more complex structure such as a B-
tree or hash table may be required.
Heap file Organization: This is the simplest storage structure, where data is
stored in an unordered list. It is useful for small relations or for append-only
workloads.
Where table name assigns the name of the table, column name defines the
name of the field, data type specifies the data type for the field and column
width allocates a specified size to the field.
Example 1 Example 2
create table account create table branch
(account-number char(10), (branch-name char(15),
branch-name char(15), branch-city char(30),
balance integer, assets integer,
primary key (account-number), primary key (branch-name),
check (balance >= 0)) check (assets >= 0))
CREATE TABLE Worker (
WORKER_ID INT NOT NULL PRIMARY KEY AUTO_INCREMENT,
FIRST_NAME CHAR(25),
LAST_NAME CHAR(25),
SALARY INT(15),
JOINING_DATE DATETIME,
DEPARTMENT CHAR(25)
);
176
Structured Query
INSERT INTO Worker Langauge (SQL)
• Procedural DMLs require a user to specify what data are needed and
how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a
user to specify what data are needed without specifying how to get those
data.
177
Relational Database Data manipulation is
Management System
(Rdbms) • The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
Declarative DMLs are usually easier to learn and use than are procedural
DMLs. However, since a user does not have to specify how to get the data,
the database system has to figure out an efficient means of accessing data.
The DML component of the SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of
a DML that involves information retrieval is called a query language.
Although technically incorrect, it is common practice to use the terms query
language and data manipulation language synonymously.
This query in the SQL language finds the name of the customer whose
customer-id
is 192-83-7465:
select customer. customer-name
from customer
where customer. customer-id = 192-83-7465
The query specifies that those rows from the table customer where the
customer-id is 192-83-7465 must be retrieved, and the customer-name
attribute of these rows must be displayed.
Queries may involve information from more than one table. For instance, the
following query finds the balance of all accounts owned by the customer with
customerid 192-83-7465.
select account. balance
from depositor, account
where depositor. customer-id = 192-83-7465 and
depositor. account-number = account. account-number
Order by clause
• It is used in the last portion of select statement
• By using this row can be sorted
• By default, it takes ascending order
• DESC: is used for sorting in descending order
• Sorting by column which is not in select list is possible
• Sorting by column Alias
Example:
SELECT EMPNO, ENAME, SAL*12 “ANNUAL”
FROM EMP
178
Structured Query
ORDER BY ANNUAL; Langauge (SQL)
Q.3 Write an SQL query to fetch “FIRST_NAME” from Worker table using
the alias name as <WORKER_NAME>.
Sequences
Sequences:
• automatically generate unique numbers
• are sharable
• are typically used to create a primary key value
• replace application code
• speed up the efficiency of accessing sequence Values when cached in
memory.
Example: Create a sequence named SEQSS that starts at 105, has a step of 1
and can take maximum value as 2000.
CREATE SEQUENCE SEQSS
180
Structured Query
START WITH 105 Langauge (SQL)
INCREMENT BY 1
MAX VALUE 2000;
Commit [WORK]
The optional keyword WORK is included for compatibility with some older
versions of SQL, but it is not necessary in modern SQL implementations.
Rollback Work: Causes the current transaction to be rolled back; that is, it
undoes all the updates performed by the SQL statements in the transaction.
Thus, the database state is restored to what it was before the first statement of
the transaction was executed. The keyword work is optional in both the
statements. Transaction rollback is useful if some error condition is detected
during execution of a transaction. Commit is similar, in a sense, to saving
changes to a document that is being edited, while rollback is similar to
quitting the edit session without saving changes. Once a transaction has
executed commit work, its effects can no longer be undone by rollback
work.
The database system guarantees that in the event of some failure, such as an
error in one of the SQL statements, a power outage, or a system crash, a
transaction’s effects will be rolled back if it has not yet executed commit
work. In the case of power outage or other system crash, the rollback occurs
when the system restarts. For instance, consider a banking application, where
we need to transfer money from one bank account to another in the same
bank.
181
Relational Database To do so, we need to update two account balances, subtracting the amount
Management System
(Rdbms) transferred from one, and adding it to the other. If the system crashes after
subtracting the amount from the first account, but before adding it to the
second account, the bank balances would be inconsistent. A similar problem
would occur, if the second account is credited before subtracting the amount
from the first account, and the system crashes just after crediting the amount.
As another example, consider our running example of a university
application. We assume that the attribute tot cred of each tuple in the student
relation is kept up-to-date by modifying it whenever the student successfully
completes a course.
8.5 SUMMARY
Structured Query Language (SQL) is a DML derived from relational
calculus. However, an SQL statement can be translated into equivalent
relational algebraic steps.
8.6 KEYWORDS
Data Definition Language: To specify the database schema and a data
manipulation language to express database queries and updates.
SQL includes a variety of language constructs for queries on the database. All
the relational-algebra operations, including the extended relational-algebra
operations, can be expressed by SQL. SQL also allows ordering of query
results by sorting on specified attributes.
184
DBMS Implementation
UNIT 9 DBMS IMPLEMENTATION AND And Future Trends
FUTURE TRENDS
Objectives
After studying this unit, you will be able to:
• Explain relational database language.
• Understand web interfaces to databases.
• Understand how to manage a database using a cloud-based solution.
• Know how to control database access.
Structure
9.0 Introduction
9.1 Web Interfaces to Database
9.2 Specialty Database
9.3 Automation and Database
9.4 Augmented Database
9.5 In-memory Database
9.6 Graph Database
9.7 Open-Source Database
9.8 Databases as Service
9.9 Data Mining
9.10 Summary
9.10 Keywords
9.11 Self–Assessment Exercises
9.12 Further Readings
9.0 INTRODUCTION
With the advent of cloud computing, the market for Database Management
systems has shifted towards cloud-based solutions. Cloud-based DBMS
solutions offer many advantages over on-premises solutions, such as
scalability, flexibility, and reduced infrastructure costs. Cloud-based
solutions also offer the ability to easily integrate with other cloud-based
services, such as analytics and machine learning tools, which can help
organizations gain insights from their data more quickly and efficiently.
Web interfaces to databases allow users to interact with and manipulate data
stored in a database through a web browser. These interfaces can be custom-
built or implemented using off-the-shelf software, and they provide a user-
friendly way for non-technical users to access and use data. Some common
types of web interfaces to databases include:
Web Forms: Web forms allow users to input data into a database through a
web browser. They can be used for tasks such as data entry, surveys, or
online orders.
Another important aspect of the Web is its ability to provide real-time access
to data. This means that users can access and interact with data in real-time,
rather than waiting for data to be processed and returned to them. This is
particularly important for applications such as financial transactions and real-
time monitoring systems.
Finally, the Web provides a way to easily distribute database applications to a
large number of users. With the use of cloud-based technologies,
organizations can deploy their database applications to a large number of
users without having to worry about managing hardware and software
infrastructure. This makes it easier for organizations to provide access to their
database applications to users located anywhere in the world.
Whenever relevant data in the database are updated, the generated documents
will automatically become up-to-date. The generated document can also be
tailored to the user on the basis of user information stored in the database.
Web interfaces provide attractive benefits even for database applications that
are used only with a single organization. browsers today can fetch programs
along with HTML documents, and run the programs on the browser, in safe
mode—that is, without damaging data on the user’s computer. Programs can
be written in client-side scripting languages, such as JavaScript, or can be
“applets” written in the Java language.
188
DBMS Implementation
A new service can be created by creating and installing an application And Future Trends
program that provides the service. The common gateway interface (CGI)
standard defines how the Web server communicates with application
programs. The application program typically communicates with a database
server, through ODBC, JDBC, or other protocols, in order to get or store
data.
CGI, or Common Gateway Interface, is a standard for interfacing external
applications with web servers to generate dynamic web content. CGI scripts
are used to enable web servers to execute programs or scripts, which can
generate dynamic content, such as web pages or other multimedia content.
CGI scripts can be written in many different programming languages,
including Perl, Python, and PHP.
Specialty databases are databases that are designed to handle specific types of
data or specialized applications. Some examples of specialty databases
include:
1. Geographic Information System (GIS) databases: These databases are
designed to store, retrieve, and analyze spatial data, such as maps and
satellite images.
2. Time-series databases: These databases are optimized for storing and
querying time-series data, such as stock prices, weather data, and sensor
data.
3. Graph databases: These databases are designed to store and query graph
structures, such as social networks and recommendation engines.
4. Document-oriented databases: These databases are optimized for storing
and querying semi-structured and unstructured data, such as text
documents and multimedia content.
5. In-memory databases: These databases store data in memory instead of
on disk, allowing for faster access and processing of data.
6. Real-time databases: These databases are designed to handle high-speed,
low-latency applications, such as financial trading systems and gaming
platforms.
Each type of specialty database has its own unique set of features and
capabilities, making it well-suited for specific types of applications and use
cases.
Big data refers to the large, complex datasets that cannot be easily managed
or processed by traditional database management systems. Big data solutions
typically involve distributed computing technologies that allow for parallel
processing of data across many servers, as well as specialized tools and
algorithms for data analysis and machine learning.
Big data refers to extremely large and complex data sets that cannot be
effectively processed and analyzed using traditional data processing tools and
techniques. These data sets are typically too large to be managed by a single
computer or traditional database management system, and they may also be
unstructured or semi-structured, meaning that they do not fit neatly into
traditional database structures.
Big data is often characterized by the "3Vs": volume, velocity, and variety.
Volume refers to the sheer size of the data, velocity refers to the speed at
which data is generated and needs to be processed, and variety refers to the
different forms and structures of data.
190
DBMS Implementation
To handle big data, specialized tools and technologies are needed, such as And Future Trends
distributed computing systems like Hadoop and Spark, NoSQL databases,
and data warehouses. These technologies allow for the processing and
analysis of large, complex data sets to extract valuable insights and
knowledge.
Unlike traditional SQL databases, NoSQL databases do not use the relational
model, and instead use a variety of data models, such as document-oriented,
graph-based, key-value, and column-family. Each data model is optimized
for specific types of data and use cases, allowing NoSQL databases to be
highly specialized for different applications.
Some examples of popular NoSQL databases include MongoDB, Cassandra,
Couchbase, and Amazon DynamoDB. These databases are often used in big
data applications, real-time data processing, content management, and other
data-intensive applications where scalability, flexibility, and performance are
critical.
Finally, database management systems are increasingly being integrated with
other technologies, such as cloud computing and artificial intelligence (AI).
Cloud-based database solutions allow for scalable and flexible data storage
and processing, while AI tools can be used to automate tasks such as data
cleaning, classification, and analysis. Artificial intelligence (AI) refers to the
development of computer systems that can perform tasks that typically
require human intelligence, such as visual perception, speech recognition,
decision-making, and natural language processing.
AI systems use techniques such as machine learning, deep learning, and
neural networks to analyze data, recognize patterns, and make decisions
based on that data. These systems can be used to automate and optimize a
wide range of processes, from customer service to medical diagnosis, and
they have the potential to revolutionize the way we live and work.
XML (Extensible Markup Language) was initially designed as a way to add
structured information to text documents. However, it has since become
widely used as a format for exchanging data between different applications
and systems. One of the key advantages of XML is its flexibility in
representing data with nested structure, which makes it useful for storing and
exchanging nontraditional data formats. 191
D
Relational Database XML (Extensible Markup Language) is a markup language used to store and
Management System
(Rdbms) transport data. It uses tags to define elements and attributes to provide
additional information about those elements. Unlike HTML, which is used to
define the structure and presentation of web pages, XML is used to define
data and its structure. XML is designed to be both human-readable and
machine-readable, making it a popular choice for exchanging data between
different systems. It is widely used in web services, where it provides a
standard format for exchanging data between different applications and
platforms. XML documents can be validated against a schema, which defines
the rules and constraints for the structure and content of the document. This
helps to ensure the accuracy and consistency of the data.
However, open-source databases may also have some limitations. They may
not provide the same level of support and documentation as proprietary
databases, and may require more technical expertise to set up and maintain.
Additionally, not all open-source databases have the same level of
functionality and scalability as proprietary databases, so users should
carefully evaluate their needs before selecting an open-source database.
Open-source databases can provide a cost-effective and flexible option for
organizations that require database management solutions. While they may
not be suitable for all applications, open-source databases can provide a
valuable tool for organizations that require customizable and affordable data
management.
One of the key benefits of DBaaS is its flexibility and scalability. Customers
can easily scale their database resources up or down to meet changing
demand without having to worry about hardware or infrastructure costs.
Additionally, DBaaS providers typically offer a range of database options,
such as relational, NoSQL, and graph databases, allowing customers to select
the database type that best meets their needs.
However, DBaaS may also have some limitations. Since the database is
hosted on the cloud, customers may experience latency or network
performance issues. Additionally, customers may have limited control over
the database configuration and maintenance, which may be a concern for
organizations with complex or specialized database requirements.
DBaaS can provide significant benefits for organizations that require flexible
and scalable database management solutions without the burden of
infrastructure maintenance. While it may not be suitable for all applications,
DBaaS can provide a valuable tool for organizations that require simplified
and cost-effective data management.
196
DBMS Implementation
9.10 SUMMARY And Future Trends
We have pointed out in this unit that there is resistance to new DBMS tools.
In spite of the apparent resistance, many more organizations are moving in
towards the use of DBMS. It is also quite clear that organizations will have to
move towards DBMS, especially those based on the relational approach in
order to maintain their competitive position in the emerging marketplaces.
9.10 KEYWORDS
Cloud-based DBMS: they often come with built-in security features, such as
encryption and multi-factor authentication, which can help to protect
sensitive data.
Legacy Systems: Refer to older computer systems, software applications, or
hardware components that are no longer considered up-to-date or supported
by the vendor.
Web Interfaces to Databases: allow users to interact with and manipulate
data stored in a database through a web browser. These interfaces can be
custom-built or implemented using off-the-shelf software, and they provide a
user-friendly way for non-technical users to access and use data. Some
common types of web interfaces to databases include Webforms, Reporting
interfaces, Dashboard interfaces, Mobile interfaces etc.
NoSQL: NoSQL (Not Only SQL) databases are a class of databases that are
designed to handle large and complex data sets, which may be unstructured
or semi-structured, and do not fit well into the traditional tabular format of
SQL databases.
198
DBMS Implementation
And Future Trends
BLOCK 4
EMERGING TECHNOLOGIES FOR
BUSINESS
199
Relational Database D
Management System
(Rdbms)
200
Cloud Computing
UNIT 10 CLOUD COMPUTING
Objectives
After studying this unit, you will be able to:
• Appreciate cloud computing, and classify services of cloud computing.
• Understand cloud computing architecture.
• Comprehend the platforms for the development of cloud applications and
list the applications of cloud services.
• Summarise the features and associated risks of different cloud
deployment and service models.
• Appreciate the emergence of the cloud as the next-generation computing
paradigm.
Structure
10.0 Introduction to Cloud Computing
10.0.1 History and Evolution of Cloud Computing
10.0.2 Types of Cloud
10.0.3 Cloud Components
10.0.4 Cloud Computing Infrastructure
10.0.5 Pros and Cons of Cloud Computing
10.1 Cloud Computing Architecture
10.2 Cloud Deployment Models
10.3 Service Management in Cloud Computing
10.4 Data Management in Cloud Computing
10.5 Resource Management and Security in Cloud
10.5.1 Inter Cloud Resource Management
10.5.2 Resource Provisioning and Resource Provisioning Methods
10.5.3 Global Exchange of Cloud Resources
10.5.4 Cloud Security Challenges
10.5.6 Security Governance – Virtual Machine Security – IAM –Security Standards
10.6 Cloud Technologies and Advancements
10.6.1 Hadoop
10.6.2 Virtual Box
10.6.3 Google App Engine
10.6.4 Open Stack
10.6.5 Federation in the Cloud – Four Levels of Federation
10.7 Case Studies
10.8 Summary
10.9 Self-Assessment Excercises
10.10 Keywords
10.11 Further Readings
201
Emerging
Technologies for 10.0 INTRODUCTION TO CLOUD COMPUTING
Business
Cloud computing refers to the delivery of computing services such as storage,
processing power, and software applications over the internet. Instead of
having to install and manage software on their own computers, users can
access these services on-demand from a remote server or data centre.
The modern era of cloud computing began in the mid-2000s, with the launch
of Amazon Web Services (AWS) in 2006. AWS provided a platform for
developers to build and deploy web applications without the need for
physical infrastructure. This marked the beginning of the Infrastructure as a
Service (IaaS) model, which has since become a dominant part of the cloud
computing landscape.
In 2008, Google launched its cloud-based application suite, Google Apps,
which provided businesses with email, word processing, and spreadsheet
software accessible through a web browser. This marked the beginning of the
Software as a Service (SaaS) model, which has since become a popular
choice for businesses looking to reduce their software licensing and
maintenance costs.
In 2010, the Platform as a Service (PaaS) model was introduced with the
launch of Heroku, a cloud-based platform for deploying and managing web
applications. PaaS provided developers with a platform for building and
deploying applications without having to worry about the underlying
infrastructure.
Since then, cloud computing has continued to evolve and expand, with new
technologies and services being introduced on a regular basis. Today, cloud
computing has become an integral part of many businesses' IT strategies,
202 enabling them to improve their agility, reduce costs, and focus on innovation.
Cloud Computing
10.0.2 Types of Cloud
Cloud computing can be categorized into four main types based on their
deployment models and ownership of infrastructure. These types are:
203
Emerging Platform Layer: The Platform layer provides a platform for building and
Technologies for
Business deploying applications, without the need to manage the underlying
infrastructure. Platform as a Service (PaaS) providers like Heroku, Google
App Engine, and Microsoft Azure App Service offer a platform for
developers to build, deploy, and manage their applications, with built-in
scalability and flexibility.
Software Layer: The Software layer provides software applications and
services that can be accessed over the internet, without the need to install and
manage them on local devices. Software as a Service (SaaS) providers like
Salesforce, Microsoft Office 365, and Google Workspace offer software
applications that are hosted and managed by the provider, allowing users to
access them from anywhere with an internet connection.
Each deployment model has its own set of benefits and challenges, and the
right model for an organization will depend on its specific needs and
requirements.
211
Emerging • Federation: Federation is another approach to the global exchange of
Technologies for
Business cloud resources. In a federated cloud environment, multiple cloud
providers collaborate to offer a unified cloud environment to customers.
This approach enables customers to access resources from different
cloud providers through a single interface.
• Resource Management: Effective resource management is essential for
the global exchange of cloud resources. Cloud providers should offer
tools and services that enable customers to manage their resources
effectively across different cloud environments.
• Security: Security is another critical aspect of the global exchange of
cloud resources. Cloud providers should offer robust security measures
to protect customer data and resources when sharing resources across
different cloud environments.
• Billing and Metering: Billing and metering are essential aspects of the
global exchange of cloud resources. Cloud providers should offer
transparent billing and metering models that enable customers to track
their resource usage and costs across different cloud environments.
Effective global exchange of cloud resources requires collaboration between
different cloud providers and effective management of resources and
security. Cloud providers should offer standard APIs and protocols that
enable seamless sharing of resources across different cloud environments and
provide tools and services to enable effective resource management and
security. Organizations need to develop policies and procedures that align
with their business objectives and regulatory requirements to ensure that
resources are shared effectively and efficiently.
10.6.1 Hadoop
Hadoop is an open-source framework that is used for distributed storage and
processing of large data sets. The framework is designed to be scalable, fault-
tolerant, and cost-effective, making it an ideal solution for big data
processing. Hadoop consists of two main components:
Hadoop has become a popular solution for big data processing due to its
scalability, fault-tolerance, and cost-effectiveness. Many organizations,
including large enterprises, use Hadoop to process and analyse large data
sets, such as log files, sensor data, and social media data.
Overall, VirtualBox is a powerful and flexible tool for virtualization, and its
open-source nature makes it a popular choice among developers and
enthusiasts alike.
GAE provides several key features for building web applications, such as
automatic scaling, load balancing, and a NoSQL datastore. It also offers
integration with other Google Cloud Platform services, such as Google Cloud
Storage and Google Cloud SQL. In addition, GAE offers a flexible pricing
model, with both free and paid tiers depending on usage and resources
required. The free tier allows developers to test and deploy their applications
at no cost, while the paid tier offers additional features and resources for
larger-scale applications.
• Capital One: Capital One, the financial services company, has adopted
cloud computing to improve its agility and innovation. Capital One has
used AWS to build a cloud-based platform for developing and deploying
new financial products and services. The platform has enabled Capital
One to quickly experiment with new ideas and bring them to market
faster, while also reducing costs and improving scalability.
These are just a few examples of how cloud computing has enabled
companies to improve their agility, innovation, scalability, and cost-
effectiveness.
10.8 SUMMARY
In summary, cloud computing is a technology that allows organizations to
store, manage, and access data and applications over the internet, rather than
on local servers or personal computers. Cloud computing offers several
benefits, including scalability, cost-effectiveness, flexibility, and reliability.
Cloud computing can be classified into several models, including public
cloud, private cloud, hybrid cloud, and multi-cloud. Cloud computing has
also enabled the development of new technologies, such as server less
computing, edge computing, and Kubernetes, which offer new ways of
managing and deploying applications in the cloud. Cloud computing has
become increasingly popular in recent years, with many companies, from
small startups to large enterprises, adopting cloud-based solutions to improve
218
Cloud Computing
their operations and competitiveness. Overall, cloud computing is a
transformative technology that is changing the way we work, communicate,
and do business in the digital age.
10.10 KEYWORDS
1. Cloud computing: A technology that allows users to store, manage, and
access data and applications over the internet, rather than on local servers
or personal computers.
2. Public cloud: A cloud computing model in which cloud services are
provided by third-party providers, accessible to anyone on the internet.
3. Private cloud: A cloud computing model in which cloud services are
provided exclusively for a single organization, either on-premises or by a
third-party provider.
4. Hybrid cloud: A cloud computing model that combines public and
private cloud services, allowing organizations to take advantage of the
strengths of each model.
5. Multi-cloud: A cloud computing model that involves using multiple
cloud services from different providers to achieve greater flexibility,
resilience, and cost-effectiveness.
6. Infrastructure as a Service (IaaS): A cloud computing service model
in which providers offer virtualized computing resources, such as
servers, storage, and networking, over the internet.
7. Platform as a Service (PaaS): A cloud computing service model in
which providers offer a platform for developing, deploying, and
managing applications over the internet.
8. Software as a Service (SaaS): A cloud computing service model in
which providers offer software applications over the internet, accessible
through web browsers or APIs.
9. Serverless computing: A cloud computing model in which cloud
providers manage the infrastructure and automatically allocate resources
219
Emerging based on the demands of the application, allowing developers to focus on
Technologies for
Business writing and deploying code.
10. Edge computing: A paradigm shift in cloud computing that enables data
processing to be done closer to the source of data, instead of sending all
data to the cloud.
221
Emerging
Technologies for UNIT 11 BIG DATA
Business
Objectives
After studying this unit, you will be able to:
• Understanding the concept of Big Data, its characteristics, and the
challenges associated with it.
• Familiarizing with the Hadoop ecosystem and its components.
• Understanding the basics of MapReduce.
• Learning utility of Pig, a high-level platform for creating MapReduce
programs, to process and analyse data.
• Understanding the basics of machine learning algorithms for Big Data
analytics.
Structure
11.0 Introduction to Big Data
11.0.1 Data Storage and Analysis
11.0.2 Characteristics of Big Data
11.0.3 Big Data Classification
11.0.4 Big Data Handling Techniques
11.0.5 Types of Big Data Analytics
11.0.6 Typical Analytical Architecture
11.0.6 Challenges in Big Data Analytics
11.0.7 Case studies: Big Data in Marketing and Sales, Healthcare, Medicine, and
Advertising
11.1 Hadoop Framework & Ecosystem
11.1.1 Requirement of Hadoop Framework
11.1.2 Map Reduce Framework
11.1.3 Hadoop Yarn and Hadoop Execution Model
11.1.4 Introduction to Hadoop Ecosystem Technologies
11.1.5 Databases: HBase, Hive
11.1.6 Scripting language: Pig, Streaming: Flink, Storm
11.2 Spark Framework
11.3 Machine Learning Algorithms for Big Data Analytics
11.4 Recent Trends in Big Data Analytics
11.5 Summary
11.6 Self–Assessment Exercises
11.7 Keywords
11.8 Further Readings
222
Big Data
11.0 INTRODUCTION TO BIG DATA
Big data refers to the vast amount of structured and unstructured data that is
generated and collected by individuals, organizations, and machines every
day. Data is too large and complex to be processed by traditional data
processing applications, which often have limitations in terms of their
capacity to store, process, and analyse large datasets.
To process and analyse big data, specialized technologies and tools such as
Hadoop, Spark, and NoSQL databases have been developed. These tools
allow organizations to store, process, and analyse large volumes of data
quickly and efficiently. The insights derived from big data can be used to
make informed decisions, identify trends and patterns, improve customer
experiences, and enhance operational efficiency.
In addition to these 3 Vs/ 4Vs, big data can also be characterized by several
other features, including:
224
Big Data
11.0.3 Big Data Classification
Big data can be classified based on several different criteria, such as the
source, the structure, the application, and the analytics approach. Here are
some common classifications of big data:
225
Emerging • MapReduce: MapReduce is a programming model that is used to
Technologies for
Business process large datasets in parallel across a cluster of computers.
• Data Compression: Data compression techniques such as gzip and
bzip2 can be used to reduce the size of data, making it easier to transfer
and store.
• Data Partitioning: Data partitioning involves dividing a large dataset
into smaller subsets to enable distributed processing.
• Cloud Computing: Cloud computing platforms such as Amazon Web
Services (AWS) and Microsoft Azure provide scalable and cost-effective
solutions for storing and processing big data.
• Machine learning: Machine learning techniques can be used to analyse
big data and identify patterns and insights that can help organizations
make informed decisions.
By using these techniques, businesses and organizations can handle big data
more effectively, extract insights, and derive value from their data.
226
Big Data
• Descriptive Analytics: This technique involves summarizing historical
data to understand what has happened in the past.
• Diagnostic Analytics: This technique involves analysing data to
determine the causes of a particular event or pattern.
• Predictive Analytics: This technique involves using statistical models
and machine learning algorithms to forecast future events or patterns
based on historical data.
• Prescriptive Analytics: This technique involves recommending actions
based on insights from predictive analytics.
• Data Sources: This layer includes all the sources of data, both internal
and external, that an organization collects and stores. These may include
data from customer transactions, social media, web logs, sensors, and
other sources.
• Data Ingestion and Storage: This layer is responsible for ingesting data
from various sources, processing it, and storing it in a format that can be
easily accessed and analysed. This layer may include technologies such
as Hadoop Distributed File System (HDFS) and NoSQL databases.
• Data Processing and Preparation: This layer is responsible for
cleaning, transforming, and preparing data for analysis. This may include
tasks such as data integration, data cleaning, data normalization, and data
aggregation.
• Analytics Engines: This layer includes the technologies and tools used
for analysing and processing data. This may include machine learning
algorithms, statistical analysis tools, and visualization tools.
• Data Presentation and Visualization: This layer includes the tools used
to present data in a meaningful way, such as dashboards, reports, and
visualizations. This layer is critical for making data accessible and
understandable to non-technical stakeholders.
• Data Complexity and Variety: Big data comes in many different forms,
including structured, semi-structured, and unstructured data, which can
be challenging to process and analyse.
• Data Quality: Big data is often incomplete, inconsistent, or inaccurate,
which can lead to erroneous insights and conclusions.
• Data Security and Privacy: Big data often contains sensitive and
confidential information, which must be protected from unauthorized
access and breaches.
• Scalability: As data volumes grow, the analytical architecture must be
able to scale to handle the increased load, which can be challenging and
costly.
• Talent Shortage: There is a shortage of skilled data scientists and
analysts who are able to process and analyse big data effectively.
• Integration: Big data analytics requires integration with multiple
systems and technologies, which can be challenging to implement and
maintain.
• Data Governance: Big data requires careful management and
governance to ensure compliance with regulations and policies.
• Interpreting Results: Big data analytics often produces large and
complex datasets, which can be challenging to interpret and translate into
actionable insights.
• Marketing and Sales: Big data is being used in marketing and sales to
understand customer behaviour and preferences, personalize marketing
messages, and optimize pricing and promotions. For example, Amazon
uses big data to personalize recommendations for individual customers
based on their browsing and purchase history. Walmart uses big data to
optimize pricing and inventory management in its stores. Coca-Cola uses
big data to optimize its vending machine placement, prices, and
promotions based on local weather conditions, events, and consumer
behaviour.
228
Big Data
• Healthcare: Big data is being used in healthcare to improve patient
outcomes, reduce costs, and enable personalized medicine. For example,
IBM's Watson Health is using big data to develop personalized cancer
treatments based on a patient's genetic profile and medical history.
Hospitals and healthcare providers are using big data to predict patient
readmission rates, identify patients at risk of developing chronic
conditions, and optimize resource allocation.
231
Emerging 11.1.3 Hadoop Yarn and Hadoop Execution Model
Technologies for
Business
Hadoop YARN (Yet Another Resource Negotiator) is a resource
management layer that sits between the Hadoop Distributed File System
(HDFS) and the processing engines, such as MapReduce, Spark, and Tez. It
provides a central platform for managing cluster resources, allocating
resources to different applications, and scheduling jobs across a cluster.
• Client: The client submits a job to the YARN Resource Manager (RM),
which schedules it across the cluster.
• Node Manager: The Node Manager runs on each node in the cluster and
is responsible for managing the resources on that node, such as CPU,
memory, and disk space. It reports the available resources back to the
Resource Manager, which uses this information to allocate resources to
different applications.
These are just a few examples of the many tools and technologies that are
available in the Hadoop ecosystem. Each of these technologies is designed to
address specific challenges and use cases in big data processing and
analytics. By leveraging the Hadoop ecosystem, organizations can build
powerful, scalable, and cost-effective data processing and analytics solutions.
HBase is a NoSQL database that is designed for storing and managing large
volumes of unstructured and semi-structured data in Hadoop. It provides real-
time random read and write access to large datasets, making it ideal for use
cases that require low-latency queries and high-throughput data processing.
HBase is modelled after Google's Bigtable database and is built on top of
Hadoop Distributed File System (HDFS). HBase uses a column-oriented data
model, which allows for efficient storage and retrieval of data, and provides a
powerful API for data manipulation.
Hive, on the other hand, is a data warehouse system for querying and
analysing large datasets stored in Hadoop. It provides a SQL-like interface
for querying data and supports a range of data formats, including structured
and semi-structured data. Hive is modelled after the SQL language, making it
easy for users with SQL experience to work with large-scale datasets in 233
Emerging Hadoop. Hive uses a metadata-driven approach to data management, which
Technologies for
Business allows for easy integration with other tools in the Hadoop ecosystem. Hive
provides a powerful SQL-like language called HiveQL for querying data and
supports advanced features such as user-defined functions, subqueries, and
joins.
Both HBase and Hive are powerful tools in the Hadoop ecosystem, and they
are often used together to provide a complete data management and analysis
solution. HBase is typically used for real-time data processing and low-
latency queries, while Hive is used for complex analytical queries and ad-hoc
data analysis.
Both Flink and Storm support stream processing, whereas Pig supports batch
processing. Stream processing is useful in scenarios where data is generated
continuously and needs to be processed in real-time, such as sensor data or
social media feeds. Batch processing is useful in scenarios where large
volumes of data need to be processed in a non-real-time manner, such as ETL
jobs or data warehousing.
When dealing with big data, it is important to choose algorithms that are
scalable and can handle large amounts of data. Some of these algorithms,
such as KNN and SVM, can be memory-intensive and may not be suitable
for large datasets. In such cases, distributed computing frameworks like
Apache Spark can be used to handle the processing of big data.
5. Data Privacy and Security: With the increasing amount of data being
collected and analysed, data privacy and security are becoming major
concerns. Businesses must ensure that they are compliant with data
protection regulations and that they are taking steps to protect sensitive
data.
6. Data Democratization: Data democratization involves making data
accessible to all stakeholders in an organization, enabling them to make
236
Big Data
data-driven decisions. This trend is gaining traction as businesses seek to
break down data silos and improve collaboration and communication
across teams.
11.5 SUMMARY
Big data refers to the large volume of structured and unstructured data that
inundates businesses on a daily basis. Big data analytics is the process of
collecting, processing, and analysing this data to gain insights and make
informed business decisions. The key characteristics of big data are
commonly summarized by the "3Vs": volume, velocity, and variety. To
handle big data, businesses require specialized tools and technologies, such
as the Hadoop ecosystem, which includes HDFS, MapReduce, and YARN, as
well as other technologies like Spark, HBase, and Hive. In addition to
handling the technical challenges of big data, businesses must also address
data privacy and security concerns, and ensure compliance with regulations
such as GDPR and CCPA.
Some of the key trends in big data analytics include real-time analytics, edge
computing, cloud-based analytics, artificial intelligence and machine
learning, data privacy and security, data democratization, and natural
language processing. Commonly, big data analytics has the potential to
provide businesses with valuable insights that can improve their operations,
customer experiences, and bottom lines.
11.7 KEYWORDS
A glossary of commonly used terms in big data include: 237
Emerging 1. Big data: Refers to large volumes of structured and unstructured data
Technologies for
Business that inundate businesses on a daily basis.
2. Business intelligence: The use of data analysis tools and technologies to
gain insights into business performance and make informed decisions.
3. Cloud computing: The delivery of computing services, including
storage, processing, and analytics, over the internet.
4. Data cleaning: The process of identifying and correcting errors and
inconsistencies in data.
5. Data governance: The management of data assets, including policies,
procedures, and standards for data quality and security.
6. Data integration: The process of combining data from multiple sources
into a single, unified view.
7. Data lake: A centralized repository for storing large volumes of
structured and unstructured data in its native format.
8. Data mining: The process of extracting useful information from large
volumes of data.
9. Data pipeline: The process of moving data from its source to a
destination for storage, processing, or analysis.
10. Data privacy: The protection of sensitive and personal data from
unauthorized access or disclosure.
11. Data quality: The measure of the accuracy, completeness, and
consistency of data.
12. Data visualization: The process of creating visual representations of
data to aid in understanding and analysis.
13. Data warehousing: The process of collecting and storing data from
multiple sources to create a centralized repository for analysis.
14. Hadoop: A popular open-source big data framework used for storing
and processing large volumes of data.
15. Machine learning: A subset of AI that involves building algorithms and
models that can learn and make predictions based on data.
16. MapReduce: A programming model used to process large volumes of
data in parallel on a distributed system.
17. NoSQL: A non-relational database management system designed for
handling large volumes of unstructured data.
18. Predictive Analytics: The use of statistical models and machine learning
algorithms to make predictions about future events based on historical
data.
19. Spark: An open-source big data processing framework that allows for
fast, in-memory processing of large datasets.
20. Streaming: The process of analysing and processing real-time data as it
is generated.
238
Big Data
11.8 FURTHER READINGS
1. Provost, F., & Fawcett, T. (2013). Data science for business: What you
need to know about data mining and data-analytic thinking. O'Reilly
Media.
2. Zaharia, M., & Chambers, B. (2018). Spark: The definitive guide.
O'Reilly Media.
3. Leskovec, J., Rajaraman, A., & Ullman, J. D. (2014). Mining of massive
datasets. Cambridge University Press.
4. Marz, N., & Warren, J. (2015). Big data: Principles and best practices of
scalable real-time data systems. Manning Publications.
5. Apache Hadoop: https://hadoop.apache.org/
6. Apache Spark: https://spark.apache.org/
7. Big Data University: https://bigdatauniversity.com/
8. Hortonworks: https://hortonworks.com/
9. Big Data Analytics News: https://www.bigdataanalyticsnews.com/
10. Data Science Central: https://www.datasciencecentral.com/
239
Emerging
Technologies for UNIT 12 ENTERPRISE RESOURCE
Business
PLANNING
Objectives
After studying this unit, you will be able to:
• Developing a thorough understanding of ERP concepts and principles
• Understanding the components and architecture of ERP systems, as well
as their benefits and limitations.
• Learning how to implement and configure an ERP system.
• Understanding how to manage ERP projects.
• Understanding how to optimize business processes using ERP systems.
Structure
12.1 Introduction to Enterprise Resource Planning (ERP)
12.1.1 What is Enterprise Resource Planning (ERP)?
12.1.2 Evolution of Enterprise Resource Planning
12.1.3 Fundamental Technology of Enterprise Resource Planning
12.1.4 Benefits of Enterprise Resource Planning
12.2 ERP Solutions and Functional Modules
12.2.1 Business Process Reengineering
12.2.2 Supply Chain Management
12.2.3 Online Analytical Processing (OLTP)
12.2.4 Customer Relationship Management (CRM)
12.2.5 Data Warehousing
12.2.6 Data Mining
12.2.7 Management Information System (MIS)
12.2.8 Executive Support System (ESS)
12.2.9 Decision support system (DSS)
12.3 Implementation of ERP
12.3.1 Implementation Methodologies and Approaches
12.3.2 ERP Life-Cycle
12.3.3 SDLC-ERP Implementation Cost and Time
12.3.4 ERP Project Management, Training
12.3.5 ERP Implementation Stakeholder’s Roles and Responsibilities
12.4 Overview of ERP in Some of the Key Functional Areas
12.5 Summary
12.6 Self–Assessment Exercises
12.7 Keywords
12.8 Further Readings
240
Enterprise Resource
12.1 INTRODUCTION TO ENTERPRISE Planning
Source: https://www.omniaccounts.co.za/history-of-enterprise-resource-planning/
The benefits of BPR can be significant. BPR can help organizations improve
operational efficiency, reduce costs, improve quality, enhance customer
satisfaction, and increase competitiveness. However, BPR is also a complex
and resource-intensive process that requires significant investment in time,
expertise, and technology. As such, organizations must carefully evaluate the
potential benefits and costs of BPR before embarking on a BPR initiative.
ESS provides executives with a range of tools and capabilities, such as data
visualization, trend analysis, and scenario planning. ESS is designed to
provide executives with a comprehensive view of their organization's
performance, including financial data, market trends, and competitor
analysis. ESS may also incorporate data from external sources, such as
industry reports, economic data, and news feeds. The key features of ESS
include:
ESS is a critical tool for top-level executives who need to make strategic
decisions. By providing executives with real-time data and analysis, ESS
helps them to stay informed and make informed decisions that can have a
significant impact on their organization's performance.
Each approach has its advantages and disadvantages, and the choice of
methodology depends on various factors, such as the size and complexity of
the organization, the scope of the ERP system, and the specific requirements
and needs of the organization. The successful implementation of an ERP
system requires careful planning, communication, and stakeholder
engagement.
12.7 KEYWORDS
1. Enterprise Resource Planning (ERP) - A software system that helps
organizations manage their business processes, operations, and resources
in a centralized manner.
2. Modules - Functional components within an ERP system that cover
different business functions, such as finance, human resources, sales and
distribution, production planning, material management, inventory
control, quality management, and marketing.
256
3. Implementation - The process of installing and configuring an ERP Enterprise Resource
Planning
system to meet the specific needs of an organization.
4. Project Management - The practice of planning, executing, and
monitoring a project to achieve specific goals and objectives, such as the
implementation of an ERP system.
5. Stakeholders - Individuals or groups who have an interest or a role in
the implementation of an ERP system, such as ERP vendors, consultants,
top management, and end-users.
6. Customization - The process of modifying an ERP system to meet the
unique needs of an organization.
7. Data Migration - The process of transferring data from legacy systems
to the new ERP system.
8. User Acceptance Testing (UAT) - The process of testing an ERP
system to ensure that it meets the requirements and expectations of end-
users.
9. Training - The process of educating end-users on how to use the ERP
system effectively.
257
Emerging
Technologies for UNIT 13 APPLICATIONS OF IOT, AI AND
Business
VR
Objectives
After studying this unit, you will be able to:
• Understand the architecture of the Internet of Things and illustrate the
real-time IoT applications to make the smart world.
• Understand AI history, explore its evolution and contribute to
comprehending what led to the AI impacts we have in society today.
• Introduce students to virtual reality (VR) and explore different
technologies, concepts, and development environments that can be used.
Structure
13.1 Introduction
13.2 Internet of Things (IoT)
13.2.1 Evolution of IoT
13.2.2 IoT Ecosystem Concepts
13.2.3 Components of an IoT Ecosystem
13.2.4 IoT Layered Architectures with Security Attacks
13.2.5 IoT Applications and Services
13.2.6 Unlocking the Massive Potential of an IoT Ecosystem for a Business
13.2.7 Key Challenges of IoT Implementation and Future Directions
13.2.8 Business Case: India (Rajkot)
13.3 Artificial Intelligence (AI) for Business
13.3.1 Historical Overview of Artificial Intelligence
13.3.2 Why is Artificial Intelligence Important?
13.3.3 Artificial Intelligence - Components and Approaches
13.3.4 What Types of Artificial Intelligence exist?
13.3.5 How do Artificial Intelligence, Machine Learning, and Deep Learning
Relate?
13.3.6 Artificial Intelligence at Work Today
13.3.7 Ethics of Artificial Intelligence
13.3.8 Future of Artificial Intelligence – Endless Opportunities and Growth
13.4 Virtual Reality (VR)
13.4.1 Introduction to Virtual Reality
13.4.2 Evolution of Virtual Reality
13.4.3 Basic Components of VR Technology
13.4.4 Applications of Virtual Reality
13.4.5 Advantages and Disadvantages of Virtual Reality
13.4.6 The Future of Virtual Reality
13.5 Summary
13.6 Self–Assessment Exercises
258
Applications of IOT,
13.7 Keywords AI and VR
13.8 Further Readings
13.1 INTRODUCTION
260
Applications of IOT,
Internet of Things = Physical Devices + Controller, Sensor and Actuators AI and VR
+ Internet Connectivity
The concept underlying the IoT could alternatively be illustrated, as in Figure
2. Globalisation of IoTs technology with A’s (anything, anyone, any service,
any path, any place, any time, etc.) and C’s (collections, convergence,
connectivity, computing. Etc.), has outpaced its capabilities today.
IoT Components
IoT
Description
Components
Sensors • Mobile phone-based sensors - Due to the
embedded sensors included in smartphones, which
are becoming more and more popular, academics are
expressing an interest in developing smart IoT
solutions.
• Medical sensors - Are used to measure and keep
track of the body's different medical parameters.
Smart watches, wristbands, monitoring patches, and
smart fabrics are examples of wearable technology.
• Neural sensors - Are used to improve mental
262 health and train the brain to concentrate, pay
Applications of IOT,
attention to details, handle stress, and manage AI and VR
emotions.
• Environmental and chemical sensors - Are
employed to detect the presence of gases and other
airborne particulates. Chemical sensors are used to
assess food and agricultural items in supply chain
applications, monitor food quality in smart kitchens,
and track pollution levels in smart cities.
• Supply chain management, access control,
identity authentication, and object tracking are just a
few of the applications that employ radio frequency
identification (RFID) to make conclusions and take
further action.
Actuators • Hydraulic actuators facilitate mechanical motion
using fluid or hydraulic power.
• Pneumatic actuators use the pressure of
compressed air.
• Electrical ones use electrical energy.
Things Physical/ virtual objects
Communication IEEE 802.15.4, low power WiFi, 6LoWPAN, RFID,
Technologies NFC, Sigfox, LoraWAN, and other proprietary
protocols for wireless networks
Middleware Oracle’s Fusion Middleware - OpenIoT,
MiddleWhere, and Hydra
Applications of Home Automation, Smart Cities, Social Life and
IoT Entertainment, Health and Fitness, Smart
Environment and Agriculture, Supply Chain and
Logistics, and Energy Conservation
Three-Layer Architecture
It’s extremely straightforward design that complies with the core principles
of the Internet of Things. In the early stages of IoT development, three
layers—application, network, and perception—are advised.
263
Emerging
Technologies for
Business
Network Layer – Between the application layer and the perception layer is
the transmission layer/network layer that moves and sends the data gathered
from actual objects via sensors. The transmission medium might be either
wireless or wired technology. Additionally, it takes on responsibility for
linking networks, network devices, and intelligent objects. It has significant
security weaknesses that compromise the authenticity and integrity of the
data being exchanged over the network. Typical security concerns that impact
264 network layers include the following:
Applications of IOT,
• Denial of Service (DoS) Attack: Attacks against legitimate users of AI and VR
devices or other network resources are known as denial of service (DoS)
attacks. It prevents users from using the targeted devices or network
resources, by flooding them with repetitive requests.
Four-Layer Architecture
Three layer architecture was unable to meet all IoT standards, due to ongoing
development in the field. As a result, researchers suggested a four-layer
architecture. Similar to the prior architecture, it contains three layers, with an
additional fourth layer known as the support layer. The three layers have the
same functionality as the three-layer architecture that we have already
discussed above.
265
Emerging
Technologies for
Business
Five-Layer Architecture
In the four-layer architecture, there were certain additional security and
storage concerns. To make the IoT secure, researchers suggested a five-layer
architecture. Similar to earlier architectures, it has three layers: the perception
layer, the transport layer, and the application layer. There are two further
layers proposed namely processing layer and business layer. The recently
proposed architecture has the capacity to safeguard IoT applications. The
following describes how these layers function and how security attacks may
affect them:
266
Applications of IOT,
Processing Layer - The processing layer is also known as a middleware AI and VR
layer. Data transmitted from the transport layer is collected and processed by
it. It is in charge of eliminating unwanted, useless data and extracting
important information. It also addresses the IoT big data dilemma of
numerous risks that may hamper IoT performance by affecting the processing
layer. Typical assaults consist of:
• Exhaustion: An attacker employs exhaustion to hinder IoT structure
processing. It happens as a result of attacks, such as a DoS attack, in
which the attacker floods the target with requests in an effort to disable
the network for users. It can be the result of previous attacks intended to
deplete system resources like the battery and memory. IoT is distributed
in nature, thus there aren't many risks associated with it. It is
significantly simpler to put protective measures in place against it.
Figure 4 depicts the hierarchy of all proposed layer architectures for the
Internet of Things (IoT), with three, four, and five layers, respectively.
267
Emerging
Technologies for
Business
• Smart Supply Chain and Logistics - IoT aims to make business and
information system procedures in the real world simpler. Using sensor
technologies like RFID and NFC, it is simple to trace the commodities in
the supply chain from the point of manufacture to the final distribution
points. Real-time data is captured, processed, and tracked. The
effectiveness of supply chain systems will ultimately be improved by
doing this.
A draft IoT policy has been prepared by the Indian government (107). The
strategy consists of two horizontal supports in addition to five vertical pillars
(Demonstration Centers, Capacity Building and Incubation, R&D and
Innovation, Incentives and Engagements, Human Resource Development)
(Standards and Governance Structure).
According to NASSCOM, the trade group for the IT sector in India, the IoT
market there is expected to reach $15 billion by 2020, with around 120
companies presently providing solutions. Smart cities, industrial IoT, and
health care are identified as three of the most important growth potential
involving government-business partnerships.
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….
…………………………………………………………………………….
Over the past few years, AI has grown into a powerful technology that
enables machines to think and act like humans. It has also caught the eye of
IT companies all around the world and is regarded as the third significant
technical innovation after the creation of mobile and cloud platforms. Even
some people use the term "fourth industrial revolution" to describe it. Over
the past ten years, businesses have migrated in increasing numbers to internet
platforms and cloud service providers. As a result of these advancements,
computers are now able to process far more data, and a large amount of fresh
data has also been produced that these systems can now analyse.
Some believe that the fourth industrial revolution, which will differ from the
past three in some ways, is about to start. The notion of what it means to be a
human has been debated throughout history, beginning with the development
of steam and water power and continuing through the industrial revolution,
the computer era, and the current day. Smarter technology in our offices and
factories, as well as networked equipment that will interact, view the entire
process, and make autonomous decisions, are just a few of the ways the
Industrial Revolution will benefit organisations. One of the key benefits of
the fourth industrial revolution is the ability to increase income levels and
enhance living standards for the majority of people worldwide. As the quality
of life for the world's population is improved by humans, robots, and smart
devices, our businesses and organisations are becoming "smarter" and more
productive.
274
Applications of IOT,
One of the earliest articles on artificial neural networks is the 1943 study by AI and VR
Warren McCulloch and Walter Pitts, which formalised how fundamental
logical operations from propositional logic may be computed in a
connectionist setting. Later, in 1950, Alan Turing raises the topic of whether
computers are capable of thought and offers the "imitation game" as a
method of evaluating the deductive reasoning and thought processes of
computing apparatuses. During a workshop for the Dartmouth Summer
Research Project on Artificial Intelligence in 1956, John McCarthy coined
the term artificial intelligence (AI). The first chatbot, Eliza Weizenbaum, was
created in 1964 by Joseph Wiezenbaum in the MIT Artificial Intelligence
Laboratory. Eliza was created as an artificially intelligent, rule-based
psychiatrist who could answer queries from users.
The Roomba robot vacuum was introduced by iRobot7 in 2002. The AI-
based personal assistants Siri, Google Assistant, Alexa, Cortana, and Bixby
from Apple, Google, Amazon, Microsoft, and Samsung are each better at
understanding natural language and capable of carrying out a larger range of
tasks. These technologies have improved over the past 10 years, despite the
fact that they initially did not work very effectively. Most of the focus on this
topic has been on Deep Learning since 2000. The Artificial Neural Network
(ANN) concept, a system created to mimic how brain cells work, is the
foundation of deep learning. Some of the most significant recent
achievements are based on Generative Adversarial Networks (GANs). After
defeating Lee Sedol in four of five games in 2016, Google's AlphaGo
defeated Chinese Go expert Ke Jie in a series of games in 2017. In a 1v1
exhibition match, OpenAI's Dota 2 bot triumphed against Dendi, a Ukrainian
professional player, in 2017. In a game with little information and practically
infinite possible future possibilities, this triumph was a noteworthy
demonstration of power. Later in 2019, OG, the current Dota world
champions, were defeated in back-to-back 5v5 games by a new version of the
same bot called OpenAI Five. In Starcraft II, DeepMind's AlphaStar bot
earns the highest rating possible in 2019.
• When algorithms are self-learning, answers are in the data after applying
AI to extract them. This maximises the value of the data. Even when
everyone is using the same methods, the best data can provide you a
competitive edge. Best data prevails.
• Making interactions with machines look natural and human is the goal of
cognitive technology, a subfield of artificial intelligence. The ultimate
objective of artificial intelligence (AI) and cognitive computing is for a
machine to emulate human processes by being able to comprehend
words and sights and then respond logically.
• Computer vision makes use of deep learning and pattern recognition to
determine what's in a picture or video. Real-time photos and videos can
be taken by machines that can perceive, analyse, and interpret visuals, as
well as their surroundings.
• Natural language processing is the process by which computers can
analyse, comprehend, and produce human language, including speech
(NLP). The next phase of NLP, known as natural language interaction,
enables users to communicate with computers in everyday language to
complete tasks.
279
Emerging
Technologies for
Business
Types of Immersion
The term "immersion" refers to the degree to which realistic physical stimuli,
such as light patterns and sound waves, are delivered to the senses of sight,
hearing, and touch in order to produce potent delusions of reality. Ernest
Adams claims that there are three primary types of immersion:
• Tactical immersion - Tactical immersion is felt when carrying out
skilled tactile activities. As they perfect the actions that lead to victory,
players experience being "in the zone."
• Strategic immersion – It is more cerebral and including a mental
challenge. Chess players are immersed in strategy as they select the best
answer from a wide range of options.
• Narrative immersion - Similar to what one feels while reading a book
or watching a movie, narrative immersion happens when gamers become
immersed in a tale. Staffan Björk and Jussi Holopainen categorise
immersion into related groups, referred to correspondingly as sensory-
motoric immersion, cognitive immersion, and emotional immersion.
They also include three more categories in addition to these:
• Spatial immersion - A player experiences spatial immersion when they
perceive the virtual environment to be perceptually convincing. A virtual
world seems and feels "real" to the player as if they are actually "there."
• Psychological immersion: This happens when a person mistakenly
believes that they are in a game instead of real life.
• Sensory immersion - As the player merges with the picture medium,
they experience an oneness of time and space that impacts perception
and consciousness.
• "The Sword of Damocles"— first virtual reality system, that was really
built as opposed to just being a concept. The first Head Mounted Display
(HMD), with accurate head tracking, is created by Ivan Sutherland. In
accordance with the user's head position and orientation, it supported a
stereo view that was updated accurately.
294
Applications of IOT,
13.4.4 Applications of Virtual Reality AI and VR
13.5 SUMMARY
The Internet of Things (IoT), artificial intelligence (AI), and virtual reality
(VR) are three cutting-edge technological advancements that are examined in
this unit. Also highlighted is how these technologies might affect businesses
in the future.
Users engage with a virtual reality (VR) environment by moving their bodies
while a computer provides them with audio and visual stimuli. A type of
technology known as virtual reality (VR) focuses on generating images in
three dimensions and producing a view that takes up much of the graphical
user interface. It's like actually creating the atmosphere we've always wanted.
13.7 KEYWORDS
1. Accuracy - This statistic indicates how successful your AI model is at
predicting outcomes. The number of correct predictions is divided by the
total number of predictions made.
10. Cellular Network - A radio network distributed over land through cells
where each cell includes a fixed-location transceiver known as a base
station. These cells together provide radio coverage over larger
geographical areas. User equipment (UE), such as mobile phones, is
therefore able to communicate even if the equipment is moving across
cells during transmission.
11. Chatbot - It is a software application that imitates human-to-human
conversation through text or voice commands.
17. Ecosystem IoT - Refers to the multi-layers that go from devices on the
edge to the middleware. The data is transported to a place that has
applications that can do the processing and analytics.
18. Eye tracking - The ability for a head mounted display (HMD) to read the
position of the experiencer’s eyes versus their head.
19. Facial recognition – It is a computer program that can recognize or
authenticate a person.
301
Emerging 20. Field of view (FOV) - Is the view that is visible to the experiencer while
Technologies for
Business rotating their head from a fixed body position.
22. Head mounted display (HMD) - A set of goggles or a helmet with tiny
monitors in front of each eye to generate images seen by the wearer as
three-dimensional.
26. Machine Learning - Robots are still in the early days of AI, but it is a
field that is moving forward quickly. Machine learning refers to the
ability of computers to learn without being explicitly programmed.
Computers “learn” via patterns they detect and adapt their behavior as a
result. It is one of the most searched Artificial Intelligence terms.
31. Smart Cities - A concept that tries to create a more intelligent city
infrastructure by using modern information and communication
technologies. Smart cities are about a more flexible adaptation to certain
circumstances, more efficient use of resources, improved quality of life,
fluent transportation and more. This will be achieved through networking
and integrated information exchange between humans and things.
33. Training Data - The term “training data” refers to all of the data used
throughout training a machine learning algorithm and the particular
dataset utilized to train rather than test.
34. Unsupervised learning – It is a form of machine learning technique that
concludes datasets with unannotated data.
35. Validation Data - This data is structured similarly to training data, with
input and labels, and it’s used to evaluate a recently trained model
against new data and assess performance, with a particular emphasis on
detecting overfitting.
36. Virtual reality (VR) - Places the experiencer in another location entirely.
Whether that location has been generated by a computer or captured by
video, it entirely occludes the experiencer’s natural surroundings.
304
Blockchain Technology
UNIT 14 BLOCKCHAIN TECHNOLOGY
Objectives
Some possible objectives for the course on blockchain technology include:
• Understand the fundamental concepts of blockchain technology,
including decentralized architecture, consensus mechanisms, and
cryptographic algorithms.
• Analyse real-world use cases of blockchain, such as cryptocurrencies,
supply chain management, digital identity, and smart contracts.
• Evaluate the security and privacy implications of blockchain technology,
including vulnerabilities, attacks, and privacy-preserving techniques.
• Explore the economic and social impact of blockchain technology.
• Apply critical thinking and problem-solving skills to identify
opportunities and challenges of blockchain technology.
Structure
14.0 Introduction to Block Chain Technology
14.1 Cryptography and Consensus Mechanism
14.1.1 Cryptographic Algorithms and their role in blockchain technology
14.1.2 Byzantine fault tolerance and consensus mechanisms
14.1.3 Proof of Work, Proof of Stake, and other consensus mechanisms
14.2 Blockchain Architecture and Platforms
14.2.1 Types of blockchain networks: public, private, and Hybrid
14.2.2 Introduction to Blockchain platforms: Ethereum, Hyperledger, and EOS
14.2.3 Smart Contracts and their applications
14.2.4 Gas, Ether, and Other blockchain-specific concepts
14.3 Security and Privacy on the Blockchain
14.3.1 Threats to blockchain security and Privacy
14.3.2 Prevention and mitigation of attacks: double spending, 51% attacks, and
others
14.3.3 Privacy-preserving techniques: ring signatures, zk-SNARKs, and others
14.3.4 Case Studies of blockchain attacks and their impact
14.4 Use Cases of Blockchain Technology
14.5 Social and Economic Impact of Blockchain Technology
14.6 Future of Blockchain Technology
14.7 Summary
14.8 Self–Assessment Excercises
14.9 Keywords
14.10 Further Readings
305
B
Emerging
Technologies for 14.0 INTRODUCTION TO BLOCK CHAIN
Business TECHNOLOGY
Blockchain technology is a distributed and decentralized digital ledger that
stores data in a secure, transparent, and immutable way. It was originally
designed to enable the creation and exchange of digital currencies, such as
Bitcoin, but has since evolved into a versatile platform for a wide range of
applications and industries.
Overall, the history of blockchain technology is still unfolding, and its future
evolution and impact are likely to be shaped by technological innovation,
market dynamics, and societal values.
Centralized Systems
Controlled by a single entity, such as a company, organization or
government.
Data and resources are stored and managed in a central location.
Central authority makes decisions and enforces rules.
Users rely on intermediaries to access and manage data and resources.
More vulnerable to hacking and data breaches.
Decentralized Systems
Controlled by a distributed network of nodes.
Data and resources are distributed among nodes and managed through
consensus.
No central authority; decisions are made through consensus.
Users have direct control over their data and resources.
More resilient to hacking and data breaches.
308
Blockchain Technology
• Public key cryptography: a technique that uses a pair of keys (public
and private) to encrypt and decrypt data, and verify digital signatures.
Consensus mechanisms are protocols that ensure that all nodes in the network
agree on the state of the blockchain. Consensus mechanisms are necessary
because the blockchain is a distributed ledger that is maintained by a network
of nodes, and there is no central authority to validate transactions. The most
common consensus mechanisms used in blockchain include Proof of Work
(PoW), Proof of Stake (PoS), and Delegated Proof of Stake (DPoS).
309
B
Emerging • Merkle Trees: A Merkle tree is a tree-like structure in which each leaf
Technologies for
Business node is a hash of a data block, and each non-leaf node is a hash of the
concatenation of its child nodes. In blockchain technology, Merkle trees
are used to efficiently verify the integrity of transactions and blocks.
311
B
Emerging
Technologies for 14.2 BLOCKCHAIN ARCHITECTURE AND
Business PLATFORMS
Blockchain architecture refers to the way in which the various components of
a blockchain network are organized and work together as depicted in figure 1.
There are several different blockchain architectures and platforms, each with
its own unique features and capabilities.
Source: https://www.researchgate.net/figure/Blockchain-architecture_fig1_343137671
Here are some of the most commonly used blockchain architectures and
platforms:
Source: https://www.researchgate.net/figure/Types-of-blockchain-networks_fig4_366925431
313
B
Emerging 1) Public Blockchain Networks: are open to anyone who wants to
Technologies for
Business participate in the network. Anyone can read, write, and validate
transactions on the network. The most well-known example of a public
blockchain network is Bitcoin. Public blockchain networks are
decentralized and do not have a central authority controlling the network.
The security of the network is ensured through consensus mechanisms
like Proof of Work or Proof of Stake.
Nodes: are computers that are connected to the blockchain network and
participate in the process of verifying transactions and maintaining the
blockchain ledger. There are different types of nodes, including full
nodes and light nodes, which differ in the amount of data they store and
the level of participation in the network.
Forks: occur when a blockchain splits into two separate chains due to a
disagreement among the network participants. There are two types of
forks: soft forks and hard forks. Soft forks are temporary changes to the
blockchain protocol that are backward-compatible with older versions,
while hard forks are permanent changes that are not compatible with
older versions.
Tokens: are digital assets that are created and managed on a blockchain
network. Tokens can represent anything of value, such as
cryptocurrency, assets, or utility tokens. Tokens are often created using
smart contracts on the Ethereum blockchain.
14.7 SUMMARY
In summary, blockchain technology is a decentralized and secure ledger that
allows for secure and transparent transactions. It has the potential to 323
B
Emerging transform various industries, including finance, supply chain, and healthcare,
Technologies for
Business by providing increased transparency, efficiency, and security.
14.9 KEYWORDS
1) Blockchain: A decentralized and secure digital ledger that records
transactions in a series of blocks that are cryptographically linked to each
other.
2) Cryptography: The practice of secure communication in the presence of
third parties.
3) Consensus algorithm: A protocol used to verify transactions and ensure
that the network agrees on the current state of the ledger.
4) Decentralization: The process of distributing power away from a central
authority, making the network more secure and resistant to attack.
5) Hash: A unique code that identifies a block in a blockchain.
6) Smart contract: A self-executing contract that is written in code and
stored on a blockchain. It contains the terms of an agreement between
parties and is automatically executed when certain conditions are met.
7) Token: A digital asset that is created and managed on a blockchain. It
can represent a variety of assets, such as currencies, commodities, or
324 even real estate.
Blockchain Technology
8) Transaction: The transfer of data on a blockchain, typically involving the
exchange of cryptocurrency or other digital assets.
9) Mining: The process of validating transactions and adding new blocks to
the blockchain.
10) Public key cryptography: A cryptographic system that uses two keys, a
public key and a private key, to encrypt and decrypt data.
11) Private key: A secret key that is used to encrypt and decrypt data in a
public key cryptography system.
12) Public key: A key that is made publicly available and used to encrypt
data in a public key cryptography system.
13) Permissioned blockchain: A private blockchain that only allows
authorized users to participate.
14) Permissionless blockchain: A public blockchain that is open to anyone
and allows anyone to participate in the network.
15) Fork: A change to the blockchain protocol that can result in the creation
of a new cryptocurrency.
325