0% found this document useful (0 votes)
9 views

Chapter-4.2

Uploaded by

mohafez108x
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Chapter-4.2

Uploaded by

mohafez108x
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 4:

Data Management
DR. SARAH NAIEM
S OU RC E B O O K : I NTRODUCTION TO I NFORMATION
SYSTEMS-1ST E DI TI O N [ EA R LY R E L EASE]
BY: P RO F. M A N A L A BDE L - K A DER A BDE L - FAT TA H
Big Data
We are amassing an ever-expanding reservoir of data and information from a wide array of sources,
including company documents, emails, web pages, credit card transactions, phone messages, stock
trades, memos, address books, and radiology scans.
In addition to the data collected through emails, blogs, youtube and much more resources including
all the data organizations collect through events.
This huge amount of data require so much work from both organizations and individuals as they face
the overwhelming task of processing an overwhelmingly vast and ever-accelerating volume of data.
According to IDC, a technology research firm, the world generates exabytes of data annually (with an
exabyte equal to one trillion terabytes).
As initially discussed this excess amount of data is referred to as "Big Data," with "Big Data" being
capitalized to distinguish it from traditional data in large quantities.
Big Data
At its essence, Big Data revolves around Consider the following examples:
making predictions. These predictions do not
arise from teaching computers to emulate • Estimating the probability that an email
human thinking; rather, they result from the message is spam.
application of mathematical principles to • Assessing the likelihood that the typed
massive datasets, allowing us to infer letters "teh" should be correctedto "the."
probabilities
• Determining the probability that a
The effectiveness of Big Data systems derives jaywalker's trajectory and velocity suggest
from their access to vast datasets upon which they will safely cross the street, indicating that
to base predictions. a self-driving car only needs to make a minor
Furthermore, these systems are designed to adjustment in speed.
enhance their performance over time by
identifying the most valuable signals and
patterns as additional data is fed into them.
Defining Big Data
Firstly, according to Gartner, Big Data is characterized by its diversity, high volume, and rapid velocity.
It comprises information assets that demand novel processing methods to facilitate improved
decision-making, the discovery of insights, and the optimization of processes.
Secondly, the Big Data Institute (TBDI; www.the-bigdatainstitute.com) defines Big Data as expansive
datasets that possess the following attributes:
• Diverse in nature,
• Comprising structured, unstructured, and semi-structured data,
• Generated at a high velocity with an unpredictable pattern,
• Not neatly fitting into conventional, structured, relational databases
• Requiring sophisticated information systems for effective capture, processing, transformation, and
analysis within a reasonable timeframe.
Defining Big Data
Big Data typically encompasses the following categories, though it's important to note that this list is
not exhaustive and may expand with the emergence of new data sources:
Traditional enterprise data: Examples include customer information from customer relationship
management systems, transactional enterprise resource planning data, web store transactions,
operational data, and general ledger data.
Machine-generated/sensor data: This category includes data from sources like smart meters,
manufacturing sensors, sensors integrated into smartphones, automobiles, airplane engines,
industrial machines, equipment logs, and trading systems.
Social data: Social data comprises information like customer feedback comments, microblogging site
content (e.g., Twitter), and content from social media platforms such as Facebook, YouTube, and
LinkedIn.
Images from various devices: These images are captured by countless devices worldwide, ranging
from digital cameras and camera phones to medical scanners and security cameras.
Big Data Examples
The Sloan Digital Sky Survey in New Mexico, which commenced in 2000, amassed more data
within its initial weeks than the entire history of astronomy. By 2013, its archive contained
hundreds of terabytes of data.
Every hour, Facebook users upload more than 10 million new photos and engage with content by
clicking "like" or leaving comments nearly 3 billion times each day.
Google's YouTube service, with over 800 million monthly users, saw users uploading more than
an hour of video every second.
On Twitter, the number of messages grew at a staggering rate of 200 percent each year,
surpassing 450 million tweets per day by mid-2013, Twitter is currently known as “Threads” and
owned by Elon Musk
Netflix and Spotify use big data to recommend movies, shows, or songs based on user history
and preferences
Characteristics of Big Data
Big Data possesses three distinct characteristics: volume, velocity, and variety, setting it apart
from conventional data.
Volume
Definition: Volume refers to the vast amounts of data generated every second across the globe.
This data comes from diverse sources such as social media, IoT devices, sensors, business
transactions, and digital communications.
Example: Social media platforms like Facebook generate petabytes of data daily, including posts,
images, and videos. Similarly, companies like Walmart collect terabytes of sales data from
thousands of stores.
Challenge: Storing, managing, and processing such massive datasets require scalable storage
solutions and distributed computing frameworks, such as Hadoop or cloud platforms.
Characteristics of Big Data
Velocity Variety
Definition: Velocity describes the speed at which Definition: Variety refers to the diverse formats
data is generated, collected, and processed. It and types of data that big data encompasses.
highlights the need for real-time or near-real- This includes structured data (e.g., databases),
time data handling to make timely decisions. semi-structured data (e.g., XML, JSON), and
unstructured data (e.g., images, videos, social
Example: Streaming platforms like Netflix and media posts).
stock trading systems generate data in
milliseconds, requiring instant processing to Example: A single organization might deal with
deliver personalized recommendations or customer reviews (text), sales data (structured),
execute trades. and website analytics (semi-structured).
Combining these to extract insights is
Challenge: Traditional systems cannot keep up challenging.
with the high rate of data inflow. Technologies
like Apache Kafka or Spark Streaming are often Challenge: Integrating and analyzing such diverse
used to handle this continuous data flow. datasets requires flexible data models and
advanced tools capable of handling
heterogeneous data.
Big Data Sources
Big data sources can be
broadly categorized into two
main types: internal data
sources and external data
sources. An example of big
data sources is depicted in
Figure 4.9
Big Data Sources
Internal Data Sources:
Internal data refers to data that is generated, owned, and controlled by a company or
organization. This data is typically generated through the company's day-to-day operations,
transactions, and interactions with customers, suppliers, and other stakeholders.
Examples of internal data sources including, customer transaction records, sales data, employee
records, production data, financial statements, and any other data that is collected and stored
by the company as part of its business activities.
Companies have full control over internal data and can use it for various purposes, such as
business analytics, decision-making, and improving internal processes.
Big Data Sources
External Data Sources:
External Data refers to data that is not generated, owned, or controlled by the company. This data is
typically sourced from outside the organization and can come from a wide range of external providers
and public sources.
Examples of external data sources include:
 Publicly available data: Information from government agencies, public databases, social media, news feeds,
and other publicly accessible sources.
 Third-party data providers: Companies that specialize in collecting and aggregating data, such as market
research firms, data brokers, and data syndication services.
 Internet of Things (IoT) devices: Data generated by sensors and devices connected to the internet, such as
weather sensors, GPS devices, and smart appliances.
 Social media and web data: User-generated content, online reviews, social media posts, and website analytics.
Companies do not have direct control over external data, but they can acquire and use it to gain
insights, enhance decision-making, and supplement their internal data.
Big Data Sources
Both internal and external data sources are valuable for organizations in the era of big data, as
they provide a wealth of information that can be analyzed to uncover insights, trends, and
opportunities.
Effective data management and integration strategies are often needed to harness the full
potential of both types of data sources for business intelligence and strategic decision-making
Database Management Systems
Ensuring the creation and implementation of the appropriate database system is vital
to guarantee that the database adequately supports both business operations and
objectives.
As previously discussed, a DBMS comprises a collection of software programs serving as
an intermediary between a database and application programs or users.
DBMSs are utilized for managing diverse types of data for various purposes
Database management systems span a spectrum from small, affordable software
packages to sophisticated systems with price tags in the hundreds of thousands of
dollars.
The following sections will explore a few popular alternatives.
Flat File
A flat file is a simple type of database that doesn't link records together. It usually stores and manages
a single table or file, unlike advanced database models like the relational model.
Typically, flat file databases are used for storing and manipulating a single table or file and do not
follow any of the database models discussed earlier, such as the relational model.
Many programs like spreadsheets and word processors include basic flat file features. These tools can
organize tables, perform simple calculations, and make comparisons.
◦ For example, Microsoft OneNote allows users to jot down notes, ideas, and thoughts in a flat-file format.
Notes can be placed anywhere on a page or within sections and tabs.
◦ Once created, these notes can be retrieved, copied, or shared with other programs like word processors or
spreadsheets.
◦ A company like ResMed, which develops respiratory products, uses OneNote to collect ideas for new products
and track their progress, helping them save costs and improve efficiency.
Similarly, EverNote is a free online tool for storing notes and other information. With the large storage
capacity of modern hard drives, databases that handle unstructured data like these are becoming
more popular.
Single and Multiple User Database
Single-User Databases: These databases are made for one person to use on their personal
computer. Examples include Microsoft Access, FileMaker Pro, and Microsoft InfoPath. InfoPath,
part of the Microsoft Office suite, helps users gather and organize information with built-in
forms for tasks like entering expenses or tracking timesheets.
Multiple Users: Organizations of all sizes often need multiuser database systems (DBMSs) to
share information across networks. These powerful systems, though more expensive, allow
many users—sometimes hundreds—to access the same database at the same time. Popular
providers of such systems include Oracle, Microsoft, Sybase, and IBM.
While single-user databases like Microsoft Access can be modified for multiuser use, they
usually have limits on how many people can access them at once.
All DBMSs have similar features: they provide a user interface, store and retrieve data, let users
update and manipulate the data, and generate reports. Multiuser DBMSs can handle complex
tasks and connect users worldwide. For example, the Linde Group, a global industrial gas
company, uses its DBMS to support 50,000 employees in 100 countries, all accessing a central
database in Munich, Germany.
Schema
A schema is like a blueprint for the database. When setting up a large database, the DBMS needs
information about the structure of the data, its relationships, and how different users will access
it.
This information, called a schema, defines the tables and components for each user. In large
systems like Oracle, schemas are essential and can either be part of the database or saved
separately.
The DBMS uses the schema to understand how to retrieve the requested data in connection
with other data.
The Schematic Diagrams shows the DBMS the logical and physical design of the data base that is
about to built
Creating and Modifying the Database
Creating or updating a database involves using a schema, which is entered
into the DBMS by database staff.
This process uses a Data Definition Language (DDL), which is a set of
instructions for defining and describing the data and its relationships in the
database.
Using DDL, the database creator can specify the structure of the data and how
it is connected in the schema.
Essentially, DDL describes how the data is organized and how it can be
accessed logically within the database.
Figure 4.11 illustrates a simplified example of a DDL used to CREATE TABLE
statement (creates a table EMP_TEST).
 Note the column specifications, data type, and precision. Depending on
the particular DBMS in use, different terms and commands may be
employed.
Creating and Modifying the Database
Another important step in setting up a database is creating a data dictionary. This is a detailed guide that describes all
the data in the database.
A data dictionary may encompass a description of data flows, record organization, and data-processing requirements and
management.
Having a data dictionary is essential to keep a database organized and free of unnecessary duplication, thus ensuring the
accuracy and reliability of the stored information. It also streamlines any required modifications to the database.
Moreover, computer and system programmers find data dictionaries useful as they provide a comprehensive overview of
the data elements within a database, which helps them create the code necessary for accessing the data.
It includes information such as:
 The name of each data item.
 Any alternative names or aliases for the item.
 The allowed range of values.
 The type of data (e.g., text or numbers).
 The storage space required.
 Details about who is responsible for updating the data and who can access it.
 A list of reports that use the data.
Creating and Modifying the Database
For instance, a data dictionary entry for the part number of an inventory item could include details
such as:
 The name of the individual who made the data dictionary entry (D.Bordwell)
 The date when the entry was created (August 4, 2010)
 The name of the person who approved the entry (J. Edwards)
 The approval date (October 13, 2010)
 The version number (3.1)
 The number of pages utilized for the entry (1)
 The part name (PARTNO)
 Other potential part names (e.g., PTNO)
 The range of acceptable values (part numbers ranging from 100 to 5,000)
 The data type (numeric)
 The storage requirement (four positions needed for the part number)
Storing and Retrieving Data
A Database Management System (DBMS) acts as a
bridge between application programs and the database.
When an application needs data, it requests it from the
DBMS.
For example, in a car dealership, pricing software might
need the cost of a six-cylinder engine option instead of
the standard four-cylinder engine to calculate the total
car price.
The software sends this request to the DBMS, which
then determines the logical access path. The DBMS
works with system programs to locate the data on a
storage device, like a disk drive.
Storing and Retrieving Data
It follows a physical access path to the exact location of the data, retrieves the price for the six-
cylinder engine, and sends it back to the application.
This same sequence of actions applies when a user seeks information from the database.
Initially, the user makes a data request to the DBMS.
For instance, a user might issue a command such as "LIST ALL OPTIONS FOR WHICH PRICE IS
GREATER THAN 200 DOLLARS.“
This represents the Logical Access Path (LAP).
Subsequently, the DBMS might access the section of the disk containing option prices to retrieve
the required information for the user which is the Physical Access Path (PAP).
Storing and Retrieving Data
Difficulties can arise when two or more individuals or programs attempt to access the same
record in the database simultaneously.
For instance, an inventory management program might try to reduce the inventory level for a
product by ten units due to recent customer shipments, while simultaneously, a procurement
program might attempt to increase the inventory level for the same product by 200 units
because of a fresh inventory receipt
In the absence of proper database control mechanisms, one of these inventory updates could be
incorrect, leading to an inaccurate product inventory level.
To prevent such issues, concurrency control methods can be implemented. One possible
approach is to restrict all other application programs from accessing a record if that record is
being updated or in use by another program
Manipulating Data and Creating Reports
After a Database Management System (DBMS) is successfully installed, it becomes a powerful tool for
employees, managers, and consumers to access essential information and generate reports. This
functionality is especially critical in meeting regulatory and operational needs.
A DBMS simplifies this process by organizing the data in a structured way, enabling quick and accurate
retrieval of information.
With its ability to generate customized reports, the system can pull together specific details which not
only helps the company comply with the law but also improves efficiency and data accuracy.
Furthermore, managers can analyze trends in ingredient usage, employees can ensure compliance with
production standards, and consumers can make informed choices based on the disclosed information.
◦ Allergen and Consumer Protection Act of 2006, food manufacturers are required to provide reports on their
ingredients, formulations, and preparation methods for public transparency. A DBMS helps companies easily
generate and share these reports to meet the legal requirements.
Manipulating Data and Creating Reports
Some database systems use a visual method called Query-by-Example (QBE) to create database
queries or requests.
This approach simplifies the process for users by providing a graphical interface similar to the
user-friendly environment of operating systems like Windows.
Instead of writing complex query code, users can perform tasks by interacting with the interface,
such as opening windows, selecting data, and clicking on the desired features.
For example, QBE allows users to build queries by filling out forms or templates that represent
the data they want to retrieve.
This intuitive process eliminates the need for technical knowledge of query languages like SQL,
making it accessible even to non-technical users.
By simply selecting fields, specifying conditions, or clicking on options, users can extract, update,
or manipulate data effortlessly.
Manipulating Data and Creating Reports
Alternatively, database commands can be incorporated into
programming languages.
For instance, C++ commands can be employed in straightforward
programs designed to access or manipulate specific data elements
within the database. An example of a DBMS query is as follows:
"SELECT * FROM EMPLOYEE WHERE JOB_CLASSIFICATION = 'C2'."
The asterisk (*) signifies that the program should include all columns
from the EMPLOYEE table.
Generally, the commands used for database manipulation are part
of the data manipulation language (DML) that is provided alongside
the DBMS.
This specific language empowers managers and other database
users to access, modify, and query the data contained within the
database, thereby generating reports. It is important to note that
application programs interact with schemas and the DBMS before
directly accessing physically stored data on storage devices like disks.
Manipulating Data and Creating Reports
In the 1970s, D. D. Chamberlain and colleagues at the IBM
Research Laboratory in San Jose, California, devised a
standardized data manipulation language known as Structured
Query Language (SQL, pronounced as "sequel").
There has been a growing interest in integrating SQL into
relational databases on both mainframe and personal
computer systems.
SQL offers numerous built-in functions, such as AVG (average),
MAX (maximum value), MIN (minimum value), and others. SQL
enables programmers to master a single powerful query
language that can be applied across a spectrum of systems,
ranging from personal computers to large mainframes
Manipulating Data and Creating Reports
SQL (Structured Query Language) is highly valued by programmers and database users because
it can be integrated into many programming languages, such as C++ and COBOL. Its
standardized, simple methods for retrieving, storing, and manipulating data make it a user-
friendly and popular database query language.
Once a database is set up and populated, it can generate various outputs, such as reports or
documents, displayed on screens or printed.
Output-control features allow users to select specific records, perform calculations, and
customize reports with formatting tools like headings making reports highly customizable and
useful for decision-making.
A DBMS (Database Management System) can generate a wide range of reports, including
summary reports that provide insights into company operations, like accounting reports
showing current and overdue accounts.
Businesses also use status reports to track the progress of orders and other operations, helping
to guide routine decisions.
Manipulating Data and Creating Reports
Databases play a crucial role in supporting executives and decision-makers by
offering valuable insights for informed decision-making.
For example, Intellifit's database helps online shoppers make better clothing
purchase decisions. The process begins with scanning a customer's body at one
of the company's locations, capturing around 200,000 measurements to create a
3-D body model.
The database then compares the customer's actual measurements with the size
charts of online clothing retailers to ensure the best fit.
This process helps shoppers avoid the uncertainty of online shopping by
improving size accuracy, thus reducing the likelihood of returns and increasing
customer satisfaction.
Popular Database management systems
There are many DBMS available for individual users, such as Microsoft Access and FileMaker
Pro as mentioned previously in this chapter
However, professional-grade systems designed for programmers run on larger computers like
midrange systems, mainframes, and supercomputers.
Major players in this sector include IBM, Oracle, and Microsoft, with the industry generating
billions in revenue annually.
While Microsoft is a leader in desktop software, its presence in the larger DBMS market is
relatively smaller.
A newer type of DBMS, called Database as a Service (DaaS) or Database 2.0, is becoming
increasingly popular. Similar to Software as a Service (SaaS) which is one of the services
provided by cloud computing, DaaS hosts databases on service provider servers, allowing clients
to access them over the Internet.
Popular Database management systems
The service provider usually handles database management and maintenance. DaaS is part of
the broader cloud computing model, which uses large clusters of computers to deliver high-
performance applications and manage information systems externally.
Many companies are adopting the DaaS model, including major organizations like Google,
Microsoft, Intuit, and Trackvia.
For example, companies such as JetBlue Airways, XM Radio, and Bank of America use
QuickBase from Intuit to manage their databases externally.
JetBlue, for instance, relies on DaaS for efficient organization and management of IT projects.
With DaaS, team members can access and manage databases from any location with an internet
connection, simplifying monitoring and project management without the need for on-site
infrastructure, updates, or security handling.

You might also like