Chapter 6
Chapter 6
CHAPTER 6
FOUNDATIONS OF BUSINESS INTELLIGENCE:
DATABASES AND INFORMATION MANAGEMENT
Information is becoming as important a business resource as money, material, and people. Even though a
company compiles millions of pieces of data doesn’t mean it can produce information that its employees,
suppliers, and customers can use. Businesses are realizing the competitive advantage they can gain by
compiling useful information, not just data.
An effective information system provides users with accurate, timely, and relevant information. Accurate
information is free of errors. Information is timely when it is available to decision makers when it is
needed. Information is relevant when it is useful and appropriate for the types of work and decisions that
require it. Many businesses don’t have timely, accurate or relevant information because data in their
information systems have been poorly organized and maintained. That’s why data management is so
essential
A computer system organizes data in a hierarchy that starts with the bit, smallest unit of data a computer
can handle, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one
character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to
form a record. Related records can be collected to form a file, and related files can be organized into a
database.
An entity is basically the person, place, thing, or event on which you store and maintain information.
Each characteristic or quality describing an entity is called an attribute.
In the table below, each column describes a characteristic (attribute) of John Jones who is the entity.
In most organizations, systems tended to grow independently without a company-wide plan. Accounting,
finance, manufacturing, human resources, and sales and marketing all developed their own systems and
data files. Each application required its own files and its own computer program to operate. In the
company as a whole, this process led to multiple master files created, maintained, and operated by
separate departments. Over the years, the organization is saddled with hundred of programs and
applications that are very difficult to maintain and manage. The resulting problems are data redundancy
and inconsistency, program-data dependence, inflexibility, poor data security, and an inability to share
data among applications.
Program-Data Dependence:
It refers to the coupling of data stored in files and the specific programs required to update and maintain
those files such that changes in programs require changes to the data.
Lack of Flexibility:
A traditional file system can deliver routine scheduled reports after extensive programming efforts, but it
cannot deliver ad hoc reports or respond to unanticipated information requirements.
Poor Security:
Because there is little control or management of data, access to information may be out of control.
Management may have no way of knowing who is accessing or even making changes to the
organization’s data.
The key to establishing an effective, efficient database is to involve the entire organization as much as
possible, even if everyone will not immediately be connected to it or use it.
A database is a collection of data organized to serve many applications by centralizing data and
controlling redundant data.
A Database Management System (DBMS) is software that permits an organization to centralize data,
manage them efficiently, and provide access to the stored data by application programs. The DBMS acts
as an interface between application programs and the physical data files.
- A DBMS reduces data redundancy and inconsistency by minimizing isolated files with repeated
data, and eliminates inconsistency.
- A DBMS makes the physical database available for different logical views.
2
MIS400 – Business Information Systems Prof.: Micheline TABET
N.B.: A logical view presents data as they would be perceived by end users. A physical view shows
how data are organized and structured on physical media.
Relational DBMS
The most popular type of DBMS today is the relational database (Ex.: Microsoft Access). A relational
database represents data as two-dimensional tables called relations. Tables may be referred to as files.
Each table contains data on an entity and its attributes. The information about a single entity is called a
row. Rows are commonly referred to as records, or in very technical terms, as tuples. Data in each table
are broken down into fields. A field, or column, contains a single attribute for an entity. Each record
requires a key field, or unique identifier for the record. In a relational database, each table contains a
primary key, a unique identifier of the table’s records. To make sure the tables relate to each other, the
primary key from one table is stored in a related table as a foreign key.
Object-Oriented DBMS
Many applications require databases that can store and retrieve not only structured numbers and
characters but also drawings, images, photographs, voice, and full-motion video. OODBMS can handle
multimedia applications. It stores data and procedures as objects that can be automatically retrieved and
shared. It is relatively slow compared to relational DBMS.
Cloud computing service companies provide a way to manage the company’s data through Internet access
using a Web browser. But these services typically have less functionality than the relational database.
Pricing for cloud-based database services are predicated upon:
Usage – small databases cost less than larger ones
Volume of data stored
Number of input-output requests
Amount of data written to the database
Amount of data read from the database
There are three important capabilities of DBMS that traditional file environments lack – data definition,
data dictionary, and a data manipulation language.
Data definition capability: specifies the structure of the content of the database. It would be used to
create tables and to define the characteristics of the fields in each table.
3
MIS400 – Business Information Systems Prof.: Micheline TABET
Data dictionary: is an automated or manual file that stores definitions of data elements and their
characteristics. Data dictionaries for large corporate databases may capture additional information, such
as usage; ownership (who in the organization is responsible for maintaining the data); authorization;
security; and the individuals, business functions, programs, and reports that use each data element.
Data manipulation language: This is the third important capability of a DBMS. It’s a formal language
used to add, change, delete, retrieve the data in the database and make sure they are formulated into useful
information. The goal of this language should be to make it easy for users to build their own queries and
reports. Data manipulation languages are getting easier to use and more prevalent. SQL (Structured
Query Language) is the most prominent language and is now embedded in desktop applications such as
Microsoft Access.
Designing Databases
To create a database, you must understand the relationships among the data, the type of data that will be
maintained in the database, how the data will be used, and how the organization will need to change to
manage data. The database requires both a conceptual design and a physical design. The conceptual, or
logical, design of a database is an abstract model of the database from a business perspective, whereas the
physical design shows how the database is actually arranged on direct-access storage devices.
You want to avoid redundancy between tables and not allow a relationship to contain repeating data
groups. You do not want to have two tables storing a customer’s name. That makes it more difficult to
keep data properly organized and updated. What would happen if you changed the customer’s name in
one table and forgot to change it in the second table? Minimizing redundancy and increasing the stability
and flexibility of databases is called normalization. In other words, normalization is the process of
creating small, stable, flexible, and adaptive data structures from complex groups of data.
Whichever relationship type you use, you need to make sure the relationship remains consistent by
enforcing referential integrity. That is, if you create a table that points to another table, you must add
corresponding records to both tables.
Database designers document their data model with an entity-relationship diagram like the one below.
This diagram illustrates the relationship between the different entities.
4
MIS400 – Business Information Systems Prof.: Micheline TABET
Businesses use their databases to keep track of basic transactions, such as paying suppliers, processing
orders, keeping track of customers, and paying employees. But they also need databases to provide
information that will help the company run the business more efficiently, and help managers and
employees make better decisions. If a company wants to know which product is the most popular or who
its most profitable customer is, the answer lies in the data.
In a large company, with large databases or large systems for separate functions, such as manufacturing,
sales, and accounting, special capabilities and tools are required for analyzing vast quantities of data and
for accessing data from multiple systems. These tools include data warehousing, data mining and tools for
accessing internal databases through the web.
Data Warehouses
As organizations want and need more information about their company, their products, and their
customers, the concept of data warehousing has become very popular. Remember those islands of
information we keep talking about? Unfortunately, too many of them have proliferated over the years and
now companies are trying to rein them in by using data warehousing.
A data warehouse is a database that stores current and historical data of potential interest to decision
makers throughout the company. The data warehouse consolidates and standardizes information from
different operational databases so that the information can be used across the enterprise for management
analysis and decision making. The data warehouse makes the data available for anyone to access as
needed, but it cannot be changed. The data warehouse system will provide queries, analysis and reporting
tools.
As the figure shows, the data come from a variety of sources, both internal and external to the
organization. They are then stored together in a data warehouse from which they can be accessed and
analyzed to fit the user’s needs.
Companies often build enterprise-wide data warehouses, where a central data warehouse serves the entire
organization, or they create smaller, decentralized warehouses called data marts. A data mart is a subset
of a data warehouse in which a summarized or highly focused portion of the organization’s data is placed
in a separate database for a specific population of users. A data mart typically focuses on a single subject
area or line of business.
5
MIS400 – Business Information Systems Prof.: Micheline TABET
Using data warehouses and data marts correctly can give management a tremendous amount of
information that can be used to trim costs, reduce inventory, put products in the right stores at the right
time, attract new customers, or keep old customers happy.
Tools for Business Intelligence: Multidimensional Data Analysis and Data Mining
Businesses collect millions of pieces of data. Using the right tools, a business can use its data to develop
effective competitive strategies that we discussed in previous chapters. Rather than guessing about which
products or services are your best sellers, business intelligence provides concrete methods of analyzing
exactly what customers want and how best to supply them.
As technology improves, so does our ability to manipulate information maintained in databases. Have
you ever played with a Rubiks Cube–one of those little multicolored puzzle boxes you can twist around
and around to come up with various color combinations? That’s a close analogy to how multidimensional
data analysis or online analytical processing (OLAP) works. In theory, it’s easy to change data around
to fit your needs.
Data Mining
Data mining: tool for allowing users to find hidden patterns, insight and relationships in data. For
instance, data mining can tell you that on a hot summer day in the middle of Texas, more bottled water is
sold in convenience stores than in grocery stores. That’s information managers can use to make sure
more stock is targeted to convenience stores. Data mining could also reveal that when customers
purchase white socks, they also purchase bottled water 62 percent of the time. We seriously doubt there
is any correlation between the two purchases. The point is that you need to beware of using data mining
as a sole source of decision making and make sure your requests are as focused as possible.
6
MIS400 – Business Information Systems Prof.: Micheline TABET
These are the five types of information managers can obtain from data mining:
Many companies collect lots of data about their business and customers. The most difficult part has been
to turn that data into useful information. Organizations are using predictive analytics to create new
opportunities for connecting with their customers by extracting information more easily and more
precisely from their data warehouses. Firms are using better data mining techniques to target customers
and suppliers with just the right information at the right time.
For instance, based on past purchases, Chadwick’s clothing retailer determines that a customer is more
likely to purchase casual clothing than formal wear at certain times of the year. Based on its predictive
analysis, the retailer then tailors its sales offers to meet that expected behavior.
Much of the data created that might be useful to businesses is stored not in databases but in text-based
documents. Word files, emails, call center transcripts and services reports contain valuable data that
managers can use to assess operations and help make better decisions about the organization.
Unfortunately, there has not been an easy way to mine those documents until recently. Text mining tools
help scrub text files to find data or to discern patterns and relationships.
Because so much business is taking place over the Web, businesses are trying to mine data from it also.
There are three categories of Web mining processes:
Web content mining: Extract knowledge from the content of Web pages – text, images, audio, and
video
Web structure mining: Data related to the structure of a Web site – links between documents
Web usage mining: User interaction data recorded by Web servers – user behavior on a Web site
Web browsers are far easier to use than most of the query languages associated with the other programs
on mainframe computer systems. Companies realize how easy it is to provide employees, customers, and
suppliers with Web-based access to databases rather than creating proprietary systems. It’s also proving
cheaper ways to create “front-end” browser applications that can more easily link information from
disparate systems than trying to combine all the systems on the “back-end”. That is, you link internal
databases to the Web through software programs that provide a connection to the database without major
reconfigurations. A database server, which is a special dedicated computer, maintains the DBMS. A
software program, called an application server, processes the transactions and offers data access. A user
making an inquiry through the Web server can connect to the organization’s database and receive
information in the form of a Web page.
7
MIS400 – Business Information Systems Prof.: Micheline TABET
Figure 6-14 shows how servers provide the interface between the database and the Web.