0% found this document useful (0 votes)
9 views

Mod 3 Information Integration - Instructor

Information Integration in Business Intelligence

Uploaded by

Devika
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Mod 3 Information Integration - Instructor

Information Integration in Business Intelligence

Uploaded by

Devika
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Business Intelligence

Module 3: Data Integration


Structured & Unstructured Information
Last Week

• Organizational Memory
• Knowledge Repositories

• Data Warehouses vs. Data Marts


• Data Warehouse Process
• Raw Data  ETL  Data Cleaning  Reporting
The Four BI Capabilities

• OLAP
• Visualization
• Digital Dashboards
• Scorecards
Presentation • Business Performance Management

• Data Mining
• Business Analytics
Creation of Insights • Real-time Decision Support

• Environmental Scanning
• Text Mining
• Web Mining
Information Integration • Radio Frequency Identification
Devices

• Data Warehousing
• Enterprise Resource Planning
• Knowledge Repositories
Organizational Memory
• Digital Content Management Systems
• Document Management Systems

© Sabherwal & Becerra-Fernandez


Structured Data

Structured Data: traditional data warehouse solutions have been based on


consolidating the structured data (numerical and categorical) recorded in
operational systems where business information sets are identified by predefined
structures (referred to as ‘schema on write’) and where a rules-driven template
ensures data integrity.
The business analysis focus is on data models, data dictionaries and business
rules to define information requirements and capabilities.
• Resides in a database
• Requires metadata
• Designed for easy access

Source: BABOK v3.


Data Model
• Organizes data elements and standardizes how data
elements relate to one another.
• Aid in communication between stakeholders and BA’s
who define requirements and the technical people
defining the response to the requirements.
• Data models determine the structure of the data.
• Data models are often in graphical form.
Example of a Data Model
Consider a simple data model for a library system:

Entities:
Book: BookID, Title, Author, ISBN
Member: MemberID, Name, MembershipDate
Loan: LoanID, BookID, MemberID, LoanDate, ReturnDate

Relationships:
A Member can borrow multiple Books (one-to-many).
A Book can be loaned to multiple Members over time (many-to-many, implemented via the
Loan entity).

•Connolly, T., & Begg, C. (2015). Database Systems: A Practical Approach to Design,
Implementation, and Management (6th ed.). Pearson.
Business Rules
• Guide everyday decision-making within businesses by
outlining the relationships between objects,
• For example, customer names and their corresponding orders.
• The translating of an organization’s business activities
into business logic.
• Business rules are typically expressed using formal
logic qualifiers, such as: "IF-THEN", "IF-ELSE", "ONLY
IF", "WHEN", et cetera.
• For example: Banking and real estate markets leverage
business rules for application processes for housing
loans or rental properties. For example, an organization may
reject an applicant if their credit score is under a specific
threshold.
Data Dictionary
A data dictionary is used to standardize a definition of a data element and enable a
common interpretation of data elements. A data dictionary contains definitions of
each data element and indicates how those elements combine into composite
data elements. Data dictionaries are used to standardize usage and meanings of data
elements between solutions and between stakeholders.
Primitive Data Data Element 1 Data Element 2 Data Element 3
Elements
Name
Name First Name Middle Name Last Name
referenced by
data elements
Alias
Alternate Given Name Middle Name Surname
name
Required referenced by
stakeholders
elements Values/Meanings Minimum 2 Minimum 2
Enumerated list characters Can be omitted characters
or description of
data element

Description First Name Middle Name Family Name


Definition
Composite
Customer Name = First Name + Middle Name + Family Name

Source: BABOK v3.


Data Dictionary Example 2
Field Name Data Type Description Constraints/Notes
Primary key,
Unique identifier
CustomerID Integer for a customer Automatically
generated

Customer's first Max length 50


FirstName Text
name characters

LastName Text Customer's last Max length 50


name characters

Max length 100


Customer's email
Email Text characters, Must be
address
valid email format

Max length 20
Customer's phone
PhoneNumber Integer characters,
number
Optional

DateOfBirth Date Customer's date of Optional, Format:


birth YYYY-MM-DD

Timestamp of when Automatically


the customer generated, Format:
AccountCreation DateTime account was YYYY-MM-DD
created HH:MM:SS

Status of the
IsActive Boolean Default is true
customer's account
Data Dictionary (page 3)

Strengths
• Provides all stakeholders with a shared understanding of the format and
content of relevant information.
• A single repository of corporate metadata promotes the use of data
throughout the organization in a consistent manner

Limitations
• Requires regular maintenance.
• All maintenance must be consistent.
• Care must be taken to consider the metadata required by multiple
scenarios.

Source: BABOK v3.


Fix this Data Dictionary!

Data Dictionary of driver information: Drivers in NSW (New South Wales)

Field Name Data Type Data Format Field Size Description Example

License Id Integer NNNNNN 6 Unique # id for all 12345


drivers
Surname Text 20 Surname of driver Jones

First Name Text 20 First Name of Driver Arnold

Address Text 50 First Name of Driver 11 Rocky St


Como 2233
Phone No. Text 10 License holders contact 0400111222
number
Date of Date/Time DD/MM/YYYY 10 Drivers Date of Birth 08/05/1956
Birth
Fix the Data Dictionary 2

Field Name Data Type Data Format Field Size Description Example

License Id Integer NNNNNN 6 Unique # id for all 12345


drivers
Surname Text 20 Surname of driver Jones

First Name Text 20 First Name of Driver Arnold

Address Text 50 First Name of Driver 11 Rocky St


Como 2233
Phone No. Text NNNNNNNNNN 10 License holders contact 0400111222
number
Date of Date/Time DD/MM/YYYY 10 Drivers Date of Birth 08/05/1956
Birth
Unstructured Data

Unstructured Data: business intelligence solutions can include semi-structured or


unstructured data which includes text, images, audio, and video. This data
frequently comes from external sources. For this type of data, the structure and
relationships are not predefined, and no specific organization rules have been
applied to ensure data integrity. Information sets are derived from the raw data
(referred to as ‘schema on read’).

The business analysis focus is on definitions


and data matching algorithms to define
information requirements and capabilities.

• No data model
• Metadata is not required

Source: BABOK v3.


Text – Unstructured Data
• Text is becoming increasingly important as a source of data – social media, online reviews,
internal documents, etc.
• Text has structure, but it is a language / syntax structure – which itself is prone to errors in
common practice.
• Working with text data requires a transformation of text into components that can be
categorized. For example, parts of speech (noun, verb, adjective) and the emotive content
(positive, negative, neutral).
• Sentiment Analysis: is a natural language processing (NLP) technique that determines
whether a piece of content is positive, negative, or neutral.
• Best NLP 2023 - Compare NLP Software | TEC (technologyevaluation.com)

Source: psu.edu
Schema on Write/Schema on Read
In the context of data management and databases, a schema is a structural
definition or description of the entire database. It outlines the organization, the
structure of the data, and the relationships between the various parts of the data
in a given database or system. Essentially, a schema acts as a blueprint for how
data is stored, organized, and manipulated, allowing for efficient access and
management of the data.

On Write
Structure and definitions are determined in advance of data being accessed
On Read
Structure and definitions are determined as data is being accessed
Schema on Write
• This is the ‘database standard’.
• The structure of the database is defined in advance.
• As data comes in, it is sorted and checked based on rules. Data that does not meet
the requirements is discarded.
• This is encompassed by the ETL process discussed in the previous module.

Advantages of schema on write:


• Because the structure is well-defined and data is categorized using metadata,
queries are very fast and accurate.

Disadvantages of schema on write:


• Because the data is structured so carefully, queries must also be exact.
• It can be difficult to change metadata requirements once set up.
• In practice, the raw data is irretrievably ‘lost’.
Schema on Read
• Can be called a ‘data lake’.
• Data is stored in its original format.
• Structure is defined when the data is needed, according to the user’s purpose.

Advantages of schema on read:


• There is a great deal of flexibility in how the data will be used, and in querying
the data. This includes combining data from different sources.
• No data is discarded since there is not an upfront ETL process.

Disadvantages of schema on read:


• Data, when accessed, may have inaccuracies, missing data, be invalid, etc.
• Query speed can be very slow. If the query doesn’t produce the expected
results, it could be a problem with the data, or a problem with the query as
written.
Business Intelligence:
Data Integration & Environmental
Scanning
The Four BI Capabilities revisited

• OLAP
• Visualization
• Digital Dashboards
• Scorecards
Presentation • Business Performance Management

• Data Mining
• Business Analytics
Creation of Insights • Real-time Decision Support

• Environmental Scanning
• Text Mining
• Web Mining
Information Integration • Radio Frequency Identification
Devices

• Data Warehousing
• Enterprise Resource Planning
• Knowledge Repositories
Organizational Memory
• Digital Content Management Systems
• Document Management Systems

© Sabherwal & Becerra-Fernandez


Information Integration
• We need to integrate the data from transactional systems (ERP) with
unstructured data, and information from other sources (web).
• Organizations will be at a different level in their data warehouse
evolution – some organizations may have state of the art enterprise
systems to capture structured data, while others may rely on excel
sheets, for example.
• Organizations may rely primarily on internal data, or they may heavily
augment this data with external data (such as market research data).
• Organizations need to develop a business model that supports the
return on investment on the purchase and integration of external data.
• The perceived uncertainty and number of resources an organization
spends on information search will dictate the level of environmental
scanning.
Data Integration Example
Environmental Scanning
Organizational Intrusiveness

Enacting Searching Quadrant Description


Passive Activ

Satisfied with limited,


Undirected Viewing casual information.
e

Undirected Conditional
Viewing Viewing Little internal resources
Conditioned Viewing but using a lot of external
industry resources.
Unanalyzable Analyzable Gather information by
Enacting experimentation and
practice.
Daft and Weick, 1984
Environmental Analyzability Systematically analyze
Searching data to produce market
forecasts, trend analysis.

Environmental scanning is how a company views and leverages external information


to support tactical and strategic decisions

Source: Business Intelligence, Wiley 2011.


Environmental Scanning Activity

• Go online and find some examples of external


research that could be purchased or found for free
for the auto industry.
• Which companies are prominent?
• Are you surprised by the amount of free data
available?
• How could this impact your work as a business
analyst in positive or negative ways?
• Is this source trustworthy? Why or why not?
Environmental Scanning 2
Enacting Searching
Gather information by experimentation and Systematically analyze data to produce market
practice. forecasts, trend analysis.

Example: Creating Google Alerts; assessing Example: Seeking trustworthy, consistent sources
specific market/industry data; testing different which may be integrated into the organization’s
Organizational Intrusiveness

data files and formats; short-term information Business Intelligence process for the purposes of
acquisition effort. influencing decision making; data subscriptions;
paid market information.

Undirected Viewing Conditioned Viewing


Satisfied with limited, casual information. Little internal resources but using a lot of external
industry resources.

Example: Exploratory browsing on the internet; Example: Focused surfing on LinkedIn for
casual business conversation; unfocused, further company updates; casually reading industry
action unlikely. magazines, business section of newspapers for
specific topics of interest.

Environmental Analyzability
Business Intelligence:
The Business Analysis
Perspective
The Business Analysis Perspective

How does a BA help an organization to integrate their information?

By providing 4 key outcomes that guide the project


which serve as a blueprint for the new data warehouse

Source: BABOK v3.


Four Business Analysis Outcomes

Source logical data model and data dictionary


• Standard definition of the required data as held in each source system.
• Provides a data definition of each element and the business rules applied to it: business
description, type, format, length, legal values and any inter-dependencies.

Source data quality assessment


• Evaluates the completeness, validity, and reliability of data from the source systems.
• Identifies where further verification and enhancement of source data is required to ensure
consistent business definitions and rules apply across the enterprise-wide data asset.

Target logical data model and data dictionary


• Presents an integrated, normalized view of the data structures required to support the business
domain.
• The data dictionary provides the standardized enterprise-wide definition of data elements and
integrity rules.

Transformation rules
• Map source and target data elements to specify requirements for the decoding/encoding of
values and for data correction (error values) and enrichment (missing values) in the
transformation process.

Source: BABOK v3.


Four Business Analysis Outcomes 2
Source logical data model and data dictionary

Source data quality assessment

POS

Extract,
CRM Transform
Data
and Load Reporting
Cleaning
(common
ERP format)

Web
Target logical data model and data dictionary

Transformation rules
Environmental Scanning Assignment
• Assignment Introduction and Learning Goals
• The objective of this assignment is to assess your ability to gather relevant information, analyze industry trends, and make
informed recommendations based on your findings. For this assignment, you will conduct an environmental scanning analysis of a
selected industry. Environmental scanning is a crucial aspect of business analysis that involves identifying and analyzing external
factors that can impact an organization's performance and decision-making.
• Assignment Value
• This assignment is worth 10% of your final grade.
• Assignment Instructions
• Provide a brief overview using these environmental scanning steps as a guide:
• 1. Choose an industry of your interest such as technology, healthcare, retail, energy, etc.
• 2. Identify and briefly describe key external factors that are currently impacting the chosen industry. These factors may include
economic, social, technological, legal, and environmental influences.
• 3. Research and analyze the competitive landscape within the chosen industry. Identify major competitors, their strengths,
weaknesses, and any recent strategic moves they have made. Sources could include news articles, social media, etc.
• 4. Explore recent and ongoing changes in the market that are affecting the industry. Identify emerging trends, consumer
preferences, technological advancements, and any new opportunities that organizations within the industry can capitalize on. This
insight can be found from consumer reviews (Amazon, Reddit threads), social media (ie: TikTok, Instagram), news articles, etc.
• 5. Based on your analysis, provide actionable recommendations for businesses operating within the selected industry. Suggest
strategies that can help organizations navigate challenges and take advantage of opportunities.
• Due Date
• SEPTEMBER 27th
• How to Submit your Assignment
• Submit a 2-3 page overview of your findings in either PDF or Word format.
Please include your name and student number on your assignment.
• Please DO NOT use AI to create this assignment. AI includes Chat GPT and Grammarly PRO. Use can use AI to do RESEARCH, however you and
your team member need to write this report using your OWN WORDS!
• I would like to see a BIBLIOGRAPHY of all references used in the report.
• I have TURNITIN and AI sensing software.
THE END

• Next Week: Business Intelligence Insight Generation

You might also like