SlideShare a Scribd company logo
Business Intelligence
Portfolio



David N. Maeda
dave.maeda@gmail.com
919-606-5772
In the Beginning …
• “Put all your eggs in one basket, and …
  watch the basket.”
                                     Mark Twain


• “Data is only valuable if it can be accessed in
  a timely fashion.”
                                     An IMS/DC Axiom
Table of Contents
• An Introduction
• A Problem Sampler
   – Diagnostician at Play
   – A Little Dirty Data
   – A SQL Query
• SSIS and ETL Options
   – SSIS and Data Management
• BIDS, SSAS, and MDX
   – New Tools, Growing Arsenal
• At Your Service …
David Maeda: An Introduction
• Completing an intense 10 week course on Microsoft
  Business Intelligence technologies, i.e. SQL Server,
  T-SQL, SSIS, SSAS, SSRS, and Visual Studio
  interfaces.
• Broad background in IT including expertise in
  database and transaction management systems.
• Experience includes leadership and project
  management positions.
• An accomplished diagnostician and software
  engineer.
Diagnostician At Play
• Earlier this year, I got a good deal on a nice fly reel intended
  for 9 and 10 weight lines. While using the reel for striped
  bass on the Roanoke River several weeks later, I noticed that
  the drag did not tightened down to a point where it was
  effectively useful.
• An exchange of emails with the US distributor got me a new
  one way clutch bearing but it did not fix the issue.
• Examining the parts diagram for the reel, I decided to add a
  7 cent wave lock washer to the drag assembly. Tested reel on
  the Roanoke. Problem resolved.
• Notified the distributor. After an evaluation, the fix was
  adopted by the manufacturer several days later.
A Little Dirty Data Problem
• In dealing with a national organization, membership
  information was found to have the following issues:
   –   30% to 60% of the email address were bad
   –   10% of the regular mail addresses were bad
   –   Inconsistent data formats in downloaded CSV files
   –   Multiple entries per member
• The Problem: How to work around the “questionable” data
  and maintain effective membership communications with
  the following criteria:
   – Minimize expenses
   – On average, needs less than 4 hours per week to manage
A Little Dirty Data Problem
              • The Solution:
                  o Design a database to allow
                     downloads to update
                     existing data without
                     affecting “local” data.
                  o The Members table is what
                     gets downloaded.
                  o The MemberExtension
                     table is the repository for
                     “local” data.
                  o Manage both tables via a
                     web based user interface
                     (UI).
                  o UI is implemented with
                     PHP and JavaScript.
                  o Automate as much as
                     possible.
A Little Dirty Data Problem
• Implementation:
   – A Nasty Surprise: CSV Data as downloaded would not import cleanly
     into MySQL. This was due to MySQL load data infile processing
     requiring certain characters to be escaped.
       • A short Java script was written to transform the downloaded CSV file
         into the necessary format prior to importing it into MySQL.
   – Any downloaded data is considered “questionable”.
       • MySQL load data infile processing overlays existing records.
       • Restrict downloaded updates to only affect the Members table.
   – The Members and MemberExtension tables are synchronized as part
     of the update process invoked from the UI.
       • Every Members entry has a corresponding MemberExtension entry.
       • A new MemberExtension will be created if necessary and initialized with
         date and email info if present.
       • Existing MemberExtension entries are not touched.
A Little Dirty Data Problem
o A Utilitarian UI
    • Apache
    • HTML Frames
    • AJAX
    • PHP
A Little Dirty Data Problem
• In Summary:
   – We were able to circumvent most of the dirty data issues by isolating
     the “questionable” data.
   – The MySQL RDBMS supports ad hoc SQL queries should the necessity
     to alter tables, etc arise.
   – Expenses were minimized by:
       • Using freely available components, i.e. Java, Apache 2.2, PHP 5, MySQL
         5.2, and JavaScript.
       • Using volunteer labor to write the ETL code.
   – A download and update sequence takes less than 10 minutes.
   – A typical request to update the email distribution takes less than 5
     minutes.
   – Managing the database and generating the necessary distribution
     lists via the provided UI takes typically less than 4 hours per week.
A SQL Query
• On a recent phone interview, I was asked:
    – How would you construct an SQL query to find the second highest sales
      total?
• My answer was:
    – Use a pair of nested queries. The inner query would ascertain the top 2
      totals. The outer query would return the lower of the two totals.
• In T-SQL this looks something like (It may look somewhat different in
  other SQL dialects):
        select top 1 orderid, (unitprice * quantity) as 'totalsale'
        from [order details] where (unitprice * quantity) in
         (
           select top 2 (unitprice * quantity) as 'ordertotal'
           from [order details]
           group by (unitprice * quantity)
           order by ordertotal desc
         )
        order by totalsale asc
ETL Options and SSIS
package appCSV;
                                        o All CSV files are not
import java.io.*;
                                        created equal. Neither are the
import java.util.StringTokenizer;
                                        ETL tools used to prepare
/**                                     and load them into a
 * @author Dave Maeda
                                        database. Compare:
 *
 * Class to convert csv field form
 *                                      o To the left is a more
 * Invoke as: java appCSV.Convert
                                        traditional approach (as used
 *
 * Where: filename is the name of       for the Dirty Data problem).
 *       ext is the file extension.
 *
                                        o To the right is an approach
 * Output: A file named <filename>.
 * Note: ext will default to "csv" if   utilizing Microsoft’s SSIS
 */                                     facility.
public class Convert
{
  private static void usage()           o SSIS has Data Management
  {                                     applications beyond ETL.
    System.out.println("n");
    System.out.println(" >> Usage:
Data Management 101: DID
• Three basic principles:
  – Disclosure
     • Viewing of data
        – Who’s viewing your data and are they authorized to do so?
  – Integrity
     • Accuracy and currency of data
        – Data is only meaningful if it is accurate and up to date.
  – Durability
     • Data loss prevention
        – More data is lost to accidents than malicious actions.
BIDS, SSAS, and MDX
o Business Intelligence Design Studio (BIDS)
    • Ships as part on MS SQL Server

o SQL Server Analysis Server (SSAS)
    • OLAP store and engine
    • Builds multi-dimensional cubes

o Multi-Dimensional eXpressions (MDX)
    • Used to retrieve cube data
    • Used in SSAS Calculations and KPIs
SSRS

o Web Enabled
   • Report Management
   • Distribution

o Charts
    • Conditional Fonts
    • Calculated Members
    • Multiple Charting Options
    • Custom Colors

o Tables
    • Multiple Formatting Options
    • Data
    • Calculated Members
    • Conditional Fonts
MOSS, PPS, Dashboards, and KPIs

o MOSS
   • SharePoint Server

o PPS
    • PerformancePoint Server

o Dashboard
    • Scorecard

o KPIs
    • Parameters
    • Values
    • Goals and Status
    • Trends (not shown)
Excel Services
o Excel Local Client
    • Parameters
    • Pivot Table
    • Associated Chart



o Excel Services
    • MOSS
    • PPS Dashboard
    • PPS Report
          Parameters
         Chart
New Tools, Growing Arsenal
• Latest additions: BIDS, SSIS, SSAS, SSRS, and MDX
• Arsenal already includes:
   – OS platforms: z/OS, Windows, Unix (AIX and Sun), and
     Linux (Red Hat and SUSE)
   – Databases: IMS, DB2, Oracle, MySQL, and SQL Server
   – Languages: Assembler (IBM and Intel), C/C++, Java,
     JavaScript, PHP, Smalltalk, SQL, and REXX.
   – Core competencies: Leadership, process improvement,
     team facilitation, interpersonal communications, client
     relations, and project management.
At Your Service …
• David Maeda
  – Software Engineer
    • Business Intelligence Analyst
    • Diagnostician/Programmer
  – Hard working and Persevering
    • Personal Integrity and High Standards
  – Team Leader and Team Player
    • “Your prime directive as a leader is to position your
      team for success.”
The End

More Related Content

PDF
NoSQL Now! NoSQL Architecture Patterns
PPTX
Relational and non relational database 7
PDF
Oracle vs NoSQL – The good, the bad and the ugly
PDF
Storage Systems For Scalable systems
PDF
NoSQL-Database-Concepts
PPTX
Chapter1: NoSQL: It’s about making intelligent choices
PPTX
Selecting best NoSQL
NoSQL Now! NoSQL Architecture Patterns
Relational and non relational database 7
Oracle vs NoSQL – The good, the bad and the ugly
Storage Systems For Scalable systems
NoSQL-Database-Concepts
Chapter1: NoSQL: It’s about making intelligent choices
Selecting best NoSQL

What's hot (20)

PPTX
http://www.hfadeel.com/Blog/?p=151
ODP
Nonrelational Databases
PDF
Comparison between rdbms and nosql
PDF
NoSQL databases
PPTX
NoSQL Architecture Overview
PPT
RDBMS vs NoSQL
DOCX
Sql vs NO-SQL database differences explained
PPTX
NoSQL Data Architecture Patterns
PPTX
Sql vs NoSQL
PPTX
Rdbms vs. no sql
PDF
Relational vs. Non-Relational
PPT
PPTX
Microsoft SQL Server Data Warehouses for SQL Server DBAs
PPTX
Introduction to NoSQL
KEY
NoSQL databases and managing big data
PDF
A to z for sql azure databases
PPTX
Hardware planning & sizing for sql server
PPTX
What's new in SQL Server 2017
PPTX
NoSQL Consepts
PPTX
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
http://www.hfadeel.com/Blog/?p=151
Nonrelational Databases
Comparison between rdbms and nosql
NoSQL databases
NoSQL Architecture Overview
RDBMS vs NoSQL
Sql vs NO-SQL database differences explained
NoSQL Data Architecture Patterns
Sql vs NoSQL
Rdbms vs. no sql
Relational vs. Non-Relational
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Introduction to NoSQL
NoSQL databases and managing big data
A to z for sql azure databases
Hardware planning & sizing for sql server
What's new in SQL Server 2017
NoSQL Consepts
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
Ad

Similar to D Maeda Bi Portfolio (20)

PDF
Sam Kamara Business Intelligence Portfolio
PDF
Tufte Sample Bi Portfolio
PPTX
Colin\'s BI Portfolio
PPT
Skills Portfolio
PDF
Business Intelligence Presentation (1/2)
PPTX
William Canning Portfolio Annotated
PPT
Nitin\'s Business Intelligence Portfolio
PDF
Business intelligence: A tool that could help your business
PDF
Business Intelligence Portfolio Rahel Thomas
PPT
Business Intelligence Dev. Portfolio
PPTX
Introduction to Microsoft’s Master Data Services (MDS)
PPT
Tony Von Gusmann & MS BI
PPTX
Why ODS? The Role Of The ODS In Today’s BI World And How Oracle Technology H...
PPT
Kevin Fahy Bi Portfolio
PPTX
Oracle hyperion essbase
PPTX
Oracle hyperion essbase
PDF
SSAS Design &amp; Incremental Processing - PASSMN May 2010
PDF
Make Better Decisions With Your Data 20080916
PDF
DB2 Web Query whats new
DOC
Md 10 G1 Jeamaire Drone
Sam Kamara Business Intelligence Portfolio
Tufte Sample Bi Portfolio
Colin\'s BI Portfolio
Skills Portfolio
Business Intelligence Presentation (1/2)
William Canning Portfolio Annotated
Nitin\'s Business Intelligence Portfolio
Business intelligence: A tool that could help your business
Business Intelligence Portfolio Rahel Thomas
Business Intelligence Dev. Portfolio
Introduction to Microsoft’s Master Data Services (MDS)
Tony Von Gusmann & MS BI
Why ODS? The Role Of The ODS In Today’s BI World And How Oracle Technology H...
Kevin Fahy Bi Portfolio
Oracle hyperion essbase
Oracle hyperion essbase
SSAS Design &amp; Incremental Processing - PASSMN May 2010
Make Better Decisions With Your Data 20080916
DB2 Web Query whats new
Md 10 G1 Jeamaire Drone
Ad

D Maeda Bi Portfolio

  • 2. In the Beginning … • “Put all your eggs in one basket, and … watch the basket.” Mark Twain • “Data is only valuable if it can be accessed in a timely fashion.” An IMS/DC Axiom
  • 3. Table of Contents • An Introduction • A Problem Sampler – Diagnostician at Play – A Little Dirty Data – A SQL Query • SSIS and ETL Options – SSIS and Data Management • BIDS, SSAS, and MDX – New Tools, Growing Arsenal • At Your Service …
  • 4. David Maeda: An Introduction • Completing an intense 10 week course on Microsoft Business Intelligence technologies, i.e. SQL Server, T-SQL, SSIS, SSAS, SSRS, and Visual Studio interfaces. • Broad background in IT including expertise in database and transaction management systems. • Experience includes leadership and project management positions. • An accomplished diagnostician and software engineer.
  • 5. Diagnostician At Play • Earlier this year, I got a good deal on a nice fly reel intended for 9 and 10 weight lines. While using the reel for striped bass on the Roanoke River several weeks later, I noticed that the drag did not tightened down to a point where it was effectively useful. • An exchange of emails with the US distributor got me a new one way clutch bearing but it did not fix the issue. • Examining the parts diagram for the reel, I decided to add a 7 cent wave lock washer to the drag assembly. Tested reel on the Roanoke. Problem resolved. • Notified the distributor. After an evaluation, the fix was adopted by the manufacturer several days later.
  • 6. A Little Dirty Data Problem • In dealing with a national organization, membership information was found to have the following issues: – 30% to 60% of the email address were bad – 10% of the regular mail addresses were bad – Inconsistent data formats in downloaded CSV files – Multiple entries per member • The Problem: How to work around the “questionable” data and maintain effective membership communications with the following criteria: – Minimize expenses – On average, needs less than 4 hours per week to manage
  • 7. A Little Dirty Data Problem • The Solution: o Design a database to allow downloads to update existing data without affecting “local” data. o The Members table is what gets downloaded. o The MemberExtension table is the repository for “local” data. o Manage both tables via a web based user interface (UI). o UI is implemented with PHP and JavaScript. o Automate as much as possible.
  • 8. A Little Dirty Data Problem • Implementation: – A Nasty Surprise: CSV Data as downloaded would not import cleanly into MySQL. This was due to MySQL load data infile processing requiring certain characters to be escaped. • A short Java script was written to transform the downloaded CSV file into the necessary format prior to importing it into MySQL. – Any downloaded data is considered “questionable”. • MySQL load data infile processing overlays existing records. • Restrict downloaded updates to only affect the Members table. – The Members and MemberExtension tables are synchronized as part of the update process invoked from the UI. • Every Members entry has a corresponding MemberExtension entry. • A new MemberExtension will be created if necessary and initialized with date and email info if present. • Existing MemberExtension entries are not touched.
  • 9. A Little Dirty Data Problem o A Utilitarian UI • Apache • HTML Frames • AJAX • PHP
  • 10. A Little Dirty Data Problem • In Summary: – We were able to circumvent most of the dirty data issues by isolating the “questionable” data. – The MySQL RDBMS supports ad hoc SQL queries should the necessity to alter tables, etc arise. – Expenses were minimized by: • Using freely available components, i.e. Java, Apache 2.2, PHP 5, MySQL 5.2, and JavaScript. • Using volunteer labor to write the ETL code. – A download and update sequence takes less than 10 minutes. – A typical request to update the email distribution takes less than 5 minutes. – Managing the database and generating the necessary distribution lists via the provided UI takes typically less than 4 hours per week.
  • 11. A SQL Query • On a recent phone interview, I was asked: – How would you construct an SQL query to find the second highest sales total? • My answer was: – Use a pair of nested queries. The inner query would ascertain the top 2 totals. The outer query would return the lower of the two totals. • In T-SQL this looks something like (It may look somewhat different in other SQL dialects): select top 1 orderid, (unitprice * quantity) as 'totalsale' from [order details] where (unitprice * quantity) in ( select top 2 (unitprice * quantity) as 'ordertotal' from [order details] group by (unitprice * quantity) order by ordertotal desc ) order by totalsale asc
  • 12. ETL Options and SSIS package appCSV; o All CSV files are not import java.io.*; created equal. Neither are the import java.util.StringTokenizer; ETL tools used to prepare /** and load them into a * @author Dave Maeda database. Compare: * * Class to convert csv field form * o To the left is a more * Invoke as: java appCSV.Convert traditional approach (as used * * Where: filename is the name of for the Dirty Data problem). * ext is the file extension. * o To the right is an approach * Output: A file named <filename>. * Note: ext will default to "csv" if utilizing Microsoft’s SSIS */ facility. public class Convert { private static void usage() o SSIS has Data Management { applications beyond ETL. System.out.println("n"); System.out.println(" >> Usage:
  • 13. Data Management 101: DID • Three basic principles: – Disclosure • Viewing of data – Who’s viewing your data and are they authorized to do so? – Integrity • Accuracy and currency of data – Data is only meaningful if it is accurate and up to date. – Durability • Data loss prevention – More data is lost to accidents than malicious actions.
  • 14. BIDS, SSAS, and MDX o Business Intelligence Design Studio (BIDS) • Ships as part on MS SQL Server o SQL Server Analysis Server (SSAS) • OLAP store and engine • Builds multi-dimensional cubes o Multi-Dimensional eXpressions (MDX) • Used to retrieve cube data • Used in SSAS Calculations and KPIs
  • 15. SSRS o Web Enabled • Report Management • Distribution o Charts • Conditional Fonts • Calculated Members • Multiple Charting Options • Custom Colors o Tables • Multiple Formatting Options • Data • Calculated Members • Conditional Fonts
  • 16. MOSS, PPS, Dashboards, and KPIs o MOSS • SharePoint Server o PPS • PerformancePoint Server o Dashboard • Scorecard o KPIs • Parameters • Values • Goals and Status • Trends (not shown)
  • 17. Excel Services o Excel Local Client • Parameters • Pivot Table • Associated Chart o Excel Services • MOSS • PPS Dashboard • PPS Report  Parameters Chart
  • 18. New Tools, Growing Arsenal • Latest additions: BIDS, SSIS, SSAS, SSRS, and MDX • Arsenal already includes: – OS platforms: z/OS, Windows, Unix (AIX and Sun), and Linux (Red Hat and SUSE) – Databases: IMS, DB2, Oracle, MySQL, and SQL Server – Languages: Assembler (IBM and Intel), C/C++, Java, JavaScript, PHP, Smalltalk, SQL, and REXX. – Core competencies: Leadership, process improvement, team facilitation, interpersonal communications, client relations, and project management.
  • 19. At Your Service … • David Maeda – Software Engineer • Business Intelligence Analyst • Diagnostician/Programmer – Hard working and Persevering • Personal Integrity and High Standards – Team Leader and Team Player • “Your prime directive as a leader is to position your team for success.”