Introduction To Big Data Analytics
Introduction To Big Data Analytics
Analytics
Pankaj Sahay
Course Structure
• 1 ½ cerdits – 7-8 lec x 2 hrs each
• Assignment / Test – Internal
• Presentations by Students
• Final Exam
Course Contents
• Motivation
• Introduction to Big Data
• Mining Big Data & the Platform
• Toolkits used for Big Data Analytics
• Overview of Hadoop
• Overview of NoSql Technologies
• Review of Key ML concepts
• Enterprise Data Science
• Some Applications of Big Data Analytics – Introduction
Ref: Practical Big Data Analytics – Nataraj Dasgupta – 2018 Packt Publishing
Understanding How Data Powers Big Business – Bill Schmarzo, 2103, Wiley
Motivation
Brief History and Timelines:
• 1980 s – POS scanner Data –
Changed the balance of power between CPG Manufacturers & Retailers
e.g. P&G, Unilever, Frito Lay, Kraft v/s Walmart, Tesco
• Detailed Data – Product Sales, Customer Loyalty Data
Retailers got new insights about Product Sales, Customer Buying
Patterns, market Trends not available earlier
• Predictive Analytics
Methods to obtain insights and address business problems / actions
• Selection of the appropriate h/w & s/w stack – e.g. different for
streaming data, or internal data
Steps for Building the Case
• WHO
• WHAT
• BUY IN (STAKEHOLDERS)
• EARLY WINS, EFFORT to REWARD Ratio
• LEVERAGE Early Wins
Who needs Big Data Mining
• Business groups – most significant impact from solution
• Any groups already working with large datasets, important to
business, direct impact on revenue
• Optimise their processes – impact on daily work processes, impact on
final outcome
Determine the Use Cases
• Units identified in the previous step
• Do they already have a platform – then prioritise among the various
use cases – requires familiarity with the work being done in BU
• Hierarchical structure – Management with oversight of Unit, Staff
who are hands on with the analysis – both must collaborate
• Management – business requirement, which use case will give the
most benefit
• Staff / Practitioners – Challenges at the operational level
• Consolidate both operational & Managerial aspects – what is the
optimal outcome
Stakeholders’ Buy-in
• Decision makers, Budget owners
• Prior to starting work, establish their consensus
• Multiple buy-ins for redundancy pool of support from primary and
secondary sources for funding support & extending early wins into a
larger project
• Baseline – Value from a certain Use Case – leverage on success
Early Wins & Effort to Reward Ratio
• After identification of Appropriate Use Cases
• Which has good effort to reward ration
• Small use case – short time to implement – small budget – specific business
critical function – Early WIN – increase credibility of solution
• Say E/R Ration = ( Time + Cost + No of Resources + Criticality of use case ) /
Business Value
• Effort – time & work reqd to implement use case, procurement, man hrs,
etc
• Barrier to entry – open source tool – less barrier to entry v/s proprietary –
procurement, risk analysis, approval
• Multiple Units – resources already engaged in other projects
Leverage Early Wins
• Paves way for bigger strategy & implementation across
• First crucial step in showing the value to the stakeholders & decision
makers
• Gets past sceptics or those who are not aware
Implementation Life Cycle
• Multiple Steps
• Trial & Error
• Perseverance
• Multiple Stakeholders – Collaborative Effort – best results
Stakeholders of the Solution
• Depends on the Use Case & Domain
• Business Sponsor – Individual / BU gives support & funding for Project
Most likely the beneficiary of the solution or max impact on Unit
• Implementation group – who / team- will implement hands – on
Most often the IT or Analytics Unit
• IT Procurement – Vetting – technology, cost, organisational relevance &
viability, compliance with internal / external policies, other aspects –
licensing, costs, upgrades, etc
• Legal – Terms & Conditions – permissions of use, restrictions
Proprietary – requires more vendor specific agreements & time for approval
Implementing the Solution
• Result of Collaboration & culmination of all activity
• Small size project – 3-6 months to implement
• Big project – months to years – add capabilities incrementally during
implementation & deployment
Technical Elements of the Big Data Platform
• Selection of the h/w stack
• Selection of the s/w & BI platform
• On premises
• Cloud based
Selection of the h/w stack
• Depends on type of solution chosen
• Location of h/w
• Type of data – un/structured, semi-structured
• Size of data – GB, Tera, Peta
• Update frequency of data
Models of h/w architecture
Multinode Architecture