SlideShare a Scribd company logo
REAL TIME PROJECT:
Click Stream Data Analytics Report Project
ClickStream Data
ClickStream data could be generated from any activity performed by the user over a web application.
What could be the user activity over any website? For example, I am logging into Amazon, what are the
activities I could perform? In a pattern, I may navigate through some pages; spend some time over certain
pages and click on certain things. All these activities, including reaching that particular page or application,
clicking, navigating from one page to another and spending time make a set of data. All these will be logged
by a web application. This data is known as ClickStream Data. It has a high business value, specific to e-
commerce applications and for those who want to understand their users’ behavior.
More formally, ClickStream can be defined as data about the links that a user clicked, including the
point of time when each one of them were clicked. E-commerce businesses mine and analyse ClickStream
data on their own websites. Most of the E-commerce applications have their built-in system, which mines all
this information.
ClickStream Analytics
Using the ClickStream data adds a lot of value to businesses, through which they can bring many
customers or visitors. It helps them understand whether the application is right, and the application
experience of users is good or bad, based on the navigation patterns that people take. They can also predict
which page you are most likely to visit next and can-do Ad Targeting as well. With this, they can understand
the needs of users and come up with better recommendations. Several other things are possible using the
ClickStream Data.
Project Scope
In this project candidates are given with sample click stream data which is taken from a web
application in a text file along with problem statements.
➢ Users information in MySQL database.
➢ Click stream data in text file generated from Web application.
Each candidate has to come up with high level system architecture design based upon the Hadoop eco
systems covered during the course. Each candidate has to table the High-level system architecture along
with designed eco systems and pros and cons will be discussed with all the other candidates. Finally, will
choose the best possible optimal system design approach for implementation.
Candidates are given instructions to create an oozie work flow with the respective Hadoop Eco systems
finalized based on the discussion. Candidates has to submit the project for the given problem statement and
this will be validated by the trainer individually before course completion.
ECO System involved in click stream analytics Project
➢ HDFS
➢ Sqoop
➢ Pig
➢ Hive
➢ Oozie
Big Data Hadoop Course Content
Chapter 1: Introduction to Big Data-hadoop
➢ Overview of Hadoop Ecosystem
➢ Role of Hadoop in Big Data– Overview of other Big DataSystems
➢ Who is using Hadoop
➢ Hadoop integrations into Exiting Software Products
➢ Current Scenario in Hadoop Ecosystem
➢ Installation
➢ Configuration
➢ Use Cases of Hadoop (HealthCare, Retail,Telecom)
Chapter 2 : HDFS
➢ Concepts
➢ Architecture
➢ Data Flow (File Read , FileWrite)
➢ Fault Tolerance
➢ Shell Commands
➢ Data Flow Archives
➢ Coherency -Data Integrity
➢ Role of Secondary Name Node
Chapter 3 : Mapreduce
➢ Theory
➢ Data Flow (Map – Shuffle –Reduce)
➢ MapRed vs MapReduce APIs
➢ Programming [Mapper, Reducer, Combiner, Partitioner]
➢ Writables
➢ Input Format
➢ Output format
➢ Streaming API using python
➢ Inherent Failure Handling using Speculative Execution
➢ Magic of Shuffle Phase
➢ File Formats
➢ Sequence Files
Chapter 4: Hbase
➢ Introduction to NoSQL
➢ CAP Theorem
➢ Classification of NoSQL
➢ Hbase and RDBMS
➢ HBASE and HDFS
➢ Architecture (Read Path, Write Path, Compactions,Splits)
➢ Installation
➢ Configuration
➢ Role of Zookeeper
➢ HBase Shell Introduction to Filters
➢ Row Key Design -What’s New in HBase HandsOn
Chapter 5 : Hive
➢ Architecture
➢ Installation
➢ Configuration
➢ Hive vs RDBMS
➢ Tables
➢ DDL
➢ DML
➢ UDF
➢ Partitioning
➢ Bucketing
➢ Hive functions
➢ Date functions
➢ String functions
➢ Cast function Meta Store
➢ Joins
➢ Real-time HQL will be shared along with database migrationproject
Chapter 6 : pig
➢ Architecture
➢ Installation
➢ Hive vs Pig
➢ Pig Latin Syntax
➢ Data Types
➢ Functions (Eval, Load/Store, String, Date Time)
➢ Joins
➢ UDFs- Performance
➢ Troubleshooting
➢ Commonly Used Functions
Chapter 7 : sqoop
➢ Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All
tables, Export)
➢ Connectors to Existing DBs and DW
Practicals
➢ SQOOP to import Real Time Weblogs from application to DB and try to export the sameto
MySQL
Chapter 8 : kafka
➢ Kafka introduction
➢ Data streaming Introduction
➢ Producer-consumer-topics
➢ Brokers
➢ Partitions
➢ Unix Streaming via kafka
Practicals
Kafka
➢ Producer and Subscribers setup and publish a topic from Producer to subscriber
Chapter 9 : oozie
➢ Architecture
➢ Installation
➢ Workflow
➢ Coordinator
➢ Action (Map reduce, Hive, Pig,Sqoop)
➢ Introduction to Bundle
➢ Mail Notifications
Chapter 10: Hadoop 2.0 and spark
➢ Limitations in Hadoop
➢ –HDFS Federation
➢ High Availability in HDFS
➢ HDFS Snapshots
➢ Other Improvements inHDFS2
➢ Introduction to YARN akaMR2
➢ Limitations in MR1
➢ Architecture of YARN
➢ Map Reduce Job Flow inYARN
➢ Introduction to Stinger Initiative andTez
➢ Back Ward Compatibility for Hadoop1.X
➢ Spark Fundamentals
➢ RDD- Sample Scala Program- SparkStreaming
Practicals
➢ Difference between SPARK1.x and SPARK2.x
➢ PySpark program to create word count program in pyspark
Chapter 11: Big Data Use cases
➢ Hadoop
➢ HDFS architecture and usage
➢ MapReduce Architecture and real time exercises
➢ Hadoop Eco systems
➢ Sqoop - mysql Db Migration
➢ Hive. -- Deep drive
➢ Pig - weblog parsing and ETL
➢ Oozie - Workflow scheduling
➢ Flume - weblogs ingestion
➢ No SQL
➢ HBase
➢ Apache Kafka
➢ Pentaho ETL tool integration & working with Hadoop eco system
➢ Apache SPARK
➢ Introduction and working with RDD.
➢ Multi node Setup Guidance
➢ Hadoop latest version Pros & cons discussion
➢ Ends with Introduction of Data science.
Chapter 12: Real Time Project
➢ Getting applications web logs
➢ Getting user information from my sql via sqoop
➢ Getting extracted data from Pig script
➢ Creating Hive SQL Table for querying
➢ Creating Reports from Hive QL
Ad

More Related Content

What's hot (20)

Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Sharing bisnis big data v3 part1
Sharing  bisnis big data v3 part1Sharing  bisnis big data v3 part1
Sharing bisnis big data v3 part1
Dwika Sudrajat
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
dzhou
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
Andrew Brust
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar
 
Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentation
Gennady Baranov
 
MongoDB
MongoDBMongoDB
MongoDB
Tharun Srinivasa
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
SahilRaina21
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
RojaT4
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
RojaT4
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
Edureka!
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
Matt Sarrel
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
datastack
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Sharing bisnis big data v3 part1
Sharing  bisnis big data v3 part1Sharing  bisnis big data v3 part1
Sharing bisnis big data v3 part1
Dwika Sudrajat
 
Facebook Hadoop Data & Applications
Facebook Hadoop Data & ApplicationsFacebook Hadoop Data & Applications
Facebook Hadoop Data & Applications
dzhou
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
Andrew Brust
 
Big data & hadoop framework
Big data & hadoop frameworkBig data & hadoop framework
Big data & hadoop framework
Tu Pham
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Distributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentationDistributed Crawler Service architecture presentation
Distributed Crawler Service architecture presentation
Gennady Baranov
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
SahilRaina21
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Big Data Unit 4 - Hadoop
Big Data Unit 4 - HadoopBig Data Unit 4 - Hadoop
Big Data Unit 4 - Hadoop
RojaT4
 
Big data technology unit 3
Big data technology unit 3Big data technology unit 3
Big data technology unit 3
RojaT4
 
Hadoop Administration pdf
Hadoop Administration pdfHadoop Administration pdf
Hadoop Administration pdf
Edureka!
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
Matt Sarrel
 

Similar to Big data-hadoop-training-course-content-content (20)

Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
Khanderao Kand
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
Big Data & Hadoop
Big Data & HadoopBig Data & Hadoop
Big Data & Hadoop
Krishna Sujeer
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
Big Data
Big DataBig Data
Big Data
Kirubaburi R
 
Final White Paper_
Final White Paper_Final White Paper_
Final White Paper_
Ryan Ellingson
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
Cascading
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
DataWorks Summit/Hadoop Summit
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
Josh Patterson
 
Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Data
inside-BigData.com
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
Bhadra Gowdra
 
Hadoop
HadoopHadoop
Hadoop
Veera Sundari
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Slim Baltagi
 
Hands-on with Apache Druid: Installation & Data Ingestion Steps
Hands-on with Apache Druid: Installation & Data Ingestion StepsHands-on with Apache Druid: Installation & Data Ingestion Steps
Hands-on with Apache Druid: Installation & Data Ingestion Steps
servicesNitor
 
Big Data and Hadoop Training in Bangalore by myTectra
Big Data and Hadoop Training in Bangalore by myTectraBig Data and Hadoop Training in Bangalore by myTectra
Big Data and Hadoop Training in Bangalore by myTectra
myTectra Learning Solutions Private Ltd
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
Keiichiro Ono
 
Data scientist a perfect job
Data scientist a perfect jobData scientist a perfect job
Data scientist a perfect job
Sidharth Raj Agarwal
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
Hadoop content
Hadoop contentHadoop content
Hadoop content
Hadoop online training
 
Big dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosqlBig dataarchitecturesandecosystem+nosql
Big dataarchitecturesandecosystem+nosql
Khanderao Kand
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
Hadoop Master Class : A concise overview
Hadoop Master Class : A concise overviewHadoop Master Class : A concise overview
Hadoop Master Class : A concise overview
Abhishek Roy
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
Cascading
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
Josh Patterson
 
Infochimps: Cloud for Big Data
Infochimps: Cloud for Big DataInfochimps: Cloud for Big Data
Infochimps: Cloud for Big Data
inside-BigData.com
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
Bhadra Gowdra
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Slim Baltagi
 
Hands-on with Apache Druid: Installation & Data Ingestion Steps
Hands-on with Apache Druid: Installation & Data Ingestion StepsHands-on with Apache Druid: Installation & Data Ingestion Steps
Hands-on with Apache Druid: Installation & Data Ingestion Steps
servicesNitor
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
Keiichiro Ono
 
Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
ScyllaDB
 
Ad

More from Training Institute (10)

tell us which cloud you prefer
tell us which cloud you prefertell us which cloud you prefer
tell us which cloud you prefer
Training Institute
 
Testing
TestingTesting
Testing
Training Institute
 
Ui path training-course-content
Ui path training-course-contentUi path training-course-content
Ui path training-course-content
Training Institute
 
Selenium training-course-content-syllabus-credo systemz
Selenium training-course-content-syllabus-credo systemzSelenium training-course-content-syllabus-credo systemz
Selenium training-course-content-syllabus-credo systemz
Training Institute
 
Python training-course-content
Python training-course-contentPython training-course-content
Python training-course-content
Training Institute
 
Aws training-course-content
Aws training-course-contentAws training-course-content
Aws training-course-content
Training Institute
 
Angular training-course-syllabus
Angular training-course-syllabus Angular training-course-syllabus
Angular training-course-syllabus
Training Institute
 
Mean stack training-course-content
Mean stack training-course-contentMean stack training-course-content
Mean stack training-course-content
Training Institute
 
Angular training-course-syllabus
Angular training-course-syllabusAngular training-course-syllabus
Angular training-course-syllabus
Training Institute
 
Angular webinar - Credo Systemz
Angular webinar - Credo SystemzAngular webinar - Credo Systemz
Angular webinar - Credo Systemz
Training Institute
 
tell us which cloud you prefer
tell us which cloud you prefertell us which cloud you prefer
tell us which cloud you prefer
Training Institute
 
Ui path training-course-content
Ui path training-course-contentUi path training-course-content
Ui path training-course-content
Training Institute
 
Selenium training-course-content-syllabus-credo systemz
Selenium training-course-content-syllabus-credo systemzSelenium training-course-content-syllabus-credo systemz
Selenium training-course-content-syllabus-credo systemz
Training Institute
 
Python training-course-content
Python training-course-contentPython training-course-content
Python training-course-content
Training Institute
 
Angular training-course-syllabus
Angular training-course-syllabus Angular training-course-syllabus
Angular training-course-syllabus
Training Institute
 
Mean stack training-course-content
Mean stack training-course-contentMean stack training-course-content
Mean stack training-course-content
Training Institute
 
Angular training-course-syllabus
Angular training-course-syllabusAngular training-course-syllabus
Angular training-course-syllabus
Training Institute
 
Angular webinar - Credo Systemz
Angular webinar - Credo SystemzAngular webinar - Credo Systemz
Angular webinar - Credo Systemz
Training Institute
 
Ad

Recently uploaded (20)

Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
Open Access: Revamping Library Learning Resources.
Open Access: Revamping Library Learning Resources.Open Access: Revamping Library Learning Resources.
Open Access: Revamping Library Learning Resources.
Rishi Bankim Chandra Evening College, Naihati, North 24 Parganas, West Bengal, India
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
Fundamentals of PR: Wk 4 - Strategic Communications
Fundamentals of PR: Wk 4 - Strategic CommunicationsFundamentals of PR: Wk 4 - Strategic Communications
Fundamentals of PR: Wk 4 - Strategic Communications
Jordan Williams
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-26-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-26-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-26-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-26-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 
Metamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative JourneyMetamorphosis: Life's Transformative Journey
Metamorphosis: Life's Transformative Journey
Arshad Shaikh
 
Introduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe EngineeringIntroduction to Vibe Coding and Vibe Engineering
Introduction to Vibe Coding and Vibe Engineering
Damian T. Gordon
 
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam SuccessUltimate VMware 2V0-11.25 Exam Dumps for Exam Success
Ultimate VMware 2V0-11.25 Exam Dumps for Exam Success
Mark Soia
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Marie Boran Special Collections Librarian Hardiman Library, University of Gal...
Library Association of Ireland
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
Social Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy StudentsSocial Problem-Unemployment .pptx notes for Physiotherapy Students
Social Problem-Unemployment .pptx notes for Physiotherapy Students
DrNidhiAgarwal
 
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdfExploring-Substances-Acidic-Basic-and-Neutral.pdf
Exploring-Substances-Acidic-Basic-and-Neutral.pdf
Sandeep Swamy
 
Fundamentals of PR: Wk 4 - Strategic Communications
Fundamentals of PR: Wk 4 - Strategic CommunicationsFundamentals of PR: Wk 4 - Strategic Communications
Fundamentals of PR: Wk 4 - Strategic Communications
Jordan Williams
 
GDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptxGDGLSPGCOER - Git and GitHub Workshop.pptx
GDGLSPGCOER - Git and GitHub Workshop.pptx
azeenhodekar
 
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
World war-1(Causes & impacts at a glance) PPT by Simanchala Sarab(BABed,sem-4...
larencebapu132
 
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Phoenix – A Collaborative Renewal of Children’s and Young People’s Services C...
Library Association of Ireland
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
Anti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptxAnti-Depressants pharmacology 1slide.pptx
Anti-Depressants pharmacology 1slide.pptx
Mayuri Chavan
 
Quality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdfQuality Contril Analysis of Containers.pdf
Quality Contril Analysis of Containers.pdf
Dr. Bindiya Chauhan
 
Understanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s GuideUnderstanding P–N Junction Semiconductors: A Beginner’s Guide
Understanding P–N Junction Semiconductors: A Beginner’s Guide
GS Virdi
 

Big data-hadoop-training-course-content-content

  • 1. REAL TIME PROJECT: Click Stream Data Analytics Report Project ClickStream Data ClickStream data could be generated from any activity performed by the user over a web application. What could be the user activity over any website? For example, I am logging into Amazon, what are the activities I could perform? In a pattern, I may navigate through some pages; spend some time over certain pages and click on certain things. All these activities, including reaching that particular page or application, clicking, navigating from one page to another and spending time make a set of data. All these will be logged by a web application. This data is known as ClickStream Data. It has a high business value, specific to e- commerce applications and for those who want to understand their users’ behavior. More formally, ClickStream can be defined as data about the links that a user clicked, including the point of time when each one of them were clicked. E-commerce businesses mine and analyse ClickStream data on their own websites. Most of the E-commerce applications have their built-in system, which mines all this information. ClickStream Analytics Using the ClickStream data adds a lot of value to businesses, through which they can bring many customers or visitors. It helps them understand whether the application is right, and the application experience of users is good or bad, based on the navigation patterns that people take. They can also predict which page you are most likely to visit next and can-do Ad Targeting as well. With this, they can understand the needs of users and come up with better recommendations. Several other things are possible using the ClickStream Data. Project Scope In this project candidates are given with sample click stream data which is taken from a web application in a text file along with problem statements. ➢ Users information in MySQL database. ➢ Click stream data in text file generated from Web application.
  • 2. Each candidate has to come up with high level system architecture design based upon the Hadoop eco systems covered during the course. Each candidate has to table the High-level system architecture along with designed eco systems and pros and cons will be discussed with all the other candidates. Finally, will choose the best possible optimal system design approach for implementation. Candidates are given instructions to create an oozie work flow with the respective Hadoop Eco systems finalized based on the discussion. Candidates has to submit the project for the given problem statement and this will be validated by the trainer individually before course completion. ECO System involved in click stream analytics Project ➢ HDFS ➢ Sqoop ➢ Pig ➢ Hive ➢ Oozie
  • 3. Big Data Hadoop Course Content Chapter 1: Introduction to Big Data-hadoop ➢ Overview of Hadoop Ecosystem ➢ Role of Hadoop in Big Data– Overview of other Big DataSystems ➢ Who is using Hadoop ➢ Hadoop integrations into Exiting Software Products ➢ Current Scenario in Hadoop Ecosystem ➢ Installation ➢ Configuration ➢ Use Cases of Hadoop (HealthCare, Retail,Telecom) Chapter 2 : HDFS ➢ Concepts ➢ Architecture ➢ Data Flow (File Read , FileWrite) ➢ Fault Tolerance ➢ Shell Commands ➢ Data Flow Archives ➢ Coherency -Data Integrity ➢ Role of Secondary Name Node Chapter 3 : Mapreduce ➢ Theory ➢ Data Flow (Map – Shuffle –Reduce) ➢ MapRed vs MapReduce APIs ➢ Programming [Mapper, Reducer, Combiner, Partitioner] ➢ Writables ➢ Input Format ➢ Output format ➢ Streaming API using python ➢ Inherent Failure Handling using Speculative Execution ➢ Magic of Shuffle Phase ➢ File Formats
  • 4. ➢ Sequence Files Chapter 4: Hbase ➢ Introduction to NoSQL ➢ CAP Theorem ➢ Classification of NoSQL ➢ Hbase and RDBMS ➢ HBASE and HDFS ➢ Architecture (Read Path, Write Path, Compactions,Splits) ➢ Installation ➢ Configuration ➢ Role of Zookeeper ➢ HBase Shell Introduction to Filters ➢ Row Key Design -What’s New in HBase HandsOn Chapter 5 : Hive ➢ Architecture ➢ Installation ➢ Configuration ➢ Hive vs RDBMS ➢ Tables ➢ DDL ➢ DML ➢ UDF ➢ Partitioning ➢ Bucketing ➢ Hive functions ➢ Date functions ➢ String functions ➢ Cast function Meta Store ➢ Joins ➢ Real-time HQL will be shared along with database migrationproject Chapter 6 : pig ➢ Architecture ➢ Installation ➢ Hive vs Pig ➢ Pig Latin Syntax ➢ Data Types ➢ Functions (Eval, Load/Store, String, Date Time) ➢ Joins ➢ UDFs- Performance ➢ Troubleshooting
  • 5. ➢ Commonly Used Functions Chapter 7 : sqoop ➢ Architecture , Installation, Commands(Import , Hive-Import, EVal, Hbase Import, Import All tables, Export) ➢ Connectors to Existing DBs and DW Practicals ➢ SQOOP to import Real Time Weblogs from application to DB and try to export the sameto MySQL Chapter 8 : kafka ➢ Kafka introduction ➢ Data streaming Introduction ➢ Producer-consumer-topics ➢ Brokers ➢ Partitions ➢ Unix Streaming via kafka Practicals Kafka ➢ Producer and Subscribers setup and publish a topic from Producer to subscriber Chapter 9 : oozie ➢ Architecture ➢ Installation ➢ Workflow ➢ Coordinator ➢ Action (Map reduce, Hive, Pig,Sqoop) ➢ Introduction to Bundle ➢ Mail Notifications Chapter 10: Hadoop 2.0 and spark ➢ Limitations in Hadoop ➢ –HDFS Federation ➢ High Availability in HDFS ➢ HDFS Snapshots ➢ Other Improvements inHDFS2 ➢ Introduction to YARN akaMR2 ➢ Limitations in MR1 ➢ Architecture of YARN
  • 6. ➢ Map Reduce Job Flow inYARN ➢ Introduction to Stinger Initiative andTez ➢ Back Ward Compatibility for Hadoop1.X ➢ Spark Fundamentals ➢ RDD- Sample Scala Program- SparkStreaming Practicals ➢ Difference between SPARK1.x and SPARK2.x ➢ PySpark program to create word count program in pyspark Chapter 11: Big Data Use cases ➢ Hadoop ➢ HDFS architecture and usage ➢ MapReduce Architecture and real time exercises ➢ Hadoop Eco systems ➢ Sqoop - mysql Db Migration ➢ Hive. -- Deep drive ➢ Pig - weblog parsing and ETL ➢ Oozie - Workflow scheduling ➢ Flume - weblogs ingestion ➢ No SQL ➢ HBase ➢ Apache Kafka ➢ Pentaho ETL tool integration & working with Hadoop eco system ➢ Apache SPARK ➢ Introduction and working with RDD. ➢ Multi node Setup Guidance ➢ Hadoop latest version Pros & cons discussion ➢ Ends with Introduction of Data science. Chapter 12: Real Time Project ➢ Getting applications web logs ➢ Getting user information from my sql via sqoop ➢ Getting extracted data from Pig script ➢ Creating Hive SQL Table for querying ➢ Creating Reports from Hive QL