0% found this document useful (0 votes)
23 views15 pages

PPT 1.1.3

The document outlines the syllabus for a Big Data Analytics course at Chandigarh University, covering key topics such as the definition and characteristics of Big Data, its architecture, and the Hadoop ecosystem. It emphasizes the importance of data veracity and value, detailing challenges and methods to ensure data quality and derive actionable insights. Additionally, it provides reference materials and web sources for further learning.

Uploaded by

snehamallick552
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views15 pages

PPT 1.1.3

The document outlines the syllabus for a Big Data Analytics course at Chandigarh University, covering key topics such as the definition and characteristics of Big Data, its architecture, and the Hadoop ecosystem. It emphasizes the importance of data veracity and value, detailing challenges and methods to ensure data quality and derive actionable insights. Additionally, it provides reference materials and web sources for further learning.

Uploaded by

snehamallick552
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Computer Science & Engineering


CHANDIGARH UNIVERSITY, MOHALI

BIG Data Analytics


21CSH-471

BY : Urvashi

Assistant Professor (Chandigarh


University)
Contents to be covered in UNIT 1
UNIT-1 Overview of Computing Paradigm Contact Hours:15

Chapter-1 Introduction to Big Data – Definition and Characteristics; The 5 V’s of Big Data – Volume: Data at scale,
Understandin Velocity: Real-time data processing, Variety: Structured, semi-structured, unstructured data, Veracity:
g Big Data Uncertainty and trustworthiness in data, Value: Transforming data into insights; Challenges and
and the 5 V’s Opportunities in Big Data; Big Data Use Cases in Real-World Applications

Chapter – 2 Fundamentals of Big Data Architecture: Data ingestion, storage, processing and visualization layers
Big Data Streaming Data in Big Data: Tools such as Spark, Apache Kafka and Flink
Architecture Real-World Big Data Architecture: Lambda and Kappa Architectures, Hybrid Architecture for batch and
real-time processing

Chapter – 3 Introduction to the Hadoop Ecosystem; HDFS (Hadoop Distributed File System): Architecture and
The Hadoop Functionality; MapReduce Programming Model: Workflow and Applications; YARN (Yet Another
Ecosystem Resource Negotiator): Resource Management; Tools in the Ecosystem: Pig, HBase, Flume, and Oozie;
Data Processing with Hadoop: ETL, Analytics and Reporting.
Course Outcomes

CO1 Understand the Fundamentals of Big Data.


CO2 Master Big Data Architecture and Tools
CO3 Explore the Hadoop Ecosystem and Data Processing Models
CO4 Develop Data Science Skills and Tools
CO5 Implement Real-Time Data Analytics and Visualization

3
4. Veracity in Big Data

Definition:
Veracity in big data pertains to the uncertainty, inconsistency, and
inaccuracies inherent in data sources. It reflects the degree of
confidence or trust that users can place in the data they work with.
High-veracity data is clean, consistent, and reliable, while low-veracity
data may suffer from errors, biases, or ambiguities.
Sources of Veracity
Several factors contribute to challenges in maintaining data veracity, including:
• Data Inconsistencies: Inconsistent formats, duplicate records, or outdated
information.
• Data Noise: Inclusion of irrelevant or meaningless data that hinders analysis.
• Human Errors: Mistakes during data entry or manipulation.
• Biases: Systemic errors introduced by data collection processes or inherent
biases in algorithms.
• Unverified Sources: Use of unreliable data sources or lack of validation
mechanisms.
Importance of Veracity
Ensuring veracity is crucial as decisions based on low-quality data can
lead to:
 Faulty analytics and inaccurate insights.
 Reduced trust in big data systems and stakeholders.
 Financial losses and reputational damage.
 Ethical and compliance risks.
Methods to Address Veracity Challenges
• Data Cleaning and Preprocessing
o Removing duplicates and inconsistencies.
o Standardizing formats and correcting errors.
• Data Validation
o Using automated tools to cross-check data accuracy.
o Validating data from third-party sources before integration.
• Metadata Management
o Maintaining detailed metadata to provide context, origin, and changes to
the data.
Methods to Address Veracity Challenges
• Advanced Analytics
o Leveraging machine learning models to detect anomalies and
inconsistencies in datasets.
• Transparent Governance
o Implementing policies for data collection, handling, and quality assurance.
o Periodically auditing data for reliability.
• Crowdsourcing Veracity
o Engaging users to identify and correct errors in large datasets, especially for
open data projects.
5. Value in Big Data
Definition:
Big Data has emerged as a cornerstone of modern
decision-making and innovation across industries. By
analyzing massive datasets, organizations can uncover
insights that drive growth, improve efficiency, and
enhance customer experiences. This report outlines the
various dimensions of value derived from Big Data, its
applications, and key considerations for its effective use.
Value Creation from Big Data
The core value of Big Data lies in its potential to transform raw information into actionable insights.
This value can be categorized into several areas:
a. Operational Efficiency
 Streamlining processes and reducing waste.
 Predictive maintenance in manufacturing using sensor data.

b. Enhanced Customer Insights


 Personalizing marketing campaigns.
 Predicting customer behavior and improving retention, (e.g., Netflix, Amazon).

c. Competitive Advantage
 Identifying emerging trends ahead of competitors.
 Innovating new products and services based on data insights.
Challenges in Extracting Value
While Big Data holds immense promise, several challenges must be addressed to
maximize its value:
 Data Quality: Ensuring data accuracy, completeness, and consistency.
 Data Privacy and Security: Protecting sensitive information against breaches.
 Skill Gaps: Developing expertise in data analytics and related technologies.
 Infrastructure Costs: Investing in storage, processing, and tools for Big Data.
 Ethical Concerns: Preventing biases and ensuring fairness in data usage.
Technologies Driving Big Data Value
The following technologies facilitate the collection, storage, analysis, and
visualization of Big Data:
 Artificial Intelligence (AI) and Machine Learning (ML): Automating data analysis
to uncover patterns and predictions.
 Cloud Computing: Scalable storage and processing capabilities.
 Internet of Things (IoT): Generating real-time data from connected devices.
 Blockchain: Ensuring data integrity and traceability.
 Data Visualization Tools: Simplifying complex datasets into actionable insights.
Reference Books
TEXT BOOKS

1. Mohammed Guller, Big Data Analytics with Spark, Apress,2015


2. Tom Mitchell, “Machine Learning”, McGraw Hill, 3rdEdition,1997
3. Michael Minelli, Michehe Chambers, “Big Data, Big Analytics: Emerging Business
Intelligence and Analytic Trends for Today’s Business”, 1stEdition, Ambiga Dhiraj, Wiely
CIO Series, 2013.
4. Arvind Sathi, “Big Data Analytics: Disruptive Technologies for Changing the Game”,1st
Edition, IBM Corporation, 2012.

REFERENCE BOOKS
5. Chris Eaton, Dirk deroos et al., “Understanding Big data”, McGraw Hill, 2012.
6. Vignesh Prajapati, “Big Data Analytics with R and Hadoop”, Packet Publishing 2013.
7. JyLiebowitz, “Big Data and Business Analytics”, CRC press, 2013.
For more insight
Web sources 
1. https://www.alliant.edu/blog/4-top-
online-resources-data-analytics?
utm_source=chatgpt.com
2. https://www.alliant.edu/blog/4-top-
online-resources-data-analytics?
utm_source=chatgpt.com
3. https://www.coursera.org/articles/
big-data-technologies?
utm_source=chatgpt.com
4. https://careerfoundry.com/en/ Big Data Big Big Data and
Analytics Analytics
blog/data-analytics/where-to-find- Wiley
free-datasets/?
utm_source=chatgpt.com
THANK YOU

For queries
Email: [email protected]

You might also like