0% found this document useful (0 votes)

2 views30 pages

BDA U-3

The document outlines a course on Big Data Analytics, detailing its outcomes, assessment structure, and syllabus. Key topics include data management, big data tools, analytics, machine learning algorithms, and data visualization techniques. The course aims to equip students with skills in various big data technologies and their applications in business contexts.

Uploaded by

www.nagrajgormude55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views30 pages

BDA U-3

Uploaded by

www.nagrajgormude55

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Big Data Analytics

(Course Code: AI732PE)

Dr. A.PramodKumar
Associate Professor,
Dept. of AIML.,
CMR Engineering College (Autonomous),
Email: [email protected]
Contact:9000159660
COURSE OUTCOMES
Upon successful completion of this course, student will be able to:

❑ CO1: Outline the basic big data concept

❑ CO2: Simulate and apply various big data technologies like Hadoop, Map
Reduce, Spark, Impala, Pig and Hive.
❑ CO3: Categorize and summarize the Big Data and its importance in Business
domains.
❑ CO4: Differentiate various learning approaches in machine learning to process
data, and to interpret the concepts of ML algorithms and test cases
❑ CO5: Develop the numerous features of data for visualization in associate with
Tableau, Qlick View and D3.
Course Assessment

Course Title INTERNET OF THINGS Course Type Integrated

IV Year
Course Code: AI732PE Credits 3 Class
I Semester
Contact Total Number of
TLP Credits Work Load
Hours Classes Assessment in
Weightage
Theory 3 4 4 Per Semester

Course Practice 0 0 0
Structure Theory Practical CIE SEE
Tutorial - - -

Total 3 4 4 42 0 30% 70%

Course Lead:
Theory Practice

Course
Instructors 1. Dr. A. Pramod Kumar
A. Dr. A. Pramod Kumar
COURSE SYLLABUS
UNIT I: Data Management Maintain Healthy, Safe & Secure Working Environment:
Data Management: Design Data Architecture and manage the data for analysis, understand various
sources of Data like Sensors/signal/GPS etc. Data Management, Data Quality (noise, outliers, missing
values, duplicate data) and Data Preprocessing. Export all the data onto Cloud ex. AWS/Rackspace etc.
Maintain Healthy, Safe & Secure Working Environment : Introduction, workplace safety, Report
Accidents & Emergencies, Protect health & safety as your work, course conclusion, and assessment.

UNIT- II: Big Data Tools & Provide Data/Information in Standard Formats :
Big Data Tools: Introduction to Big Data tools like Hadoop, Spark, Impala etc., Data ETL process,
Identify gaps in the data and follow-up for decision making.
Provide Data/Information in Standard Formats: Introduction, Knowledge Management, and
Standardized reporting & compliances, Decision Models, course conclusion. Assessment

UNIT- III: Big Data Analytics :

Big Data Analytics: Run descriptives to understand the nature of the available data, collate all the
data sources to suffice business requirement, Run descriptive statistics for all the variables and
observer the data ranges, Outlier detection and elimination.

UNIT- IV: Machine Learning Algorithms:

Machine Learning Algorithms: Hypothesis testing and determining the multiple analytical
methodologies, Train Model on 2/3 sample data using various Statistical/Machine learning
algorithms, Test model on 1/3 sample for prediction etc.
Unit V: Data Visualization:
Data Visualization: Prepare the data for Visualization, Use tools like Tableau, Qlick View and D3, Draw
insights out of Visualization tool. Product Implementation.
TEXT BOOKS
❑ Michael Minelli, Michelle Chambers and AmbigaDhiraj, “Big Data, Big Analytics: Emerging
Business Intelligence and Analytic Trends for Today's Businesses”, Wiley, 2013.
❑ ArvindSathi, “Big Data Analytics: Disruptive Technologies for Changing the Game”, 1st Edition,
IBM Corporation, 2012.
❑ Davy Cielen, Arno D. B. Meysman, and Mohamed Ali ,“Introducing Data Science - Big data,
machine learning, and more, using Python tools” , Dreamtech Press 2016
❑ Data Science & Big Data Analytics Discovering, Analyzing, Visualizing and Presenting Data EMC
Education Services, Wiley Publishers, 2015.
REFERENCE BOOKS:
❑ Introduction to Data Mining, Tan, Steinbach and Kumar, Addison Wesley, 2006
❑ Cay Horstmann, Wiley John Wiley & Sons, “Big Java”, 4th Edition, INC
❑ Data Mining Analysis and Concepts, M. Zaki and W. Meira, New Edition 2014.Camebridge
University press. NPTEL/SWAYAM/MOOCS:
❑ https://onlinecourses.nptel.ac.in/noc23_cs112/preview
E books
❑ https://bmsce.ac.in/Content/IS/Big_Data_Analytics_-_Unit_1.pdf
❑ https://mrcet.com/downloads/digital_notes/IT/(R17A0528)%20BIG%20DATA%20A
NALYTICS.pdf
3
E-RESOURCES:
1. http://freevideolectures.com/Course/3613/Big-Data-and-Hadoop/18
2. http://www.comp.nus.edu.sg/~ooibc/mapreduce-survey.pdf
UNIT-III

Big Data Analytics

5
Run descriptive to understand the nature of the available data
Nowadays, Big Data and Data Science have become high volume keywords. They tend to
become extensively researched and this makes this data to be processed and studied with
scrutiny. One of the techniques to analyse this data is Descriptive Analysis.
Big data analytics is the process of examining large data sets containing a variety of data
types -- i.e., big data -- to uncover hidden patterns, unknown correlations, market trends,
customer preferences and other useful business information.
The analytical findings can lead to more effective marketing, new revenue opportunities,
better customer service, improved operational efficiency, competitive advantages over
rival organizations and other business benefits.
The primary goal of big data analytics is to help companies make more informed business
decisions by enabling data scientists, predictive modelers and other analytics professionals
to analyze large volumes of transaction data, as well as other forms of data that may be
untapped by conventional business intelligence(BI) programs.
Relational and transactional databases based on SQL language have clearly dominated the
market of data storage and data manipulation over the past 20 years.
From relational databases to Big Data, In particular they had to face five major weaknesses
of relational databases: the scaling of treatment the scaling of data, the redundancy the
velocity the variety and complexity.
Descriptive analytics is a branch of data analytics that involves summarizing and
interpreting historical data to understand patterns
As a relational database, it provides a set of functionalities to access data across several
entities (tables) by complex queries. It provides also integrity referential to insure the
constant validity of the links between entities.
Such mechanisms are extremely costly and complex to implement in distributed
architecture, considering that it is necessary to insure that all data that are linked together
have to be hosted on the same node. Moreover, it implies the definition a static data-model
or schema, not applicable to the velocity of web data.

As a transactional database, they must respect the ACID constraints, i.e. the Atomicity of
updates, the Consistency of the database, the Isolation and the Durability of queries.
These constraints are perfectly applicable in a centralized architecture, but much more
complex to insure in a distributed architecture. No SQL and Big Data
Coherence: All the nodes of the system have to see
exactly the same data at the same time
Availability: The system must stay up and running
even if one of its node is failing down
Partition Tolerance: each subnet-works must be
autonomous
we are describing our data with the help of
various representative methods using
charts, graphs, tables, excel files, etc
Most of the time it is performed on small data sets and this analysis helps us a lot
to predict some future trends based on the current findings. Some measures that
are used to describe a data set are measures of central tendency and measures of
variability or dispersion.
Types of Descriptive Statistics
1.Measures of central tendency 2.measure of variability 3. measure of frequency
distribution

Measures of Central Tendency

It represents the whole set of data by a single value. It gives us the location of the
central points. There are three main measures of central tendency
Mean It is the sum of observations divided by the total number of observations. It
is also defined as average which is the sum divided by count.

where, x = Observations n = number of terms

Mode
It is the value that has the highest frequency in the given data set. The data set
may have no mode if the frequency of all data points is the same. we can have
more than one mode if we encounter two or more data points having the same
frequency.
Median
It is the middle value of the data set. It splits the data into two halves. If the
number of elements in the data set is odd then the center element is the median
and if it is even then the median would be the average of two central elements.
Measure of Variability
Measures of variability are also termed measures of dispersion as it helps to gain
insights about the dispersion or the spread of the observations at hand
Range
The range describes the difference between the largest and smallest data point in
our data set. The bigger the range, the more the spread of data and vice versa.
Range = Largest data value – smallest data value
Variance
It is defined as an average squared deviation from the mean. It is calculated by
finding the difference between every data point and the average which is also
known as the mean, squaring them, adding all of them, and then dividing by the
number of data points present in our data set.

where, x -> Observation under consideration N -> number of terms mu -> Mean
Standard Deviation
It is defined as the square root of the variance. It is calculated by finding the
Mean, then subtracting each number from the Mean which is also known as the
average, and squaring the result.
where,
x = Observation under consideration N = number of terms mu = Mean
Measures of Frequency Distribution
Measures of frequency distribution help us gain valuable insights into the
distribution and the characteristics of the dataset. Count frequency, relative
frequency, cumulative frequency

collate all the data sources to suffice business requirement

Key-value store Concept
This technology can address a large volume of data due to the simplicity of its
data model. Each object is identified by a unique key and the access to this data is
only possible through this key.
The structure of the object is free. This model only provides the four basic
operations to Create, Read, Update and Delete an object from its key
these databases are providing in façade a HTTP REST API so that they can
interoperate with any language.
This simple approach has the benefit to provide exceptional performance in read
and write access, and a large scalability of data.
it provides only limited querying facilities, considering that data can only be
retrieved from their key, and not their content.

Columns based databases Concept

Columns based databases are storing data in grids, in which the column is the
basic entity that represents a data field.
Columns can be grouped together through the concept of columns NoSQL and
Big Data families. Rows of the grids are assimilated to records and identified by a
unique Key such as in the Key-value model
some providers are also including in their model the concept of version as a third
dimension of the grid.
The organization of the database in grids can appear similar to the tables of
relational databases.
While the columns of a relational table are static and present for each record, this
is not the case in Columns Oriented Database so that it is possible to dynamically
add a column to a table with no cost in term of storage space

These databases are designed to store up to several millions of columns that can
be fields of an entity or one-to many relationships
Document based databases Concept
Document based databases are similar to Key-value stores except that the value
associated to the key can be a structured and complex objects rather than a simple
types.
These complex objects are generally structured in XML or JSON formalism
This approach allows the implementation of queries on the content of the
documents and not only through the key of the record. NoSQL and Big Data
The simplicity and flexibility of this data model makes it particularly applicable
to Content Management Systems (CMS).
Graph databases Concept
The graph paradigm is a data model in which entities are nodes and associations
between entities are arcs or relationships. Both nodes and relationships are
characterized by a set of properties.
This category of databases is typically designed to address the complexity of
databases more than their volumetric
they are applied in cartography, social networks, and more generally in network
modelling.
MapReduce
MapReduce is a programming model or pattern within the Hadoop framework that
is used to access big data stored in the Hadoop File System (HDFS). It is a core
component, integral to the functioning of the Hadoop framework.
MapReduce is a processing technique and a program model for distributed
computing based on java. Map Reduce is responsible for processing the file

Map
➢ Iterate over large number of records
➢ Extract data of interest
➢ Shuffle and sort intermediate results
▪ Reduce
➢ Aggregate intermediate results
➢ Generate final output
Components of MapReduce
1. PayLoad: The applications implement Map and Reduce functions and form the core of
the job
2. MR Unit: Unit test framework for MapReduce
3. Mapper: Mapper maps the input key/value pairs to the set of intermediate key/value
pairs
4. Name Node: Node that manages the HDFS is known as named node
5. Data Node: Node where the data is presented before processing takes place
6. Master Node: Node where the job trackers runs and accept the job request from the
clients
7. Slave Node: Node where the Map and Reduce program runs
8. Job Tracker: Schedules jobs and tracks the assigned jobs to the task tracker
9. Task Tracker: Tracks the task and updates the status to the job tracker
10. Job: A program that is an execution of a Mapper and Reducer across a dataset
11. Task: An execution of Mapper and Reducer on a piece of data
12. Task Attempt: A particular instance of an attempt to execute a task on a Slave Node
Abstraction of MapReduce
1. Hive - Query engine(uses SQL)
2. Pig - yahoo invented
3. Sqoop –SQL+Hadoop
4. Oozie - yahoo invented use for automating/scheduling
MapReduce Job workflow
Hadoop API
Hadoop MapReduce is a software framework for easily writing applications
which process big amounts of data in-parallel on large clusters (thousands of
nodes) of commodity hardware in a reliable, fault-tolerant manner.
The term MapReduce actually refers to the following two different tasks that
Hadoop programs perform:
1.The Map Task: This is the first task, which takes input data and converts it into
a set of data, where individual elements are broken down into tuples (key/value
pairs).
2. The Reduce Task: This task takes the output from a map task as input and
combines those data tuples into a smaller set of tuples. The reduce task is always
performed after the map task.

6
The framework takes care of scheduling tasks, monitoring them and re-executes
the failed tasks.
The MapReduce framework consists of a single master Job Tracker and one slave
Task Tracker per cluster-node.
The master is responsible for resource management, tracking resource
consumption/availability and scheduling the jobs component tasks on the slaves,
monitoring them and re-executing the failed tasks.
The slaves Task Tracker execute the tasks as directed by the master and provide
task-status information to the master periodically.
Run descriptive statistics for all the variables and observer the data ranges
Called the “simplest class of analytics”, descriptive analytics allows you to
condense big data into smaller, more useful bits of information
It has been estimated that more than 80% of business analytics (e.g. social
analytics) are descriptive.
Some social data could include the number of posts, fans, followers, page views,
check-ins,pins, etc. It would appear to be an endless list if we tried to list them all.
Outlier detection and elimination.
An outlier is a data point that significantly deviates from the rest of the data. It
can be either much higher or much lower than the other data points, and its
presence can have a significant impact on the results of machine learning
algorithms. They can be caused by measurement or execution errors. There are
two main types of outliers
Global outliers: these are isolated data points that are far away from the main
body of the data. They are often easy to identify and remove.
Contextual outliers: these are data points that are unusual in a specific context
but may not be outliers in a different context. They are often more difficult to
identify and may require additional information or domain knowledge to
determine their significance
•Data that don’t conform to the normal and expected patterns are Outliers
•Wide range of application in various domains including finance, security,
intrusion detection in cyber Security
•Criteria for what constitutes an outlier depend the problem domain
•Typically involve large amount of data which may be unstructured
Outlier Detection Methods
Outlier detection plays a crucial role in ensuring the quality and accuracy of
machine learning models. By identifying and removing or handling outliers
effectively, we can prevent them from biasing the model, reducing its
performance
Statistical Methods:
Z-Score: This method calculates the standard deviation of the data points and
identifies outliers. it exceeding a certain threshold (typically 3 or -3).
Interquartile Range (IQR): IQR identifies outliers as data points falling outside
the range defined by Q1-k*(Q3-Q1) and Q3+k*(Q3-Q1), where Q1 and Q3 are
the first and third quartiles, and k is a factor (typically 1.5).
Distance-Based Methods:
K-Nearest Neighbors (KNN): KNN identifies outliers as data points whose K
nearest neighbors are far away from them.
Local Outlier Factor (LOF): This method calculates the local density of data
points and identifies outliers as those with significantly lower density compared
to their neighbors.
Clustering-Based Methods:
Density-Based Spatial Clustering of Applications with Noise
(DBSCAN): In DBSCAN, clusters data points based on their density and
identifies outliers as points not belonging to any cluster
Hierarchical clustering: It involves building a hierarchy of clusters by iteratively
merging or splitting clusters based on their similarity.
Other Methods:
Isolation Forest: Isolation forest randomly isolates data points by splitting
features and identifies outliers as those isolated quickly and easily.
One-class Support Vector Machines (OCSVM): One-Class svm learns a
boundary around the normal data and identifies outliers as points falling outside
the boundary.
Outlier Removal
This involves identifying and removing outliers from the dataset before training
the model.
Thresholding: Outliers are identified as data points exceeding a certain threshold
(e.g., Z-score > 3).
Distance-based methods: Outliers are identified based on their distance from
their nearest neighbors.
Clustering: Outliers are identified as points not belonging to any cluster or
belonging to very small clusters.

Tecnicas Cirurgicas em Animais de Grande Porte A Simon Turner
No ratings yet
Tecnicas Cirurgicas em Animais de Grande Porte A Simon Turner
331 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
BDA U-5
No ratings yet
BDA U-5
33 pages
It (r20) 4-1 Big Data Analytics Digital Notes
No ratings yet
It (r20) 4-1 Big Data Analytics Digital Notes
84 pages
Data Analytics III-i
No ratings yet
Data Analytics III-i
85 pages
Data Analytics (Finished
No ratings yet
Data Analytics (Finished
4 pages
Data Science - g.scali (Lect1) (1)
No ratings yet
Data Science - g.scali (Lect1) (1)
22 pages
No SQL Database in Bda
No ratings yet
No SQL Database in Bda
84 pages
Big Data Analytics (R20a0520)
No ratings yet
Big Data Analytics (R20a0520)
84 pages
R II Bca IV Sem Unit 3 Balu Sir
No ratings yet
R II Bca IV Sem Unit 3 Balu Sir
14 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
Data Science
No ratings yet
Data Science
12 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Assignment DSBDS Insem
No ratings yet
Assignment DSBDS Insem
6 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
kit-601-l-unit-1-240219102731-858108ce
No ratings yet
kit-601-l-unit-1-240219102731-858108ce
35 pages
Unit-II Notes
No ratings yet
Unit-II Notes
9 pages
DA (1)
No ratings yet
DA (1)
86 pages
CS8091 LN
No ratings yet
CS8091 LN
68 pages
unit-1-big-data-notes
No ratings yet
unit-1-big-data-notes
40 pages
BDA-MATERIAL-JNTUGV-R20-UNIT-1
No ratings yet
BDA-MATERIAL-JNTUGV-R20-UNIT-1
32 pages
Chapter 2 - Data Science
No ratings yet
Chapter 2 - Data Science
20 pages
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
No ratings yet
IT_(R20)_4-1_BIG DATA ANALYTICS_DIGITAL NOTES (1)
117 pages
DA Merge Notes(30!09!24)
No ratings yet
DA Merge Notes(30!09!24)
348 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Unit 1 BIGDATA - 702 (D) CSE
No ratings yet
Unit 1 BIGDATA - 702 (D) CSE
20 pages
BDA U1
No ratings yet
BDA U1
80 pages
Digital Notes IDBA Final Original
No ratings yet
Digital Notes IDBA Final Original
156 pages
BIG data1
No ratings yet
BIG data1
49 pages
BDA Syllabus - Sem VII - Mumbai University
No ratings yet
BDA Syllabus - Sem VII - Mumbai University
3 pages
Data Analytics III I
No ratings yet
Data Analytics III I
86 pages
Data Science and Big Data Analytics_ Unit_1
No ratings yet
Data Science and Big Data Analytics_ Unit_1
47 pages
326E5E
No ratings yet
326E5E
2 pages
Unit 1
No ratings yet
Unit 1
61 pages
Mca Big Data PDF Sem 3
No ratings yet
Mca Big Data PDF Sem 3
193 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
ch-1.pdf
No ratings yet
ch-1.pdf
19 pages
SEM VII BDA Syllabus Theory
No ratings yet
SEM VII BDA Syllabus Theory
4 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
CS8091-Big-Data-Analytics
No ratings yet
CS8091-Big-Data-Analytics
28 pages
BDA_Notes
No ratings yet
BDA_Notes
68 pages
BDA Unit 1
No ratings yet
BDA Unit 1
36 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
CS8091 BDA Unit1
No ratings yet
CS8091 BDA Unit1
63 pages
Module 1 Introduction to Big Data Analytics
No ratings yet
Module 1 Introduction to Big Data Analytics
121 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
R15a0530 Bda PDF
No ratings yet
R15a0530 Bda PDF
43 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Data Analytics III i Good Notes
No ratings yet
Data Analytics III i Good Notes
86 pages
BDT-Unit_I.pptx (1)
No ratings yet
BDT-Unit_I.pptx (1)
107 pages
DATA ANALYTICS
No ratings yet
DATA ANALYTICS
42 pages
Big Data Lesson 1 Lucrezia Noli
No ratings yet
Big Data Lesson 1 Lucrezia Noli
46 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Module 1_BCS602_chapter 02.pptx
No ratings yet
Module 1_BCS602_chapter 02.pptx
90 pages
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
No ratings yet
Ethiopin Tecica University Departement of Ict Cours Title: Big Data
15 pages
Data Analysis _Unit1
No ratings yet
Data Analysis _Unit1
65 pages
U - 02 ET
No ratings yet
U - 02 ET
24 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
No ratings yet
Unit - 4 REGULARIZATION FOR DEEP LEARNING
56 pages
8.6 SecondOrder
No ratings yet
8.6 SecondOrder
14 pages
BDA U-4
No ratings yet
BDA U-4
27 pages
BDA U2
No ratings yet
BDA U2
68 pages
Nokia PC Suite 6.83: User'S Guide For
No ratings yet
Nokia PC Suite 6.83: User'S Guide For
26 pages
Mis-final report
No ratings yet
Mis-final report
26 pages
GTA Cheats
No ratings yet
GTA Cheats
27 pages
The Challenge of Digital Art Preservation - Lino García y Pilar Montero
No ratings yet
The Challenge of Digital Art Preservation - Lino García y Pilar Montero
12 pages
STATE MACHINE
No ratings yet
STATE MACHINE
9 pages
Module 6 - Power Platform Application Lifecycle Management
No ratings yet
Module 6 - Power Platform Application Lifecycle Management
46 pages
GATE Dbms Functional Dependency
No ratings yet
GATE Dbms Functional Dependency
6 pages
SNMP Management Guide 02.12.2004
No ratings yet
SNMP Management Guide 02.12.2004
122 pages
Basic100316843export1625893680240 - 0710 13 08 001
No ratings yet
Basic100316843export1625893680240 - 0710 13 08 001
8 pages
Project_Report_Template_AICTE_Internship_2025
No ratings yet
Project_Report_Template_AICTE_Internship_2025
21 pages
JAVA PAD
No ratings yet
JAVA PAD
73 pages
Java Micro Project
No ratings yet
Java Micro Project
9 pages
UCS671 Embedded Vision
No ratings yet
UCS671 Embedded Vision
8 pages
Transforming The Digital Architecture of Planning
No ratings yet
Transforming The Digital Architecture of Planning
35 pages
Design and Process Fmea Worksheet
No ratings yet
Design and Process Fmea Worksheet
4 pages
CHM Automation
No ratings yet
CHM Automation
5 pages
1 1 550v3 PrinDes Example
No ratings yet
1 1 550v3 PrinDes Example
24 pages
Undergraduate Major Project Guidelines (CAT403, CAT404 CAT405)
No ratings yet
Undergraduate Major Project Guidelines (CAT403, CAT404 CAT405)
21 pages
Eze Onyedika Hillary: Education
No ratings yet
Eze Onyedika Hillary: Education
3 pages
OPERA 5.0.5.xx Forms 11g Application Server Installation R3 - DRAFT PDF
No ratings yet
OPERA 5.0.5.xx Forms 11g Application Server Installation R3 - DRAFT PDF
52 pages
Sarayu Novels - Google Search
No ratings yet
Sarayu Novels - Google Search
2 pages
Soa, WSDL, Soap
No ratings yet
Soa, WSDL, Soap
7 pages
Performance Comparison: Dell Latitude E5430 vs. HP ProBook 4440s
No ratings yet
Performance Comparison: Dell Latitude E5430 vs. HP ProBook 4440s
16 pages
CV Ionel Contra 20220825
No ratings yet
CV Ionel Contra 20220825
1 page
151 Italian Polka S.rachmaninov
No ratings yet
151 Italian Polka S.rachmaninov
2 pages
Slots and Slot Address Sequences: Micro Motion Modbus Interface Tool
No ratings yet
Slots and Slot Address Sequences: Micro Motion Modbus Interface Tool
8 pages
M.Sc. - I.T.: Adbms
No ratings yet
M.Sc. - I.T.: Adbms
64 pages
Project Report CRICKET
100% (3)
Project Report CRICKET
17 pages
UNIT - 3 Dbms
No ratings yet
UNIT - 3 Dbms
82 pages

BDA U-3

Uploaded by

BDA U-3

Uploaded by

Big Data Analytics

(Course Code: AI732PE)

❑ CO1: Outline the basic big data concept

Course Title INTERNET OF THINGS Course Type Integrated

Total 3 4 4 42 0 30% 70%

UNIT- III: Big Data Analytics :

UNIT- IV: Machine Learning Algorithms:

Big Data Analytics

Measures of Central Tendency

where, x = Observations n = number of terms

collate all the data sources to suffice business requirement

Columns based databases Concept

You might also like