0% found this document useful (0 votes)

6 views

FDSNotes

Ty bsc cs sppu

Uploaded by

student.2004.in

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

FDSNotes

Ty bsc cs sppu

Uploaded by

student.2004.in

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Foundation of Data Science

1. Introduction to Data Science

Introduction to Data Science
• Definition: Data Science is a multidisciplinary field that uses scientific methods, algorithms,
processes, and systems to extract knowledge and insights from structured and unstructured
data.
• Key Components: It involves the integration of statistics, computer science, machine
learning, data mining, and domain knowledge.
• The 3 V’s of Data:
• Volume: Refers to the vast amount of data generated every second from various
sources (e.g., social media, sensors, transactions).
• Velocity: The speed at which data is generated, processed, and analyzed. In today’s
fast-paced world, data needs to be processed in real-time or near real-time.
• Variety: The different forms and types of data, including structured (e.g., databases),
semi-structured (e.g., XML, JSON), and unstructured data (e.g., text, images,
videos).

Why Learn Data Science?

• Demand for Data Scientists: The demand for data scientists is high across various
industries, as businesses increasingly rely on data-driven decision-making.
• Versatility: Data Science skills are applicable in numerous fields such as healthcare,
finance, marketing, and technology.
• Problem Solving: Data Science enables professionals to solve complex problems, improve
business processes, and innovate.
• Career Growth: Offers lucrative career opportunities with high earning potential and job
security.

Applications of Data Science

• Healthcare: Predictive analytics for patient outcomes, personalized medicine, and medical
image analysis.
• Finance: Fraud detection, risk management, algorithmic trading, and customer
segmentation.
• Marketing: Customer behavior analysis, targeted advertising, sentiment analysis, and
recommendation systems.
• Retail: Inventory management, demand forecasting, and personalized shopping experiences.
• Transportation: Route optimization, autonomous vehicles, and predictive maintenance.

The Data Science Lifecycle

• Data Collection: Gathering data from various sources such as databases, sensors, or the
web.
• Data Cleaning: Preprocessing the data to handle missing values, outliers, and errors to
ensure quality.
• Data Exploration: Analyzing the data to discover patterns, trends, and relationships using
statistical methods.
• Data Modeling: Building predictive models using machine learning algorithms to make
forecasts or decisions.
• Data Interpretation: Interpreting the results to gain insights and inform decision-making.
• Model Deployment: Implementing the model in a production environment where it can be
used to make real-time decisions.
• Monitoring & Maintenance: Continuously monitoring the model’s performance and
updating it as needed.

Data Scientist’s Toolbox

• Programming Languages: Python, R, and SQL are essential for data manipulation,
analysis, and modeling.
• Libraries & Frameworks:
• Pandas: Data manipulation and analysis.
• NumPy: Numerical computing.
• Scikit-learn: Machine learning algorithms.
• TensorFlow & PyTorch: Deep learning frameworks.
• Data Visualization Tools: Matplotlib, Seaborn, and Tableau for creating visual
representations of data.
• Big Data Technologies: Hadoop and Spark for processing and analyzing large datasets.
• Database Management: SQL databases (e.g., MySQL, PostgreSQL) and NoSQL databases
(e.g., MongoDB, Cassandra).

Types of Data
• Structured Data:
• Definition: Data that is organized in a specific format, often in rows and columns,
making it easily searchable in databases.
• Examples: Excel sheets, SQL databases.
• Semi-structured Data:
• Definition: Data that doesn’t have a fixed format but includes tags or markers to
separate elements.
• Examples: XML, JSON files.
• Unstructured Data:
• Definition: Data that lacks a specific format or structure, making it more challenging
to process and analyze.
• Examples: Text documents, images, videos, emails.
• Problems with Unstructured Data:
• Storage Issues: Requires more space and advanced storage solutions.
• Processing Complexity: Difficult to process and analyze due to its lack of
structure.
• Interpretation Challenges: Requires advanced techniques like natural
language processing (NLP) or image recognition.
Data Sources
• Open Data: Publicly available data that can be freely used and shared. Examples include
government datasets, public health data, and environmental data.
• Social Media Data: Data generated from social media platforms, such as posts, likes,
shares, and comments. Useful for sentiment analysis and trend prediction.
• Multimodal Data: Data that combines multiple types of information, such as text, images,
and audio. Examples include video files with subtitles or annotated images.
• Standard Datasets: Widely-used datasets in Data Science for benchmarking algorithms and
models. Examples include the Iris dataset, MNIST dataset, and ImageNet.

Data Formats
• Integers and Floats:
• Integers: Whole numbers used for counting or indexing.
• Floats: Numbers with decimal points, used for representing continuous data.
• Text Data:
• Plain Text: Simple text data stored without any formatting (e.g., .txt files).
• Text Files:
• CSV Files: Comma-separated values, often used for storing tabular data.
• JSON Files: JavaScript Object Notation, used for storing and exchanging data.
• XML Files: Extensible Markup Language, used for encoding documents in a format
that is both human-readable and machine-readable.
• HTML Files: Hypertext Markup Language, used for creating web pages.
• Dense Numerical Arrays: Arrays containing numerical data, typically used in scientific
computing and data analysis (e.g., NumPy arrays).
• Compressed or Archived Data:
• Tar Files: Archive files that can contain multiple files and directories.
• GZip Files: Compressed files that reduce storage space and transfer time.
• Zip Files: Archive files that can contain multiple files in a compressed format.
• Image Files:
• Rasterized Images: Images made up of pixels (e.g., JPEG, PNG).
• Vectorized Images: Images made up of paths and curves, scalable without losing
quality (e.g., SVG files).
• Compressed Images: Images that have been compressed to reduce file size (e.g.,
JPEG).

2. Statistical Data Analysis

Role of Statistics in Data Science
• Definition: Statistics is the branch of mathematics that deals with the collection, analysis,
interpretation, presentation, and organization of data.
• Importance in Data Science:
• Data Collection: Statistics provides methods to design surveys and experiments to
collect data efficiently.
• Data Analysis: Statistical techniques are essential for analyzing and interpreting
complex data sets.
• Inference: Statistics helps in making inferences about a population based on sample
data.
• Decision Making: Statistical methods enable data-driven decision-making by
providing a quantitative basis for assessing the reliability and significance of results.

Descriptive Statistics (6 Lectures)

• Definition: Descriptive statistics involves summarizing and organizing data to understand
its main characteristics, typically through numerical summaries, graphs, and tables.
• Key Components:
• Measuring the Frequency:
• Definition: Frequency refers to how often a data point occurs in a dataset.
• Tools: Frequency distributions, histograms, and bar charts are used to
visualize frequency.
• Measuring the Central Tendency:
• Mean: The arithmetic average of a set of numbers.
• Median: The middle value in a dataset when arranged in ascending or
descending order.
• Mode: The value that appears most frequently in a dataset.
• Measuring the Dispersion:
• Range: The difference between the highest and lowest values in a dataset.
• Standard Deviation: A measure of the amount of variation or dispersion in a
set of values.
• Variance: The square of the standard deviation, representing the spread of a
dataset.
• Interquartile Range (IQR): The difference between the first quartile (Q1)
and the third quartile (Q3), representing the middle 50% of the data.

Inferential Statistics (10 Lectures)

• Definition: Inferential statistics involves making predictions or inferences about a
population based on a sample of data drawn from that population.
• Key Concepts:
• Hypothesis Testing:
• Definition: A method used to determine if there is enough evidence to reject
a null hypothesis in favor of an alternative hypothesis.
• Steps:
1. Formulate Hypotheses: Define the null hypothesis (H0) and
alternative hypothesis (H1).
2. Choose Significance Level (α): Commonly used levels are 0.05 or
0.01.
3. Calculate Test Statistic: Based on the sample data.
4. Determine p-value: Compare the p-value with the significance level
to make a decision.
5. Make a Conclusion: Accept or reject the null hypothesis.
• Multiple Hypothesis Testing:
• Definition: Testing several hypotheses simultaneously, often using
adjustments like the Bonferroni correction to control the overall error rate.
• Parameter Estimation Methods:
• Point Estimation: Estimating an unknown parameter using a single value
(e.g., sample mean for population mean).
• Interval Estimation: Providing a range within which the parameter is
expected to lie, with a certain level of confidence (e.g., confidence intervals).

Measuring Data Similarity and Dissimilarity

• Definition: Similarity and dissimilarity measures are used to compare data points or objects,
which is essential for clustering, classification, and other data analysis tasks.
• Key Concepts:
• Data Matrix versus Dissimilarity Matrix:
• Data Matrix: Represents data with rows as objects and columns as attributes.
• Dissimilarity Matrix: Represents pairwise dissimilarities between objects,
with values indicating how different two objects are.
• Proximity Measures for Nominal Attributes:
• Definition: Nominal attributes are categorical attributes with no intrinsic
ordering (e.g., color, gender).
• Proximity Measures: Jaccard coefficient, Simple Matching Coefficient
(SMC).
• Proximity Measures for Binary Attributes:
• Definition: Binary attributes take on two values (e.g., 0 or 1).
• Proximity Measures: Hamming distance, Jaccard coefficient for binary data.
• Dissimilarity of Numeric Data:
• Euclidean Distance: The straight-line distance between two points in a
multi-dimensional space.
• Manhattan Distance: The sum of absolute differences between the
coordinates of two points (also known as L1 distance).
• Minkowski Distance: A generalization of Euclidean and Manhattan
distances, parameterized by a value 'p' that determines the specific distance
measure (p=1 for Manhattan, p=2 for Euclidean).
• Proximity Measures for Ordinal Attributes:
• Definition: Ordinal attributes have a clear, ordered relationship between
values (e.g., rankings).
• Proximity Measures: Can use rank correlation coefficients like Spearman's
rank correlation or Kendall's tau.

Concept of Outlier
• Definition: An outlier is a data point that significantly differs from other observations in a
dataset.
• Types of Outliers:
• Univariate Outliers: Outliers that occur in a single variable.
• Multivariate Outliers: Outliers that occur in a combination of variables, not
apparent when looking at individual variables.
• Contextual Outliers: Outliers that are only considered abnormal in a specific
context (e.g., temperature readings that are normal in summer but outliers in winter).
• Outlier Detection Methods:
• Z-Score Method: Calculates how many standard deviations a data point is from the
mean. Data points with a Z-score beyond a certain threshold (e.g., ±3) are considered
outliers.
• IQR Method: Outliers are identified as data points that fall below Q1 - 1.5IQR or
above Q3 + 1.5IQR.
• Machine Learning Methods: Techniques like clustering, isolation forests, and one-
class SVMs can be used to detect outliers in more complex datasets.

3. Data Preprocessing
Data Objects and Attribute Types
• What is an Attribute?
• Definition: An attribute (or feature) is a property or characteristic of an object or
data point. In a dataset, attributes are the columns that describe different aspects of
the data objects (rows).
• Types of Attributes:
• Nominal Attributes:
• Definition: Categorical attributes with no inherent order or ranking among
the values.
• Examples: Colors (red, blue, green), gender (male, female).
• Binary Attributes:
• Definition: Attributes that have two possible states or values.
• Types:
• Symmetric Binary: Both outcomes are equally important (e.g., 0 or 1
in a binary variable).
• Asymmetric Binary: One outcome is more significant than the other
(e.g., success/failure, where success is more critical).
• Ordinal Attributes:
• Definition: Categorical attributes with a meaningful order or ranking
between values.
• Examples: Education levels (high school, bachelor's, master's), customer
satisfaction ratings (poor, fair, good, excellent).
• Numeric Attributes:
• Definition: Attributes that are quantifiable and expressible in numbers.
• Types:
• Discrete Attributes: Attributes that take on a countable number of
distinct values.
• Examples: Number of students in a class, number of cars in a
parking lot.
• Continuous Attributes: Attributes that can take on any value within a
range.
• Examples: Temperature, height, weight.
Data Quality: Why Preprocess the Data?
• Importance of Data Preprocessing:
• Accuracy: Ensures the accuracy and reliability of the analysis by addressing issues
such as missing data, noise, and inconsistencies.
• Efficiency: Reduces the complexity of data, making it easier to process and analyze.
• Consistency: Aligns data from different sources or formats, ensuring that it is
coherent and uniform.
• Improves Model Performance: Clean and well-preprocessed data lead to better
model performance and more accurate predictions.

Data Munging/Wrangling Operations

• Definition: Data munging or wrangling refers to the process of transforming raw data into a
clean, structured format suitable for analysis.
• Common Operations:
• Data Parsing: Converting raw data into a structured format.
• Data Filtering: Removing irrelevant or redundant data.
• Data Aggregation: Summarizing or combining data from multiple sources.
• Data Enrichment: Enhancing data with additional relevant information.

Cleaning Data
• Definition: Data cleaning is the process of identifying and correcting (or removing) errors
and inconsistencies in data to improve its quality.
• Common Data Cleaning Issues:
• Missing Values: Data points where information is absent.
• Handling Methods: Imputation (filling in missing values), deletion, or using
algorithms that can handle missing data.
• Noisy Data: Data that contains errors, inconsistencies, or irrelevant information.
• Types of Noisy Data:
• Duplicate Entries: Multiple records for the same entity.
• Multiple Entries for a Single Entity: Different entries representing
the same entity with slight variations.
• Missing Entries: Partial data missing for certain records.
• NULLs: Missing values represented as NULL.
• Huge Outliers: Data points that are significantly different from other
observations.
• Out-of-Date Data: Data that is no longer accurate or relevant.
• Artificial Entries: Data that is not genuine or was created for testing
purposes.
• Irregular Spacings: Inconsistent spacing within text data.
• Formatting Issues: Different formatting styles used across tables or
columns.
• Extra Whitespace: Unnecessary spaces that can cause parsing issues.
• Irregular Capitalization: Inconsistent use of uppercase and
lowercase letters.
• Inconsistent Delimiters: Different delimiters used to separate data
fields.
• Irregular NULL Format: Inconsistent representation of missing
data.
• Invalid Characters: Characters that do not belong in the dataset.
• Incompatible Datetimes: Different date and time formats that need
standardization.

Data Transformation
• Definition: Data transformation involves converting data into a suitable format or structure
for analysis.
• Common Data Transformation Techniques:
• Rescaling: Adjusting the range of data values to a specific scale, often to bring all
variables into the same range.
• Example: Rescaling data to a range of 0 to 1.
• Normalizing: Adjusting the data to have a mean of 0 and a standard deviation of 1.
• Example: Z-score normalization.
• Binarizing: Converting numerical data into binary form (e.g., 0 or 1).
• Example: Converting a continuous attribute into a binary attribute based on a
threshold.
• Standardizing: Ensuring data follows a standard normal distribution with a mean of
0 and standard deviation of 1.
• Example: Standardizing data to remove the effects of different scales.
• Label Encoding: Converting categorical attributes into numerical form by assigning
a unique integer to each category.
• One-Hot Encoding: Converting categorical attributes into binary vectors where each
category is represented by a binary variable (0 or 1).

Data Reduction
• Definition: Data reduction involves reducing the volume of data while maintaining its
integrity and meaning, making it easier to analyze.
• Techniques:
• Dimensionality Reduction: Reducing the number of attributes or features while
retaining essential information (e.g., PCA, LDA).
• Numerosity Reduction: Reducing the number of data points or records through
techniques like clustering, sampling, or aggregation.

Data Discretization
• Definition: Data discretization involves converting continuous data into discrete intervals or
categories.
• Importance: Useful for transforming continuous attributes into categorical attributes, which
can simplify analysis and improve model performance.
• Methods:
• Binning: Dividing data into intervals, or "bins," and assigning a categorical label to
each bin.
• Histogram Analysis: Using histograms to define intervals based on data distribution.
• Cluster Analysis: Grouping similar data points and assigning them to discrete
categories.

4. Data Visualization
Introduction to Exploratory Data Analysis (EDA)
• Definition: EDA is an approach to analyzing data sets to summarize their main
characteristics, often using visual methods.
• Purpose of EDA:
• Identifying Patterns: Detecting trends, correlations, and relationships in data.
• Spotting Anomalies: Finding outliers or irregularities in the data.
• Checking Assumptions: Verifying the validity of assumptions made about the data.
• Guiding Further Analysis: Informing the choice of statistical models or algorithms
to apply.

Data Visualization and Visual Encoding

• Definition: Data visualization is the graphical representation of data to make complex data
more accessible and understandable.
• Visual Encoding: The process of mapping data attributes (e.g., numbers, categories) to
visual elements like color, shape, size, or position in a chart.
• Examples of Visual Encoding:
• Position: The location of data points on a plot (e.g., x and y axes in a scatter
plot).
• Color: Used to distinguish different categories or indicate data intensity (e.g.,
heat maps).
• Size: Represents the magnitude of data points (e.g., bubble size in bubble
plots).
• Shape: Differentiates between categories (e.g., different marker shapes in a
scatter plot).

Data Visualization Libraries

• Definition: Libraries or software packages that provide tools and functions for creating
visual representations of data.
• Popular Libraries:
• Matplotlib: A widely used Python library for creating static, animated, and
interactive visualizations.
• Seaborn: Built on top of Matplotlib, Seaborn provides a high-level interface for
creating attractive and informative statistical graphics.
• Plotly: An interactive graphing library that enables complex, web-based
visualizations.
• ggplot2: A popular data visualization package in R, based on the Grammar of
Graphics.
• D3.js: A JavaScript library for producing dynamic, interactive data visualizations in
web browsers.
Basic Data Visualization Tools
• Histograms:
• Definition: A graphical representation of the distribution of a dataset. It shows the
frequency of data points in specified ranges (bins).
• Use: Ideal for displaying the distribution of a single continuous variable.
• Bar Charts/Graphs:
• Definition: A chart that presents categorical data with rectangular bars. The length of
each bar is proportional to the value it represents.
• Use: Best for comparing the frequency or count of different categories.
• Scatter Plots:
• Definition: A plot that shows the relationship between two numerical variables. Each
point represents an observation in the dataset.
• Use: Useful for identifying correlations or patterns between variables.
• Line Charts:
• Definition: A type of chart that displays data points connected by a line. It shows
trends over time or ordered categories.
• Use: Commonly used to track changes over time.
• Area Plots:
• Definition: Similar to line charts, but the area under the line is filled with color or
shading.
• Use: Good for visualizing cumulative data or comparing multiple variables.
• Pie Charts:
• Definition: A circular chart divided into sectors, each representing a proportion of
the whole.
• Use: Ideal for showing the relative proportions of categories in a dataset.
• Donut Charts:
• Definition: A variation of the pie chart with a central hole, often used to provide
additional information in the center.
• Use: Similar to pie charts but with an added aesthetic appeal.

Specialized Data Visualization Tools

• Boxplots:
• Definition: A graphical representation of the distribution of a dataset based on five
summary statistics: minimum, first quartile (Q1), median, third quartile (Q3), and
maximum.
• Use: Effective for identifying outliers and understanding the spread and skewness of
data.
• Bubble Plots:
• Definition: A variation of a scatter plot where each point is replaced by a bubble, and
the size of the bubble represents a third variable.
• Use: Useful for visualizing three dimensions of data on a two-dimensional plane.
• Heat Maps:
• Definition: A graphical representation of data where individual values are
represented by colors.
• Use: Ideal for displaying the intensity or density of data points across a matrix.
• Dendrogram:
• Definition: A tree-like diagram used to illustrate the arrangement of clusters
produced by hierarchical clustering.
• Use: Useful for visualizing the structure and hierarchy of data clusters.
• Venn Diagram:
• Definition: A diagram that shows all possible logical relations between a finite
collection of sets.
• Use: Effective for illustrating set relationships, such as intersections and unions.
• Treemap:
• Definition: A hierarchical structure represented as nested rectangles, where each
rectangle's size is proportional to the data value.
• Use: Useful for visualizing large amounts of hierarchical data.
• 3D Scatter Plots:
• Definition: An extension of the scatter plot into three dimensions, where each point
is defined by three numerical coordinates.
• Use: Ideal for visualizing the relationship between three continuous variables.

Advanced Data Visualization Tools - Wordclouds

• Definition: A visual representation of text data where the size of each word reflects its
frequency or importance.
• Use: Effective for quickly identifying the most prominent words or themes in a text dataset.

Visualization of Geospatial Data

• Definition: The process of visualizing data that includes geographical or spatial
components.
• Tools and Techniques:
• Choropleth Maps: Maps where areas are shaded or patterned in proportion to the
data value.
• Point Maps: Maps that represent individual data points as symbols, such as dots.
• Heat Maps: Geographical maps that use color to represent the density of data points
in a given area.
• Interactive Maps: Maps that allow users to interact with data by zooming, clicking,
or filtering.

Data Visualization Types

• Categorical Data Visualization:
• Tools: Bar charts, pie charts, donut charts.
• Purpose: Comparing different categories or understanding the distribution of
categorical data.
• Numerical Data Visualization:
• Tools: Histograms, boxplots, scatter plots.
• Purpose: Understanding the distribution, trends, and relationships between
numerical variables.
• Hierarchical Data Visualization:
• Tools: Treemaps, dendrograms.
• Purpose: Displaying the structure and relationships within hierarchical datasets.
• Network Data Visualization:
• Tools: Network graphs, node-link diagrams.
• Purpose: Visualizing relationships and interactions between entities within a
network.

Philippine Skills Framework For Analytics and Artificial Intelligence
100% (2)
Philippine Skills Framework For Analytics and Artificial Intelligence
149 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Crash Course_Introduction to Data Science
No ratings yet
Crash Course_Introduction to Data Science
121 pages
Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Python for Data Analysis
No ratings yet
Python for Data Analysis
84 pages
Data Science - g.scali (Lect1) (1)
No ratings yet
Data Science - g.scali (Lect1) (1)
22 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
Introduction to Business Analytics - Copy
No ratings yet
Introduction to Business Analytics - Copy
63 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Chapter 2-2
No ratings yet
Chapter 2-2
34 pages
HCI - Notes-Ch3
100% (1)
HCI - Notes-Ch3
44 pages
Data Science Class2
No ratings yet
Data Science Class2
33 pages
Understanding Data
No ratings yet
Understanding Data
8 pages
Revision (1)
No ratings yet
Revision (1)
19 pages
03-07-2024-Data Science - Orentation Programme
No ratings yet
03-07-2024-Data Science - Orentation Programme
53 pages
Rudra Bhatt Data
No ratings yet
Rudra Bhatt Data
9 pages
Unit1-Data Science Fundamentals
No ratings yet
Unit1-Data Science Fundamentals
35 pages
Unit 1
No ratings yet
Unit 1
28 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
Chap 2-Data Analysis
No ratings yet
Chap 2-Data Analysis
27 pages
Session 1819
No ratings yet
Session 1819
47 pages
Big Data Chapter-I_new
No ratings yet
Big Data Chapter-I_new
49 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
BIG DATA ANALTICS (UNIT 1)
No ratings yet
BIG DATA ANALTICS (UNIT 1)
31 pages
DS Lecture 15
No ratings yet
DS Lecture 15
44 pages
IDS_sem ans unit 1
No ratings yet
IDS_sem ans unit 1
10 pages
Unit 1
No ratings yet
Unit 1
61 pages
Unit-1 ans
No ratings yet
Unit-1 ans
30 pages
DataCleaning
No ratings yet
DataCleaning
28 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
Fundamentals of Data Source and Preparation For ML v31
No ratings yet
Fundamentals of Data Source and Preparation For ML v31
45 pages
UNIT 5 BDT.pptx
No ratings yet
UNIT 5 BDT.pptx
132 pages
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
No ratings yet
Unit 1 - DATA ANALYTICS - KIT-601 - AKTU
24 pages
4.0 Introduction to Data
No ratings yet
4.0 Introduction to Data
16 pages
Week 1-2
No ratings yet
Week 1-2
3 pages
data Science
No ratings yet
data Science
3 pages
Unit-II-1
No ratings yet
Unit-II-1
12 pages
Emerging_CH2
No ratings yet
Emerging_CH2
41 pages
Unit 1
No ratings yet
Unit 1
36 pages
4. Data segmentation
No ratings yet
4. Data segmentation
11 pages
AI ML June 4 2022
No ratings yet
AI ML June 4 2022
40 pages
HTC Emerging Ch2
No ratings yet
HTC Emerging Ch2
37 pages
Module 1
No ratings yet
Module 1
55 pages
data_analytics
No ratings yet
data_analytics
8 pages
Data Analytics All 5 Units
No ratings yet
Data Analytics All 5 Units
63 pages
Unit 1 (1) (1)
No ratings yet
Unit 1 (1) (1)
12 pages
CH-2 Data Science
No ratings yet
CH-2 Data Science
45 pages
Data Science UNIT 1 Final
No ratings yet
Data Science UNIT 1 Final
107 pages
Big Data Analytics M1
No ratings yet
Big Data Analytics M1
27 pages
Introduction to Data Science
No ratings yet
Introduction to Data Science
25 pages
#2 Data Science
No ratings yet
#2 Data Science
32 pages
Screenshot 2024-11-08 at 11.01.05 AM
No ratings yet
Screenshot 2024-11-08 at 11.01.05 AM
54 pages
FDS - 1 SOLVED
No ratings yet
FDS - 1 SOLVED
17 pages
Chapter 2 Emerging
No ratings yet
Chapter 2 Emerging
31 pages
UNIT 1 Exploratory Data Analysis
No ratings yet
UNIT 1 Exploratory Data Analysis
21 pages
m2 final
No ratings yet
m2 final
151 pages
Cse 2027 Fda M1
No ratings yet
Cse 2027 Fda M1
55 pages
Module 1_BCS602_chapter 02.pptx
No ratings yet
Module 1_BCS602_chapter 02.pptx
90 pages
DA PUT Solutions
No ratings yet
DA PUT Solutions
30 pages
Unit-1 Notes (1)
No ratings yet
Unit-1 Notes (1)
24 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Engineering Education, Moving Into 2020
No ratings yet
Engineering Education, Moving Into 2020
9 pages
AIML Architect Ascendion MOHIT SHARMA Bengaluru
No ratings yet
AIML Architect Ascendion MOHIT SHARMA Bengaluru
3 pages
Big Data
No ratings yet
Big Data
1 page
Arunai Theory Exam Time Table NovDec2024
No ratings yet
Arunai Theory Exam Time Table NovDec2024
25 pages
Get Fundamentals of Data Observability 1st Edition Andy Petrella free all chapters
100% (1)
Get Fundamentals of Data Observability 1st Edition Andy Petrella free all chapters
55 pages
Dse Placement Report
No ratings yet
Dse Placement Report
77 pages
Muzaffer Kareem, IBM BA Coursera Cert
No ratings yet
Muzaffer Kareem, IBM BA Coursera Cert
1 page
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 1 - Exploratory Data Analysis
No ratings yet
Data Exploration and Visualization - AD3301 - Important Questions With Answer - Unit 1 - Exploratory Data Analysis
8 pages
Report Data Driven Business Transformation
No ratings yet
Report Data Driven Business Transformation
32 pages
Marcelo de Carvalho Alves, Luciana Sanches - Remote Sensing and Digital Image Processing With R - Lab Manual-CRC Press (2023)
No ratings yet
Marcelo de Carvalho Alves, Luciana Sanches - Remote Sensing and Digital Image Processing With R - Lab Manual-CRC Press (2023)
189 pages
Sales Analysis
No ratings yet
Sales Analysis
4 pages
Chapter 1 - Lecture
No ratings yet
Chapter 1 - Lecture
7 pages
Finalized FDP Brochure
No ratings yet
Finalized FDP Brochure
4 pages
What Is Data Science
No ratings yet
What Is Data Science
2 pages
Data Analytics Learning Plan
No ratings yet
Data Analytics Learning Plan
3 pages
Predictive Analytics of Lithium Ion Battery For Optimization and Battery Failure Using Machine Learning Algorithms With Authors Name
No ratings yet
Predictive Analytics of Lithium Ion Battery For Optimization and Battery Failure Using Machine Learning Algorithms With Authors Name
7 pages
Data Analytics Business Intelligence
No ratings yet
Data Analytics Business Intelligence
31 pages
10 Steps To Creating A Data-Driven Culture
No ratings yet
10 Steps To Creating A Data-Driven Culture
5 pages
Bachelor of Computer Applications (BCA) - Data Science Structure - 2023-2024
No ratings yet
Bachelor of Computer Applications (BCA) - Data Science Structure - 2023-2024
6 pages
A Road Map For Data Science. What Is Data Science - by Jared - Towards Data Science PDF
No ratings yet
A Road Map For Data Science. What Is Data Science - by Jared - Towards Data Science PDF
6 pages
Data Merging
No ratings yet
Data Merging
3 pages
How To Learn AI From Scratch in 2024
No ratings yet
How To Learn AI From Scratch in 2024
3 pages
Discrete Structures Notes - TutorialsDuniya
No ratings yet
Discrete Structures Notes - TutorialsDuniya
136 pages
Unlocking The Power of Data To Advance Pharma Businesses
No ratings yet
Unlocking The Power of Data To Advance Pharma Businesses
7 pages
5-Days FDP Data Science: Aicte Training and Learning (Atal) Academy
100% (1)
5-Days FDP Data Science: Aicte Training and Learning (Atal) Academy
3 pages
UiPath Certified - Test Automation Engineer Professional Exam Description
No ratings yet
UiPath Certified - Test Automation Engineer Professional Exam Description
8 pages
Leveraging Data Science To Combat COVID-19: A Comprehensive Review
No ratings yet
Leveraging Data Science To Combat COVID-19: A Comprehensive Review
20 pages
Case Study Assignment 2
No ratings yet
Case Study Assignment 2
76 pages
Plagiarism Report
No ratings yet
Plagiarism Report
2 pages

FDSNotes

Uploaded by

FDSNotes

Uploaded by

Foundation of Data Science

1. Introduction to Data Science

Why Learn Data Science?

Applications of Data Science

The Data Science Lifecycle

Data Scientist’s Toolbox

2. Statistical Data Analysis

Descriptive Statistics (6 Lectures)

Inferential Statistics (10 Lectures)

Measuring Data Similarity and Dissimilarity

Data Munging/Wrangling Operations

Data Visualization and Visual Encoding

Data Visualization Libraries

Specialized Data Visualization Tools

Advanced Data Visualization Tools - Wordclouds

Visualization of Geospatial Data

Data Visualization Types

You might also like