0% found this document useful (0 votes)
74 views

01-Core Data Concepts

This document discusses core concepts of data, including: - Data can be observations and information that is structured or unstructured - Examples of data include measurements of penguin attributes like ID, flipper length, body mass, and sex - Data can be continuous, taking any value, or discrete, only taking certain specified values - Whether data is considered continuous or discrete depends on the context and how values are measured or categorized - It's important to distinguish between nominal data, which has no inherent order, and ordinal data, which has a clear ranking or hierarchy but discrete steps

Uploaded by

Chathura Dilsh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

01-Core Data Concepts

This document discusses core concepts of data, including: - Data can be observations and information that is structured or unstructured - Examples of data include measurements of penguin attributes like ID, flipper length, body mass, and sex - Data can be continuous, taking any value, or discrete, only taking certain specified values - Whether data is considered continuous or discrete depends on the context and how values are measured or categorized - It's important to distinguish between nominal data, which has no inherent order, and ordinal data, which has a clear ranking or hierarchy but discrete steps

Uploaded by

Chathura Dilsh
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

X

Core Data Concepts


Measures of Data

X
Core Data Concepts

Fundamentally at the core of using data science to


solve real world problems and find solutions to business
challenges, we use data.
It’s important to understand core concepts of data
before continuing on to use it with a variety of methods.

X
X
Core Data Concepts

In this section of the course we’ll focus on some core


topics to understanding data.
As we begin to learn about probability, statistics, and
visualizations regarding data, we should take some time to
understand what we mean by the term “data”.

X
X
Core Data Concepts

What is data?

X
X
Core Data Concepts

What is data?
Collected observations and information about
something, which can be structured or unstructured.
Let’s think about a few examples of data…

X
X
Core Data Concepts

Data Examples:
Information about penguins

X
X
Core Data Concepts

Data Examples:
Information about penguins
Penguin ID Flipper Length (mm) Body Mass (g)

0 181.0 3750.0

1 186.0 3800.0

2 195.0 3250.0

3 193.0 3450.0

X
X
Core Data Concepts

Data Examples:
Notice unit of measurements and structure.
Penguin ID Flipper Length (mm) Body Mass (g)

0 181.0 3750.0

1 186.0 3800.0

2 195.0 3250.0

3 193.0 3450.0

X
X
Core Data Concepts

Data Examples:
Not all data needs to be numeric!
Penguin ID Flipper Length (mm) Body Mass (g) Sex

0 181.0 3750.0 Male

1 186.0 3800.0 Female

2 195.0 3250.0 Male

3 193.0 3450.0 Female

X
X
Core Data Concepts

Data Examples:
This is actually a real dataset! The data was collected
and made available by Dr. Kristen Gorman and the Palmer
Station, Antarctica LTER, a member of the Long Term
Ecological Research Network.

X
X
Core Data Concepts

Types of Data:
So far we’ve seen that data doesn’t need to be
numerical (e.g. color of cars).
We can distinguish between different types of data,
such as continuous vs. discrete data. As well as
structured data vs. unstructured data.

X
X
Core Data Concepts

Data Vocabulary:
Let’s explore some core concepts and the vocabulary used
to describe them:
● Continuous vs. Discrete (Categorical)
● Structured vs. Unstructured
● Nominal vs. Ordinal
● Population vs. Sample
X
X
Core Data Concepts

Continuous vs. Discrete:

X
X
Core Data Concepts

Continuous vs. Discrete:


Discrete Data:
Can only take certain values, there are no values “in-
between” values.

X
X
Core Data Concepts

Continuous vs. Discrete:


Discrete Data:
Can only take certain values, there are no values “in-
between” values.
Car models: Toyota, Tesla, Ferrari

X
X
Core Data Concepts

Continuous vs. Discrete:


Discrete Data:
Can only take certain values, there are no values “in-
between” values.
Playing Card Values: A,2,3..J,Q,K

X
X
Core Data Concepts

Continuous vs. Discrete:


Discrete Data:
Note that discrete data can be numeric. Such as the
possible values of rolling a single die are only 1,2,3,4,5
and 6.

X
X
Core Data Concepts

Continuous vs. Discrete:


Continuous Data:
Can take any value, there are an “infinite” amount of
values in-between any two values if you are able to get
precise enough.

X
X
Core Data Concepts

Continuous vs. Discrete:


Continuous Data:
Height of people is a continuous value (e.g. 172 cm tall
or 173 cm tall).

X
X
Core Data Concepts

Continuous vs. Discrete:


Continuous Data:
Notice how someone could be in between that at 172.5
cm tall.

X
X
Core Data Concepts

Continuous vs. Discrete:


Continuous Data:
Notice how someone could be in between that at 172.5
cm tall, or 172.54 cm tall, or 172.542cm.

X
X
Core Data Concepts

Continuous vs. Discrete:


Remember that while continuous data is numeric (160kg),
discrete data can be numeric (dice roll of 2) or a string
(“Blue”).
Keep in mind that sometimes the context and framing of a
dataset will decide whether you should think of data as
continuous or discrete.
X
X
Core Data Concepts

Continuous vs. Discrete:


For example, is color data continuous or discrete?

X
X
Core Data Concepts

Continuous vs. Discrete:


For example, is color data continuous or discrete? You
may want to immediately say discrete:

X
X
Core Data Concepts

Continuous vs. Discrete:


For example, is color data continuous or discrete? But
what if the context is physics and the visible spectrum of
light in wavelengths?

X
X
Core Data Concepts

Continuous vs. Discrete:


For example, is color data continuous or discrete? But
what if the context is physics and the visible spectrum of
light in wavelengths?

X
X
Core Data Concepts

Continuous vs. Discrete:


For example, is color data continuous or discrete? The
does the term “color” even make sense?

X
X
Core Data Concepts

Continuous vs. Discrete:


Do not confuse numeric and ordered discrete data with
continuous data!

X
X
Core Data Concepts

Continuous vs. Discrete:


Consider an airline with passenger classes: 1st, 2nd, and
3rd class. This is numeric data with a clear order, but it’s
still discrete! There is no 1.57 passenger class.

X
X
Core Data Concepts

Continuous vs. Discrete:


To help distinguish this type of data, we need to consider
nominal vs. ordinal data.

X
X
Core Data Concepts

Nominal vs. Ordinal:


Nominal data is classified without a natural order or
rank.

X
X
Core Data Concepts

Nominal vs. Ordinal:


Nominal data is classified without a natural order or
rank.
For example categories of discrete animals: dogs, cats,
lizards, horses, etc…

X
X
Core Data Concepts

Nominal vs. Ordinal:


Nominal data is classified without a natural order or
rank.
A good test for nominal data is if it can be clearly
sorted or not. Nominal data can not be sorted.

X
X
Core Data Concepts

Nominal vs. Ordinal:


Ordinal data can be sorted (it has an order to it).

X
X
Core Data Concepts

Nominal vs. Ordinal:


Ordinal data can be sorted (it has an order to it). Our
previous examples of passenger classes is an ordinal
discrete data set.

X
X
Core Data Concepts

Nominal vs. Ordinal:


Ordinal data can be sorted (it has an order to it). We
understand that 2nd class is in between 1st and 3rd class.

X
X
Core Data Concepts

Nominal vs. Ordinal:


Ordinal data doesn’t necessarily need to be numeric.
Weather data terms such as “hot”, “mild”, and “cold” can
be said to be ordinal.

X
X
Core Data Concepts

When thinking about continuous vs. discrete and


nominal vs. ordinal, try to keep in mind the context of the
problem you are trying to solve.
It may not be necessary to apply labels such as ordinal
if they aren’t useful to the challenge at hand.

X
X
Core Data Concepts

Structured vs. Unstructured:


We also need to understand that not all data is formatted
nicely in a table or spreadsheet, and in some cases we
don’t even want it in a structured format!

X
X
Core Data Concepts

Structured vs. Unstructured:


Structured data is highly specific and is stored in a
predefined format.
For example: Excel spreadsheets, JSON files, XML files,
or SQL databases follow a predefined format.

X
X
Core Data Concepts

Structured vs. Unstructured:


Unstructured data is not in a particular format.
For example audio, video, or text data doesn’t need to
follow any particular predefined structured format.

X
X
Core Data Concepts

Structured vs. Unstructured:


Unstructured data is not in a particular format.
Be careful not to confused computer encoded file
formats with “formatted data”! Just because text is in a
PDF format doesn’t make it structured data.

X
X
Core Data Concepts

Structured vs. Unstructured:


Typical unstructured data is harder to work with, but in
certain fields it’s actually necessary to achieve results!

X
X
Core Data Concepts

Structured vs. Unstructured:


Many state of the art deep learning techniques use
unstructured data to learn patterns and generate new
objects.

X
X
Core Data Concepts

Structured vs. Unstructured:


For example: DALLE-2 from OpenAI

X
X
Core Data Concepts

Population vs. Sample:


Finally, we also need to consider the scope of our data
collection.
Is this data a full representation of everything available
and known? Or is it a sample of everything?

X
X
Core Data Concepts

Population vs. Sample:


Population consists of every member of a group. This is
dependent on the context of the situation.
For example a list of all the student names in a school
contains data on the entire population.

X
X
Core Data Concepts

Population vs. Sample:


Don’t confuse the term “population” in this context with
the population of an entire country. In the context of data
science we use population to describe the entire available
data set.

X
X
Core Data Concepts

Population vs. Sample:


Often however it is not possible to record data on an
entire population.
In this case we rely on a sample from the population,
which is a subset of the members of the group.

X
X
Core Data Concepts

Population vs. Sample:


For example an optional school survey that we only ask
some students fill out would be a sample of a population.
We should always try to have samples that are
representative of the population we’re trying to
understand.

X
X
Core Data Concepts

Population vs. Sample:


Later on we’ll discover that sample sizes are a well
studied science.
For example: How many students should we survey for a
school of 1,000 students to get a representative sample?

X
X
Core Data Concepts

Population vs. Sample:


In case you're curious on the answer to that question, it
actually depends a bit on our assumptions of the overall
population and the task at hand. You can find out more now
at: wikipedia.org/wiki/Sample_size_determination

X
X
Core Data Concepts

We’ve seen that data comes in different forms and that we


need to be cognizant of the context surrounding the data
and more importantly on what we are using the data for.
The ability to measure certain features of data is crucial to
understanding data sets, especially numeric ones.

X
X
Core Data Concepts

Let’s continue by exploring two key concepts of data


measurements:
● Measurements of Central Tendency
○ Mean, Median , and Mode
● Measurements of Dispersion
○ Variance and Standard Deviation

X
X

You might also like