0% found this document useful (0 votes)
22 views

AI-Data Science

1. KNN is a simple machine learning algorithm that can be used for classification and regression problems. 2. It works by finding the K closest training examples in the feature space and predicting the class based on a majority vote of its neighbors. 3. The main features of KNN are that it relies on the closest training examples to make predictions, assumes similar things exist in close proximity, and uses a majority vote of the nearest neighbors to predict the class.

Uploaded by

Ashmita Paul
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

AI-Data Science

1. KNN is a simple machine learning algorithm that can be used for classification and regression problems. 2. It works by finding the K closest training examples in the feature space and predicting the class based on a majority vote of its neighbors. 3. The main features of KNN are that it relies on the closest training examples to make predictions, assumes similar things exist in close proximity, and uses a majority vote of the nearest neighbors to predict the class.

Uploaded by

Ashmita Paul
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

DIYA ACADEMY OF LEARNING

Affiliated to CBSE Board, New Delhi


Affiliation No. 830420

Artificial Intelligence
Code-417
Data Science
Grade 10

Bringing Education & Values Together


Applications of Data Science
Data Collection
Sources of Data
While accessing data from any of the data
sources, following points should be kept in mind:

• Data which is available for public usage only should be taken up.
• Personal datasets should only be used with the consent of the owner.
• One should never breach someone’s privacy to collect data.
• Data should only be taken form reliable sources as the data collected
from random sources can be wrong or unusable.
• Reliable sources of data ensure the authenticity of data which helps in
proper training of the AI model.
Data Visualisation
• While collecting data, it is possible that the data might come with some errors. Let
us first take a look at the types of issues we can face with data:
1. Erroneous Data: There are two ways in which the data can be erroneous:
Incorrect values
Invalid or Null values
2. Missing Data: In some datasets, some cells remain empty. The values of these cells
are missing and hence the cells remain empty
3. Outliers: Data which does not fall in the range of a certain element are referred to
as outliers.
To understand this better, let us take an example of marks of students in a class. Let
us assume that a student was absent for exams and hence has got 0 marks in it. If his
marks are taken into account, the whole class’s average would go down. To prevent
this, the average is taken for the range of marks from highest to lowest keeping this
particular result separate. This makes sure that the average marks of the class are
true according to the data.
Pandas
• Pandas is a Python library used for working with data sets.
• It has functions for analyzing, cleaning, exploring, and manipulating data.
Why Use Pandas?
• Analyze big data and make conclusions based on statistical theories.
• Make data readable and relevant.
What Can Pandas Do?
• Pandas gives you answers about the data. Like:
• Is there a correlation between two or more columns?
• What is average value?
• Max value?
• Min value?
• Pandas are also able to delete rows that are not relevant, or contains wrong
values, like empty or NULL values. This is called cleaning the data.
Matplotlib
• Matplotlib is a Python 2D plotting library that we can use to produce
high quality data visualization.
• It has a module named pyplot which makes things easy for plotting by
providing feature to control line styles, font properties, formatting
axes etc.
• It supports a very wide variety of graphs and plots namely -
histogram, bar charts, scatter plot, error charts etc.
Data Visualization
Pandas + Numpy + Matplotlib = Data Visualization

Different ways of data visualization (creating graphs) -


1. Read a csv file and create 2d graph for the selected columns
• We use pandas to read a csv file and store in a variable and use the selected
columns to create chart.
2. Create 2 arrays for x axis and y axis and create 2d graph
• We use numpy library to create the required numbers to be mapped for
creating the chart and the pyplot method in matplotlib to draw the actual
chart.
Scatter Plot
Scatterplots show many points plotted in the Cartesian plane.
Each point represents the values of two variables.
One variable is chosen in the horizontal axis and another in the vertical
axis.
Line Plot
A line chart or line graph or curve
chart is a type of chart which
displays information as a series of
data points called 'markers'
connected by straight line
segments.
Bar Graph
It is one of the most commonly used graphical methods.
Histogram
A histogram is a graphical representation of
data points organized into user-specified
ranges.
Similar in appearance to a bar graph, the
histogram condenses a data series into an
easily interpreted visual by taking many data
points and grouping them into logical ranges
or bins.
Box Plots
When the data is split according to its percentile throughout the
range, box plots come in haman.
Box plots also known as box and whiskers plot conveniently display
the distribution of data throughout the range with the help of 4
quartiles.
Data Sciences: Classification Model
• Personality Prediction
• K-Nearest Neighbour model

1. What is KNN?
2. How does it work?
3. What are the features of KNN?
Personality Prediction
Step 1
Here is a map. Take a good look at it. In this map you can see the
arrows determine a quality. The qualities mentioned are:
Think for a minute and understand
which of these qualities you have in
you. Now, take a chit and write your
name on it. Place this chit at a point
in this map which best describes
you. It can be placed anywhere on
the graph. Be honest about yourself
and put it on the graph.
Step 2:
Take the quiz
https://tinyurl.com/discanimal
K-Nearest Neighbour Model (KNN)
• A simple, easy-to-implement supervised machine learning algorithm
• Can be used to solve both classification and regression problems.
• The KNN algorithm assumes that similar things exist in close
proximity. In other words, similar things are near to each other as the
saying goes “Birds of a feather flock together”.
Features of KNN
• The KNN prediction model relies on the surrounding points or
neighbors to determine its class or group.
• Utilizes the properties of the majority of the nearest points to decide
how to classify unknown points.
• Based on the concept that similar data points should be close to each
other.
Questions to Practice
1. What are the applications of Data science?
2. Define: Erroneous data, Outliers, Histogram.
3. What is KNN? How does it work?
4. What are the features of KNN?
5. Explain personality prediction.
6. What is the purpose of Pandas?
7. Why do we use matplotlib in Python?

You might also like