0% found this document useful (0 votes)
30 views12 pages

Chapter-14 Data Science

Data science note

Uploaded by

Munny Narang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views12 pages

Chapter-14 Data Science

Data science note

Uploaded by

Munny Narang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Chapter-14

DATA SCIENCE

Data:
Collecting data allows you to capture a record of past events so that we can use data analysis to
find recurring patterns.

Data Science:Data science is the study of data to extract meaningful insights for
business. It is an approach that combines principles and practices from the fields of
mathematics, statistics, artificial intelligence, and computer engineering to analyze large
amounts of data. Data Science is used to give the right direction to the companies to
achieve their goals.

Applications of Data Science


● Internet Search
There are many search engines apart from Google, like Yahoo, Bing, ADL etc that make
use of data science algorithms to deliver the best results for our search query in a
fraction of second. Google manages to process 20 petabytes of data everyday only with
the help of Data Science.
● Digital Advertisements (Targeted Advertisements)

Entire digital marketing like email marketing, pay per click advertising, banner
advertising, digital billboards in the airports, etc. are based on Data Science algorithms.
This is the reason for the success of digital ads more than the traditional
advertisements.

● Digital Marketing
Website Recommendations
List of companies into e-commerce business like Amazon, Flipkart, eBay, Twitter,
LinkedIn, Netflix etc. help Per customers find relevant products from billions of products
available with them based on the user's past experiences, preferable search and
interests. All this is possible only using Data Science as a tool.
● Fraud and Risk Detection
Banking sector and finance companies use Data Science to analyze customer data like
customer profiling, past expenditures, and other essential variables to understand the
probabilities of risk, default and failure if any. This has really helped them to recover
from bad debts and losses, every year by selling their banking products based on the
customer's purchasing power.
● Medicine
Data Science is used in Medical Image Analysis such as detecting tumors, artery
stenosis, organ delineation, etc.. Data Science is quite helpful in disease related
research at genetics and genomics in understanding the advanced level of treatment
and to study the response of drugs on a patient. Thus, the use of Data Science has
done wonders in predicting genetic risks in advance and provided better individual care.
● Gaming

Games like EA Sports, Zynga, Sony etc. are designed using machine learning
algorithms that help the games to either improve or upgrade themselves based on the
player's previous moves and accordingly mod its game to a next higher level.

● Virtual Reality

Virtual Reality headset is using computing knowledge, algorithms and data to provide
you with the best viewing experience. Pokémon GO is gaining its popularity due to the
implementation of Data Science in the Virtual Reality game.

Revisiting AI Project Cycle


Problem Scoping
The Problem statement template leads us towards the goal of our project which
can now be stated as:

THE PROBLEM STATEMENT TEMPLATE FOR THE ABOVE PROBLEM:

“To be able to predict the quantity of food dishes to be


prepared for everyday consumption in restaurant buffets.”

Data Acquisition
After finalising the goal of our project, let us now move towards looking at various data
features which affect the problem in some way or the other. Since any AI-based project
requires data for testing and training, we need to understand what kind of data is to be
collected to work towards the goal. In our scenario, various factors that would affect the
quantity of food to be prepared for the next day consumption in buffets would be:

System map for our problem statement

Data Aquisition:
we extract the required information from the curated dataset and clean it up in such a
way that there exist no errors or missing elements in it.

Evaluation
Once the model has been trained on the training dataset of 20 days, it is now time to
see if the model is working properly or not. Let us see how the model works and how is
it tested.
Step 1: The trained model is fed data regards the name of the dish and the quantity
produced for the same.
Step 2: It is then fed data regards the quantity of food left unconsumed for the same
dish on previous occasions.
Step 3: The model then works upon the entries according to the training it got at the
modelling stage.
Step 4: The Model predicts the quantity of food to be prepared for the next day.
Step 5: The prediction is compared to the testing dataset value. From the testing
dataset, ideally, we can say that the quantity of food to be produced for next day’s
consumption should be the total quantity minus the unconsumed quantity.
Step 6: The model is tested for 10 testing datasets kept aside while training.
Step 7: Prediction values of testing dataset is compared to the actual values.
Step 8: If the prediction value is same or almost similar to the actual values, the model
is said to be accurate.
Otherwise, either the model selection is changed or the model is trained on more data
for better accuracy. Once the model is able to achieve optimum efficiency, it is ready to
be deployed in the restaurant for real-time usage.

Data Collection:
Data collection is an exercise which does not require even a tiny bit of technological
knowledge. But when it comes to analysing the data, it becomes a tedious process for
humans as it is all about numbers and alpha-numerical data. That is where Data
Science comes into the picture. It not only gives us a clearer idea around the dataset,
but also adds value to it by providing deeper and clearer analyses around it. And as AI
gets incorporated in the process, predictions and suggestions by the machine become
possible on the same.
Python for Data Science

Data Science is using a combination of Python and Mathematical concepts like


Statistics, Data probability, etc. Python is the most suitable, simple and easy language
to write the code and can highly complex mathematical processing required to develop
applications using Al

A file created in Python and saved with an extension py is called a module. A collection
of relevant di saved under the same directory and a name is called a Package. There
are various packages related to purposes available for free to be used in Python. Some
of the open-source packages available in Python are:

● NumPy: Numerical Array Data Handling Package. It is used for data analysis
and calculation related numerical data sets.
● OpenCV: Image Processing Package. It is used for manipulating and processing
of images like cropping, editing etc.
● Matplotlib: Data Visualisation Package. It is used for the graphical
representation to produce high quality visualization of the numerical data.
● NLTK (Natural Language Toolkit): Natural language Processing Package. It
helps in tasks related to text data.
● Pandas: Data related to 2 or more dimensions is handled using Pandas. The
source of data is data in a tabular form either using spreadsheets or database
software.

NumPy
NumPy, which stands for Numerical Python, is the fundamental package for
Mathematical and logical operations on arrays in Python. It is a commonly used
package when it comes to working around numbers. NumPy gives a wide range of
arithmetic operations around numbers giving us an easier approach in working with
them. NumPy also works with arrays, which is nothing but a homogenous collection of
Data. An array is nothing but a set of multiple values which are of same datatype. They
can be numbers, characters, booleans, etc. but only one data type can be accessed
through an array. In NumPy, the arrays used are known as ND-arrays (N-Dimensional
Arrays) as NumPy comes with a feature of creating n-dimensional arrays in Python.An
array can easily be compared to a list. Let us take a look at how they are different:

INDEXING IN ARRAY:
Indexing is an operation that pulls out a select set of values from an
array. The index of a value in an array is that value's location within
the array. There is a difference between the value and where the
value is stored in an array.
NumPy Questions:
Q1. Create a one dimensional array.
import numpy as np
rollno=np.array([1,2,3,4,5])
print(rollno)

Q2. Create a sequential One Dimensional array with the values as multiples of 10
to 100.

import numpy as np
a=np.arange(10,101,10)
print(a)

Q3.Create 1D array with 4 random values:

import numpy as np
a=np.random.random(4)
print(a)

Q4.Create a two dimensional array of 3 rows and 4 columns with all values as
ones.

import numpy as np
a=np.ones((3,4))
print(a)

Q5. Create a two dimensional array of 3 rows and 3 columns with all zeroes.

import numpy as np
a=np.zeros((3,3))
print (a)

Q6.Create a two dimensional array of 3 rows and 3 columns with all values as 7.
import numpy as np
a=np.full((3,3),7)
print(a)

Q7. Create two separate three dimensional arrays and calculate sum,product.
Print result.

import numpy as np
a=np.array([[1,2,3],[1,4,6],[1,4,2]])
b=np.array([[2,5,7],[5,3,1],[1,5,3]])
c=a+b
d=a*b
print(c)
print(d)

Q8. Create two separate one dimensional arrays and calculate sum,product. Print
result.
import numpy as np
a=np.array([1,2,4])
b=np.array([2,5,7])
c=a+b
d=a*b
print(c)
print(d)

Q9.Create a Matrix of 3X3, then replace the values at (2,2) to 7 and at (1,2) to 10.

import numpy as np
a=np.array([[3,2,1],[2,3,5],[2,3,1]])
a[2,2]=7
a[1,2]=10
print(a)

Q10.Create a one dimensional array and replace the value at (0,0) to 4 and (0,2) to
8.

import numpy as np
a=np.array([1,2,5,7,8])
a[0,0]=4
a[0,2]=8
print(a)

You might also like