Exp7 11 Data Science
Exp7 11 Data Science
Experiment name: Creating a Data Frame and Matrix-like Operations on a Data Frame.
Merging two Data Frames and Applying functions to Data Frames
Experimental set-up/Equipment/Apparatus/Tools: -
1 Computer System
2 Google Colab /python Installed on system with editor (like pycharm, jupyter)
DataFramesin Python are very similar: they come with the Pandas library, and they are
defined as two-dimensional labeled data structures with columns of potentially different
types.
In general, we could say that the Pandas DataFrame consists of three main components: the
data, the index, and the columns.
a Pandas DataFrame
a Pandas Series: a one-dimensional labeled array capable of holding any data type
with axis labels or index. An example of a Series object is one column from a
DataFrame.
Experimental Procedure-
1. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
2. Type a python program using input, output and calculations
3. Save the program
4. Execute it.
import pandas
import pandas as pd
print(pd. version )
Series
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)
With the index argument, you can name your own labels
import pandas as pd
a = [1, 7, 2]
print(myvar)
print(myvar["y"])
import pandas as pd
myvar = pd.Series(calories)
print(myvar)
Create a Series using only data from "day1" and "day2":
import pandas as pd
print(myvar)
print(myvar["day1"])
DataFrames
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Locate Row
As you can see from the result above, the DataFrame is like a table with rows
and columns.
data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
print(df)
Use the named index in the loc attribute to return the specified row(s).
auth_series = pd.Series(author)
article_series = pd.Series(article)
df1 = pd.DataFrame(frame)
print(df1)
Precaution and sources of error:
The devices either computers or any other networking device should be handled
with due care and preserved carefully.
Results
Conclusions
Through this experiment we learntto perform various operations on a data frame
Experiment 8
Experimental set-up/Equipment/Apparatus/Tools: -
3 Computer System
4 Google Colab /python Installed on system with editor (like pycharm, jupyter)
Python offers multiple great graphing libraries that come packed with lots of different
features. No matter if you want to create interactive, live or highly customized plots python
has an excellent library for you.
Experimental Procedure-
5. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
6. Type a python program using input, output and calculations
7. Save the program
8. Execute it.
importpandasaspd
iris=pd.read_csv('iris.csv', names=['sepal_length', 'sepal_width', 'petal_length',
'petal_width', 'class'])
print(iris.head())
Scatter Plot
fig, ax=plt.subplots()
# scatter the sepal_length against the sepal_width
ax.scatter(iris['sepal_length'], iris['sepal_width'])
# set a title and labels
ax.set_title('Iris Dataset')
ax.set_xlabel('sepal_length')
ax.set_ylabel('sepal_width')
Line Chart
columns=iris.columns.drop(['class'])
# create x data
x_data=range(0, iris.shape[0])
# create figure and axis
fig, ax=plt.subplots()
# plot each column
forcolumnincolumns:
ax.plot(x_data, iris[column], label=column)
# set title and legend
ax.set_title('Iris Dataset')
ax.legend()
Bar Chart
fig, ax=plt.subplots()
# count the occurrence of each class
data=wine_reviews['points'].value_counts()
# get x and y data
points=data.index
frequency=data.values
# create bar chart
ax.bar(points, frequency)
# set title and labels
ax.set_title('Wine Review Scores')
ax.set_xlabel('Points')
ax.set_ylabel('Frequency')
Results
Conclusions
Through this experiment we learntvarious charts for visualization, aesthatics and plotting in
layers
Experiment 9
Experiment name: Creating Histograms and Density Charts
Experimental set-up/Equipment/Apparatus/Tools: -
1. Computer System
2. Google Colab /python Installed on system with editor (like pycharm, jupyter)
Histograms
A great way to get started exploring a single variable is with the histogram. A histogram
divides the variable into bins, counts the data points in each bin, and shows the bins on the x-
axis and the counts on the y-axis. In our case, the bins will be an interval of time representing
the delay of the flights and the count will be the number of flights falling into that interval.
The binwidth is the most important parameter for a histogram and we should always try out a
few different values of binwidth to select the best one for our data.
To make a basic histogram in Python, we can use either matplotlib or seaborn. The code
below shows function calls in both libraries that create equivalent figures. For the plot calls,
we specify the binwidth by the number of bins. For this plot, I will use bins that are 5 minutes
in length, which means that the number of bins will be the range of the data (from -60 to 120
minutes) divided by the binwidth, 5 minutes ( bins = int(180/5)).
Density Plots
Density plot is a smoothed, continuous version of a histogram estimated from the data. The
most common form of estimation is known as kernel density estimation. In this method, a
continuous curve (the kernel) is drawn at every individual data point and all of these curves
are then added together to make a single smooth density estimation. The kernel most often
used is a Gaussian (which produces a Gaussian bell curve at each data point).
Experimental Procedure-
9. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
10. Type a python program using input, output and calculations
11. Save the program
12. Execute it.
Histograms
importmatplotlib.pyplotasplt
importseabornassns
# matplotlib histogram
plt.hist(flights['arr_delay'], color='blue', edgecolor='black',
bins =int(180/5))
# seaborn histogram
sns.distplot(flights['arr_delay'], hist=True, kde=False,
bins=int(180/5), color='blue',
hist_kws={'edgecolor':'black'})
# Add labels
plt.title('Histogram of Arrival Delays')
plt.xlabel('Delay (min)')
plt.ylabel('Flights')
Density Plots
# Density Plot and Histogram of all arrival delays
sns.distplot(flights['arr_delay'], hist=True, kde=True,
bins=int(180/5), color = 'darkblue',
hist_kws={'edgecolor':'black'},
kde_kws={'linewidth': 4})
Results
Conclusions
Through this experiment we learnt to make histograms and density charts
Experiment 10
Experimental set-up/Equipment/Apparatus/Tools: -
3. Computer System
4. Google Colab /python Installed on system with editor (like pycharm, jupyter)
To predict the relationship between two variables, we’ll use a simple linear regression
model. In a simple linear regression model, we’ll predict the outcome of a variable known as
the dependent variable using only one independent variable.
Building a linear regression model
To build a linear regression model in python, we’ll follow five steps:
1. Reading and understanding the data
2. Visualizing the data
3. Performing simple linear regression
4. Residual analysis
5. Predictions on the test set
Performing Simple Linear
Regression Equation of simple linear
regression y = c + mX
In our case:
y = c + m * TV
Experimental Procedure-
1. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
2. Type a python program using input, output and calculations
3. Save the program
4. Execute it.
# Supress Warnings
import warnings
warnings.filterwarnings('ignore')
# Import the numpy and pandas package
importnumpyasnp
import pandas aspd
# Read the given CSV file, and view some sample records
advertising =pd.read_csv("Company_data.csv")
advertising
importmatplotlib.pyplotasplt
importseabornassns
plt.show()
X= advertising['TV']
y = advertising['Sales']
fromsklearn.model_selectionimporttrain_test_split
X_train, X_test, y_train, y_test=train_test_split(X, y, train_size=0.7,
test_size=0.3, random_state=100)
X_train
y_train
Results
Conclusions
Through this experiment we learnt to build and execute linear regression model
Experiment 11
Experiment name: build Multiple Linear Regression, Lasso and Ridge Regression
Objectives:To learn how to build Multiple Linear Regression, Lasso and Ridge Regression
Prerequisites: knowledge of Python
Key terms: Linear Regression, Lasso, Ridge Regression
Experimental set-up/Equipment/Apparatus/Tools: -
5. Computer System
6. Google Colab /python Installed on system with editor (like pycharm, jupyter)
Ridge and Lasso regression are powerful techniques generally used for creating parsimonious
models in presence of a ‘large’ number of features. Here ‘large’ can typically mean either of two
things:
1. Large enough to enhance the tendency of a model to overfit (as low as 10 variables
might cause overfitting)
2. Large enough to cause computational challenges. With modern systems, this situation
might arise in case of millions or billions of features
Though Ridge and Lasso might appear to work towards a common goal, the inherent properties
and practical use cases differ substantially. If you’ve heard of them before, you must know that
they work by penalizing the magnitude of coefficients of features along with minimizing the error
between predicted and actual observations. These are called ‘regularization’ techniques. The key
difference is in how they assign penalty to the coefficients:
1. Ridge Regression:
o Performs L2 regularization, i.e. adds penalty equivalent to square of the
magnitude of coefficients
o Minimization objective = LS Obj + α * (sum of square of coefficients)
2. Lasso Regression:
o Performs L1 regularization, i.e. adds penalty equivalent to absolute value of
the magnitude of coefficients
o Minimization objective = LS Obj + α * (sum of absolute value of
coefficients)
Experimental Procedure-
13. Start Google Colab /python Installed on system with editor (like pycharm, jupyter)
14. Type a python program using input, output and calculations
15. Save the program
16. Execute it.
Ridge Regression:
Lasso Regression
defridge_regression(data, predictors, alpha,
models_to_plot={}): #Fit the model
ridgereg =
Ridge(alpha=alpha,normalize=True)
ridgereg.fit(data[predictors],data['y'])
y_pred =
ridgereg.predict(data[predictors])
Results
Conclusions
Through this experiment we learnt Multiple Linear Regression, Lasso and Ridge Regression