100% found this document useful (1 vote)
1K views

CS3361 Data Science Lab Manual (II CYS)

The document discusses downloading and installing NumPy, SciPy, Jupyter, Statsmodels, and Pandas packages using pip in the command prompt. It then provides code snippets to create and work with NumPy arrays and Pandas dataframes, read data from text files and the web, perform descriptive analytics on the Iris dataset, use the Pima Indians diabetes dataset for univariate and bivariate analysis and multiple regression, apply various plotting functions to UCI datasets, and visualize geographic data with Basemap.

Uploaded by

rajananandh72138
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

CS3361 Data Science Lab Manual (II CYS)

The document discusses downloading and installing NumPy, SciPy, Jupyter, Statsmodels, and Pandas packages using pip in the command prompt. It then provides code snippets to create and work with NumPy arrays and Pandas dataframes, read data from text files and the web, perform descriptive analytics on the Iris dataset, use the Pima Indians diabetes dataset for univariate and bivariate analysis and multiple regression, apply various plotting functions to UCI datasets, and visualize geographic data with Basemap.

Uploaded by

rajananandh72138
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Register no:411621104029

EXNO:
DATE:
DOWNLOAD INSTALL AND EXPLORE THE FEATURES OF NUMPY, SCIPY,
JUPYTER, STATSMODELS AND PANDAS PACKAGES

AIM:
To download, install and explore the features of Numpy, Scipy, Jupyter, Statsmodels
and pandas packages.
ALGORITHM:
Step 1: Go to Command prompt.
Step 2: Type pip install Numpy.
Step3: Numpy packages have been installed.
Step 4: Type pip Scipy, Scipy packages get installed.
Step 5: Type pip install Jupyter, Jupyter packages get installed.
Step 6: Type pip install Statsmodels, the packages get installed.
Step 7: Type pip install pandas, the packages get installed.
Register no:411621104029

INSTALLATION PROCESS:
Numpy Installation: pip install numpy

Scipy Installation: pip install scipy

Jupyter Installation: pip install jupyter


Register no:411621104029

Statsmodels installation: pip install statsmodels

Pandas installation:

RESULT:
Thus the working with commands executed successfully.
Register no:411621104029

EXNO:
DATE:

WORKING WITH NUMPY ARRAYS

AIM:
To write a python code to implement the concept of Numpy arrays.

ALGORITHM:
Step 1: Create a numpy.
Step 2: Import numpy as np.
Step 3: And create an array
Step 4: Variable name= np.array {[]}
Register no:411621104029

PROGRAM & OUTPUT:

RESULT:
Thus the working with Numpy array was completed successfully.
Register no:411621104029

EXNO:
DATE:

WORKING WITH PANDAS DATA FRAMES

AIM:
To write a python code to implement the concept of Pandas Data frames.

ALGORITHM:
Step 1: import pandas.
Step 2: Create a data frame using List.
Step 3: Create Data frame from dict of ndarray /List.
Step 4: Delete the rows and columns.
Pandas Data Frame is two-dimensional size-mutable, potentially heterogeneous.
Tabular data structure with labelled axes (rows and columns). A data frame is a
twodimensional data structure i.e., data is aligned in a tabular fashion in rows and columns.
Pandas Data frame consists of three principle components the data, rows and columns.
PROGRAM & OUTPUT:
Creating a data frame using List:

Creating Data frame from dict of ndarray/lists:


Register no:411621104029

Dealing with Rows and Columns:

RESULT:
Thus, the working with pandas Data Frame was completed successfully.
Register no:411621104029

EXNO:
DATE:

READING DATA FROM TEXT FILES, EXCEL AND THE WEB AND
EXPLORING VARIOUS COMMANDS DOING DESCRIPTIVE
ANALYTICS ON THE IRIS DATA SET

AIM:
To read the data from text files, Excel and the web and exploring various commands for
doing descriptive analytics on the Iris data set.
READING DATA FROM TEXT FILE:
ALGORITHM:
Step 1: Open Notepad and type a text.
Step 2: Save that text to Desktop or any other Folder.
Step 3: Open pycharm and type code.
Step 4: Run the program. Step
5: The Output displays
Register no:411621104029

PROGRAM:

OUTPUT:

IRIS DATA SET:


ALGORITHM:
Step 1: Download the IRIS dataset from the Kaggle website and save in Documents or any
other folder do you want.
Register no:411621104029

Link:https://www.kaggle.com/code/bharath25/descriptive-statistics-and-machine-learningiris/data
Step 2: Open the pycharm and type the following commands, Download Packages.
Step 3: The output will display.

iris. head (10)

iris. Shape iris.info


()
Register no:411621104029

iris. describe ()

iris.isnull ().sum ()
Register no:411621104029

iris.value_counts (“Species”

RESULT:
Thus, the program was executed successfully.
Register no:411621104029

EXNO:
DATE:

USE THE DIABETES DATA SET FROM UCI AND PIMA INDIANS
DIABETES DATA SET

AIM:
To use the diabetes data set from UCI and Pima Indians diabetes data set performing the
following.
a) Implement Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis from UCI dataset.
b) Bivariate analysis: Linear and Logistic Regression Modeling.
c) Multiple Regression Analysis.

ALGORITHM:
Step 1: Download the Pima Indians Diabetes dataset
Link: https://www.kaggle.com/datasets/uciml/pima-indians-
diabetesdatabase?resource=download Step 2: Install Packages.
Step 3: Open the pycharm and type the following Commands.
Step 4: The output will display.
Register no:411621104029

PROGRAM:
5a) Univariate analysis:
Frequency, Mean, Median, Mode, Variance, Standard Deviation, Skewness and
Kurtosis.
print (df.shape)
print (df.info ())

Print (df.mean ())


Register no:411621104029

Print (df.median ())

Print (df.mode ())


Print (df.std ())

Print (df.var ())


Print (df.skew ())
Register no:411621104029

Print (df.kurtosis ())


Df.describe ()
Register no:411621104029
Register no:411621104029
Register no:411621104029

5 b) Bivariate Analysis: Linear and Logistic Regression Modelling.


Register no:411621104029

LOGISTIC REGRESSION:
Register no:411621104029

5 c) MULTIPLE REGRESSION ANALYSIS.


ALGORITHM:
Step 1: Import Libraries.
Step 2: Import dataset.
Step 3: Define x and y.
Step 4: Train the model on the training set.
Step 5: Predict the test set results.
Step 6: Evaluate the model.
Step 7: Plot the results.
Register no:411621104029
Register no:411621104029
Register no:411621104029

RESULT:
Thus, the program was executed successfully.
Register no:411621104029

EXNO:
DATE:

APPLY AND EXPLORE VARIOUS PLOTTING FUNCTIONS ON UCI


DATA SETS

AIM:
To apply and explore various plotting functions on UCI data sets.
a) Normal Curves.
b) Density and Contour Plots.
c) Correlation and Scatter Plots.
d) Histograms.
e) Three Dimensional Plotting.

ALGORITHM:
Step 1: Download Heart dataset from kaggle.
Link:https://www.kaggle.com/datasets/zhaoyingzhu/heartcsv
Step 2: Save that in downloads or any other Folder and install packages.
Step 3: Apply these following commands on the dataset.
Step 4: The Output will display.

PROGRAM:
Register no:411621104029

BOX PLOT:

a) Normal Curve:
Register no:411621104029

b) Density Plots:

c) Correlation and Scatter plots:


Register no:411621104029

Correlation plot

Scatter plot
Register no:411621104029

Histogram:
Register no:411621104029

d) Three Dimensional Plotting:


Register no:411621104029

RESULT:
Thus, the program was executed successfully.
Register no:411621104029

EXNO:
DATE:

VISUALIZING GEOGRAPHIC DATA WITH BASEMAP

AIM:
To create an insight Geographic Data with Basemap.

ALGORITHM:
Step 1: Install Basemap. The zip file occurs extract the original file.
Step 2: import Packages.
Step3: Save that in downloads or any other Folder.
Step 4: Apply these following commands.
Step 5: The Output will display.

PROGRAM & OUTPUT:


%matplotlib inline import
numpy as np import
matplotlib.pyplot as plt
from mpl_toolkits.basemap import Basemap

plt.figure(figsize=(8, 8))
m = Basemap(projection='ortho', resolution=None, lat_0=50, lon_0=-100)
m.bluemarble(scale=0.5);
Register no:411621104029

fig
= plt.figure(figsize=(8, 8))
m = Basemap(projection='lcc', resolution=None,
width=8E6, height=8E6,
lat_0=45, lon_0=-100,)

m.etopo(scale=0.5, alpha=0.5)

# Map (long, lat) to (x, y) for plotting


x, y = m(-122.3, 47.6) plt.plot(x, y,
'ok', markersize=5)
plt.text(x, y, ' Seattle', fontsize=12);
Register no:411621104029

RESULT:

Thus, the program was executed successfully.


Register no:411621104029

EXP NO:
DATE:

ARITHMETIC OPERATION BETWEEN TWO PANDA SERIES


AIM
To write a python program to perform arithmetic operation between two panda series
ALGORITHM
STEP 1: Start
STEP 2: Import pandas package
STEP 3: Initialise ds1 and ds2
STEP 4: For addition, calculate ds1+ds2
STEP 5: For subtraction, calculate ds1-ds2
STEP 6: For multiplication, calculate ds1*ds2
STEP 7: For division, calculate ds1/ds2
STEP 8: Print the desired results
STEP 9: Stop

PROGRAM
import pandas as pd
ds1=pd.Series([2,4,6,8,10])
ds2=pd.Series([1,3,5,7,9])
print("Add two series")
ds=ds1+ds2 print(ds)
print("Subtract two series")
ds=ds1-ds2 print(ds)
print("Multiply two series")
ds=ds1*ds2 print(ds)
print("Divide two series")
ds=ds1/ds2 print(ds)

OUTPUT
Register no:411621104029

RESULT
Thus, the program to perform arithmetic operations between two panda series has
been executed successfully
Register no:411621104029

EXPNO:
DATE:
SCATTER PLOTS IN PYTHON USING POKEMON DATASET
AIM
To perform a scatter plot in Python, using Matplotlib and Seaborn library with
Pokémon dataset.
ALGORITHM:
Step 1: Download Pokémon dataset from Kaggle.
Link: https://www.kaggle.com/datasets/rounakbanik/pokemon Step 2:
Save that in downloads or any other Folder and install packages.
Step 3: Apply these following commands on the dataset.
Step 4: The Output will display.
PROGRAM:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt import
seaborn as sns

data = pd.read_csv("../input/pokemon.csv") data.shape data.head() g1 =


data.loc[data.generation==1,:] # dataframe.plot.scatter() method g1.plot.scatter('attack',
'defense'); # The ';' is to avoid showing a message before showing the plot
# plt.scatter() function
plt.scatter('attack', 'defense', data=g1);
g1.plot.scatter('attack', 'defense', s = 40, c = 'orange', marker = 's', figsize=(8,5.5));
plt.figure(figsize=(10,7)) # Specify size of the chart plt.scatter('attack', 'defense',
data=data[data.is_legendary==1], marker = 'x', c = 'magenta') plt.scatter('attack',
'defense', data=data[data.is_legendary==0], marker = 'o', c = 'blue') plt.legend(('Yes',
'No'), title='Is legendary?') plt.show()
plt.figure(figsize=(10,7))
sns.scatterplot(x = 'attack', y = 'defense', s = 70, hue ='is_legendary', data=data); #
hue represents color plt.figure(figsize=(10,7)) sns.scatterplot(x = 'attack', y =
'defense', s = 50, hue = 'is_legendary', style ='is_legendary', data=data); #
style represents marker plt.figure(figsize=(11,7))
sns.scatterplot(x = 'attack', y = 'defense', s = 50, hue = 'type1', data=data)
Register no:411621104029

plt.legend(bbox_to_anchor=(1.02, 1)) # move legend to outside of the


chart plt.title('Defense vs Attack for All Pokemons', fontsize=16)
plt.xlabel('Attack', fontsize=12) plt.ylabel('Defense', fontsize=12)
plt.show()
water = data[data.type1 == 'water']
water.plot.scatter('height_m', 'weight_kg', figsize=(10,6)) plt.grid(True)
# add gridlines
plt.show()
water.plot.scatter('height_m', 'weight_kg', figsize=(10,6))
plt.grid(True)
for index, row in water.nlargest(5, 'height_m').iterrows():
plt.annotate(row['name'], # text to show
xy = (row['height_m'], row['weight_kg']), # the point to annotate
xytext = (row['height_m']+0.2, row['weight_kg']), # where to show the text
fontsize=12)
plt.xlim(0, ) # x-axis has minimum 0
plt.ylim(0, ) # y-axis has minimum 0
plt.show()

OUTPUT:
Register no:411621104029
Register no:411621104029

RESULT:

Thus, the above program was executed successfully.

You might also like