0% found this document useful (0 votes)
7 views22 pages

Harsh Psda Practical File

The document is a lab file for the Fundamentals of Data Science and Analytics course at Amity University, detailing various experiments conducted by a BCA student. It includes a list of experiments focused on data visualization, text analysis, image processing, exploratory data analysis, and machine learning techniques, along with theoretical explanations and code implementations for each experiment. The file serves as a practical guide for applying data science concepts using Python programming.

Uploaded by

harshsingh110099
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views22 pages

Harsh Psda Practical File

The document is a lab file for the Fundamentals of Data Science and Analytics course at Amity University, detailing various experiments conducted by a BCA student. It includes a list of experiments focused on data visualization, text analysis, image processing, exploratory data analysis, and machine learning techniques, along with theoretical explanations and code implementations for each experiment. The file serves as a practical guide for applying data science concepts using Python programming.

Uploaded by

harshsingh110099
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Amity Institute of Information Technology

Amity University, Uttar Pradesh, Noida


Session:2022-2025(Odd Semester)

[CSIT366]
FUNDAMENTALS OF DATA SCIENCE AND
ANALYTICS
LAB FILE

Submitted to: Dr. Rashmi Vashisth Submitted by: Harsh Singh Parmar
Program: BCA 5th semester Enrollment no: A10046622002
INDEX
Sno. Name of the Experiment Date Signature
1. To implement Bar Plot, Histogram and 11-07-24
Line Chart.
2. To implement frequency distribution 18-07-24
and trend chart using a data set.
3. Write a program to create a matrix of 25-07-24
random numbers and convert it to a
vector.
4. Write a program to perform different 01-08-24
text analysis operations using NLTK.
5. Write a program to generate a random 08-08-24
word using
(i) HTTP Request
(ii) A Text File
6. To implement Image Morphing in 06-09-24
Python.
7. To implement and verify EDA 19-09-24
techniques.
8. To implement KNN algorithm on iris 26-09-24
dataset.
9. To Perform sentiment analysis on text 03-10-24
data using a pretrained model.
10. To Implement Principal Component 10-10-24
Analysis on a dataset.
Date-11-07-24

Experiment-1
Aim: To implement Bar Plot, Histogram and Line Chart.

Software used: Pycharm

Theory
Bar Plot: A bar plot is a visual representation of data using rectangular bars,
where the length of each bar corresponds to the value it represents. It's effective
for comparing categories or showing changes over time.
Histogram: A histogram is a visual representation of data distribution where
continuous data is grouped into intervals and displayed as bars. The height of
each bar corresponds to the frequency of values within that interval, providing
insights into data shape and spread.
Line Chart: A line chart is a visual representation of data points connected by
straight lines, often used to show trends and changes over time. It's ideal for
displaying continuous data and highlighting patterns of increase, decrease, or
stability.

Code:
import matplotlib.pyplot as plt
import numpy as np

x = np.arange(10)
y = np.random.rand(10)

plt.figure(figsize=(8, 6))
plt.plot(x, y, marker='x')
plt.title('Line Chart Example')
plt.xlabel('X Axis')
plt.ylabel('Y Axis')

data = np.random.randn(1000)
plt.figure(figsize=(8, 6))
plt.hist(data, bins=30, edgecolor='black')
plt.title('Histogram Example')
plt.xlabel('Value')
plt.ylabel('Frequency')

categories = ['A', 'B', 'C', 'D']


values = [10, 20, 15, 25]

plt.figure(figsize=(8, 6))
plt.bar(categories, values)
plt.title('Bar Plot Example')
plt.xlabel('Category')
plt.ylabel('Value')

plt.show()

OUTPUT:
Date-18-07-24

Experiment-2
Aim: To implement frequency distribution and trend chart using a data set.

Software used: Pycharm

Theory: Frequency distribution and trend charts are complementary tools for
data analysis. A frequency distribution organizes data into intervals, showing
how often values occur within each range. This helps identify patterns and
central tendencies. A trend chart, on the other hand, visualizes data points
over time, revealing upward, downward, or stable patterns. By combining these
methods, analysts can gain insights into data behavior, make predictions, and
inform decision-making.

Code:
import matplotlib.pyplot as plt
import pandas as pd

df = pd.read_csv('camera_dataset.csv')

plt.figure(figsize=(10, 6))
df.groupby('Release date')['Max resolution'].max().plot(kind='line', marker='o')
plt.xlabel("Year")
plt.ylabel("Maximum Resolution")
plt.title("Trend Chart: Maximum Resolution over Years")
plt.show()

plt.figure(figsize=(10, 6))
effective_pixels = df['Effective pixels'].value_counts()
plt.bar(effective_pixels.index, effective_pixels.values, edgecolor="black")
plt.xlabel("Effective Pixels")
plt.ylabel("Frequency")
plt.title("Bar Chart: Frequency of Effective Pixels")
plt.show()
OUTPUT
Date-25-07-24

Experiment-3
Aim: Write a program to create a matrix of random numbers and convert it
to a vector.
Software used: Pycharm
Theory
A matrix is a rectangular array of numbers arranged in rows and columns. In
Python, it's often represented as a nested list. The NumPy library provides
efficient matrix operations.
A vector is a one-dimensional array of numbers, essentially a special case of a
matrix with only one column. In Python, it can be represented as a list or a
NumPy array. Vectors are used for various calculations and linear algebra
operations.
Both matrices and vectors are fundamental data structures in fields like linear
algebra, machine learning, and data science.

Code:
import numpy as np

def create_linear_transformation_matrix(n, k):


"""
Create an n x k matrix to represent a linear function that maps k-dimensional vectors to n-
dimensional vectors.

Args:
n (int): The number of dimensions in the output vector.
k (int): The number of dimensions in the input vector.

Returns:
np.ndarray: An n x k matrix representing the linear transformation.
"""
# Create an n x k matrix with random coefficients
matrix = np.random.rand(n, k)

return matrix

n = 3 # number of dimensions in the output vector


k = 4 # number of dimensions in the input vector

# Create the linear transformation matrix


matrix = create_linear_transformation_matrix(n, k)

print(matrix)

OUTPUT
Date-01-08-24

Experiment-4
Aim: Write a program to perform different text analysis operations using
NLTK.

Software used: Pycharm


Theory: NLTK is a powerful Python library for text analysis. It offers a suite of
tools for tasks like tokenization (breaking text into words), stop word removal
(filtering out common words), stemming (reducing words to their root), and
part-of-speech tagging (identifying word types). These operations are
foundational for more complex analyses such as sentiment analysis, named
entity recognition (extracting names, organizations, locations), and text
classification. NLTK provides a rich environment for experimenting with and
developing natural language processing applications.
Code:
import nltk
from nltk import pos_tag, word_tokenize

nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')

def pos_tagging(text):
tokens = word_tokenize(text)

tagged_tokens = pos_tag(tokens)

return tagged_tokens

# Example usage:
text = "This is a sample sentence. Another sentence. And the last one."
tagged_tokens = pos_tagging(text)

for token, tag in tagged_tokens:


print(f"{token}: {tag}")
OUTPUT
Date-08-08-24

Experiment-5.1
Aim: Write a program to generate a random word using
(i) HTTP Request
(ii) A Text File

Software used: Pycharm


Theory: Python offers several methods for generating random words. One
common approach is to utilize the random module to select words from a pre-
existing list. Alternatively, you can construct words randomly by combining
characters from specified character sets. For more complex word generation,
consider employing libraries like NLTK or wonderwords, which provide
functionalities for generating realistic-sounding words or even complete
sentences based on linguistic patterns. The specific method chosen depends on
the desired outcome, whether it's simple random word selection or the
creation of more sophisticated text structures.
Code:
import requests
import random

def get_random_word():
response = requests.get("https://raw.githubusercontent.com/dwyl/english-
words/master/words_alpha.txt")
if response.status_code == 200:
words = response.text.splitlines()
random_word = random.choice(words)
return random_word
else:
return None

random_word = get_random_word()
if random_word:
print("Random word:", random_word)
else:
print("Failed to get random word")

OUTPUT
Date-08-08-24

Experiment-5.2
Aim: Write a program to generate a random word using
(i) HTTP Request
(ii) A Text File

Software used: Pycharm


Code:
import random

def get_random_word(filename):
with open(filename, 'r') as f:
words = [line.strip() for line in f.readlines()]
random_word = random.choice(words)
return random_word

filename = 'words.txt'
random_word = get_random_word(filename)
print("Random word:", random_word)

OUTPUT
Date-06-09-24

Experiment-6
Aim: To implement Image Morphing in Python
Software Used: Pycharm
Theory: Image morphing in Python involves smoothly transforming one image
into another by manipulating pixels, shapes, or features. This can be achieved
using libraries like OpenCV, which allows for image warping and blending
techniques. By defining corresponding points between two images, algorithms
like Delaunay triangulation or thin plate splines can interpolate the intermediate
images. The result is a seamless transition from one image to another, often used
in applications like animation, facial recognition, or artistic effects. Libraries such
as NumPy are also helpful in managing pixel-level operations efficiently.
Code:
from PIL import Image, ImageChops
import requests
from io import BytesIO
import os
import matplotlib.pyplot as plt

image1_url = "https://cdn.pixabay.com/photo/2014/02/27/16/10/flowers-276014_1280.jpg"
image2_url = "https://cdn.pixabay.com/photo/2015/04/23/22/00/tree-736885_1280.jpg"

response1 = requests.get(image1_url)
response2 = requests.get(image2_url)

if response1.status_code == 200 and response2.status_code == 200:


image1 = Image.open(BytesIO(response1.content))
image2 = Image.open(BytesIO(response2.content))

width, height = min(image1.size, image2.size)


image1 = image1.resize((width,height))
image2 = image2.resize((width,height))

num_frames = 30

output_folder = 'morphing_frames/'
os.makedirs(output_folder, exist_ok=True)

for i in range (num_frames + 1):


alpha = i/ num_frames
blended_image = ImageChops.blend(image1, image2, alpha)
frame_filename = f'{output_folder}frame_{i:03d}.jpg'
blended_image.save(frame_filename)

# Display a specific frame


if i == 15:
frame_to_display = Image.open(frame_filename)
plt.imshow(frame_to_display)
plt.axis('off')
plt.show()

print(f'{num_frames + 1} frames created in {output_folder}')

else:
print("Failed to download images from the provided URLs.")

OUTPUT
Date-19-09-24

Experiment-7
Aim: To implement and verify EDA techniques
Software Used: Pycharm
Theory: Exploratory Data Analysis (EDA) techniques in Python are crucial for
understanding the underlying patterns, trends, and relationships within a
dataset. Common EDA techniques include descriptive statistics (like mean,
median, and mode), data visualization, and distribution analysis. Libraries such
as Pandas and NumPy allow for easy data manipulation and summary statistics,
while Matplotlib and Seaborn are used for plotting histograms, boxplots, scatter
plots, and correlation heatmaps. Outlier detection, handling missing data, and
normalization or transformation techniques are often applied to clean and better
understand the dataset before applying predictive models.
Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

file_path = 'titanic.csv'
df = pd.read_csv(file_path)

# Set the size for all plots


plt.figure(figsize=(12, 8))

# 1. Countplot of Survived
plt.subplot(2, 3, 1)
sns.countplot(x='Survived', data=df)
plt.title('Countplot of Survived')

# 2. Boxplot of Fare grouped by Pclass


plt.subplot(2, 3, 2)
sns.boxplot(x='Pclass', y='Fare', data=df)
plt.title('Boxplot of Fare by Pclass')

# 3. Scatter plot of Age vs Fare


plt.subplot(2, 3, 3)
sns.scatterplot(x='Age', y='Fare', data=df)
plt.title('Scatter plot of Age vs Fare')
# 4. Heatmap of correlations
plt.subplot(2, 3, 4)
corr_matrix = df[['Survived', 'Pclass', 'Age', 'SibSp', 'Parch', 'Fare']].corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Heatmap of Correlations')

# 5. Bubble chart: Age vs Fare with Pclass as color and Fare as size
plt.subplot(2, 3, 5)
bubble_sizes = df['Fare'] / 2 # Scale down bubble size for better visualization
sns.scatterplot(x='Age', y='Fare', size=bubble_sizes, hue='Pclass', data=df, palette='viridis',
alpha=0.6)
plt.title('Bubble Chart: Age vs Fare')

plt.tight_layout()
plt.show()
Date-26-09-24

Experiment-8
Aim: To implement KNN algorithm on iris dataset
Software Used: PyCharm
Theory: KNN algorithm operates on the principle of "similarity is proximity."
Given a new data point, KNN finds the K closest data points (neighbours) from
the training set. The class or value of the majority of these neighbours is then
assigned to the new data point. In classification, this means predicting the
category, while in regression, it involves predicting a numerical value. KNN is
often used for tasks like image recognition, recommendation systems, and
anomaly detection.
Iris Dataset consists of 150 samples from each of three species of Iris (Iris setosa,
Iris virginica and Iris versicolor). Four features were measured from each sample:
the length and the width of the sepals and petals, in centimetres.
Code:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import pandas as pd

file = "iris.csv"
iris = pd.read_csv(file)

X = iris[['sepal.length', 'sepal.width', 'petal.length', 'petal.width']]


y = iris['variety']

# Split the dataset into training and testing sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize features by removing the mean and scaling to unit variance


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
knn = KNeighborsClassifier(n_neighbors=3)

knn.fit(X_train, y_train)

# Make predictions using the testing set


y_pred = knn.predict(X_test)

print("Accuracy:", accuracy_score(y_test, y_pred))


print("Classification Report:")
print(classification_report(y_test, y_pred))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

OUTPUT
Date-03-10-24

Experiment-9
Aim: Perform sentiment analysis on text data using a pretrained model.
Software Used: Pycharm
Theory: Sentiment analysis on text data involves automatically determining the
emotional tone of a given piece of text. This can range from positive, negative,
or neutral. A pretrained model, like those based on deep learning architectures
such as Recurrent Neural Networks (RNNs) or Transformers, can be effectively
used for this task. These models are trained on large datasets of labeled text,
allowing them to learn complex patterns and nuances in language that are
indicative of sentiment. By feeding a new piece of text into a pretrained model,
it can predict the sentiment with a certain degree of accuracy, providing valuable
insights for applications like social media monitoring, customer feedback
analysis, movie reviews and market research.
Code:
import tf_keras as keras
import keras
import torch
from transformers import pipeline
sentiment_pipeline = pipeline("sentiment-analysis")
data = ["I love this movie", "This movie sucks!","This movie is damn good!","The movie
wasn't up to the mark","The movie could have been better!","That movie was awesome","I
found it very touching","It feels like movie of the year"]
result = sentiment_pipeline(data)
print(result)

OUTPUT
Date-10-10-24

Experiment-10
Aim: To Implement Principal Component Analysis on a dataset
Software Used: Pycharm
Theory: PCA (Principal Component Analysis) is a dimensionality reduction
technique widely used in machine learning. It transforms a large dataset of
interrelated variables into a smaller set of uncorrelated variables called principal
components. These components capture the most variance in the data, allowing
for efficient data representation and analysis. PCA is commonly used for tasks
like feature engineering, visualization, and noise reduction, making it a valuable
tool in various machine learning applications.
Code:
For Reduction to 1-Dimension
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

#Load the Iris dataset


iris = load_iris()
X = iris.data
y = iris.target

#Standardize the data


scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA(n_components=1)
X_pca_1d = pca.fit_transform(X_scaled)

plt.figure(figsize=(8, 4))
plt.scatter(X_pca_1d, np.zeros_like(X_pca_1d), c=y, cmap='viridis', edgecolor='k', s=100)
plt.xlabel('First Principal Component')
plt.title('PCA of Iris Dataset (1 Dimension)')
plt.yticks([])
plt.show()
explained_variance = pca.explained_variance_ratio_
print(f'Explained variance by the first principal component: {explained_variance[0]}')

OUTPUT
For Reduction to 2-Dimensions
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

iris = load_iris()
X = iris.data
y = iris.target

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

plt.figure(figsize=(8, 6))
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, cmap='viridis', edgecolor='k', s=100)
plt.xlabel('First Principal Component')
plt.ylabel('Second Principal Component')
plt.title('PCA of Iris Dataset')
plt.colorbar(label='Iris Species')
plt.show()

explained_variance = pca.explained_variance_ratio_
print(f'Explained variance by each principal component: {explained_variance}')

OUTPUT

You might also like