0% found this document useful (0 votes)
3 views

Practical_1

This document provides a comprehensive guide on setting up a data analysis environment using NumPy and Pandas. It covers creating and manipulating arrays, DataFrames, and performing statistical operations, along with data cleaning and visualization techniques. Additionally, it includes practical examples for real-world data analysis and integration of NumPy with Pandas.

Uploaded by

2203031050417
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Practical_1

This document provides a comprehensive guide on setting up a data analysis environment using NumPy and Pandas. It covers creating and manipulating arrays, DataFrames, and performing statistical operations, along with data cleaning and visualization techniques. Additionally, it includes practical examples for real-world data analysis and integration of NumPy with Pandas.

Uploaded by

2203031050417
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

Setting Up the Environment

First, ensure you have the necessary libraries installed:

pip install numpy pandas scipy

2. Importing Libraries

Start by importing the required libraries:

import numpy as np
import pandas as pd
from statistics import mean, median, stdev

3. NumPy Operations

3.1 Creating and Manipulating Arrays

# Create a 1D array
array_1d = np.array([10, 20, 30, 40, 50])

# Create a 2D array
array_2d = np.array([[1, 2], [3, 4], [5, 6]])

# Perform mathematical operations


array_sum = array_1d + 5 # Add 5 to each element
array_mean = np.mean(array_1d) # Compute the mean
array_std = np.std(array_1d) # Compute the standard deviation

3.2 Useful NumPy Functions

# Generate a range of numbers


range_array = np.arange(1, 10, 2)

# Generate random numbers


random_array = np.random.rand(3, 3)
# Reshape arrays
reshaped = random_array.reshape(1, 9)

4. Pandas Operations

4.1 Creating DataFrames and Series

# Create a Series
series = pd.Series([10, 20, 30, 40, 50], name="Scores")

# Create a DataFrame
data = {
"Name": ["Alice", "Bob", "Charlie", "David", "Eva"],
"Age": [25, 30, 35, 40, 45],
"Score": [85, 90, 78, 88, 76],
}
df = pd.DataFrame(data)

4.2 Inspecting Data

# View first few rows


print(df.head())

# Summary of the data


print(df.describe())

4.3 Filtering and Sorting

# Filter rows where Score > 80


filtered_df = df[df["Score"] > 80]

# Sort by Age
sorted_df = df.sort_values(by="Age", ascending=False)

4.4 Data Manipulation


# Add a new column
df["Bonus"] = df["Score"] * 0.1

# Modify existing data


df.loc[df["Name"] == "Alice", "Score"] = 95

# Drop a column
df.drop(columns=["Bonus"], inplace=True)

4.5 Handling Missing Values

# Simulate missing data


df.loc[2, "Score"] = None

# Fill missing values


df["Score"].fillna(df["Score"].mean(), inplace=True)

# Drop rows with missing data


df.dropna(inplace=True)

5. Statistics Library Operations

The statistics library is useful for basic statistical operations:

# Calculate mean, median, and standard deviation


scores = df["Score"].tolist()
print("Mean:", mean(scores))
print("Median:", median(scores))
print("Standard Deviation:", stdev(scores))

6. Integrating NumPy with Pandas

You can use NumPy functions within Pandas:


# Calculate z-score
df["Z-Score"] = (df["Score"] - np.mean(df["Score"])) /
np.std(df["Score"])

7. Real-World Data Analysis Example

7.1 Loading Data

# Read a CSV file


data = pd.read_csv("data.csv")

# Display basic information


print(data.info())

7.2 Cleaning Data

# Drop duplicates
data.drop_duplicates(inplace=True)

# Handle missing values


data.fillna(data.mean(), inplace=True)

7.3 Aggregation

# Group by a column and calculate mean


grouped = data.groupby("Category")["Value"].mean()
print(grouped)

7.4 Exporting Data

# Save the cleaned data


data.to_csv("cleaned_data.csv", index=False)
8. Visualization

Pandas integrates with Matplotlib for basic visualization:

import matplotlib.pyplot as plt

# Plot a histogram
df["Score"].hist()
plt.title("Score Distribution")
plt.xlabel("Score")
plt.ylabel("Frequency")
plt.show()

# Plot a bar chart


df.plot(x="Name", y="Score", kind="bar", title="Scores by Name")
plt.show()

You might also like