UNIT-4 (1)
UNIT-4 (1)
1. How to reshape arrays in NumPy. What happens when you use the reshape() function with
a -1 parameter?
In NumPy, the reshape() function is used to change the shape (or dimensions) of an array without
changing its data.
Syntax:
numpy.reshape(a, newshape)
a.reshape(newshape)
Example:
import numpy as np
a = np.array([1, 2, 3, 4, 5, 6])
print(b)
Output:
[[1 2 3]
[4 5 6]]
The -1 is a special placeholder in reshape() that tells NumPy to infer the correct dimension
automatically based on the original array's size.
📘 Example:
a = np.array([1, 2, 3, 4, 5, 6])
print(b)
Output:
[[1 2]
[3 4]
[5 6]
Inclusive? Excludes the end for slicing Includes an end for slicing
3. Explain the process of installing Python libraries using different methods. Compare pip,
conda, and manual installation, highlighting their advantages and limitations.
Python supports multiple ways to install libraries. The three main methods are:
Method Description
🔹 3. Manual Installation
Downloading the source code or a .whl (wheel) or .tar.gz file and installing manually.
Steps:
# Download package
pip install /path/to/package.whl
4. Compare and contrast the primary data structures in Pandas: Series and DataFrame
Dimensionality 1D 2D
Index type Single axis (index) Two axes (rows and columns)
Example:
import pandas as pd
df['Average'] = df.mean(axis=1)
print(df)
Output:
Student1 85 78 92 85.00
Student2 90 88 80 86.00
5. Explain the process of manipulating array shapes in NumPy. Discuss transpose operations,
reshaping, stacking, and splitting arrays with appropriate examples.
NumPy provides powerful tools to change the shape or structure of arrays for mathematical and data
operations.
🔹 1. Reshaping Arrays
Purpose: Change the shape (dimensions) of an array without changing the data.
import numpy as np
a = np.arange(6) # [0, 1, 2, 3, 4, 5]
print(b)
Output:
[[0 1 2]
[3 4 5]]
🔹 2. Transposing Arrays
Purpose: Flip rows and columns (useful in linear algebra and image processing).
a = np.array([[1, 2], [3, 4]])
print(a.T) # or np.transpose(a)
Output:
[[1 3]
[2 4]]
🔹 3. Stacking Arrays
Vertical Stack (vstack) – Stack arrays row-wise (like adding more rows):
b = np.array([[5, 6]])
print(np.vstack((a, b)))
c = np.array([[7], [8]])
print(np.hstack((a, c)))
🔹 4. Splitting Arrays
print(np.hsplit(a, 3))
print(np.vsplit(a, 2))
6. Explain the process of importing data from CSV files into Pandas DataFrames. Discuss
various parameters that can be used to handle different CSV formats, missing values, and
data types.
CSV (Comma-Separated Values) is a common format for datasets. Pandas makes it easy to import
them.
Basic import:
import pandas as pd
df = pd.read_csv("students.csv")
print(df.head())
Key parameters:
Parameter Purpose
Examples:
df = pd.read_csv("data.csv", sep=";")
df = pd.read_csv("data.csv", index_col="StudentID")
4. Read specific columns and skip rows:
Array slicing in NumPy works similarly to Python list slicing, but it's more powerful because it
supports multi-dimensional arrays.
🔹 Syntax:
array[start:stop:step]
✅ 1D Example:
import numpy as np
print(a[1:4])
2D Example:
b = np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
])
print(b[:2, :2])
Output:
[[1 2]
[4 5]
8. Write two methods to import data from a CSV file into a Pandas DataFrame
import pandas as pd
df = pd.read_csv("data.csv")
print(df.head())
This is useful when you want more control while reading the file manually.
import csv
import pandas as pd
9. Identify the Python library commonly used for solving differential equations numerically
import numpy as np
return -2*y
y0 = [1]
t_span = (0, 5)
plt.plot(solution.t, solution.y[0])
plt.xlabel("Time")
plt.ylabel("y(t)")
plt.grid()
plt.show()
Function: plot()
🔹 Basic Example:
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 8, 7]
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.grid(True)
plt.show()
Function Purpose
hist() Histograms
Function Purpose
Display images or
imshow()
2D arrays
Multiple plot
subplot()
layout
11. Describe the various ways to create NumPy arrays from Python lists, ranges, and using
built-in functions. Explain the significance of the dtype parameter when creating arrays.
NumPy arrays can be created in multiple ways, primarily from Python sequences, iterables, and built-
in NumPy functions. Here’s an explanation of each method, along with the role of the dtype
parameter:
You can convert Python lists (or nested lists) into NumPy arrays using numpy.array().
Example:
import numpy as np
list1 = [1, 2, 3, 4]
arr1 = np.array(list1)
print(arr1)
Output:
[1 2 3 4]
arr2 = np.array(list2)
print(arr2)
# Output:
[[1 2]
[3 4]]
B. Creating Arrays from Ranges
You can use Python’s range() function in combination with np.array(), or use NumPy’s own arange()
function which is more flexible.
print(arr3) # Output: [0 2 4 6 8]
print(arr4) # Output: [0 2 4 6 8]
The dtype parameter defines the data type of the elements in the array. This is important for:
Memory efficiency: Specifying float32 instead of the default float64 can save memory.
Precision control: Choose int32, float64, or complex128 depending on the required precision.
Example:
Output:
Supported dtypes include: int32, int64, float32, float64, bool, complex, str, object, etc
12. Explain the attributes and properties of Pandas Series with examples.
A Pandas Series is a one-dimensional labeled array capable of holding any data type (integers,
strings, floats, Python objects, etc.).
Creating a Series
import pandas as pd
s.values Returns the underlying NumPy array array([10, 20, 30, 40])
s.index Returns the index labels Index(['a', 'b', 'c', 'd'], dtype='object')
s.name Name of the Series (optional) Can be set via s.name = 'my_series'
Examples:
print(s.dtype) # int64
print(s.shape) # (4,)
Accessing Elements
By label: s['b'] → 20
By position: s[1] → 20
Vectorized Operations
Summary Statistics
13. Explain how NumPy and Pandas can be used together in a data analysis workflow.
NumPy and Pandas are two core libraries in Python's data analysis stack. They complement each
other in various ways. NumPy provides fast, efficient numerical computations, while Pandas builds on
NumPy by offering powerful, user-friendly data structures like Series and DataFrames.
import pandas as pd
df = pd.read_csv('data.csv')
Use Pandas for handling missing values, renaming columns, converting types, etc.
import numpy as np
values = df['column1'].to_numpy()
df['normalized'] = normalized
df['log_sales'] = np.log(df['sales'] + 1)
mean = np.mean(df['sales'])
summary = df.describe()
Libraries like Matplotlib and Seaborn can plot Pandas Series/DataFrames directly.
14. Demonstrate mathematical operations and statistical functions that can be performed on
Series objects. How do NaN values affect these operations?
Pandas Series supports vectorized operations and built-in statistical functions, making it ideal for
performing computations on data columns.
import pandas as pd
Function Description
s.var() Variance
print(s.describe())
NaN (Not a Number) values automatically get excluded in most statistical computations unless
explicitly handled.
print(s.sum()) # Output: 70
Function Use
15. How to concatenate NumPy arrays both horizontally and vertically. What happens when
the arrays have different shapes?
Concatenation in NumPy refers to joining multiple arrays along an axis. NumPy provides several
functions for concatenation, such as np.concatenate(), np.vstack(), and np.hstack().
Method 1: np.concatenate()
import numpy as np
# Horizontal concatenation
print(result)
Output:
[[1 2 5 6]
[3 4 7 8]]
Method 2: np.hstack()
Method 1: np.concatenate()
print(result)
Output:
[[1 2]
[3 4]
[5 6]
[7 8]]
Method 2: np.vstack()
If the arrays do not match in shape along the concatenation axis, NumPy will raise a ValueError.
Error:
ValueError: all the input arrays must have the same number of rows for axis=1
Error:
ValueError: all the input arrays must have the same number of columns for axis=0
To resolve mismatches:
a = a.reshape((3, 1))
print(result)
A Pandas DataFrame is a 2-dimensional labeled data structure with columns of potentially different
types — like an Excel spreadsheet or SQL table.
import pandas as pd
data = {
df = pd.DataFrame(data)
data = {
df = pd.DataFrame(data)
data = [
df = pd.DataFrame(data)
import numpy as np
CSV: pd.read_csv('file.csv')
Excel: pd.read_excel('file.xlsx')
df.dtypes Returns the data types of each column Name: object, Age: int64, ...
17. Discuss indexing, reindexing, and aligning Series objects of Pandas Series with examples
A. Indexing in Series
Series objects are like dictionaries; they map labels (indices) to data (values).
1. Positional Indexing
print(s[0]) # 10
2. Label-based Indexing
print(s['b']) # 20
3. Slicing
print(s[1:]) # Uses position
4. Boolean Indexing
B. Reindexing Series
Reindexing means changing the index of a Series, potentially introducing or removing data.
Example:
print(s2)
Output:
a 10.0
b 20.0
d NaN
✅ C. Aligning Series
Alignment happens automatically during arithmetic operations between Series with different
indexes.
Example:
print(s1 + s2)
Output:
a NaN
b 12.0
c 23.0
d NaN
s1.add(s2, fill_value=0)
a 1.0
b 12.0
c 23.0
d 30.0
18. Compare the performance of NumPy's statistical operations with equivalent operations in
pure Python. Include examples of calculating mean, median, standard deviation,
correlation, and other statistical measures.
NumPy offers fast, vectorized operations using compiled C code, while pure Python uses interpreted
loops, which are slower and more verbose. Let’s compare both in terms of performance, simplicity,
and readability.
✅ A. Setup
import numpy as np
import statistics
import time
np_data = np.array(data)
✅ B. Mean Calculation
Pure Python:
start = time.time()
end = time.time()
NumPy:
start = time.time()
mean_np = np.mean(np_data)
end = time.time()
C. Median Calculation
Pure Python:
median_py = statistics.median(data)
NumPy:
median_np = np.median(np_data)
🔸 NumPy uses quickselect under the hood, much faster than Python’s sort-based method.
D. Standard Deviation
Pure Python:
NumPy:
E. Correlation Coefficient
Pure Python:
# Manually compute the Pearson correlation coefficient
mean_x = sum(x)/len(x)
mean_y = sum(y)/len(y)
NumPy:
💡 NumPy is much faster, more accurate, and handles edge cases better.
F. Performance Summary
Fast
Median Slower (sorts)
(Quickselect)
G. Conclusion
Use NumPy for any statistical work on large or even moderately sized datasets.
19. Explain how to select, filter, and manipulate data using different indexing methods (loc,
iloc, at, iat) used in DataFrame creation in Pandas.
df = pd.DataFrame({
Access a single value using row and column labels (faster than loc[]).
✅ E. Comparison Summary
Named
loc Labels ✅ Yes Slow rows/colu
mns
Index-
iloc Positions ✅ Yes Medium based
access
Fast
access to
at Labels ❌ No ✅✅
one value
(label)
Fast
access to
iat Positions ❌ No ✅✅
one value
(pos)
✅ G. Updating Data