Data Analysis and Visualization-Theory -R22A
Data Analysis and Visualization-Theory -R22A
22CSCxx
DATA ANALYSIS AND VISUALIZATION
Instruction 3
Hours per week
Duration of SEE 3 Hours
SEE 60 Marks
CIE 40 Marks
Credits 3
Course Outcomes: Upon completion of this course, students will be able to:
1. Create, manipulate, and analyze numerical data using NumPy arrays and associated functions.
2. Perform various preprocessing operations on datasets using Pandas Series and DataFrame objects.
3. Combine and manipulating complex datasets using a variety of Pandas techniques, including
concatenation, merging, grouping, aggregation, and time series analysis,
4. Apply inferential statistics to analyze data, draw valid conclusions about populations, based on hypothesis
testing, confidence intervals, and correlation analysis.
5. Create and interpret different types of data visualizations using Matplotlib and Seaborn
UNIT - I
Introduction to Numpy: Data types in Python - Fixed type arrays, creating arrays, array indexing, array slicing,
reshaping arrays, array concatenation and splitting, Universal Functions, Aggregations, Broadcasting rules,
Comparisons, Boolean Arrays, Masks Fancy Indexing, Fast Sorting using np.sort and np.argsort, partial sorting
Creating Structured Arrays, Compound types and Record Arrays.
UNIT - II
Introduction to Pandas: Series Object, DataFrame Object, Data Indexing and Selecting for Series and
DataFrames, Universal Functions for Index Preservation, Index Alignment and Operations between Series and
DataFrames, Handling missing data, operating on Null values, Hierarchical Indexing.
UNIT - III
Combining Datasets: Concat, Append, Merge and Joins, Aggregation and Grouping, Pivot Tables, Vectorized
String Operations, High-Performance functions - query() and eval()
UNIT - IV
Time Series : Date and Time Data Types and Tools ,Time Series Basics , Date Ranges, Frequencies, and
Shifting ,Time Zone Handling , Time Zone Localization and Conversion , Operations with Time Zone-Aware
Timestamp Objects , Operations Between Different Time Zones ,Periods and Period Arithmetic ,Resampling
and Frequency Conversion , Moving Window Functions.
UNIT - V
Visualization with Matplotlib : Simple Line plots, Scatter plots, Visualizing errors, Density and Contour plots,
Histograms, Binnings, Multiple subplots, Three-dimensional plotting with Matplotlib, Geographic data with
Basemap, Visualization with Seaborn.
Text Books:
1. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly Media, 2016.
2. Wes McKinney, “Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter”, 3rd
Edition, 2022
Suggested Reading:
1. Samir Madhavan, “Mastering Python for Data Science”, Packt Publishing, 2015
Online Resources:
1. https://numpy.org/doc/stable/user/index.html
2. https://pandas.pydata.org/
3. https://matplotlib.org/
4. https://seaborn.pydata.org/tutorial.html
5. https://www.coursera.org/learn/data-analysis-with-python