FIN_FODS
FIN_FODS
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
INSTITUTION
VISION AND MISSION
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
MISSION
Academic Excellence
Industry Readiness
Industry Colloboration
Quality Accrediation
Innovation Ecosystem
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
DEPARTMENT
VISION AND MISSION
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
VISION
Empower minds through technology, transforming possibilities
into realities embodies our commitment to leveraging the power of technology
to unlock the potential of individuals communities. We believe in empowering
minds by providing access to cutting-edge tools, knowledge, and resources that
enable individuals to innovate, create, and achieve their goals.
MISSION
Unleashing potential: Our educational approach is designed to uncover
and unleash these potential, providing students with the tools, resources
and opportunities needed to excel and succeed.
Inspiring innovation: By fostering a culture of innovation, we empower
students to become agents of change and develop groundbreaking
solutions to real-world problems
Driving excellence in education: Through rigorous academic standards,
personalized support, and a focus on continuous improvement, we strive
to equip our students with the knowledge, skills, and values needed to
excel in an ever-changing world
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
PROGRAM EDUCATIONAL
OBJECTIVES(PEOs),
PROGRAM SPECIFIC
OUTCOMES (PSOs) AND
PROGRAM OUTCOME(POs)
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
CLASS
TIMETABLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
INDIVIDUAL
TIMETABLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
COURSE
OUTCOMES ,COURSE
OBJECTIVES AND
SYLLABUS
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
COURSE OBJECTIVES:
To understand the data science fundamentals and process.
To learn to describe the data for the data science process.
To learn to describe the relationship between data.
To utilize the Python libraries for Data Wrangling.
To present and interpret data using visualization libraries in Python
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
SYLLABUS:
UNIT I INTRODUCTION 9
Data Science: Benefits and uses – facets of data - Data Science Process:
Overview – Defining research goals – Retrieving data – Data preparation -
Exploratory Data analysis – build the model– presenting findings and building
applications - Data Mining - Data Warehousing – Basic Statistical descriptions
of Data
UNIT II DESCRIBING DATA 9
Types of Data - Types of Variables -Describing Data with Tables and Graphs –
Describing Data with Averages - Describing Variability - Normal Distributions
and Standard (z) Scores
UNIT III DESCRIBING RELATIONSHIPS 9
Correlation –Scatter plots –correlation coefficient for quantitative data –
computational formula for correlation coefficient – Regression –regression line
–least squares regression line – Standard error of estimate – interpretation of r2
–multiple regression equations –regression towards the mean
UNIT IV PYTHON LIBRARIES FOR DATA WRANGLING 9
Basics of Numpy arrays –aggregations –computations on arrays –comparisons,
masks, boolean logic – fancy indexing – structured arrays – Data manipulation
with Pandas – data indexing andselection – operating on data – missing data –
Hierarchical indexing – combining datasets – aggregation and grouping – pivot
tables
UNIT V DATA VISUALIZATION 9
Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density
and contour plots – Histograms – legends – colors – subplots – text and
annotation – customization – three dimensional plotting - Geographic Data with
Basemap - Visualization with Seaborn
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
COURSE OUTCOMES
At the end of this course, the students will be able to:
CO1: Define the data science process
CO2: Understand different types of data description for data science process
CO3: Gain knowledge on relationships between data
CO4: Use the Python Libraries for Data Wrangling
CO5: Apply visualization Libraries in Python to interpret and explore data
TOTAL:45 PERIODS
TEXT BOOKS
1. David Cielen, Arno D. B. Meysman, and Mohamed Ali,
“Introducing Data Science”, Manning Publications, 2016. (Unit I)
2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition,
Wiley Publications, 2017. (Units II and III) 69
3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016.
(Units IV and V)
REFERENCES:
1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”,
Green Tea Press,2014
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
CO – PO MAPPING
STUDENT
NAME LIST
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
Reg No Name
513523104047 NADEESH A P
513523104048 NASREEN BANU M
513523104049 NAVINKUMARAN P
513523104050 NAVYA D
513523104051 NIRANJAN G
513523104052 NIWIN RAJ J
513523104053 POOMA R
513523104054 POORANI A
513523104055 PRATHAP T
513523104056 PRATHIPA K
513523104057 PREETHA ZEN S
513523104058 PRIYADHARSHINI A
513523104059 PRIYADHARSHINI R
513523104060 PRIYANKA S
513523104061 RAGHUL RAJ S
513523104062 RAKSHITHA S
513523104063 RASIGAPRIYA K
513523104064 ROOPAN K
513523104065 RUTHIRAN C
513523104067 SAKTHIVEL K
513523104068 SALEEM AHMED A
513523104069 SANGEETHA P
513523104070 SANJEEVAN SURYA KUMAR
513523104071 SATHISH KUMAR D
513523104072 SATHYA BAMA S P
513523104073 SHALINI V
513523104074 SHARANYA M
513523104075 SHREE DEVI V G
513523104076 SIVA KUMAR T
513523104077 SONIYA S
513523104078 SUBASHINI T
513523104079 SUGAVANAN A
513523104080 SUJAN U
513523104081 SULTHAN BABU E A
513523104082 TAMILSELVAN P
513523104083 THARUN KUMAR M
513523104084 THULASI DEVI S
513523104085 VARUN MOHAN M D
513523104086 VENKATESAN S
513523104087 VIDHYA J
513523104088 VIGNESHWARAN B
513523104089 YUVASHRI E
513523104090 YUVASHRI P
513523104301 DEVISRI S
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
DEPARTMENT
ACADEMIC CALENDAR
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
LESSON PLAN
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
FIRST SERIES
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
TIMETABLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
QUESTION PAPER
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
ANSWER KEY
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
PART-A
9.What is a 'z-Score'?
A z-score indicates how many standard deviations a data point is away from the
mean of a dataset. It is used to determine if a value is unusual in a given
distribution.
PART-B
The Data Science process is a systematic approach that involves several key
stages, each crucial for extracting meaningful insights from data. These stages
include:
problem or the business question. Clear objectives help guide the entire
analysis and ensure that the right questions are being addressed.
2. Retrieving Data:
In this stage, relevant data is collected from various sources, which may
include databases, APIs, spreadsheets, or external datasets. This data can
be structured (e.g., tables) or unstructured (e.g., text, images).
3. Data Preparation:
This stage involves cleaning the data to ensure that it is accurate,
consistent, and ready for analysis. It includes handling missing values,
correcting errors, transforming variables, and filtering irrelevant data.
4. Exploratory Data Analysis (EDA):
EDA involves the initial analysis of the data to summarize its main
characteristics. This stage typically uses visualizations and basic
statistical methods to understand the distribution, identify patterns, and
detect outliers or anomalies.
5. Building the Model:
After understanding the data, appropriate machine learning or statistical
models are selected. These models are then trained using the data to
predict outcomes or discover patterns. Common algorithms include
regression, classification, and clustering.
6. Presenting Findings and Building Applications:
Once the models are built, the findings are presented to stakeholders
using visualizations, reports, and dashboards. If necessary, the models are
deployed into applications or integrated into decision-making processes
to improve business operations.
Without proper data preparation, the accuracy and reliability of the models
would be compromised, leading to suboptimal results and misguided decisions.
13. What are the basic statistical descriptions of data? Explain the key
measures.
These statistical descriptions help summarize the data, providing insights into
its central tendency, variability, and overall distribution, making it easier to
understand and model.
Data Mining plays a crucial role in Data Science by applying algorithms and
techniques to explore and analyze large datasets for hidden patterns,
correlations, and trends. The role of Data Mining can be described as:
1. Pattern Recognition:
Data mining helps identify patterns or relationships that are not
immediately obvious. For example, it can identify customer behavior
trends in e-commerce or detect fraud patterns in financial transactions.
2. Predictive Modeling:
Using historical data, data mining can build predictive models. These
models forecast future events based on existing data, such as predicting
customer churn or sales trends.
3. Classification and Clustering:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
By applying data mining techniques, Data Science can extract valuable insights
from large datasets, improving the decision-making process and driving
business growth.
15. Explain the different types of data and their importance in data
analysis.
Understanding the type of data helps Data Scientists choose the correct
statistical techniques and ensures accurate and meaningful analysis. Each type
has specific tools and methods suited for its analysis, guiding the choice of
algorithms and models.
PART-C
Averages represent the central point around which data points are distributed.
The primary measures of central tendency are:
Mean: The mean is the arithmetic average of a dataset. It is calculated by
summing all data points and dividing by the total number of points. The
mean provides an overall idea of the dataset's central value but can be
affected by extreme values (outliers).
Mean(μ)=∑xin\text{Mean} (\mu) = \frac{\sum x_i}{n}Mean(μ)=n∑xi
where xix_ixi represents each data point and nnn is the total number of
data points.
Median: The median is the middle value of a dataset when arranged in
ascending or descending order. The median is less sensitive to outliers
than the mean and is particularly useful in skewed distributions.
Mode: The mode represents the most frequently occurring value in the
dataset. A dataset may have one mode (unimodal), multiple modes
(multimodal), or no mode at all if all values are unique.
2. Variability (Measures of Spread)
While averages provide information about the central location of the data,
variability describes the extent to which the data points spread out from the
center. The key measures of variability include:
Range: The range is the difference between the highest and lowest values
in the dataset.
Range=Maximum−Minimum\text{Range} = \text{Maximum} - \
text{Minimum}Range=Maximum−Minimum
Although easy to calculate, the range is highly sensitive to outliers and
may not be a reliable measure of spread in skewed datasets.
MARK
STATEMENT
&
RESULT ANALYSIS
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
MARK STATEMENT
513523104047 NADEESH A P 71
513523104048 NASREEN BANU M 0
513523104049 NAVINKUMARAN P 13
513523104050 NAVYA D 70
513523104051 NIRANJAN G 30
513523104052 NIWIN RAJ J 0
513523104053 POOMA R 75
513523104054 POORANI A 79
513523104055 PRATHAP T 83
513523104056 PRATHIPA K 92
513523104057 PREETHA ZEN S 76
513523104058 PRIYADHARSHINI A 71
513523104059 PRIYADHARSHINI R 81
513523104060 PRIYANKA S 68
513523104061 RAGHUL RAJ S 0
513523104062 RAKSHITHA S 62
513523104063 RASIGAPRIYA K 93
513523104064 ROOPAN K 0
513523104065 RUTHIRAN C 90
513523104066 SAARIYA KOWNEN K 94
513523104067 SAKTHIVEL K 83
513523104068 SALEEM AHMED A 90
513523104069 SANGEETHA P 62
513523104070 SANJEEVAN SURYA KUMAR 60
513523104071 SATHISH KUMAR D 13
513523104072 SATHYA BAMA S P 69
513523104073 SHALINI V 53
513523104074 SHARANYA M 86
513523104075 SHREE DEVI V G 66
513523104076 SIVA KUMAR T 38
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
513523104077 SONIYA S 85
513523104078 SUBASHINI T 80
513523104079 SUGAVANAN A 50
513523104080 SUJAN U 64
513523104081 SULTHAN BABU E A 0
513523104082 TAMILSELVAN P 58
513523104083 THARUN KUMAR M 59
513523104084 THULASI DEVI S 70
513523104085 VARUN MOHAN M D 41
513523104086 VENKATESAN S 2
513523104087 VIDHYA J 68
513523104088 VIGNESHWARAN B 30
513523104089 YUVASHRI E 72
513523104090 YUVASHRI P 64
513523104301 DEVISRI S 77
513523104302 DHANA SHREE S 71
513523104303 GOKUL M 71
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
RESULT ANALYSIS
NO. OF STUDENTS 47
PASS % 83%
FACULTY SIGN
HOD
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
SECOND SERIES
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
TIMETABLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
QUESTION PAPER
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
ANSWER KEY
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
PART-A
1. What is Correlation?
Correlation is a statistical measure that describes the strength and direction of
the relationship between two variables.
2. What is a Scatter Plot?
A scatter plot is a graphical representation of two variables, showing the
relationship between them using points on a Cartesian plane.
3. What is the correlation coefficient?
The correlation coefficient is a numerical measure that indicates the degree of
linear relationship between two variables, ranging from -1 to +1.
4. What is the computational formula for the correlation coefficient?
The formula for the correlation coefficient rrr is:
r=n(∑xy)−(∑x)(∑y)[n∑x2−(∑x)2][n∑y2−(∑y)2]r = \frac{n(\sum xy) - (\sum
x)(\sum y)}{\sqrt{[n \sum x^2 - (\sum x)^2][n \sum y^2 - (\sum
y)^2]}}r=[n∑x2−(∑x)2][n∑y2−(∑y)2]n(∑xy)−(∑x)(∑y)
5. What is Regression in statistics?
Regression is a statistical technique used to model the relationship between a
dependent variable and one or more independent variables.
6. What is the least squares regression line?
The least squares regression line minimizes the sum of the squared differences
between the observed values and the predicted values from the regression line.
7. What is the standard error of estimate?
The standard error of estimate is a measure of the accuracy of predictions made
by the regression line, representing the typical distance between the observed
values and the regression line.
8. What does r2r^2r2 represent in regression analysis?
The r2r^2r2 (coefficient of determination) represents the proportion of the
variance in the dependent variable that is predictable from the independent
variable(s).
9. What is multiple regression?
Multiple regression is a statistical method used to model the relationship
between a dependent variable and two or more independent variables.
10. What is regression towards the mean?
Regression towards the mean refers to the phenomenon where extreme values
in a dataset tend to move closer to the average in subsequent measurements.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
PART-B
1. Explain the concept of correlation and the role of the correlation coefficient in
analyzing data.
The correlation coefficient is a numerical value that quantifies the degree and
direction of the relationship between two variables. The most common correlation
coefficient is the Pearson correlation coefficient, denoted by rrr. It can take values
between -1 and +1:
r=1r = 1r=1: Perfect positive correlation (as one variable increases, the other
increases in exact proportion).
r=−1r = -1r=−1: Perfect negative correlation (as one variable increases, the
other decreases in exact proportion).
r=0r = 0r=0: No linear correlation (no predictable relationship).
Between 0 and ±1: A value closer to 1 or -1 indicates a stronger linear
relationship between the two variables.
The role of the correlation coefficient in data analysis is crucial for identifying
relationships between variables, allowing for prediction, modeling, and hypothesis
testing. A high positive or negative correlation can provide insight into potential
causal relationships, while a correlation near 0 suggests little or no linear relationship.
2. Describe how regression analysis works and the importance of the least
squares regression line in predicting outcomes.
where:
The least squares regression line is the line that minimizes the sum of the squared
differences (errors) between the observed data points and the predicted values from
the regression model. This method ensures the best fit by reducing the impact of large
errors and producing the most accurate predictions possible.
The importance of the least squares regression line lies in its ability to predict
outcomes based on the data. By understanding the relationship between variables,
businesses, researchers, and analysts can forecast future trends, optimize operations,
and make data-driven decisions.
3. Discuss the concept of multiple regression and how it differs from simple linear
regression.
Multiple regression is an extension of simple linear regression where more than one
independent variable is used to predict the dependent variable. It allows for a more
complex model that can capture the influence of multiple factors simultaneously. The
equation for multiple regression is:
where:
Key advantage of multiple regression is that it accounts for the influence of multiple
variables simultaneously, making the predictions more accurate when several factors
influence the outcome. For example, predicting a house price might involve variables
like size, location, and number of rooms, all of which are captured in a multiple
regression model.
Significance: r2r^2r2 is important for assessing how well the regression model
performs. A high r2r^2r2 indicates that the model does a good job of predicting
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
the dependent variable, while a low r2r^2r2 suggests that the model may need
improvement.
5. Describe the process of using Python libraries such as Numpy and Pandas for
data wrangling and manipulation.
Data wrangling refers to the process of cleaning, transforming, and preparing raw
data for analysis. Python libraries like Numpy and Pandas are essential tools for this
task:
1. Numpy:
Numpy is a powerful library for numerical computing. It provides support for:
o Arrays: Multi-dimensional arrays that allow efficient storage and
manipulation of large datasets.
o Aggregations: Functions for calculating sums, means, standard
deviations, and other statistics across arrays.
o Computations: Mathematical operations like addition, multiplication,
matrix operations, and more are performed efficiently using Numpy
arrays.
o Boolean Logic: Numpy supports logical operations on arrays, allowing
for element-wise comparisons, filtering, and masking of data.
2. Pandas:
Pandas is a library built for data manipulation and analysis. It provides two
primary data structures:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
These libraries provide flexible, powerful tools for data manipulation, enabling
efficient data wrangling, which is essential for cleaning, preparing, and
transforming data into a usable format for analysis.
PART-C
Effective data visualization helps both analysts and non-technical stakeholders quickly
comprehend relationships in the data. In this process, Python libraries such as
Matplotlib, Seaborn, and other visualization tools like Plotly and Bokeh play a
critical role in creating various kinds of plots and interactive visualizations.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
Matplotlib is one of the most popular Python libraries for creating static, animated,
and interactive visualizations. It provides fine-grained control over plot elements and
is highly customizable. It is often considered the backbone of most Python-based
plotting libraries.
Line Plots: Matplotlib is widely used for creating line plots, useful for
visualizing trends over time or continuous variables. For example, plotting
stock prices over several months.
import matplotlib.pyplot as plt
plt.plot(x, y)
plt.title("Line Plot Example")
plt.xlabel("X-axis Label")
plt.ylabel("Y-axis Label")
plt.show()
Bar Charts: Bar charts are often used to compare quantities across different
categories. In Matplotlib, bar charts are straightforward to create, helping to
compare data distributions across categories or groups.
plt.bar(categories, values)
plt.title("Bar Chart Example")
plt.show()
Regression Plots: Seaborn provides the ability to easily create regression plots
that fit a line (or curve) to the data, helping visualize linear relationships
between variables.
sns.regplot(x="x_var", y="y_var", data=df)
Faceting: Seaborn allows for easy faceting, which means creating multiple
plots based on a categorical variable. This is useful for analyzing data subsets
across different categories.
sns.FacetGrid(df, col="category").map(plt.scatter, "x", "y")
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
While Matplotlib and Seaborn focus on static plots, Plotly and Bokeh are excellent
tools for creating interactive visualizations that allow users to explore the data
dynamically.
Example:
import plotly.express as px
fig = px.scatter(df, x="x_var", y="y_var", color="category")
fig.show()
Example:
tools to consider. These libraries allow for a more engaging experience when working
with complex datasets or when presenting findings to stakeholders.
For visualizing geospatial data, Python provides libraries like Basemap (an
extension of Matplotlib) and GeoPandas.
Basemap: This library allows for the creation of maps and geospatial
visualizations. It is commonly used for plotting geographic data on various
types of projections (e.g., Mercator, Lambert).
Example:
Example:
While creating effective visualizations, it’s important to follow best practices for
making charts clear and interpretable:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
Titles and Labels: Every plot should have a clear title and labeled axes.
Legends: Legends should be used to differentiate data series or categories.
Color Schemes: Use appropriate color schemes to enhance readability and not
mislead interpretation.
Gridlines: Gridlines or background lines can help in reading the values more
accurately.
Annotations: Use text annotations to highlight important data points or trends.
Conclusion: A good plot is one that tells a clear story and helps the audience
understand the underlying patterns in the data.
Conclusion:
In summary, data visualization is a crucial step in the data analysis pipeline, enabling
analysts to communicate insights more effectively. Libraries such as Matplotlib
provide the foundational tools for static plots, while Seaborn enhances the aesthetics
and ease of use for statistical plots. For interactive visualizations, Plotly and Bokeh
are excellent options, and Basemap and GeoPandas serve the specialized needs of
geospatial data. By using these tools effectively and following best practices in
visualization, analysts can create impactful and insightful visualizations that facilitate
data understanding and decision-making
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
MARKS
STATEMENT&
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
RESULT ANALYSIS
MARK STATEMENT
IAT 2
Reg No Name
MARKS
513523104047 NADEESH A P 63
513523104048 NASREEN BANU M 18
513523104049 NAVINKUMARAN P 14
513523104050 NAVYA D 0
513523104051 NIRANJAN G 14
513523104052 NIWIN RAJ J 41
513523104053 POOMA R 65
513523104054 POORANI A 72
513523104055 PRATHAP T 70
513523104056 PRATHIPA K 0
513523104057 PREETHA ZEN S 79
513523104058 PRIYADHARSHINI A 80
513523104059 PRIYADHARSHINI R 76
513523104060 PRIYANKA S 0
513523104061 RAGHUL RAJ S 10
513523104062 RAKSHITHA S 63
513523104063 RASIGAPRIYA K 94
513523104064 ROOPAN K 3
513523104065 RUTHIRAN C 62
513523104066 SAARIYA KOWNEN K 94
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
513523104067 SAKTHIVEL K 58
513523104068 SALEEM AHMED A 92
513523104069 SANGEETHA P 66
513523104070 SANJEEVAN SURYA KUMAR 0
513523104071 SATHISH KUMAR D 15
513523104072 SATHYA BAMA S P 84
513523104073 SHALINI V 50
513523104074 SHARANYA M 90
513523104075 SHREE DEVI V G 77
513523104076 SIVA KUMAR T 67
513523104077 SONIYA S 95
513523104078 SUBASHINI T 75
513523104079 SUGAVANAN A 66
513523104080 SUJAN U 73
513523104081 SULTHAN BABU E A 52
513523104082 TAMILSELVAN P 65
513523104083 THARUN KUMAR M 68
513523104084 THULASI DEVI S 70
513523104085 VARUN MOHAN M D 39
513523104086 VENKATESAN S 15
513523104087 VIDHYA J 81
513523104088 VIGNESHWARAN B 34
513523104089 YUVASHRI E 82
513523104090 YUVASHRI P 81
513523104301 DEVISRI S 62
513523104302 DHANA SHREE S 52
513523104303 GOKUL M 63
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
RESULT ANALYSIS
NO. OF STUDENTS 47
PASS % 79%
FACULTY SIGN
HOD
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
PART-B
1. Discuss the various facets of data in Data Science and their significance in
the analysis process.
2. Explain the process of Data Science, including data retrieval, preparation,
and exploratory data analysis.
3. Describe the different types of data and variables used in data science, and
explain how data can be described using averages and measures of
variability.
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
4. Explain the concept of correlation and regression, and discuss how they are
used to describe relationships between variables in data analysis.
5. Discuss how data manipulation is performed using Pandas, including
indexing, selection, handling missing data, and combining datasets
PART-C
1.Explain the complete Data Science process, from defining research goals to
building applications, including steps like data retrieval, preparation,
exploratory data analysis, and model building
ANSWER KEY
PART-A
Exploratory Data Analysis (EDA) is a critical step in the data analysis process
where data scientists analyze and summarize the main characteristics of a
dataset, often with visual methods. The goal of EDA is to:
The correlation coefficient measures the strength and direction of the linear
relationship between two variables. It is a value between -1 and 1:
An r2r^2r2 value of 1 means that the model explains all the variance in
the data.
An r2r^2r2 value of 0 means that the model does not explain any of the
variance.
A line plot (or line graph) is a type of chart used to visualize data points
connected by straight lines. It is typically used to show trends over time (time
series data) or the relationship between two continuous variables. Line plots are
useful for:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
PART-B
1. Discuss the various facets of data in Data Science and their significance
in the analysis process.
In Data Science, data is central to all analyses, and it can be classified into
several facets based on its characteristics. Understanding these facets is crucial
as they determine how we process and interpret the data. The various facets of
data include:
Types of Data:
o Structured Data: This is data that is organized in a fixed schema,
typically in tables (rows and columns), such as databases. It can be
easily analyzed using traditional data processing tools.
o Unstructured Data: This is data that does not have a predefined
structure, like text documents, images, videos, and social media
posts. Unstructured data often requires more complex methods
such as Natural Language Processing (NLP) or deep learning
models.
o Semi-Structured Data: This data has some organizational
structure but does not fit neatly into a traditional database, such as
JSON or XML files.
o Continuous Data: Data that can take any value within a given
range (e.g., height, weight).
o Discrete Data: Data that can only take specific values, typically
integers (e.g., the number of people in a room).
Significance in Analysis:
The first step in the data science process is to clearly define the problem
or question to be addressed. Research goals should be specific,
measurable, and aligned with business objectives. For example, if the
goal is to predict customer churn, the analysis should focus on
understanding the factors that influence churn.
2. Retrieving Data:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
3. Data Preparation:
Significance:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
3. Describe the different types of data and variables used in data science,
and explain how data can be described using averages and measures of
variability.
3. Binary Data: A special case of categorical data that has only two
possible outcomes (e.g., True/False, Yes/No).
2. Measures of Variability:
o Range: The difference between the maximum and minimum
values in a dataset. It gives a basic sense of the spread but is
sensitive to outliers.
o Variance: A measure of how spread out the values are around the
mean. It is calculated as the average of squared differences from
the mean.
o Standard Deviation: The square root of the variance, giving a
measure of spread in the same units as the data. A larger standard
deviation indicates more variability.
o Interquartile Range (IQR): The range between the 1st and 3rd
quartiles (Q1 and Q3), representing the middle 50% of the data. It
is useful for identifying outliers.
Significance:
4. Explain the concept of correlation and regression, and discuss how they
are used to describe relationships between variables in data analysis.
Correlation:
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
Regression:
PART-C
1. Indexing in Pandas
Default Index: By default, rows are labeled with integer indices starting
from 0, which is the most common way of identifying rows. Each column
can be accessed by its name, which can be thought of as the label for that
column.
Custom Index: Pandas allows the use of custom indices for rows (e.g.,
using a column as an index). Custom indices make the dataset more
intuitive, especially when dealing with time-series data or datasets where
rows have meaningful identifiers.
Label-based Indexing: Pandas provides .loc[], which allows you to
select rows and columns using labels. This is particularly useful when the
dataset has non-integer indices (e.g., strings or dates).
Position-based Indexing: Using .iloc[], you can access data by its
integer position. This is similar to traditional arrays, where data is
accessed by its index position.
Significance:
2. Selection in Pandas
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
Significance:
Significance:
Often in data analysis, data is spread across multiple sources or tables that need
to be combined to form a comprehensive dataset. Pandas provides a range of
tools for merging, joining, and concatenating datasets.
Significance:
MARK
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
STATEMENT
&
RESULT ANALYSIS
MARK STATEMENT
MODELEXAM
Reg No Name
MARKS
513523104047 NADEESH A P 78
513523104048 NASREEN BANU M AB
513523104049 NAVINKUMARAN P 22
513523104050 NAVYA D 79
513523104051 NIRANJAN G 08
513523104052 NIWIN RAJ J 18
513523104053 POOMA R 79
513523104054 POORANI A AB
513523104055 PRATHAP T 62
513523104056 PRATHIPA K 95
513523104057 PREETHA ZEN S 85
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
513523104058 PRIYADHARSHINI A 66
513523104059 PRIYADHARSHINI R 68
513523104060 PRIYANKA S AB
513523104061 RAGHUL RAJ S AB
513523104062 RAKSHITHA S AB
513523104063 RASIGAPRIYA K 90
513523104064 ROOPAN K AB
513523104065 RUTHIRAN C 80
513523104066 SAARIYA KOWNEN K 93
513523104067 SAKTHIVEL K 66
513523104068 SALEEM AHMED A 94
513523104069 SANGEETHA P AB
513523104070 SANJEEVAN SURYA KUMAR 24
513523104071 SATHISH KUMAR D AB
513523104072 SATHYA BAMA S P 73
513523104073 SHALINI V 61
513523104074 SHARANYA M 87
513523104075 SHREE DEVI V G AB
513523104076 SIVA KUMAR T 66
513523104077 SONIYA S 93
513523104078 SUBASHINI T 72
513523104079 SUGAVANAN A 11
513523104080 SUJAN U 09
513523104081 SULTHAN BABU E A 56
513523104082 TAMILSELVAN P 62
513523104083 THARUN KUMAR M 55
513523104084 THULASI DEVI S AB
513523104085 VARUN MOHAN M D 42
513523104086 VENKATESAN S 03
513523104087 VIDHYA J 83
513523104088 VIGNESHWARAN B 15
513523104089 YUVASHRI E 69
513523104090 YUVASHRI P 73
513523104301 DEVISRI S AB
513523104302 DHANA SHREE S AB
513523104303 GOKUL M 75
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
RESULT ANALYSIS
NO. OF STUDENTS 47
PASS % 74%
FACULTY SIGN
HOD
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
SLOW
LEARNERS
IDENTIFICATION
SLOW LEARNERS
Reg No Name
513523104064 ROOPAN K
513523104086 VENKATESAN S
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
513523104049 NAVINKUMARAN P
513523104051 NIRANJAN G
513523104088 VIGNESHWARAN B
ATTENDANCE
FOR REMEDIAL
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
CLASS
Attendance
Reg No Name
MARK STATEMENT
SHOWING PROGRESSION
OF SLOW LEARNERS
Marks
Reg No Name
ASSIGNMENT
TOPICS
ASSIGNMENT TOPICS
SAMPLE
ASSIGNMENT
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
ASSIGNMENT
MARKS
STATEMENT
513523104087 VIDHYA J 9 9 9 9
513523104088 VIGNESHWARAN B 8 8 9 8
513523104089 YUVASHRI E 8 8 8 8
513523104090 YUVASHRI P 9 9 9 9
513523104301 DEVISRI S 9 8 8 8
513523104302 DHANA SHREE S 9 9 9 9
513523104303 GOKUL M 9 8 8 8
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
LOG NOTEBOOK
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
CO ATTAINMENT
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
LECTURE NOTES
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
ASSESSMENT EXAMINATON
QUESTION PAPERS WITH
SCHEME AND SAMPLE
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
HAND OUTS
An ISO 9001:2015 Certified
Institution,
Approved by
AICTE, New Delhi
& Affiliated to
Anna University,Chennai
TIMETABLE FOR
REMEDIAL CLASS