0% found this document useful (0 votes)

5 views44 pages

PythonDASE_2025 Version1 (1)

The document discusses the integration of data analytics and machine learning into software engineering, highlighting their benefits such as improved decision-making, optimized efficiency, and enhanced user experience. It also covers essential Python libraries for data analytics, including NumPy, Pandas, and Matplotlib, along with practical examples of data visualization techniques. Additionally, it introduces social network analysis using the NetworkX library, emphasizing its applications in understanding relationships and structures within data.

Uploaded by

nirthisingh58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views44 pages

PythonDASE_2025 Version1 (1)

Uploaded by

nirthisingh58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

DATA ANALYTICS WITH PYTHON

HTTPS://MORIOH.COM/P/0BC57432AB32

Mark Twain said that the secret of getting ahead is getting started.
Programming can seem daunting for beginners, but the best way to
get started is to dive right in and start writing code.
DATA ANALYTICS IN SE

• The integration of data analytics into software engineering practices is paving the way for the future of this field.
• Data analytics refers to the process of examining large sets of data to uncover patterns, correlations, and insights that can be used to
make data-driven decisions. Incorporating data analytics into software engineering processes provides numerous advantages and
benefits.
• Improved decision-making: By analyzing data, software engineers can make more informed decisions based on factual evidence
rather than relying on intuition alone. This leads to better software development strategies and more accurate predictions of user
needs.
• Optimized efficiency: Data analytics helps identify bottlenecks and inefficiencies in the software development lifecycle, allowing
engineers to optimize processes, reduce errors, and deliver high-quality software in a shorter time frame.
• Enhanced user experience: By analyzing user data, software engineers can gain insights into user behavior, preferences, and pain
points. This information can be used to improve user interfaces, personalize experiences, and create tailored solutions.
• Early detection of issues: Data analytics enables software engineers to proactively identify and address potential issues before they
escalate. By monitoring and analyzing software performance metrics, engineers can spot anomalies and implement preventive
measures.
DA AND ML

• One of the key drivers behind the future of software engineering is machine learning. Machine
learning algorithms can analyze vast amounts of data, recognize patterns, and make accurate
predictions or decisions based on this information. Incorporating machine learning into software
engineering processes brings numerous benefits.
• Automated testing: Machine learning algorithms can be trained to analyze code and
automatically detect bugs, reducing the need for manual testing and increasing efficiency.
• Intelligent debugging: By leveraging machine learning, software engineers can build models
that identify recurring errors patterns and suggest potential fixes, making the debugging process
faster and more effective.
• Code optimization: Machine learning algorithms can analyze code repositories and identify
patterns of efficient code, helping software engineers optimize their codebase for improved
performance.
• Personalized software: Machine learning can analyze user data and preferences to create
personalized software experiences, tailored to individual needs.
DA AND SOFTWARE DEV

• What is Data Analytics in Software Development?

• Data analytics in software development involves the collection, analysis, and interpretation of data generated
during the development process. This data can come from various sources, such as user feedback, error logs,
performance metrics, and customer usage patterns. By leveraging data analytics, developers can identify trends,
detect potential issues, and optimize their software development processes.
• Leverage User Feedback: Collect and analyze feedback from users to identify common issues and areas for
improvement.
• Monitor Error Logs: Continuously monitor error logs to detect and resolve software defects, ensuring better
software quality.
• Utilize Performance Metrics: Track performance metrics, such as response times and throughput, to identify
bottlenecks and optimize software performance.
• Explore Customer Usage Patterns: Analyze customer usage patterns to understand how users interact with
the software and gain insights for user-centric design and development.
• Implement Continuous Integration and Delivery: Automate software testing and deployment processes
to ensure quicker feedback loops and faster software delivery.
PYTHON LIBRARIES AND PACKAGES FOR DATA SCIENTISTS
(THE 5 MOST IMPORTANT ONES)

• Numpy
• NumPy is a Python package. It stands for ‘Numerical Python’. It is a library consisting of multidimensional
array objects and a collection of routines for processing of array.
• Pandas - Python’s popular data analysis library, pandas, provides several different options for visualizing
your data with .plot(). Even if you’re at the beginning of your pandas journey, you’ll soon be creating basic
plots that will yield valuable insights into your data.
• Arranges data into a 2-d table – similar to a database; however, the Pandas library calls it a dataframe
(df as an abbreviation-to plot – df.plot()); It was created for data analysis, data cleaning, data handling and
data discovery…this is one of the most common libraries used for data analytics and data
visualisations…as well as machine learning algorithms…open source community intended for pandas to
be the most powerful library for data analytics!.
• import pandas as pd
• Matplotlib
• This is another popular data visualisation library…Data visualization helps you to better understand your
data, discover things that you wouldn’t discover in raw format and communicate your findings more
efficiently to others…Matplotlib is the most famous and commonly used plotting library in Python
• import matplotlib.pyplot as plt
USING MATPLOTLIB

• A Quick plot
• from matplotlib import pyplot as plt
• x=[1,2,3,4]
• y=[1,4,9,16]
• plt.plot(x,y)
• #plt.show() in VS
DOING THE SAME THING WITH PANDA

• import pandas as pd
• x=[1,2,3,4]
• y=[1,4,9,16]
• df=pd.DataFrame(x,y)
• df.plot() #plt.show() in VS -
So what happened?

The DataFrame has an automatic index column that is used to

plot against the first series – which is the x values!!
OR – YOU CAN GIVE NAMES TO THE
DATA SERIES
• import matplotlib.pyplot as plt
• import pandas as pd
• x1=[1,2,3,4]
• y1=[1,4,9,16]
• df=pd.DataFrame({'x_series':x1,'y_series':y1})
• df.plot(kind='line',x='x_series',y='y_series' ,color='red')
• plt.show()
MATPLOTLIB

• A graph or chart is simply a visual representation of

numeric data. MatPlotLib makes a large number of
graph and chart types available to you.
SIMPLE LINES WITH MATPLOTLIB

• import matplotlib.pyplot as plt

• x1=[1,2,3,4]
• y1=[1,4,9,16]
• x2 = [1,2,3, 4]
• y2 = [4,7,10,13]
• # plotting the line 2 points
• plt.plot(x1, y1, label = "line 1")
• plt.plot(x2, y2, label = "line 2")
• plt.legend()
• plt.show()
ON GOOGLE COLABS

• https://colab.research.google.com/notebooks/welcome.ipynb#scrollTo=JKYK1Qh-b66n
PLOTTING OPTIONS

• # x axis values
• x = [1,2,3,4,5,6]
• # corresponding y axis values
• y = [2,4,1,5,2,6]
• # plotting the points
• plt.ylim(1,8)
• plt.xlim(1,8)
• plt.plot(x, y, color='green', linestyle='dashed', linewidth = 3,
• marker='o', markerfacecolor='blue', markersize=12)
• # setting x and y axis range
BAR PLOT

• x = [1, 2, 3, 4, 5]
• y = [10, 24, 36, 40, 5]
• plt.xlabel('Entries')
• plt.ylabel('Values')
• plt.bar(x,y, width = 0.8, color = ['red',
'green','blue'])
PIE PLOT

The colors parameter lets you choose custom

• import matplotlib.pyplot as plt colors for each pie wedge.You use the labels
• values = [5, 8, 9, 10, 4, 7] parameter to identify each wedge. In many cases,
you need to make one wedge stand out from the
• colors = ['b', 'g', 'r', 'c', 'm', 'y'] others, so you add the explode parameter with list
• labels = ['A', 'B', 'C', 'D', 'E', 'F'] of explode values. A value of 0 keeps the wedge in
place — any other value moves the wedge out
• explode = (0, 0.2, 0, 0, 0, 0) from the center of the pie.
• plt.pie(values, colors=colors, labels=labels,
• explode=explode, autopct='%1.1f%%',
• counterclock=False, shadow=True)
• plt.title('Values')
• plt.show()
HISTOGRAMS

• Histograms categorize data by breaking it into bins, where each bin

contains a subset of the data range.
• A histogram then displays the number of items in each bin so that
you can see the distribution of data and the progression of data from
bin to bin.
• In most cases, you see a curve of some type, such as a bell curve.
• The problem with doing a demo for histograms is the lack of
data…so you need to generate random data and then the example
that follows shows how to create a histogram with randomized data
NUMPY

• NumPy offers the random module to work with random numbers

• Generate a random integer from 0 to 100:
• import numpy as np
• x = np.random.randint(100)
• print(x)
• In NumPy we work with arrays, and you can use the two methods from the above
examples to make random arrays
• Generate a 1-D array containing 5 random integers from 0 to 100:
• import numpy as np
• x=np.random.randint(100, size=(5))
• print(x)
RANDN

• The np.random.randn() is a numpy library method

that returns a sample (or samples) from the
“standard normal” distribution.
• Randn assumes 0 as the central point
• All data generated have a
greater probability of being close
to the central point (0)
A HISTOGRAM WITH STANDARD DATA

• import numpy as np
• import matplotlib.pyplot as plt
• x = 20*np.random.randn(10000)
• plt.hist(x, 25, range=(-60, 60), histtype='stepfilled',
• align='mid', color='g', label='Test Data')
• plt.legend()
• plt.title('Step Filled Histogram')
• plt.show()
DATA MINING - NETWORK
RELATIONSHIPS
• A Network or Graph is a special representation of entities which have relationships among
themselves.
• It is made up of a collection of two generic objects — (1) node: which represents an entity, and
(2) edge: which represents the connection between any two nodes. In a complex network, we
also have attributes or features associated with each node and edge. F
• For example, a person represented as a node may have attributes like age, gender, salary, etc.
Similarly, an edge between two persons which represents ‘friend’ connection may have attributes
like friends_since, last_meeting, etc.
• Because of this complex nature, it becomes imperative that we present a network intuitively, such
that it showcases as much information as possible.
• To do so we first need to get acquainted with the different available tools, and that’s the topic of
this article i.e. to go through the different options which help us visualize a network
A GRAPH WITH NODES

• import matplotlib.pyplot as plt

• import networkx as nx
• G=nx.Graph()
• G.add_node("Sanjay")
• G.add_node("Deepak")
• G.add_node("Mpho")
• G.add_node("Sue")
• nx.draw(G, with_labels=True)
• plt.show()
NODES AND EDGES

• Nodes represent the individuals in a network,

while edges constitute the relationships between the
individuals.
NODES AND EDGES

• Networks are described by two sets of items, import matplotlib.pyplot as plt

which form a “network”. import networkx as nx
• Nodes G=nx.Graph()
G.add_node("Sanjay")
• Edges G.add_node("Deepak")
• In mathematical terms, this is a graph. G.add_node("Mpho")
• Edges can be added separately or as a list G.add_node("Sue")
G.add_edge("Sanjay", "Deepak")
G.add_edge("Sanjay", "Mpho")
G.add_edge("Deepak", "Sue")
G.add_edge("Mpho", "Sue")
G.add_edge("Sanjay", "Sue")
G.add_edge("Deepak", "Mpho")
nx.draw(G, with_labels=True,
node_color="red",node_size=2000)
plt.show()
SOCIAL NETWORK ANALYSIS

• Link analysis refers to the process of analyzing the links (or relationships) between
any kind of entity. This can include analyzing links between web pages, emails, financial
transactions, or any other data type where relationships between entities are
relevant.
• Social network analysis is a specific kind of link analysis that focuses exclusively on
people and groups and their relationships with each other.
• One aspect that makes SNA such a powerful tool is the ability to visualize these
relationships in a graph, using nodes to represent individuals and edges to represent
the connections between them.
• Visualizing individuals and relationships allows us to more easily intuit the dynamics of
social influence, the formation of social groups, and the flow of information between
groups and individuals.
NETWORKX

• NetworkX is a Python library for the creation, manipulation,

and study of complex networks.
• It can handle networks with millions of nodes and edges, and
provides functions for generating random networks, calculating
network metrics, and visualizing network structures.
• It also has a wide range of algorithms for community
detection, link prediction, and network visualization.
• While Networkx has extensive capabilities, Python users will
find it user-friendly and intuitive to use.
NETWORK ANALYSIS USING
NETWORKX

• NetworkX has its own drawing module which

provides multiple options for plotting.
• Below we can find the visualization for some of the
draw modules in the package.
• Using any of them is fairly easy, as all you need to do
is call the module and pass the G graph variable and
the package does the rest.
A GRAPH OF NODES & EDGES

• Consider that this graph represents the places in a city that people generally
visit, and the path that was followed by a visitor of that city. Let us consider V
as the places and E as the path to travel from one place to another.

V = {v1, v2, v3, v4, v5}

E = {(v1,v2), (v2,v5), (v5, v5), (v4,v5), (v4,v4)}

The edge (u,v) is the same as the edge (v,u) – They
are unordered pairs.

Concretely – Graphs are mathematical structures

used to study pairwise relationships between objects
and entities. It is a branch of Discrete Mathematics
and has found multiple applications in Computer
Science, Chemistry, Linguistics, Operations Research,
Sociology etc.
A NETWORK OF ASSOSCIATIONS

• import pandas as pd
• import matplotlib.pyplot as plt
• import networkx as nx
• G = nx.Graph()
• df = pd.DataFrame([
• ("Dave", "Ntando"), ("Peter", "Shalulile"), ("John", "Jenny"),
• ("Mohamad", "Jabulani"), ("Dave", "John"), ("Peter",
"Sameera"),
• ("Sameera", "Albert"), ("Peter", "John"),("Peter", "Jabulani")
• ], columns=['from', 'to'])
• G = nx.from_pandas_edgelist(df, 'from', 'to')
• nx.draw(G, with_labels=True,
node_color="red",node_size=2000)
• plt.show()
SOCIAL NETWORK ANALYSIS (SNA).

• The study of social structures using graph theory is

called social network analysis (SNA).
USING AN EXTERNAL FILE FOR SNA

• import os
• os.chdir("c:\\SE_2025\\Python\\Data")
• import pandas as pd
• import matplotlib.pyplot as plt
• import networkx as nx
• df = df = pd.read_csv("us_president.csv")
• df.head
• G = nx.from_pandas_edgelist(df, 'From', 'To')
• nx.draw(G, with_labels=True,
node_color='lightblue')
• plt.show()
SNA ANALYSIS

• Analysis
• A node's degree is simply a count of
how many social connections (i.e.,
edges) it has. The degree centrality for
a node is simply its degree. A node
with 10 social connections would have
a degree centrality of 10.
• Degree of Centrality
• nx.degree(G)
• Most Influential
• nx.degree_centrality(G)
• …if you wanted a sorted list most_influential = nx.degree_centrality(G)
for w in sorted(most_influential, key=most_influential.get,
reverse=True):
print(w, most_influential[w])
WE ALL NEED IMPORTANT
CONNECTIONS
• Most important connection
• Eigenvector centrality measures a node's
importance based on the importance of its
neighbors in a network.A node connected to
many influential nodes will have a higher
eigenvector centrality score than a node connected
to many less influential nodes, even if the number of
connections is the same…. In essence, it's not just most_influential = nx.degree_centrality(G)
about the quantity of connections, but also the quality for w in sorted(most_influential,
or influence of those connections. key=most_influential.get, reverse=True):
print(w, most_influential[w])
• most_influential = nx.degree_centrality(G)
• If you wanted a sorted version
SHORTEST PATH

• Shortest path analysis can reveal the overall connectivity

of a network and identify hubs or bridges that connect
different parts of the network.
• Imagine a social network where individuals are nodes and
friendships are edges.The shortest path between two
friends might be a direct friendship (one hop) or a
friendship through a mutual acquaintance (two hops).
• E.g. nx.shortest_path(G,"G Bush","M Trump")
PLAYING WITH SCIKIT-LEARN

• Scikit-learn is the package for machine learning and data science experimentation
favored by most data scientists. It contains a wide range of well-established learning
algorithms, error functions, and testing procedures.
• Data Science
• Classification problem: Guessing that a new observation is from a certain group
• Regression problem: Guessing the value of a new observation
• It works with the method fit(X, y) where X is the bidimensional array of predictors (the set
of observations to learn) and y is the target outcome (another array, unidimensional).
• The next slide shows an example of simple linear regression – the data is read from
the csv file named Salary (found in Learn); the data file contains information of
employees’ years of experience and their salary amounts
GETTING AN INITIAL VIEW OF THE
DATA

• import os
• import numpy as np
• import pandas as pd
• import matplotlib.pyplot as plt
• os.chdir("c:\\SE_2025\\Python\\Data")
• df = pd.read_csv('Salary.csv')
• df.head(10)
• x=df["YearsExperience"]
• y=df["Salary"]
• plt.scatter(x, y, s=[100], marker='*', c='m')
• plt.xlabel("Years of Experience")
• plt.ylabel("Salary")
• plt.show()
WITH THE
PREDICTION
FEATURE
• from sklearn import linear_model
• import numpy as np
• import pandas as pd
• import matplotlib.pyplot as plt
• regr = linear_model.LinearRegression()
• df = pd.read_csv('Salary.csv')
• x=df[["YearsExperience"]] # a list of predictors is expected here - we only have 1 this time
• y=df["Salary"]
• regr.fit(x,y)
• years=int(input("Enter number of years of experience"))
• PredSalary = regr.predict([[years]]) # e,g years = 20 (Years of Experience)
• PredSalary=round(PredSalary[0],2) # since this actually a list…we need the number
• print("For an Employee with ",years," of experience the predicted salary is ",PredSalary)
• plt.scatter(x, y, s=[100], marker='*', c='m') # plot the graph with original data
• plt.plot(x,regr.predict(x)) # add on the regression line
• plt.show()
THE NEXT FEW SECTIONS

• Here the concept of Machine Learning (ML), Artifical

Intelligence (AI) and Big Data (BD) and Data Science
will be presented in an overview manner
DATA MINING

• Data mining is described as “statistics at scale and speed”

• Data mining stands at the confluence of the fields of statistics and
machine learning (also known as artificial intelligence).
• A variety of techniques for exploring data and building models have
been around for a long time in the world of statistics:
• linear regression, logistic regression, discriminant analysis, and principal
components analysis, for example.
• But the core tenets of classical statistics—computing is difficult and
data are scarce—do not apply in data mining applications where both
data and computing power are plentiful
DM & BD

• Data mining and Big Data go hand in hand. Big Data is a relative term—data today are
big by reference to the past, and to the methods and devices available to deal with
them.
• The challenge Big Data presents is often characterized by the four Vs—
• volume, velocity, variety, and veracity.Volume refers to the amount of data.
• Velocity refers to the flow rate—the speed at which it is being generated and changed.
• Variety refers to the different types of data being generated (currency, dates, numbers, text,
etc.).
• Veracity refers to the fact that data is being generated by organic distributed processes
(e.g., millions of people signing up for services or free downloads) and not subject to the
controls or quality checks that apply to data collected for a study.
DATA SCIENCE

• The ubiquity, size, value, and importance of Big Data has given rise to
a new profession: the data scientist.
• Data science is a mix of skills in the areas of statistics, machine
learning, math, programming, business, and IT.
• The term itself is a reference to a rare individual who combines deep
skills in all the constituent areas.
• In their book Analyzing the Analyzers (Harris et al., 2013), the
authors describe the skill sets of most data scientists as resembling a
“T”—deep in one area (the vertical bar of the T), and shallower in
other areas (the top of the T).
STATS & MACHINE LEARNING

• A major difference between the fields of statistics and machine

learning is the focus in statistics on inference from a sample to the
population regarding an “average effect”—
• for example, “a R10 price increase will reduce average demand by 2 boxes.”
• In contrast, the focus in machine learning is on predicting individual
records—“the predicted demand for person i given a R10 price increase is
1 box, while for person j it is 3 boxes.”
• The emphasis that classical statistics places on inference (determining
whether a pattern or interesting result might have happened by chance in
our sample) is absent from data mining.
ML

• Machine learning to refer to algorithms that learn directly from data,

especially local patterns, often in layered or iterative fashion.
• In contrast, we use statistical models to refer to methods that apply
global structure to the data.
• A simple example is a linear regression model (statistical) vs. a k-
nearest-neighbors algorithm (machine learning).
• A given record would be treated by linear regression in accord with an
overall linear equation that applies to all the records.
• In k-nearest neighbours, that record would be classified in accord with the
values of a small number of nearby records.
• Big data is known as the process in which we collect and analyze the large volume
of data sets (called Big Data) which helps in discovering useful hidden patterns and
other information such as customer choices, market trends which is really beneficial
for the organizations to remain informed and customer-oriented business decision
• Machine learning is a subset of AI ( Artificial intelligence) which helps the
computers and machines in predicting future actions without the intervention of
human beings. So it could be said that, with the help of Machine learning, software
applications can learn how to make their accuracy better in order to predict the
outcomes.
• The normal procedure of big data analytics is just all about gathering and transform
the particular data into extract information, and after that, that particular gathered
data is used by Machin Learning in order to predict better results.
BDA & ML

• Big data provides a massive amount of data; volume & variety

• Machine learning is a subset of AI – it only works well when it can control/manage its data---if there is too much
data, ML algorithms will struggle to provide an accurate output
• The data has to be reduced ---this means it needs to be summarized, extrapolated and it has to be represented
by statistical measures of aggregation (e.g. sum, average, median, mode, standard deviation, regression line,
correlation co-efficients,…)
• In this way masses of data are reduced to single values through Big Data Analytics (BDA)
• At this point, Machine learning gets involved and leverages the BDA values to provide additional insights into the
data…and make predictions and creates on the data that it has ….this is referred to as supervised learning
where training data is used to make future predictions as is the case for linear regression.
• In unsupervised learning, the big data is used as a starting point blank slate with no rules or prior patterns
provided); the machine is responsible for identifying patterns and associations…Clustering is an important
concept when it comes to unsupervised learning. It mainly deals with finding a structure or pattern in a collection
of uncategorized data.

Complete 1988 Mazda 323 Workshop Manual
91% (22)
Complete 1988 Mazda 323 Workshop Manual
1,129 pages
Nadine Gordimer - Writing and Being
No ratings yet
Nadine Gordimer - Writing and Being
156 pages
Data Visualization
No ratings yet
Data Visualization
25 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
Getting Started With Python Data Analysis - Sample Chapter
0% (1)
Getting Started With Python Data Analysis - Sample Chapter
17 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
49 pages
Data Manipulation and Visualization
No ratings yet
Data Manipulation and Visualization
21 pages
DOC-20250315-WA0005.
No ratings yet
DOC-20250315-WA0005.
29 pages
Python For Data Exploration
No ratings yet
Python For Data Exploration
28 pages
DSBA Curriculum Guide
No ratings yet
DSBA Curriculum Guide
18 pages
Introduction to Python
No ratings yet
Introduction to Python
71 pages
Part A
No ratings yet
Part A
24 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
13_Data Visualization
No ratings yet
13_Data Visualization
15 pages
NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)
No ratings yet
NumPy, Pandas, MatplotLib,Seaborn, ScikitLearn (SkLearn)
14 pages
Q-Step WS 06112019 Data Analysis and Visualisation With Python
No ratings yet
Q-Step WS 06112019 Data Analysis and Visualisation With Python
76 pages
Vibhin Pro
No ratings yet
Vibhin Pro
36 pages
PYTHON
No ratings yet
PYTHON
11 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
72 pages
Data Analysis and Visualisation With Python
No ratings yet
Data Analysis and Visualisation With Python
75 pages
DAV Notes
No ratings yet
DAV Notes
266 pages
Unit-1
No ratings yet
Unit-1
84 pages
1st Class-Introduction and Python Package (1)
No ratings yet
1st Class-Introduction and Python Package (1)
93 pages
Data Analysis with Python
No ratings yet
Data Analysis with Python
51 pages
Da Question Bank
No ratings yet
Da Question Bank
7 pages
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
No ratings yet
Advance Data Analysis and Visualisation - With - Python For Executives and Business Management
76 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
suraj report file
No ratings yet
suraj report file
17 pages
Unit 7: Problem Solving Real World Programming Problems
No ratings yet
Unit 7: Problem Solving Real World Programming Problems
36 pages
Python Data Analysis Sample Chapter
No ratings yet
Python Data Analysis Sample Chapter
40 pages
Introduction-It Skills
No ratings yet
Introduction-It Skills
20 pages
dsbda Unit4
No ratings yet
dsbda Unit4
110 pages
PPT-moocs-jayashRA2111003011636
No ratings yet
PPT-moocs-jayashRA2111003011636
14 pages
RR
No ratings yet
RR
35 pages
2.1 - Introduction To Data Analytics
No ratings yet
2.1 - Introduction To Data Analytics
32 pages
l9 Scientific Python Proc
No ratings yet
l9 Scientific Python Proc
30 pages
Report File
No ratings yet
Report File
40 pages
PP&DS UNIT III
No ratings yet
PP&DS UNIT III
26 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
BDA File
No ratings yet
BDA File
26 pages
Python For Data Science
No ratings yet
Python For Data Science
12 pages
Efficient Data Preparation: With Python
No ratings yet
Efficient Data Preparation: With Python
19 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
Ipl Data Analysis Pbl
No ratings yet
Ipl Data Analysis Pbl
11 pages
Data Analysis Using Python Day_1 to Day_4
No ratings yet
Data Analysis Using Python Day_1 to Day_4
30 pages
Data Analysis Salary of Data Professions
No ratings yet
Data Analysis Salary of Data Professions
14 pages
Final
No ratings yet
Final
36 pages
DVP First Module
No ratings yet
DVP First Module
88 pages
Lab - Manual FDS
No ratings yet
Lab - Manual FDS
12 pages
AUTOMATED EDA Libraries
No ratings yet
AUTOMATED EDA Libraries
12 pages
TY FDS Workbook
No ratings yet
TY FDS Workbook
56 pages
IJERT Data Analysis Using Python
No ratings yet
IJERT Data Analysis Using Python
6 pages
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
No ratings yet
Experiment No: 1 Introduction To Data Analytics and Python Fundamentals Page-1/11
8 pages
MLS+1+-+Python+for+Data+Science
No ratings yet
MLS+1+-+Python+for+Data+Science
33 pages
Basic Libraries For Data Science
No ratings yet
Basic Libraries For Data Science
4 pages
data science
No ratings yet
data science
42 pages
DATA ANALYSIS USING PYTHON2
No ratings yet
DATA ANALYSIS USING PYTHON2
27 pages
DVAP - Final Project Report
No ratings yet
DVAP - Final Project Report
27 pages
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
From Everand
Python Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2
e3
No ratings yet
Python For Beginners
From Everand
Python For Beginners
Célio Azevedo
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
I. Executive Summary
No ratings yet
I. Executive Summary
10 pages
Dynasand Continuous Sand Filter E10799 en Web
No ratings yet
Dynasand Continuous Sand Filter E10799 en Web
6 pages
Learn French in France - France-Diplomatie - Ministère de L'europe Et Des Affaires Étrangères
No ratings yet
Learn French in France - France-Diplomatie - Ministère de L'europe Et Des Affaires Étrangères
2 pages
L02 NanoTransistorModels
No ratings yet
L02 NanoTransistorModels
102 pages
Table of Contents
No ratings yet
Table of Contents
19 pages
Atomic Mass and Atomic Number Worksheet
No ratings yet
Atomic Mass and Atomic Number Worksheet
1 page
Borewell Operator
No ratings yet
Borewell Operator
4 pages
DEH-P6000UB: Operation Manual Manual de Instrucciones
No ratings yet
DEH-P6000UB: Operation Manual Manual de Instrucciones
120 pages
Climate Change
No ratings yet
Climate Change
5 pages
Pitfall Prayer Marks: Recognition and Appropriate Treatment: A Case Report and Review of Literature
No ratings yet
Pitfall Prayer Marks: Recognition and Appropriate Treatment: A Case Report and Review of Literature
6 pages
Japan and Brazil Throught A Tarvelers Eye
No ratings yet
Japan and Brazil Throught A Tarvelers Eye
2 pages
Counter Affidavit in Service Matter in SC
No ratings yet
Counter Affidavit in Service Matter in SC
29 pages
Thesis Draft- Chap I-V
No ratings yet
Thesis Draft- Chap I-V
125 pages
content%3A%2F%2Forg.telegram.messenger.provider%2Fmedia%2FAndroid%2Fdata%2Forg.telegram.messenger%2Ffiles%2FTelegram%2FTelegram%2520Files%2FFirst%2520round%2520model%2520Exam%2520contents
No ratings yet
content%3A%2F%2Forg.telegram.messenger.provider%2Fmedia%2FAndroid%2Fdata%2Forg.telegram.messenger%2Ffiles%2FTelegram%2FTelegram%2520Files%2FFirst%2520round%2520model%2520Exam%2520contents
3 pages
Communication Skills Personality Development
No ratings yet
Communication Skills Personality Development
4 pages
3bse066174r201 - Cba
No ratings yet
3bse066174r201 - Cba
44 pages
Reporter 6
No ratings yet
Reporter 6
56 pages
Monetary Policy Instruments in Ethiopia
No ratings yet
Monetary Policy Instruments in Ethiopia
18 pages
Redhill Park Management Plan 2007-11
No ratings yet
Redhill Park Management Plan 2007-11
48 pages
Case Study. People Involved
No ratings yet
Case Study. People Involved
3 pages
Free Shield
No ratings yet
Free Shield
3 pages
ARP Poison Routing: An Attack at Indiana University
No ratings yet
ARP Poison Routing: An Attack at Indiana University
39 pages
Sample Offer - Offer Letter
No ratings yet
Sample Offer - Offer Letter
2 pages
Civil War Battles Chart
100% (1)
Civil War Battles Chart
1 page
IMI Maxseal ICO4S, 1-4, 3-Way 20 Bar
No ratings yet
IMI Maxseal ICO4S, 1-4, 3-Way 20 Bar
3 pages
Pay Later Statement
No ratings yet
Pay Later Statement
2 pages
CGLE -24 Final Result Writeup_12325
No ratings yet
CGLE -24 Final Result Writeup_12325
17 pages
SOC 101 Course Outline-Section 3
No ratings yet
SOC 101 Course Outline-Section 3
6 pages

PythonDASE_2025 Version1 (1)

Uploaded by

PythonDASE_2025 Version1 (1)

Uploaded by

DATA ANALYTICS WITH PYTHON

• What is Data Analytics in Software Development?

The DataFrame has an automatic index column that is used to

• A graph or chart is simply a visual representation of

• import matplotlib.pyplot as plt

The colors parameter lets you choose custom

• Histograms categorize data by breaking it into bins, where each bin

• NumPy offers the random module to work with random numbers

• The np.random.randn() is a numpy library method

• import matplotlib.pyplot as plt

• Nodes represent the individuals in a network,

• Networks are described by two sets of items, import matplotlib.pyplot as plt

• NetworkX is a Python library for the creation, manipulation,

• NetworkX has its own drawing module which

V = {v1, v2, v3, v4, v5}

E = {(v1,v2), (v2,v5), (v5, v5), (v4,v5), (v4,v4)}

Concretely – Graphs are mathematical structures

• The study of social structures using graph theory is

• Shortest path analysis can reveal the overall connectivity

• Here the concept of Machine Learning (ML), Artifical

• Data mining is described as “statistics at scale and speed”

• A major difference between the fields of statistics and machine

• Machine learning to refer to algorithms that learn directly from data,

• Big data provides a massive amount of data; volume & variety

You might also like