0% found this document useful (0 votes)
13 views

Lecture 2 (Defining Data Analytics)

Defining Data Analytics

Uploaded by

Bilal Rauf
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Lecture 2 (Defining Data Analytics)

Defining Data Analytics

Uploaded by

Bilal Rauf
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data Analytics in Software Engineering

(MSE669)

Dr. Assad Abbas

Associate Professor
Department of Computer Science
COMSATS University Islamabad, Islamabad Campus
[email protected]
Outline
n Introduction
n Overview of data analytics and software engineering
n Data analytics vs. other concepts
n Importance in Software Engineering

June 23, 2024 2


Defining Data Analytics
n Data analytics is the process of
collecting, processing, analyzing, and
communicating data to support decision
making and problem solving.
n Data analytics can be applied to various
domains and fields, such as business,
education, health, etc.
n Data analytics can use different types of
techniques and methods, such as
descriptive, diagnostic, predictive, and
prescriptive analytics.
n Data analytics can use different types of
tools and languages, such as R,
Python, SQL, Excel, etc.

June 23, 2024 3


Data Analytics vs. Other Concepts
n Often confused with other related concepts, such as data
mining, data science, and machine learning.
n However, these are not the same, and they have different
purposes and scopes.
n Data mining—specialized process within analytics for
discovering patterns, trends, and associations from large and
complex data sets.
n Data science—a broad and interdisciplinary field that
combines data analytics, data mining, machine learning,
statistics, mathematics, computer science, and domain
knowledge.
n Machine learning is a subfield of artificial intelligence that
focuses on creating systems and models that can learn from
data and improve their performance without explicit
programming.
June 23, 2024 4
Data Analytics Lifecycle

June 23, 2024 5


Software Engineering
n Software engineering is an engineering-based
approach to software development. A software
engineer is a person who applies the engineering
design process to design, develop, test, maintain,
and evaluate computer software

June 23, 2024 6


Importance in Software Engineering
n Data analytics is important in software engineering because it
can help software engineers improve the quality,
performance, and usability of software products and services.
n Can help software engineers:
5 understand the needs, preferences, and behaviors of
software users and stakeholders.
5 optimize the software development process and lifecycle.
5 enhance the software functionality, reliability, security, and
efficiency.
5 innovate new software features, products, and services
based on data insights.

June 23, 2024 7


Integrating Data Analytics in SDLC
n The software development life cycle (SDLC) is a framework
that defines the stages and activities involved in developing a
software product or system.
n The SDLC typically consists of phases such as planning,
analysis, design, implementation, testing, deployment, and
maintenance.
n Integrating data analytics in the SDLC can help improve the
quality and efficiency of the software development process
and outcome.
n Data analytics can provide valuable insights and feedback at
each phase of the SDLC, such as:

June 23, 2024 8


Software Engineering Lifecycle

June 23, 2024 9


Integrating Data Analytics in SDLC
n Planning
5 Data analytics can also help estimate the resources, time, and budget needed for the
project.
n Analysis
5 Can help understand the user needs, preferences, and behaviors.
5 can also help analyze the existing systems, processes, and data sources that are
relevant to the project.
5 Market research, user experience and feedback analysis
n Design
5 can help design the software architecture, components, and interfaces that meet the
functional and non-functional requirements.
5 can also help design the data models, schemas, and structures that support the data
processing and analysis.
5 Can help design user interfaces, system design models as per market demand

June 23, 2024 10


Integrating Data Analytics in SDLC
n Implementation
5 Code quality analysis (code smell detection), code review optimization, code
contribution analysis, bug prediction, code documentation analysis
n Testing
5 Analytics help test the software functionality, performance, and usability.
5 Test case prioritization, test automation optimization, defect prediction, regression
testing analysis, user feedback analysis, test coverage analysis, continuous
improvement
n Deployment
5 Analytics help deploy the software product or system to the target environment,
platform, or users.
5 Release planning, most suitable times of deployment, performance monitoring, CI/CD,
rollback decision support, infrastructure scaling, post deployment metrices, such as
error rate, response time, deployment optimization
n Maintenance
5 Analytics can help monitor and measure the software performance, usability, and user
satisfaction.
5 Analytics can also help maintain and update the software product or system and the
data sets, models, and dashboards according to the changing needs and feedback.

June 23, 2024 11


Illustration

June 23, 2024 12


Challenges in Data Analytics for Software Engineering
n Data quality issues
5 The data may be incomplete, inconsistent, inaccurate, or outdated.
5 This can affect the validity and reliability of the analysis results and
lead to wrong decisions or actions.

June 23, 2024 13


Challenges in Data Analytics for Software Engineering
n Privacy concerns
5 The data may contain sensitive or personal information that needs to
be protected from unauthorized access or disclosure.
5 This requires proper data governance and security measures to
ensure compliance with ethical and legal standards.
n Interpretability challenges
5 The data analysis results may be complex, ambiguous, or difficult to
understand or explain. This can hinder the communication and
adoption of the results by the stakeholders and users.
5 This requires proper visualization and documentation techniques to
make the results clear and actionable.

June 23, 2024 14


Applications in Software Engineering
n Bug prediction
5 To Identify and prioritize the software modules or components that are
most likely to contain defects or failures.
5 This helps allocating testing resources and efforts more effectively
and reduce the cost and time of debugging.
n Code optimization
5 To optimize the code for better performance, readability,
maintainability, or security.
5 Involves analyzing the code complexity, structure, style, or
dependencies and suggesting or applying improvements or
refactoring.
n Project management
5 To monitor and control the software development process and
lifecycle.
5 Involve analyzing the project progress, performance, risks, or issues
and providing feedback or recommendations to the project team or
stakeholders.
June 23, 2024 15
Case Study—Bug Prediction
n Microsoft Research developed a machine learning
model that can predict the number of bugs in a
software project based on various code metrics and
historical data.
n The model can also identify the most bug-prone files
or functions in the code base.
n Applied to several Microsoft products, such as
Windows, Office, or Azure, and achieved high
accuracy and usefulness.
n The model helped the developers and testers to
focus on the most critical parts of the code and
reduce the bug fixing time and effort
https://medium.com/data-science-at-microsoft/common-data-engineering-challenges-and-their-solution-dd51872812ac
June 23, 2024 16
Case Study— Code Optimization
n Work done by Facebook for code optimization.
n Zoncolan, tool to analyze the code of their Web
applications and detect potential performance or security
issues.
n The tool can also suggest or apply code optimizations to
improve the code quality and efficiency.
n The tool was applied to millions of lines of code across
Facebook’s web applications and found thousands of
issues and opportunities for improvement.
n The tool helped the developers and engineers to
enhance the performance and security of their web
applications and reduce the technical debt

June 23, 2024 17


Software Practice for Big Data Analytics
n Big data analytics refers to the process of analyzing
large and complex data sets using advanced tools and
techniques, such as cloud computing, distributed
systems, parallel processing, machine learning, etc.
n Big data analytics can help businesses gain insights and
value from their data that are otherwise difficult or
impossible to obtain.
n However, big data analytics also poses new challenges
and requirements for software engineering, such as
scalability, performance, reliability, security, etc.
n Therefore, software engineers need to adopt new
practices and methodologies to design and develop big
data analytics platforms and applications.

June 23, 2024 18


Software Practice for Big Data Analytics
n Some of the best practices for software engineering for big
data analytics are:
5 Adopting an agile and iterative approach that allows for rapid
prototyping, testing, and feedback.
5 Using a software stack that consists of various tools and frameworks
that support scalable data processing, such as SPARK, MongoDB,
Elasticsearch, Apache Kafka, PySpark, scikit-learn, Spark MLlib, etc.
5 Applying software engineering principles and standards, such as
modularity, reusability, documentation, testing, etc.
5 Employing cloud services and resources, such as storage, compute,
networking, etc., to enable elasticity, availability, and cost-efficiency.
5 Implementing data governance and security measures, such as data
quality, privacy, encryption, authentication, etc., to ensure compliance
and trustworthiness.

June 23, 2024 19


Future Trends
n Data analytics and software engineering are two fields that are constantly
evolving and innovating. Some of the future trends are:
n AI-driven development
n Advanced analytics:
5 Data analytics will become more advanced and sophisticated, using
techniques such as natural language processing, computer vision, deep
learning, etc.
5 Data analytics will also become more accessible and democratized, allowing
more users to perform data analysis without coding or technical skills.
n Data mesh
5 Data mesh is a new paradigm for data architecture that treats data as a
decentralized and distributed asset.
5 Data mesh enables data to be owned, managed, and governed by the domain
or business unit that produces it, rather than by a centralized data team or
platform.
5 Data mesh also enables data to be easily discoverable, accessible, and
interoperable across the organization.

June 23, 2024 20


Software Data Repositories
n PROMISE Software Engineering Repository
n Software Engineering – Datasets
n Top Skills for US-Based Software Engineers Dataset
n A Versatile Dataset of Agile Open Source Software P
rojects

June 23, 2024 21

You might also like