0% found this document useful (0 votes)
40 views

IDS Unit - 5

1. Data science frameworks provide tools to execute data science techniques and gain insights from business data to drive decisions. The overall framework involves 7 steps: ask questions, acquire and assimilate data, analyze data, answer questions, advise and act. 2. Key components of a data science project include purpose, people with different skills, processes, platforms, and programmability. 3. Process evaluations examine if a program's planned operations and achievements align with expectations, and help identify areas needing monitoring or updates.

Uploaded by

Vrindapareek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views

IDS Unit - 5

1. Data science frameworks provide tools to execute data science techniques and gain insights from business data to drive decisions. The overall framework involves 7 steps: ask questions, acquire and assimilate data, analyze data, answer questions, advise and act. 2. Key components of a data science project include purpose, people with different skills, processes, platforms, and programmability. 3. Process evaluations examine if a program's planned operations and achievements align with expectations, and help identify areas needing monitoring or updates.

Uploaded by

Vrindapareek
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Unit – 5

Q. 1 What is Framework in Data Science ?


A framework in data science are software tools, that help you with executing data
science techniques on your business data to get the best insights that drive your
decisions.
Data Science is the art of turning data into actions and the overall framework is
the following 7 high level steps :

Ask > Acquire > Assimilate > Analyze > Answer > Advise > Act

1. Asking Questions : Data science begins by asking questions, where the


answers are then used to advise and act upon.
2. Acquiring and Assimilating Data : Once a series of questions are asked, a
data scientist will try and acquire the required data and assimilate in a
form which is usable.
3. Analyzing Data : Once the data has been collected and cleaned, we are now
ready to start the analysis or conduct data mining.
4. Answering Questions with data : After we are able to build a model
having the desired performance. We will be then able to answer the
question with data with a proper model.
5. Advise and Act : Another very important part of data science is the Advise
stage. After understanding data, a data scientist will have to provide
actionable advise.

Q. 2 What are various components of Data Science Project ?

1. Purpose : Just like in the classic approach of project management, a goal or


purpose should always be formulated.

2. People : Various types of people with different skillsets play an important


role within a data science project. In order to work successfully with data,
developers, testers, data scientists and domain experts are essential.

3. Processes : There are two main types of processes within data science
projects: organizational vs. technical processes.
4. Platforms : Besides the above mentioned factors, fundamental and
strategical questions that are concerned with what platforms you will use
for your analytics and products are also critical for successfully managing a
data science project.

5. Programmability : Finally, you want to think about which tools and


programming languages we want to use.

Q. 3 What is Process Evaluation ?


1. Process evaluations can help decision-makers understand if their program
is running as expected.
2. They help to examine whether planned program operations and
achievements have taken place in accordance with an organization’s
expectations.
3. We would deploy a process evaluation to investigate whether aspects of a
program are working as planned.
4. It helps clients to identify which areas need close monitoring.
5. It also helps to anticipate whether a program will run smoothly when
scaled up.
6. They helps teams to interrogate where the decisions/actions to be updated
to achieve their analysis goals.

Q. 4 What methodology used in Data Science Project ?


1. Business Understanding : Before solving any problem in the Business
domain it needs to be understood properly. Business understanding forms
a concrete base, which further leads to easy resolution of queries.

2. Analytic Understanding : Based on the above business understanding one


should decide the analytical approach to follow. The approaches can be of 4
types: Descriptive approach, Diagnostic approach, Predictive approach and
Prescriptive approach.

3. Data Requirements : The above chosen analytical method indicates the


necessary data content, formats and sources to be gathered.

4. Data Collection : Data collected can be obtained in any random format. So,
according to the approach chosen and the output to be obtained, the data
collected should be validated.

5. Data Understanding : Data understanding answers the question “Is the


data collected representative of the problem to be solved?”. Descriptive
statistics calculates the measures applied over data to access the content
and quality of matter.

6. Data Preparation : Here noise removal is done. if we don’t need specific


data then we should not consider it for further process. This whole process
includes transformation, normalization etc.

7. Modelling : Modelling decides whether the data prepared for processing is


appropriate or requires more finishing and seasoning.

8. Evaluation : Model evaluation is done during model development. It


checks for the quality of the model to be assessed and also if it meets the
business requirements.

9. Deployment : Deployment phase checks how much the model can


withstand in the external environment and perform superiorly as
compared to others.

10. Feedback : Feedback is the necessary purpose which helps in refining the
model and accessing its performance and impact.
Q. 5 Case Study of Industry Use of Data Science.

1. Data Science Case Study – Spotify


Music plays an important role in the lives of people of almost all age
groups. We frequently listen to our favorite songs in our daily routine such
as while traveling, in leisure time, etc. to release our stress and relax.
Today, there are many music playing applications in the market.

You all might have heard the name “Spotify” at least once and most
probably, you might have even used it. So you must have observed that as
soon as we start using it on a regular basis, it starts giving us personalized
music recommendations and options to create customized playlists. This is
what people like about it.
But how does Spotify do all this? The answer is “data”.

At the core of these personalized services lies a large amount of user data
that Spotify, actually not only Spotify but most of the music playing
applications are using.

Spotify is using this data for optimizing their algorithms, improving user
music experience, providing targeted ads, and making some good business
strategies.

The main goal of Spotify is to provide such a great experience to every user
that will make them continue listening for hours. To achieve this they are
using many advanced Data Science and Machine Learning techniques to
extract insights from the user data for matching with the music taste of
their individual customer.

2. Data Science Case Study – LinkedIn


LinkedIn is one of the most successful social media platforms that is
connecting professionals across the globe. It also uses customer data for
providing better services and customized user experience.

LinkedIn stores a large amount of user data including several details like
their contact information, previous history, interests, activities on different
social networking sites, etc. in its data warehouse for being aware of the
trends and patterns.
Using the insights gained from the user data, LinkedIn connects individual
users with their friends and people related to their areas of interest. It also
helps them to make some decisions regarding the business.

According to the different trends, LinkedIn provides various articles and


other services that might match user interests. LinkedIn also enables users
to promote their business to the right people by making use of targeting.

Also while using the customers’ data, LinkedIn makes sure that the data is
secure and no scrapping of data takes place from their site.

Q. 6 Different Data Science Project Methods

Method 1 : Scrum
This approach is the most widely used process framework for agile
development processes. Scrum emphasizes daily communication and
flexible reassessment of plans that are executed in a short period of time.

Method 2 : Kanban
This approach features managing/improving products with a focus on
continuous delivery without overloading the development team.

Method 3 : BEAM
BEAM aims at Agile Dimensional Modeling, with the goal of aligning
requirements analysis with business processes rather than reports.

Q. 7 Challenges in Data Science Project Management

1. Data Quality : The process of discovering data can is a crucial and


fundamental task in a data-driven project. The approaches for quality of data
can be discovered based on certain requirements, such as user-centered and
other organizational frameworks.

2. Data Integration : In general, the method of combining data from various


sources and store it together to get a unified view is known as data
integration. Inconsistent data in an organization is likely to have data
integration issues.
3. Dirty Data : Data which contains inaccurate information can be said as dirty
data. To remove the dirty data from a dataset is virtually impossible.
Depending on the severity of the errors, strategies to work with dirty data
needs to be implemented.

4. Data Uncertainty : Reasons for data uncertainties can be ranged from


measurement errors, processing errors, etc. Known and unknown errors, as
well as uncertainties, should be expected when using real-world data.

5. Data Transformation : Although the whole data can be transformed into a


usable form, yet there remain some issues which can go wrong with the ETL
project such as an increase in data velocity, time cost of fixing broken data
connections, etc.

You might also like