0% found this document useful (0 votes)
73 views

T Assignment

The document provides details on the 6 phases of the CRISP-DM methodology for data mining projects: 1) Business Understanding to define objectives, 2) Data Understanding to collect and examine data, 3) Data Preparation to prepare data for modeling, 4) Modeling to select techniques and build models, 5) Evaluation to assess results and determine next steps, and 6) Deployment to implement results. Each phase consists of multiple tasks to comprehensively guide the data mining process from start to finish.

Uploaded by

ANURAG RAI
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views

T Assignment

The document provides details on the 6 phases of the CRISP-DM methodology for data mining projects: 1) Business Understanding to define objectives, 2) Data Understanding to collect and examine data, 3) Data Preparation to prepare data for modeling, 4) Modeling to select techniques and build models, 5) Evaluation to assess results and determine next steps, and 6) Deployment to implement results. Each phase consists of multiple tasks to comprehensively guide the data mining process from start to finish.

Uploaded by

ANURAG RAI
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1. Explain data mining applications.

2. Explain working of modeler.


3. Design all the phases of CRISP-DM

1.Data Mining Applications

Here is the list of areas where data mining is widely used −

Financial Data Analysis

Retail Industry

Telecommunication Industry

Biological Data Analysis

Other Scientific Applications

Intrusion Detection

Financial Data Analysis

Design and construction of data warehouses for multidimensional data analysis and data mining.

Loan payment prediction and customer credit policy analysis.

Classification and clustering of customers for targeted marketing.

Detection of money laundering and other financial crimes.

Retail Industry

Data Mining has its great application in Retail Industry because it collects large amount of data
from on sales, customer purchasing history, goods transportation, consumption and services. It
is natural that the quantity of data collected will continue to expand rapidly because of the
increasing ease, availability and popularity of the web.

Data mining in retail industry helps in identifying customer buying patterns and trends that lead
to improved quality of customer service and good customer retention and satisfaction. Here is
the list of examples of data mining in the retail industry −

Design and Construction of data warehouses based on the benefits of data mining.

Multidimensional analysis of sales, customers, products, time and region.

Analysis of effectiveness of sales campaigns.

Customer Retention

Product recommendation and cross-referencing of items.

Telecommunication Industry
Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies,
the telecommunication industry is rapidly expanding. This is the reason why data mining is
become very important to help and understand the business.

Data mining in telecommunication industry helps in identifying the telecommunication patterns,


catch fraudulent activities, make better use of resource, and improve quality of service. Here is
the list of examples for which data mining improves telecommunication services −

Multidimensional Analysis of Telecommunication data.

Fraudulent pattern analysis.

Identification of unusual patterns

Multidimensional association and sequential patterns analysis.

Mobile Telecommunication services.

Use of visualization tools in telecommunication data analysis.

Biological Data Analysis:

In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data mining is a very
important part of Bioinformatics. Following are the aspects in which data mining contributes for
biological data analysis −

Semantic integration of heterogeneous, distributed genomic and proteomic databases.

Alignment, indexing, similarity search and comparative analysis multiple nucleotide sequences.

Discovery of structural patterns and analysis of genetic networks and protein pathways.

Association and path analysis.

Visualization tools in genetic data analysis.

2.Explain working of modeler.

IBM SPSS Modeler is an extensive predictive analytics platform that is designed to bring
predictive intelligence to decisions made by individuals, groups, systems and the enterprise. By
providing a range of advanced algorithms and techniques that include text analytics, entity
analytics, decision management and optimization, SPSS Modeler can help you consistently make
the right decisions—from the desktop or within operational systems.

Access a variety of data sources such as data warehouses, databases, Hadoop distributions or


flat files to find hidden patterns in the data.
Deliver predictive, resource-aware and strategically aligned decisions to people and systems at
the point of impact almost instantly.

Put analytics in the hands of whoever will benefit from it, regardless of their statistical or
analytical background.
.

Solve your business problems with a single platform that is designed to handle simple
descriptive analysis all the way to the most complex optimization problems.

Analyse vast amounts of data in less time while fully using your existing IT investments with in-
database performance and minimized data movement.
.

Take advantage often open platform that can be deployed in most environments and
integrated with other IBM solutions to bridge the gap between analytics and action.

4. Design all the phases of CRISP-DM

I. Business Understanding

Any good project starts with a deep understanding of the customer’s needs. Data mining
projects are no exception and CRISP-DM recognizes this. 

The Business Understanding phase focuses on understanding the objectives and requirements of


the project. Aside from the third task, the three other tasks in this phase are foundational
project management activities that are universal to most projects:

1. Determine business objectives: You should first “thoroughly understand, from a business


perspective, what the customer really wants to accomplish.

2. Assess situation: Determine resources availability, project requirements, assess risks and


contingencies, and conduct a cost-benefit analysis.

3. Determine data mining goals: In addition to defining the business objectives, you should
also define what success looks like from a technical data mining perspective.

4. Produce project plan: Select technologies and tools and define detailed plans for each
project phase.

II. Data Understanding

Next is the Data Understanding phase. Adding to the foundation of Business Understanding, it


drives the focus to identify, collect, and analyse the data sets that can help you accomplish the
project goals. This phase also has four tasks:

1. Collect initial data: Acquire the necessary data and (if necessary) load it into your analysis
tool.

2. Describe data: Examine the data and document its surface properties like data format,
number of records, or field identities.

3. Explore data: Dig deeper into the data. Query it, visualize it, and identify relationships
among the data.
4. Verify data quality: How clean/dirty is the data? Document any quality issues.

III. Data Preparation

A common rule of thumb is that 80% of the project is data preparation.

This phase, which is often referred to as “data munging”, prepares the final data set(s) for
modeling. It has five tasks:

1. Select data: Determine which data sets will be used and document reasons for
inclusion/exclusion.

2. Clean data: Often this is the lengthiest task. Without it, you’ll likely fall victim to garbage-in,
garbage-out. A common practice during this task is to correct, impute, or remove erroneous
values.

3. Construct data: Derive new attributes that will be helpful. For example, derive someone’s
body mass index from height and weight fields.

4. Integrate data: Create new data sets by combining data from multiple sources.

5. Format data: Re-format data as necessary. For example, you might convert string values that
store numbers to numeric values so that you can perform mathematical operations.

IV. Modeling

What is widely regarded as data science’s most exciting work is also often the shortest phase of
the project.

Here you’ll likely build and assess various models based on several different modeling
techniques. This phase has four tasks:

1. Select modeling techniques: Determine which algorithms to try (e.g. regression, neural net).

2. Generate test design: Pending your modeling approach, you might need to split the data
into training, test, and validation sets.

3. Build model: As glamorous as this might sound, this might just be executing a few lines of
code like “reg = LinearRegression().fit(X, y)”.

4. Assess model: Generally, multiple models are competing against each other, and the data
scientist needs to interpret the model results based on domain knowledge, the pre-defined
success criteria, and the test design.

V. Evaluation

Whereas the Assess Model task of the Modeling phase focuses on technical model assessment,


the Evaluation phase looks more broadly at which model best meets the business and what to
do next. This phase has three tasks:

1. Evaluate results: Do the models meet the business success criteria? Which one(s) should we
approve for the business?

2. Review process: Review the work accomplished. Was anything overlooked? Were all steps
properly executed? Summarize findings and correct anything if needed.
3. Determine next steps: Based on the previous three tasks, determine whether to proceed to
deployment, iterate further, or initiate new projects.

VI. Deployment

“Depending on the requirements, the deployment phase can be as simple as generating a report
or as complex as implementing a repeatable data mining process across the enterprise.”

A model is not particularly useful unless the customer can access its results. The complexity of
this phase varies widely. This final phase has four tasks:

1. Plan deployment: Develop and document a plan for deploying the model.

2. Plan monitoring and maintenance: Develop a thorough monitoring and maintenance plan


to avoid issues during the operational phase (or post-project phase) of a model.

3. Produce final report: The project team documents a summary of the project which might
include a final presentation of data mining results.

4. Review project: Conduct a project retrospective about what went well, what could have
been better, and how to improve in the future.

You might also like