T Assignment
T Assignment
Retail Industry
Telecommunication Industry
Intrusion Detection
Design and construction of data warehouses for multidimensional data analysis and data mining.
Retail Industry
Data Mining has its great application in Retail Industry because it collects large amount of data
from on sales, customer purchasing history, goods transportation, consumption and services. It
is natural that the quantity of data collected will continue to expand rapidly because of the
increasing ease, availability and popularity of the web.
Data mining in retail industry helps in identifying customer buying patterns and trends that lead
to improved quality of customer service and good customer retention and satisfaction. Here is
the list of examples of data mining in the retail industry −
Design and Construction of data warehouses based on the benefits of data mining.
Customer Retention
Telecommunication Industry
Today the telecommunication industry is one of the most emerging industries providing various
services such as fax, pager, cellular phone, internet messenger, images, e-mail, web data
transmission, etc. Due to the development of new computer and communication technologies,
the telecommunication industry is rapidly expanding. This is the reason why data mining is
become very important to help and understand the business.
In recent times, we have seen a tremendous growth in the field of biology such as genomics,
proteomics, functional Genomics and biomedical research. Biological data mining is a very
important part of Bioinformatics. Following are the aspects in which data mining contributes for
biological data analysis −
Alignment, indexing, similarity search and comparative analysis multiple nucleotide sequences.
Discovery of structural patterns and analysis of genetic networks and protein pathways.
IBM SPSS Modeler is an extensive predictive analytics platform that is designed to bring
predictive intelligence to decisions made by individuals, groups, systems and the enterprise. By
providing a range of advanced algorithms and techniques that include text analytics, entity
analytics, decision management and optimization, SPSS Modeler can help you consistently make
the right decisions—from the desktop or within operational systems.
Put analytics in the hands of whoever will benefit from it, regardless of their statistical or
analytical background.
.
Solve your business problems with a single platform that is designed to handle simple
descriptive analysis all the way to the most complex optimization problems.
Analyse vast amounts of data in less time while fully using your existing IT investments with in-
database performance and minimized data movement.
.
Take advantage often open platform that can be deployed in most environments and
integrated with other IBM solutions to bridge the gap between analytics and action.
I. Business Understanding
Any good project starts with a deep understanding of the customer’s needs. Data mining
projects are no exception and CRISP-DM recognizes this.
3. Determine data mining goals: In addition to defining the business objectives, you should
also define what success looks like from a technical data mining perspective.
4. Produce project plan: Select technologies and tools and define detailed plans for each
project phase.
1. Collect initial data: Acquire the necessary data and (if necessary) load it into your analysis
tool.
2. Describe data: Examine the data and document its surface properties like data format,
number of records, or field identities.
3. Explore data: Dig deeper into the data. Query it, visualize it, and identify relationships
among the data.
4. Verify data quality: How clean/dirty is the data? Document any quality issues.
This phase, which is often referred to as “data munging”, prepares the final data set(s) for
modeling. It has five tasks:
1. Select data: Determine which data sets will be used and document reasons for
inclusion/exclusion.
2. Clean data: Often this is the lengthiest task. Without it, you’ll likely fall victim to garbage-in,
garbage-out. A common practice during this task is to correct, impute, or remove erroneous
values.
3. Construct data: Derive new attributes that will be helpful. For example, derive someone’s
body mass index from height and weight fields.
4. Integrate data: Create new data sets by combining data from multiple sources.
5. Format data: Re-format data as necessary. For example, you might convert string values that
store numbers to numeric values so that you can perform mathematical operations.
IV. Modeling
What is widely regarded as data science’s most exciting work is also often the shortest phase of
the project.
Here you’ll likely build and assess various models based on several different modeling
techniques. This phase has four tasks:
1. Select modeling techniques: Determine which algorithms to try (e.g. regression, neural net).
2. Generate test design: Pending your modeling approach, you might need to split the data
into training, test, and validation sets.
3. Build model: As glamorous as this might sound, this might just be executing a few lines of
code like “reg = LinearRegression().fit(X, y)”.
4. Assess model: Generally, multiple models are competing against each other, and the data
scientist needs to interpret the model results based on domain knowledge, the pre-defined
success criteria, and the test design.
V. Evaluation
1. Evaluate results: Do the models meet the business success criteria? Which one(s) should we
approve for the business?
2. Review process: Review the work accomplished. Was anything overlooked? Were all steps
properly executed? Summarize findings and correct anything if needed.
3. Determine next steps: Based on the previous three tasks, determine whether to proceed to
deployment, iterate further, or initiate new projects.
VI. Deployment
“Depending on the requirements, the deployment phase can be as simple as generating a report
or as complex as implementing a repeatable data mining process across the enterprise.”
A model is not particularly useful unless the customer can access its results. The complexity of
this phase varies widely. This final phase has four tasks:
3. Produce final report: The project team documents a summary of the project which might
include a final presentation of data mining results.
4. Review project: Conduct a project retrospective about what went well, what could have
been better, and how to improve in the future.