Data Mining
Data Mining
Overview 2 - 3 Data Mining History & Background...4 Statistics...4 Artificial Intelligence..4 - 5 Machine Learning 5 How Data Mining Works6 - 8 Steps in Data Mining..9 - 10 Data Mining Elements....10 Data, Information, and Knowledge.....11 Advantages and Disadvantages of Data Mining.12 Advantages of Data Mining.12 - 13 Disadvantages of Data Mining.....13 - 14 Sample Company that uses Data Mining . .15 Conclusion..16 References Appendix
Overview
Data mining is the process of discovering knowledge from databases that are being stored in data warehouses. It is a powerful new technology which helps companies focus on the most important information in their data warehouses.It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational databases. Data mining tools foresee future trends and behaviors, allowing businesses to make practical, knowledge-driven decisions. These tools answer business concerns that would usually be time consuming to solve.The purpose is to identify valid, useful, and understandable patterns in data.Data mining involves a seven (7) step process: Data Integration
Data Selection
Data Cleaning
Data Transformation
Data Mining
Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost.For example, one Midwest grocery chain used the data mining capacity of Oracle software to analyze local buying patterns. They discovered that when men bought diapers on Thursdays and Saturdays, they also tended to buy beer. Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Thursdays, however, they only bought a few items. The retailer concluded that they purchased the beer to have it available for the upcoming weekend. The grocery chain could use this newly discovered information in various ways to increase revenue. For example, they could move the beer display closer to the diaper display. And, they could make sure beer and diapers were sold at full price on Thursdays.
Data mining history started about 30 to 40 years ago. It started off as statistical analysis, promoted by two companies SAS (Statistical Analysis System) and SPSS (IBM Company).Statistics with regression analysis, standard distribution/deviation/variance, cluster analysis, confidence intervals is still important but today new techniques add greatly to the power of the statistics routines. New methods such as fuzzy logic, heuristics and neural networks were arriving on the scene in the 1980's. These could be classified into two groups - artificial intelligence and machine learning. First workshops on knowledge discovery in databases (KDD another name for data mining) were in the early 1990's. It could be said that data mining had three sources or roots, namelyStatistics, Artificial Intelligence and Machine learning. Statistics Statistics has been the core of intelligence over the past and has done a significant contribution to the business intelligence sector in the past. Nevertheless the statistics has not been able to produce all the expected outcomes in the complex business requirements of the modern days industries. The classical statistics model was comprised of concepts such as regression analysis, co-relation analysis, standard distribution, variance, standard deviation, and cluster analysis. Hence all these techniques could be identified as study of data and their relationships in a static manner. Artificial Intelligence Artificial Intelligence is another concept that has been an appealing topic in the research groups. The inspiration here was that it used heuristics compared to statistics where it makes an effort to
simulate the human thought process in statistical problems. Although this was an excellent theoretical concept its requirement of high computational power made it impractical in the early 1980s when it came to the lime light. Machine Learning Stemming from the AI model and the classical Statistics model another model came in with union of both named machine learning. Whilst AI was not a major success in the commercial area, machine learning has incorporated many concepts of it as the computational power has become cheap over the past few years. It could be stated that Machine learning is an evolution of AI since it incorporates heuristic model of AI with the advanced statistical analysis. It is such that machine learning lets the computers to learn the data it process and achieves the goal by using the learned data being applied to an advanced statistical model. It is believed that data mining as it is defined today is about 10 15 years old. Below is a diagram that shows the roots of data mining.
Statistics
now tailor your advertising efforts to suit their needs. By doing this, you will greatly increase your chances of earning a profit. Computer algorithms are frequently used indata mining programs; nevertheless the factors which have led to the increasing popularity of data mining technologies are the increase in both processing power and storage. Another thing that has led to the rapid popularity of data mining technology is graphical interfaces. These interfaces have made the programs easier to use, and this has allowed them to be adapted by a larger segment of the population. Artificial neural networks are a cutting edge technology that is being used more in data mining applications. Unlike computer algorithms, neural networks are not linear, and are capable of learning. Neural networks are modeled after the human mind, and have powerful applications in data mining that have not been fully explored. In addition to this, decision trees play an important role in the development of data mining programs. As the name implies, decision trees are structures have a number of different decisions. Each decision could be called a branch. The decisions define the rules for a given set of data. The next element that makes up an important part of data mining is called rule induction. A rule induction will pull rules from data which are based on an "if-then" scenario. The next part that makes up data mining is a genetic algorithm. The genetic algorithm will utilize techniques that are based on mutation and natural selection. The last important part of data mining tools is called the nearest neighbor. The nearest neighbor will categorize records with other records that are similar within a database.
There are a number of real-world applications of data mining programs. Generally, having information which is highly detailed will allow you to make predictions that are equally detailed. Using this detailed information to make predictions about the behavior of your customers can allow you to make large profits. Companies can use data mining tools to get answers to complex questions. For example, a credit card company that wants to increase its revenues could use data mining to find out if reducing the minimum payments would allow them to earn more interest. If the company has detailed information related to their customers, they should be able to make accurate predictions about how customers will react to policies.
sources.
2. Data Selection: We may not use all the data we have collected in the first step. So in this
step we select only those data which we think useful for data mining.
3. Data Cleaning: The data that have been collected are not clean and may contain errors,
missing values, noisy or inconsistent data. So we need to apply different techniques to get rid of such anomalies.
4. Data Transformation: The data even after cleaning are not ready for mining as we need to
transform them into forms appropriate for mining. The techniques used to accomplish this are smoothing, aggregation, normalization etc.
5. Data Mining: Now we are ready to apply data mining techniques on the data to discover
the interesting patterns. Techniques like clustering and association analysis are among the many different techniques used for data mining.
6. Pattern Evaluation and Knowledge Presentation: This step involves visualization,
Extract, transform, and load transaction data onto the data warehouse system. Store and manage the data in a multidimensional database system. Provide data access to business analysts and information technology professionals. Analyze the data by application software. Present the data in a useful format, such as a graph or table.
10
Operational or transactional data such as, sales, cost, inventory, payroll, and accounting. Nonoperational data, such as industry sales, forecast data, and macro-economic data. Meta data - data about the data itself, such as logical database design or data dictionary definition.
Information The patterns, associations, or relationships among all this data can provide information. For example, analysis of retail point of sale transaction data can yield information on which products are selling and when. Knowledge Information can be converted into knowledge about historical patterns and future trends. For example, summary information on retail supermarket sales can be analyzed in light of promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or retailer could determine which items are most susceptible to promotional efforts.
11
companies. It helps retail companies to offer certain discount for particular products that will attract customers. Useful and accurate trends about an organization customers purchasing behavior are obtained. The data gathered can be sanitized for different departments within the organization. Data mining provides financial institutions with information about loans and credit reporting. By building a model from previous customer data with common characteristics, the institution can estimate the good and/ or bad loans and its risk level. It also helps banks to detect fraudulent credit card transactions to help credit card owners avoid losses. Information/data is available for future reference, to project market trends. Manufacturers can detect faulty equipment and determine optimal control parameters by applying data mining in operational engineering data.
12
Management decisions can be made to provide greater benefit to the staff and organization. Data mining helps government agency by digging and analyzing records of financial transaction to build patterns that can detect money laundering or criminal activity.
enormously increased concerns about personal privacy, and as such people are afraid of their personal information being collected and used in unethical ways that may cause harm. Businesses collect information about their customers in many ways for understanding their purchasing behaviors trends. However businesses sometimes close down their operations or acquired by others. At this time the personal information they own may be sold or leaked to a third party. Security Issues: Businesses owns information about their employees and customers including social security number, birthday, payroll and other personal information, but the issue of whether or not this information is properly stored is a great concerns. There have been a lot of cases that hackers accessed and stole enormous data of customers from big corporation such as Ford Motor Credit Company, Sony and others. With so much personal and financial information available, the credit card stolen and identity theft is a big problem.
13
Misuse of information/inaccurate information: Information collected through data mining intended for marketing or ethical purposes can be misused. Information is demoralized by unethical people or business to take advantage of vulnerable people or discriminate against a group of people. Data mining technique is not necessarily accurate; therefore if inaccurate information is used for decision-making this will cause serious consequences.
14
Digicel is a sample company that uses data mining. After 11 years of operation, Digicel Group Limited has over 12.8 million customers across its thirty-two markets in the Caribbean, Central America and the Pacific. The company is renowned for delivering best value, best service and best network.
Digicel is the lead sponsor of Caribbean, Central American and Pacific sports teams, including the Special Olympics teams throughout these regions. Digicel sponsors the West Indies cricket team and is also the title sponsor of the Digicel Caribbean Cup. In the Pacific, Digicel is the proud sponsor of several national rugby teams and also sponsors the Vanuatu cricket team.
Digicel also runs a host of community-based initiatives across its markets and has set up Digicel Foundations in Jamaica, Haiti and Papua New Guinea which focus on educational, cultural and social development programs.
It is extremely important for data to be captured and store for all the various markets that Digicel operates in order to make management decisions on a daily basis. Three (3) software are used in data mining coupled with other Applications, software that are used: Sales Force, Customer Relation Manager (CRM) and E-care. Thesoftwares are used to capture relevant data from customers and staff for various departments. Several reports are generated from the data acquired and job cards, decisions are made based on these reports.
15
Conclusion
In the short term the results of data mining will be in profitable, if mundane, business related areas. Micro marketing campaigns will explore new niches.Advertising will target potential customers with new precision. In the medium term, data mining may be as common and easy to use as email. We may use these tools to find the best airfare to New York, root out a phone number of a long-lost classmate or find the best prices on lawn mowers. In the long-term, decisions about the sustainability, continuity and profitability of an organization can be made, to project revenue growth and investments. Imagine intelligent agents turned loose on medical research data or on sub-atomic particle data. Computers may reveal new treatments for disease or new insights into the nature of the universe.
16
References
Data Mining and Housing History http://dataminingandhousinghistory.com: Retrieved on September 11, 2012 at 9:35 p.m.
Business Research Methods/Research Methodology http://www.researchmethodology.info/data-mining/: Retrieved on September 14, 2012 at 9:32 a.m.
Business Research Methods 8th edition Authors: William G. Zikmund/ Barry J. Babin/ John C. Carr/ Mitch Griffin.
Digicel University Digicel Group, Jamaica, RKA 10-16 Grenada Way, Kingston 5
17
APPENDIX
18