Unit-V-Introduction to Data Mining.pptxHarsha Patel
Data mining involves extracting useful patterns from large data sets to help businesses make informed decisions. It allows organizations to obtain knowledge from data, make improvements, and aid decision making in a cost-effective manner. However, data mining tools can be difficult to use and may not always provide precise results. Knowledge discovery is the overall process of discovering useful information from data, which includes steps like data cleaning, integration, selection, transformation, and mining followed by pattern evaluation and presentation of knowledge.
This document discusses data mining and related topics. It begins by defining data mining as the process of discovering patterns in large datasets using methods from machine learning, statistics, and database systems. The document then discusses data warehouses, how they work, and their role in data mining. It describes different data mining functionalities and tasks such as classification, prediction, and clustering. The document outlines some common data mining applications and issues related to methodology, performance, and diverse data types. Finally, it discusses some social implications of data mining involving privacy, profiling, and unauthorized use of data.
Data mining involves analyzing large amounts of data to discover patterns that can be used for purposes such as increasing sales, reducing costs, or detecting fraud. It allows companies to better understand customer behavior and develop more effective marketing strategies. Common data mining techniques used by retailers include loyalty programs to track purchasing patterns and target customers with personalized coupons. Data mining software uses techniques like classification, clustering, and prediction to analyze data from different perspectives and extract useful information and patterns.
Brief description of the 3 mining techniques and we give a brief description of the differences between them and the similarities. Finally we talked about the shared techniques.
This document outlines the learning objectives and resources for a course on data mining and analytics. The course aims to:
1) Familiarize students with key concepts in data mining like association rule mining and classification algorithms.
2) Teach students to apply techniques like association rule mining, classification, cluster analysis, and outlier analysis.
3) Help students understand the importance of applying data mining concepts across different domains.
The primary textbook listed is "Data Mining: Concepts and Techniques" by Jiawei Han and Micheline Kamber. Topics that will be covered include introduction to data mining, preprocessing, association rules, classification algorithms, cluster analysis, and applications.
- Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. It involves steps like data cleaning, integration, selection, transformation, mining, pattern evaluation and knowledge presentation.
- There are various types of data that can be mined, including database data, data warehouses, transactional data, text data, web data, time-series data, images, audio, video and others. Common data mining techniques include characterization, discrimination, clustering, classification, regression, and outlier detection. The goal is to extract useful patterns from data for tasks like prediction and description.
Congress,[x] and a new mood in Britain for rapid decolonisation in India.[y][41][44]
Bose's legacy is mixed. Among many in India, he is seen as a hero, his saga serving as a would-be counterpoise to the many actions of regeneration, negotiation, and reconciliation over a quarter-century through which the independence of India was achieved.[z][aa][ab] His collaborations with Japanese Fascism and Nazism pose serious ethical dilemmas,[ac] especially his reluctance to publicly criticize the worst excesses of German anti-Semitism from 1938 onwards or to offer refuge in India to its victims.
Chittaranjan Das, a voice for aggressive nationalism in Bengal. In 1923, Bose was elected the President of Indian Youth Congress and also the Secretary of the Bengal State Congress. He became the editor of the newspaper "Forward", which had been founded by Chittaranjan Das.[81] Bose worked as the CEO of the Calcutta Municipal Corporation for Das when the latter was elected mayor of Calcutta in 1924.[82] During the same year, when Bose was leading a protest march in Calcutta, he, Maghfoor Ahmad Ajazi and other leaders were arrested and imprisoned.[83][failed verification] After a roundup of nationalists in 1925, Bose was sent to prison in Mandalay, British Burma, where he contracted tuberculosis.[84]
Subhas Bose (in military uniform) with Congress president, Motilal Nehru taking the salute. Annual meeting, Indian National Congress, 29 December 1928
In 1927, after being released from prison, Bose became general secretary of the Congress party and worked with Jawaharlal Nehru for independence. In late December 1928, Bose organised the Annual Meeting of the Indian National Congress in Calcutta.[85] His most memorable role was as General officer commanding (GOC) Congress Volunteer Corps.[85] Author Nirad Chaudhuri wrote about the meeting:
Bose organized a volunteer corps in uniform, its officers were even provided with steel-cut epaulettes ... his uniform was made by a firm of British tailors in Calcutta, Harman's. A telegram addressed to him as GOC was delivered to the British General in Fort William and was the subject of a good deal of malicious gossip in the (British Indian) press. Mahatma Gandhi as a sincere pacifist vowed to non-violence, did not like the strutting, clicking of boots, and saluting, and he afterward described the Calcutta session of the Congress as a Bertram Mills circus, which caused a great deal of indignation among the Bengalis.[85]
A little later, Bose was again arrested and jailed for civil disobedience; this time he emerged to become Mayor of Calcutta in 1930.[84]
Chittaranjan Das, a voice for aggressive nationalism in Bengal. In 1923, Bose was elected the President of Indian Youth Congress and also the Secretary of the Bengal State Congress. He became the editor of the newspaper "Forward", which had been founded by Chittaranjan Das.[81] Bose worked as the CEO of the Calcutta Municipal Corporation for Das when the latter was elected ma
The document discusses data warehousing, data mining, and business intelligence. It defines data warehousing as a solution for fast analysis of information that operational systems cannot provide, due to limitations like unavailable historical data and poor query performance. It describes the architecture of data warehousing and lists databases, data warehouses, and transactional data as sources for data mining. The data mining process involves data collection, feature extraction, cleaning, and analytical algorithms. Common techniques are discussed as well. Business intelligence is defined as converting corporate data through processing and analysis into useful information and knowledge to trigger profitable business decisions.
UNIT - 5: Data Warehousing and Data MiningNandakumar P
UNIT-V
Mining Object, Spatial, Multimedia, Text, and Web Data: Multidimensional Analysis and Descriptive Mining of Complex Data Objects – Spatial Data Mining – Multimedia Data Mining – Text Mining – Mining the World Wide Web.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Additional themes of data mining for Msc CSThanveen
Data mining involves using computational techniques from machine learning, statistics, and database systems to discover patterns in large data sets. There are several theoretical foundations of data mining including data reduction, data compression, pattern discovery, probability theory, and inductive databases. Statistical techniques like regression, generalized linear models, analysis of variance, and time series analysis are also used for statistical data mining. Visual data mining integrates data visualization techniques with data mining to discover implicit knowledge. Audio data mining uses audio signals to represent data mining patterns and results. Collaborative filtering is commonly used for product recommendations based on opinions of other customers. Privacy and security of personal data are important social concerns of data mining.
Data mining involves extracting hidden predictive information from large databases. It uses techniques like neural networks, decision trees, visualization, and link analysis. The data mining process involves exploration of the data, building and validating models, and deploying the results. Popular data mining software packages include R, which is open source and flexible, and SAS Enterprise Miner, which has an easy to use interface and supports a variety of techniques.
Data mining is the process of extracting hidden predictive information from large databases to help companies understand their data. It involves collecting, storing, accessing, and analyzing data to identify patterns and trends. Common data mining techniques include neural networks, decision trees, visualization, link analysis, and clustering. The overall process involves exploration of the data, building and validating predictive models, and deploying the results. Popular data mining software packages include R, RapidMiner, SAS Enterprise Miner, and SPSS Modeler due to their ease of use, flexibility, and variety of algorithms.
This document discusses different methods for analyzing qualitative and quantitative user research data. It describes various techniques for simple quantitative analysis including calculating percentages, averages, and identifying patterns through graphical representations. The document also outlines three major theoretical frameworks that can be applied to qualitative data analysis: grounded theory, distributed cognition, and activity theory. Presenting findings may involve graphical representations, rigorous notations, stories, or summaries highlighting key results and statistics.
This document discusses different methods for analyzing qualitative and quantitative user research data. It describes various techniques for simple quantitative analysis including calculating percentages, averages, and identifying patterns through graphical representations. The document also outlines three major theoretical frameworks that can be applied to qualitative data analysis: grounded theory, distributed cognition, and activity theory. Presenting findings may involve graphical representations, rigorous notations, storytelling, or summarizing key highlights and statistics.
The document introduces various applications of data mining including finance, retail, telecommunications, biology, science, engineering, and intrusion detection. It then provides more details on data mining applications in each of these domains. For financial data analysis, it discusses constructing data warehouses, loan prediction, customer segmentation for marketing, and detecting money laundering. For retail, it discusses data warehouses, sales analysis, customer retention, and product recommendations. For telecommunications, it discusses network analysis, fraud detection, and improving resource usage.
This document introduces data mining. It defines data mining as the process of extracting useful information from large databases. It discusses technologies used in data mining like statistics and machine learning. It also covers data mining models and tasks such as classification, regression, clustering, and forecasting. Finally, it provides an overview of the data mining process and examples of data mining tools.
Data mining refers to extracting knowledge from large amounts of data and involves techniques from machine learning, statistics, and databases. A typical data mining system includes a database, data mining engine, pattern evaluation module, and graphical user interface. The knowledge discovery in data (KDD) process involves data cleaning, integration, selection, transformation, mining, evaluation, and presentation to extract useful patterns from data. KDD is the overall process while data mining is one step, applying algorithms to extract patterns for analysis.
This document provides an introduction to data mining techniques. It discusses how data mining emerged due to the problem of data explosion and the need to extract knowledge from large datasets. It describes data mining as an interdisciplinary field that involves methods from artificial intelligence, machine learning, statistics, and databases. It also summarizes some common data mining frameworks and processes like KDD, CRISP-DM and SEMMA.
This document outlines the learning objectives and resources for a course on data mining and analytics. The course aims to:
1) Familiarize students with key concepts in data mining like association rule mining and classification algorithms.
2) Teach students to apply techniques like association rule mining, classification, cluster analysis, and outlier analysis.
3) Help students understand the importance of applying data mining concepts across different domains.
The primary textbook listed is "Data Mining: Concepts and Techniques" by Jiawei Han and Micheline Kamber. Topics that will be covered include introduction to data mining, preprocessing, association rules, classification algorithms, cluster analysis, and applications.
- Data mining is the process of discovering interesting patterns and knowledge from large amounts of data. It involves steps like data cleaning, integration, selection, transformation, mining, pattern evaluation and knowledge presentation.
- There are various types of data that can be mined, including database data, data warehouses, transactional data, text data, web data, time-series data, images, audio, video and others. Common data mining techniques include characterization, discrimination, clustering, classification, regression, and outlier detection. The goal is to extract useful patterns from data for tasks like prediction and description.
Congress,[x] and a new mood in Britain for rapid decolonisation in India.[y][41][44]
Bose's legacy is mixed. Among many in India, he is seen as a hero, his saga serving as a would-be counterpoise to the many actions of regeneration, negotiation, and reconciliation over a quarter-century through which the independence of India was achieved.[z][aa][ab] His collaborations with Japanese Fascism and Nazism pose serious ethical dilemmas,[ac] especially his reluctance to publicly criticize the worst excesses of German anti-Semitism from 1938 onwards or to offer refuge in India to its victims.
Chittaranjan Das, a voice for aggressive nationalism in Bengal. In 1923, Bose was elected the President of Indian Youth Congress and also the Secretary of the Bengal State Congress. He became the editor of the newspaper "Forward", which had been founded by Chittaranjan Das.[81] Bose worked as the CEO of the Calcutta Municipal Corporation for Das when the latter was elected mayor of Calcutta in 1924.[82] During the same year, when Bose was leading a protest march in Calcutta, he, Maghfoor Ahmad Ajazi and other leaders were arrested and imprisoned.[83][failed verification] After a roundup of nationalists in 1925, Bose was sent to prison in Mandalay, British Burma, where he contracted tuberculosis.[84]
Subhas Bose (in military uniform) with Congress president, Motilal Nehru taking the salute. Annual meeting, Indian National Congress, 29 December 1928
In 1927, after being released from prison, Bose became general secretary of the Congress party and worked with Jawaharlal Nehru for independence. In late December 1928, Bose organised the Annual Meeting of the Indian National Congress in Calcutta.[85] His most memorable role was as General officer commanding (GOC) Congress Volunteer Corps.[85] Author Nirad Chaudhuri wrote about the meeting:
Bose organized a volunteer corps in uniform, its officers were even provided with steel-cut epaulettes ... his uniform was made by a firm of British tailors in Calcutta, Harman's. A telegram addressed to him as GOC was delivered to the British General in Fort William and was the subject of a good deal of malicious gossip in the (British Indian) press. Mahatma Gandhi as a sincere pacifist vowed to non-violence, did not like the strutting, clicking of boots, and saluting, and he afterward described the Calcutta session of the Congress as a Bertram Mills circus, which caused a great deal of indignation among the Bengalis.[85]
A little later, Bose was again arrested and jailed for civil disobedience; this time he emerged to become Mayor of Calcutta in 1930.[84]
Chittaranjan Das, a voice for aggressive nationalism in Bengal. In 1923, Bose was elected the President of Indian Youth Congress and also the Secretary of the Bengal State Congress. He became the editor of the newspaper "Forward", which had been founded by Chittaranjan Das.[81] Bose worked as the CEO of the Calcutta Municipal Corporation for Das when the latter was elected ma
The document discusses data warehousing, data mining, and business intelligence. It defines data warehousing as a solution for fast analysis of information that operational systems cannot provide, due to limitations like unavailable historical data and poor query performance. It describes the architecture of data warehousing and lists databases, data warehouses, and transactional data as sources for data mining. The data mining process involves data collection, feature extraction, cleaning, and analytical algorithms. Common techniques are discussed as well. Business intelligence is defined as converting corporate data through processing and analysis into useful information and knowledge to trigger profitable business decisions.
UNIT - 5: Data Warehousing and Data MiningNandakumar P
UNIT-V
Mining Object, Spatial, Multimedia, Text, and Web Data: Multidimensional Analysis and Descriptive Mining of Complex Data Objects – Spatial Data Mining – Multimedia Data Mining – Text Mining – Mining the World Wide Web.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Additional themes of data mining for Msc CSThanveen
Data mining involves using computational techniques from machine learning, statistics, and database systems to discover patterns in large data sets. There are several theoretical foundations of data mining including data reduction, data compression, pattern discovery, probability theory, and inductive databases. Statistical techniques like regression, generalized linear models, analysis of variance, and time series analysis are also used for statistical data mining. Visual data mining integrates data visualization techniques with data mining to discover implicit knowledge. Audio data mining uses audio signals to represent data mining patterns and results. Collaborative filtering is commonly used for product recommendations based on opinions of other customers. Privacy and security of personal data are important social concerns of data mining.
Data mining involves extracting hidden predictive information from large databases. It uses techniques like neural networks, decision trees, visualization, and link analysis. The data mining process involves exploration of the data, building and validating models, and deploying the results. Popular data mining software packages include R, which is open source and flexible, and SAS Enterprise Miner, which has an easy to use interface and supports a variety of techniques.
Data mining is the process of extracting hidden predictive information from large databases to help companies understand their data. It involves collecting, storing, accessing, and analyzing data to identify patterns and trends. Common data mining techniques include neural networks, decision trees, visualization, link analysis, and clustering. The overall process involves exploration of the data, building and validating predictive models, and deploying the results. Popular data mining software packages include R, RapidMiner, SAS Enterprise Miner, and SPSS Modeler due to their ease of use, flexibility, and variety of algorithms.
This document discusses different methods for analyzing qualitative and quantitative user research data. It describes various techniques for simple quantitative analysis including calculating percentages, averages, and identifying patterns through graphical representations. The document also outlines three major theoretical frameworks that can be applied to qualitative data analysis: grounded theory, distributed cognition, and activity theory. Presenting findings may involve graphical representations, rigorous notations, stories, or summaries highlighting key results and statistics.
This document discusses different methods for analyzing qualitative and quantitative user research data. It describes various techniques for simple quantitative analysis including calculating percentages, averages, and identifying patterns through graphical representations. The document also outlines three major theoretical frameworks that can be applied to qualitative data analysis: grounded theory, distributed cognition, and activity theory. Presenting findings may involve graphical representations, rigorous notations, storytelling, or summarizing key highlights and statistics.
The document introduces various applications of data mining including finance, retail, telecommunications, biology, science, engineering, and intrusion detection. It then provides more details on data mining applications in each of these domains. For financial data analysis, it discusses constructing data warehouses, loan prediction, customer segmentation for marketing, and detecting money laundering. For retail, it discusses data warehouses, sales analysis, customer retention, and product recommendations. For telecommunications, it discusses network analysis, fraud detection, and improving resource usage.
This document introduces data mining. It defines data mining as the process of extracting useful information from large databases. It discusses technologies used in data mining like statistics and machine learning. It also covers data mining models and tasks such as classification, regression, clustering, and forecasting. Finally, it provides an overview of the data mining process and examples of data mining tools.
Data mining refers to extracting knowledge from large amounts of data and involves techniques from machine learning, statistics, and databases. A typical data mining system includes a database, data mining engine, pattern evaluation module, and graphical user interface. The knowledge discovery in data (KDD) process involves data cleaning, integration, selection, transformation, mining, evaluation, and presentation to extract useful patterns from data. KDD is the overall process while data mining is one step, applying algorithms to extract patterns for analysis.
This document provides an introduction to data mining techniques. It discusses how data mining emerged due to the problem of data explosion and the need to extract knowledge from large datasets. It describes data mining as an interdisciplinary field that involves methods from artificial intelligence, machine learning, statistics, and databases. It also summarizes some common data mining frameworks and processes like KDD, CRISP-DM and SEMMA.
Cyber law governs information technology and the digital circulation of information. It encompasses aspects of contract, intellectual property, privacy, and data protection laws. Cyber law is important because it recognizes electronic documents and supports e-commerce transactions while also providing a legal structure to address cyber crimes. The key areas of cyber law include fraud, copyright, defamation, harassment and stalking, freedom of speech, trade secrets, and contracts related to terms of service agreements.
Introduction to ANN, McCulloch Pitts Neuron, Perceptron and its Learning
Algorithm, Sigmoid Neuron, Activation Functions: Tanh, ReLu Multi- layer Perceptron
Model – Introduction, learning parameters: Weight and Bias, Loss function: Mean
Square Error, Back Propagation Learning Convolutional Neural Network, Building
blocks of CNN, Transfer Learning, R-CNN,Auto encoders, LSTM Networks, Recent
Trends in Deep Learning.
Jacob Murphy Australia - Excels In Optimizing Software ApplicationsJacob Murphy Australia
In the world of technology, Jacob Murphy Australia stands out as a Junior Software Engineer with a passion for innovation. Holding a Bachelor of Science in Computer Science from Columbia University, Jacob's forte lies in software engineering and object-oriented programming. As a Freelance Software Engineer, he excels in optimizing software applications to deliver exceptional user experiences and operational efficiency. Jacob thrives in collaborative environments, actively engaging in design and code reviews to ensure top-notch solutions. With a diverse skill set encompassing Java, C++, Python, and Agile methodologies, Jacob is poised to be a valuable asset to any software development team.
The use of huge quantity of natural fine aggregate (NFA) and cement in civil construction work which have given rise to various ecological problems. The industrial waste like Blast furnace slag (GGBFS), fly ash, metakaolin, silica fume can be used as partly replacement for cement and manufactured sand obtained from crusher, was partly used as fine aggregate. In this work, MATLAB software model is developed using neural network toolbox to predict the flexural strength of concrete made by using pozzolanic materials and partly replacing natural fine aggregate (NFA) by Manufactured sand (MS). Flexural strength was experimentally calculated by casting beams specimens and results obtained from experiment were used to develop the artificial neural network (ANN) model. Total 131 results values were used to modeling formation and from that 30% data record was used for testing purpose and 70% data record was used for training purpose. 25 input materials properties were used to find the 28 days flexural strength of concrete obtained from partly replacing cement with pozzolans and partly replacing natural fine aggregate (NFA) by manufactured sand (MS). The results obtained from ANN model provides very strong accuracy to predict flexural strength of concrete obtained from partly replacing cement with pozzolans and natural fine aggregate (NFA) by manufactured sand.
In modern aerospace engineering, uncertainty is not an inconvenience — it is a defining feature. Lightweight structures, composite materials, and tight performance margins demand a deeper understanding of how variability in material properties, geometry, and boundary conditions affects dynamic response. This keynote presentation tackles the grand challenge: how can we model, quantify, and interpret uncertainty in structural dynamics while preserving physical insight?
This talk reflects over two decades of research at the intersection of structural mechanics, stochastic modelling, and computational dynamics. Rather than adopting black-box probabilistic methods that obscure interpretation, the approaches outlined here are rooted in engineering-first thinking — anchored in modal analysis, physical realism, and practical implementation within standard finite element frameworks.
The talk is structured around three major pillars:
1. Parametric Uncertainty via Random Eigenvalue Problems
* Analytical and asymptotic methods are introduced to compute statistics of natural frequencies and mode shapes.
* Key insight: eigenvalue sensitivity depends on spectral gaps — a critical factor for systems with clustered modes (e.g., turbine blades, panels).
2. Parametric Uncertainty in Dynamic Response using Modal Projection
* Spectral function-based representations are presented as a frequency-adaptive alternative to classical stochastic expansions.
* Efficient Galerkin projection techniques handle high-dimensional random fields while retaining mode-wise physical meaning.
3. Nonparametric Uncertainty using Random Matrix Theory
* When system parameters are unknown or unmeasurable, Wishart-distributed random matrices offer a principled way to encode uncertainty.
* A reduced-order implementation connects this theory to real-world systems — including experimental validations with vibrating plates and large-scale aerospace structures.
Across all topics, the focus is on reduced computational cost, physical interpretability, and direct applicability to aerospace problems.
The final section outlines current integration with FE tools (e.g., ANSYS, NASTRAN) and ongoing research into nonlinear extensions, digital twin frameworks, and uncertainty-informed design.
Whether you're a researcher, simulation engineer, or design analyst, this presentation offers a cohesive, physics-based roadmap to quantify what we don't know — and to do so responsibly.
Key words
Stochastic Dynamics, Structural Uncertainty, Aerospace Structures, Uncertainty Quantification, Random Matrix Theory, Modal Analysis, Spectral Methods, Engineering Mechanics, Finite Element Uncertainty, Wishart Distribution, Parametric Uncertainty, Nonparametric Modelling, Eigenvalue Problems, Reduced Order Modelling, ASME SSDM2025
an insightful lecture on "Loads on Structure," where we delve into the fundamental concepts and principles of load analysis in structural engineering. This presentation covers various types of loads, including dead loads, live loads, as well as their impact on building design and safety. Whether you are a student, educator, or professional in the field, this lecture will enhance your understanding of ensuring stability. Explore real-world examples and best practices that are essential for effective engineering solutions.
A lecture by Eng. Wael Almakinachi, M.Sc.
Efficient Algorithms for Isogeny Computation on Hyperelliptic Curves: Their A...IJCNCJournal
We present efficient algorithms for computing isogenies between hyperelliptic curves, leveraging higher genus curves to enhance cryptographic protocols in the post-quantum context. Our algorithms reduce the computational complexity of isogeny computations from O(g4) to O(g3) operations for genus 2 curves, achieving significant efficiency gains over traditional elliptic curve methods. Detailed pseudocode and comprehensive complexity analyses demonstrate these improvements both theoretically and empirically. Additionally, we provide a thorough security analysis, including proofs of resistance to quantum attacks such as Shor's and Grover's algorithms. Our findings establish hyperelliptic isogeny-based cryptography as a promising candidate for secure and efficient post-quantum cryptographic systems.
Dear SICPA Team,
Please find attached a document outlining my professional background and experience.
I remain at your disposal should you have any questions or require further information.
Best regards,
Fabien Keller
2. Data Mining Functionalities
• Data mining functionalities are used to specify the kind of patterns to be
found in data mining tasks.
• In general, data mining tasks can be classified into two categories:
descriptive and predictive.
a) Descriptive mining tasks characterize the general properties of the data
in the database.
b) Predictive mining tasks perform inference on the current data in order to
make predictions.
• Data mining system can able to mine multiple kinds of patterns to
accommodate different user expectations or applications.
• Data mining systems should be able to discover patterns at various
granularity (i.e., different levels of abstraction).
• Data mining systems should also allow users to specify hints to guide or
focus the search for interesting patterns.
3. Common Data Mining Tasks
• Anomaly detection (Outlier/change/deviation detection) – The
identification of unusual data records, that might be interesting or
data errors that require further investigation.
• Association rule learning (Dependency modelling) – Searches for
relationships between variables. For example a supermarket might
gather data on customer purchasing habits. Using association rule
learning, the supermarket can determine which products are
frequently bought together and use this information for marketing
purposes. This is sometimes referred to as market basket analysis.
• Clustering – is the task of discovering groups and structures in the data
that are in some way or another "similar", without using known
structures in the data.
• Classification – is the task of generalizing known structure to apply to
new data. For example, an e-mail program might attempt to
classify an e-mail as "legitimate" or as "spam".
• Regression – attempts to find a function which models the data with
the least error.
• Summarization – providing a more compact representation of the data
set, including Visualization and report generation.
4. Mining Methodology and User
Interaction Issues
• Mining different kinds of knowledge in databases − Different users may be interested
in different kinds of knowledge. Therefore it is necessary for data mining to cover a
broad range of knowledge discovery task.
• Interactive mining of knowledge at multiple levels of abstraction − The data mining
process needs to be interactive because it allows users to focus the search for patterns,
providing and refining data mining requests based on the returned results.
• Incorporation of background knowledge − To guide discovery process and to express
the discovered patterns, the background knowledge can be used. Background
knowledge may be used to express the discovered patterns not only in concise terms
but at multiple levels of abstraction.
• Data mining query languages and ad hoc data mining − Data Mining Query language
that allows the user to describe ad hoc mining tasks, should be integrated with a data
warehouse query language and optimized for efficient and flexible data mining.
• Presentation and visualization of data mining results − Once the patterns are
discovered it needs to be expressed in high level languages, and visual representations.
These representations should be easily understandable.
• Handling noisy or incomplete data − The data cleaning methods are required to handle
the noise and incomplete objects while mining the data regularities. If the data cleaning
methods are not there then the accuracy of the discovered patterns will be poor.
• Pattern evaluation − The patterns discovered should be interesting because either they
represent common knowledge or lack novelty.