0% found this document useful (0 votes)
14 views

DM 5th unit ppt

Uploaded by

Sandhya Rani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

DM 5th unit ppt

Uploaded by

Sandhya Rani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Mining Time series Data:

• A Time Series database consists of sequences of values or events


obtained over repeated measurements of time.
Ex: for every 2 mins
• The Values are typically measured at equal time intervals .
Ex: hourly, daily, weekly
• A Time series database is also know as “sequence database”.
• Time series databases are popularly in many app locations such as
- Stock market Analysis
- Observation of natural phenomena
• Such as atmosphere , temperature, wind, earth quake, scientific and
engineering experiments, medical treatments.
Mining Time series Data:

• Every organization generates a high-volume of data every single day


- Sales figure
- revenue
- Traffic or operating cost
•Time series data mining can generate valuable information for long-
term business decisions.
•Time series are very frequently plotted via line- charts.
•Time series “fore casting “ is the use of a model to predict future
values based on previously observed values.
Strengths:
• A lot of well established algorithm
• Fore casting time-series can be a very hard task due to the inherent
uncertainty nature of these systems.
• Fast computation is possible.
Weakness:
Some cases correct information but wrong results in weather report.
• Some times, the past of the time-series is not enough to predict the
future.
• How to efficiently deal with outliers
• How to efficiently deal with multiple- periodicities.
Applications of Time series Analysis:
• Economic forecasting
• Sales forecasting
• Budgetary forecasting
• Stock market Analysis
• Yield Projections
• Process and Quality Control
• Inventory Studies
• Work Load Projections
• Utility Studies
• Census Analysis
Examples of Time Series:

• Sales and profits of a product of a company in different years.


• National Income measured for recent 10 years.
• Monthly Bank Deposits and Bank clearings.
• Daily Sales of milk and milk products in month of a milk dairy.
• Shares in stock exchange in all the days of a week.
Spatial Data Mining:
It is the process of discovering potentially useful patterns from large
spatial datasets.
Spatial Database:
It stores a large amount of space related data, such as maps,
preprocessed remote sensing or medical imaging data and VLSI chip
layout.
Eg: GIS, ISRO, NASA, RADAR Data etc….
Properties of spatial Data Mining
• Exploring New models
• New objective Functions
• New pattern
Applications of Spatial Data Mining:
• Use in Geographical Model
• Used in Analysis
• To provide Business intelligence
• Used in Research Purpose.
Spatial Data Mining Tasks:
Basic Tasks are Spatial Data Mining are:
• Classification
• Association Rules
• Characteristics Rules
• Discriminate Rules
• Clustering
• Trend Detection
1. Classification:
Finds a set of rules which determine the class of the classified
object according to it’s attributes.
2. Association Rules:
Find rules from the database. Association rules describe patterns
which are often in the database.
3. Characteristics Rules:
Describe some part of database
eg: “ Bridge is an object in the place where a road crosses a river.
4. Discriminate Rules:
Describe differences between two parts of database
eg: find differences b/w cities with high & low unemployment rate.
5. Clustering:
Groups the object in one cluster are similar and objects from
different clusters are dissimilar.
6. Trend Detection:
• finds trends in database.
• A trend is a temporal pattern in some time series data.
• A spatial trend is defined as a pattern of change of a non-spatial
attribute in the neighbor hood of spatial object.
Spatial Data Mining Techniques:
• There is no Unique way of classifying SDM techniques.
• Various kinds of patterns can be discovered from databases can be presented
in different forms.
SDM techniques as follows:
1. Clustering & outlier detection
a) Partitioning Method
b) Hierarchal Method
c) Density Based Method
d) Grid-Based Method
2. Association & co-location
3. Classification
4. Trend Dictions
1. Clustering and Outlier Detection:
• Spatial Clustering is a process of grouping a set of spatial objects into
groups called clusters.
• Objects with in a cluster show a high degree of similarity, where as
the clusters are much disimilar as possible.
• Clustering is a very well known technique in satistics and clustering
algorithm to deal with the large geographical datasets.
Clustering algorithms can be separated into four general categories:
a) Partitioning Method
b) Hierarchal Method
c) Density Based Method
d) Grid-Based Method
a) Partitioning Method:
• Partitioning algorithm organizes the objects into clusters such that the
total deviation of each object from it’s cluster center is minimized.
• At beginning each object is classified as a single clusters.
• K-means is commonly used fundamental partitioning algorithm.
b) Hierarchal Method :
• Hierarchical method decomposes the dataset of splitting or merging
all clusters until a stopping criterion is met.
• Some of the recently used hierarchical clustering algorithms are
“ Balanced Iterative Reducing and Clustering using hierarchies and
clustering using representatives” .
c) Density Based Method :
• The method regards clusters as dense regions of objects that are separated by
regions of low density.
• It contrast to partitioning methods clusters of arbitrary shapes can be
discovered.
• Density -based methods can be used to filter out noise and outliers.
d) Grid-Based Method :
• Grid Based Clustering algorithms first quartile clustering space into a finite
number of cells and then perform the required operations on the grid
structure.
• Cells that contain more than a certain number of points are treated as dense.
• The main advantage of the approach it’s fast processing time , since the time is
independent on the number of data objects, dependent on the number of
cells.
2. Association & co-location :
• When performing , clustering methods on the data, we can find only
characteristics rules, describing spatial objects according to their non-
spatial attributes.
3. Classification :
• Every data object stored in database is characterized by it’s attributes.
• Classification is technique, Which aim is to find rules that describe the
partition of the database into an explicitly given set of classes.
4. Trend Dictions :
• A spatial trend is a regular change of one or more non-spatial attributes
when spatially moving away from a start object.
• Spatial Trend detection is a technique for finding patterns of the attribute
changes with respect to the neighborhood of some spatial object.
Challenges of Web Mining:
a) Complexity of Web pages:
• The site pages don’t have a unifying structure.
• They are extremely complicated as compared to traditional text
document.
b) The Web is dynamic data source:
The data on the intrenet is quickly updated for ex : News,
climate, shopping, financial news, sports------etc
c) Diversity of client networks:
• The client network on the web is quickly expanding.
• These clients have different intrests, backgrounds and usage purpose.
d) Relavancy of data:
• It is considered that a specific person is generally concerned about a
small partion of the web, while the rest of the segment of the web
contains the data that is not user and may lead to unwanted results.
e) The web is too broad:
• The size of the web is tremendous and rapidly increasing
• It appears that the web is too huge for data ware housing and data
mining.
Web Mining

Web Content Mining Web Usage Mining Web Structure Mining

Clustering Classification Association


Text Mining :
It is the process of “ Extracting required data” and consists of
large collections of documents from various sources.
Ex: news, articles, research papers, books, digital, libraries, electronic
publications, e-mail messages, electronic documents and web pages
etc…
Goal : Finding the patterns trends across multiple documents.
• Text mining is the part of data mining which involves processing of tet
from documents.
• The text is used to “ gather high quality information “ .
• Computational logistics principles are used to evaluate text.
Text Mining:
• In Text Mining data is stored in Unstructured Format
• It is used to in fields like bio-science and consumers profile analysis.
• Text-Mining is basically an AI technology that involves processing the
data from various Text-documents.
Text Mining Process:
a) Text Transformation:
• A text transformation is a technique that is used to control the
capabilization of the text.
• The two way of document representation is given
- bag of words
- vector space
b) Text – preprocessing:
For extracting useful information and knowledge from unstructured text
data.
Feature selection:
• The process of reducing the input of processing or finding the essential
information sources.
• The feature selection is also called “variable selection”.
Evaluate:
Computational logistics principles are used to evaluate text.
Applications:
- Online library catalogue system.
- Online library document management system.
- Web search engines.
Basic Measures:
• Precision
• Recall
• F-Score
1. Precision:
|{Relevant}^{Retrieved}|
Precision=
|{ Retrieved}|
2. Recall :

|{Relevant}^ {Retrieved}|
Recall =
|{ Relevant}|
3. F- Score:
Recall * Precision
F-Score =
( recall + Precision)\2
Text Mining:
Fig: Relationship Between the set of relevant documents and set of
retrieved document Relevant + Retrieved

Relevant Documents Retrieved Documents

All Documents
Text Retrieval Methods:
1. Document Selection Method:
Boolean Retrieval model ( and/ OR /Not)
2. Document Ranking Method: The goal is to approximate the
degree of relevance of a document with a score computed based on
information such as the frequency of words in the document and
the whole collection.
Tokenization:
Stop list: Regularly used terms a, the, for, with
Word stem : long, longer, longest------
Multi Media Mining
• Multi media data mining is used for Extracting intresting
information for Multi media data set.
• Multi media mining is a sub field of data mining which is used to find
interesting information of implicit knowledge from multitime data
bases.
• Audio data
• Video data
• Image data
• Graphical data
• Speech data
• Text Data
Categories of Multi Media data Mining:
Multi Media Data Mining

Video
Text Dynamic Media
Static Media Mining
Ming

Audio
Image
Mining
Mining
• The Multi Media Data Mining is classified into Two categories are
Static and Dynamic media.
• Static media contains text ( digital library, creating sms & mms) and
images ( photos & media images)
• Dynamic media contains Audio ( Music & MP3 sounds) & (video like
movies).
Applications of Multimedia Mining:
• Digital Library
• Traffic video sequences
• Media Analysis
• Customer Perception
• Media Making and Broad Casting
• Mobiles
• Digital cameras
• Internet------etc
Multimedia Data Mining Processing:
• Data Collection is the initial stage of the learning s/m pre-processing is
to extract significant features from raw data, it includes data cleaning,
transformation , normalization, features extaction etc----
• Learning can be direct, if informative types can be recognized at pre-
processing data\ stage.
• Complete process depends extremely on the nature of raw data and
difficulty field
• The product of pre- processing is the training set.
Multimedia Data Mining Processing:

Data Collection Feature


Extraction
Raw Data

Data Preprocessing
-Data Cleaning
-Feature Selection
Training set

Machine Learning
Model
Architecture of Multimedia Data Mining:
• The Architecture has several components:
1. Input
2. Multimedia content
3. Spation temporal segmentation
4. Feature Extraction
5. Finding the similar pattern.
Architecture of Multimedia Data Mining:
Vid
text Im eo
Input Multimedia Contents Aud
age io

Spatiotemporal segmentation

Text Image Audio Video Feature Extraction

Finding the similar Patterns Evolution of results


Multimedia Data Mining:
a) Similarity search in multi media data
b) Multi dimensional Analysis of Multimedia data
c) Classification & Prediction Analysis of multi media data.
d) Mining Associations in Multi media data
e) Image Analysis , pattern recognization, digital image content mining.
f) Mining associations in multimedia data is “ Associations b/w image
contents and non image contents”.
g) Association among image contents related to spatial relationships.
Multimedia Data Mining:
Tool: Multimedia Miner an extraction for Data base miner.
Procedure : Feature Extraction is Descriptor only descriptions of
image.
• Layout descriptor image grid is 8*8 & 4*4 -----& 64 cells stored.
Feature: Image Excavator is extraction uses image context information
HTML tag’s.
• Hieratical of keywords searched in directions
• It is a combination of Text, graphic, sound, animation & video that is
delivered imitatively to the user by electronic or digitally manipulated
means.
Spatial Data Mining:
• Process of Discovering interesting and Preprocessing Unknown but
potentially useful patterns from large spatial database.
• Stores a large amount of space related data such as maps
preprocessed remote sensing on medical image data.
• Spatial database is a database systems that is optimized to store and
query basic spatial objects.
• Point- a house, a city, a moving car.

You might also like