0% found this document useful (0 votes)

10 views8 pages

DWH m2p2

data warehouse-data mining

Uploaded by

bibliophileonthesamepage

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views8 pages

DWH m2p2

data warehouse-data mining

Uploaded by

bibliophileonthesamepage

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Types of Data Objects and Attributes

1. Data Objects: Various names include record, point, vector, pattern, event, case,
sample, observation, or entity.
2. Attributes: Descriptive characteristics that define data objects, such as GPA, ID
number, or temperature.

Attributes and Measurement

3. Definition of Attribute: A property or characteristic of an object that can vary.

4. Measurement Scale: Defines how attributes are quantified (numerically or
symbolically).

Types of Attributes and Operations

5. Nominal Attributes:
o Represented by names or labels (e.g., eye color, gender).
o Operations: Equality and inequality (e.g., mode, contingency tables).
6. Ordinal Attributes:
o Values have a meaningful order (e.g., grades, rankings).
o Operations: Order comparison (e.g., median, rank correlation).
7. Interval Attributes:
o Meaningful differences between values but no true zero point (e.g., Celsius
temperature).
o Operations: Addition and subtraction (e.g., mean, standard deviation).
8. Ratio Attributes:
o Meaningful differences and ratios between values (e.g., age, length).
o Operations: Multiplication and division (e.g., geometric mean, percent
variation).

Attribute Characteristics and Representation

9. Discrete Attributes:
o Finite or countably infinite set of values (e.g., zip codes, counts).
o Can be categorical (e.g., binary attributes like true/false).
10. Continuous Attributes:
o Real-number values (e.g., temperature, height).
o Typically represented as floating-point variables for precision.

Special Attribute Types

11. Binary Attributes:

o Specific case of discrete attributes with only two possible values (e.g., yes/no,
male/female).
12. Asymmetric Attributes:
o Focus on presence (non-zero values) rather than absence (e.g., courses taken
by students).

Practical Applications and Considerations

13. Data Analysis Techniques:
o Selection of appropriate techniques based on attribute type (e.g., mean for
interval and ratio attributes).
14. Measurement and Transformation:
o Transformations (e.g., converting temperature scales) must preserve the
attribute's meaning.
15. Statistical Validity:
o Ensure statistical operations align with the type of attribute to yield
meaningful results (e.g., correlation tests for quantitative attributes).

Types of Data Sets

Record Data

Record data is structured as a collection of records (or data objects), where each record
consists of a fixed set of data fields (attributes). Here are some key variations:

1. Transaction or Market Basket Data:

o Defini&on: Each record represents a transac/on involving a set of items.
o Example: Shopping records where each transac/on lists purchased items.
o A/ributes: Typically asymmetric (0-1 entries) indica/ng presence or absence
of items.
o Representa&on: OCen stored as sparse matrices due to the large number of
poten/al items and sparsity of transac/ons.
2. Data Matrix:
o Defini&on: Data objects have a fixed set of numeric aHributes, forming a
mul/dimensional space.
o Example: Sta/s/cal datasets where each row represents an observa/on and
each column represents a numeric aHribute.
o A/ributes: Numeric, allowing for standard matrix opera/ons (e.g., mean,
covariance).
o Representa&on: Used in sta/s/cal analysis and machine learning for
modeling rela/onships between aHributes.
3. Sparse Data Matrix:
o Defini&on: A subset of data matrix where aHributes are mostly zero.
o Example: Document-term matrices in text mining where terms are sparse
across documents.
o A/ributes: Asymmetric (0-1 entries), indica/ng term presence.
o Representa&on: Efficient storage and processing due to sparsity.

Graph-Based Data

Graphs are used to represent data where relationships between objects are crucial:
4. Data with Relationships Among Objects:
o Deﬁni&on: Objects are nodes, rela/onships are edges with proper/es (e.g.,
direc/on, weight).
o Example: Web pages linked by hyperlinks; social networks with users as
nodes and friendships as edges.
o A/ributes: Nodes and edges can have addi/onal proper/es.
o Representa&on: Enables network analysis and algorithms like PageRank for
search engines.
5. Data with Objects That Are Graphs:
o Deﬁni&on: Objects themselves have internal structure represented as graphs.
o Example: Chemical compounds where atoms are nodes and bonds are edges.
o A/ributes: Atomic proper/es and bond types.
o Representa&on: Used in chemical informa/cs for predic/ng compound
proper/es.

Ordered Data

Data where attributes have an inherent order or relationship in time or space:

6. Sequential Data:
o Defini&on: Extends record data with /me-stamped records.
o Example: Retail transac/ons with /mestamps; clickstream data on websites.
o A/ributes: Includes /me stamps or sequence posi/ons.
o Representa&on: Analyzed for sequen/al paHerns and temporal correla/ons.
7. Sequence Data:
o Defini&on: Similar to sequen/al data but without explicit /mestamps,
focusing on ordered sequences.
o Example: Gene/c sequences represented by nucleo/de sequences.
o A/ributes: DNA or RNA bases (A, T, C, G).
o Representa&on: Used in bioinforma/cs for genome analysis and sequence
alignment.
8. Time Series Data:
o Defini&on: Each record is a series of measurements taken over /me.
o Example: Stock prices, temperature readings over months or years.
o A/ributes: Time-indexed measurements.
o Representa&on: Analyzed for trends, seasonality, and temporal correla/ons.
9. Spatial Data:
o Defini&on: Data with spa/al aHributes (posi/ons or areas) and possibly other
aHributes.
o Example: Geospa/al data like weather maps, satellite imagery.
o A/ributes: Spa/al coordinates and environmental variables (temperature,
precipita/on).
o Representa&on: Analyzed for spa/al autocorrela/on and geographical
paHerns.

Characteristics Impacting Data Mining Techniques

• Dimensionality: Number of aHributes affec/ng complexity and curse of
dimensionality.
• Sparsity: Presence of mostly zero values affec/ng storage and computa/on
efficiency.
• Resolu&on: Level of detail impac/ng paHern visibility and noise in data analysis.

Summary of Section 2.2 Data Quality

Data Quality Issues in Data Mining Applications

Data mining often deals with data collected for purposes other than mining itself. Therefore,
addressing data quality issues at the source is usually not feasible. Instead, data mining
focuses on detecting and correcting data quality problems and using algorithms tolerant to
poor data quality.

Measurement and Data Collection Issues

Data imperfections are common due to human error, limitations of measuring devices, or
flaws in the data collection process. Issues include missing values, duplicate data objects, and
inconsistent data. Data cleaning involves detecting and correcting these issues.

Measurement and Data Collection Errors

Measurement error refers to inaccuracies in recorded values compared to true values, while
data collection errors include omission or inappropriate inclusion of data objects. Errors can
be systematic or random.

Noise and Artifacts

Noise refers to random disturbances in data, while artifacts are deterministic distortions.
Techniques from signal or image processing are used to reduce noise, preserving underlying
patterns.

Precision, Bias, and Accuracy

Precision is the closeness of repeated measurements to each other, bias is systematic variation
from the true value, and accuracy is the closeness to the true value. Significant digits should
match data precision.

Outliers

Outliers are data objects or values significantly different from others in a dataset. They can be
of interest in anomaly detection tasks like fraud detection.

Missing Values

Some data objects may lack attribute values, impacting analysis. Strategies include
eliminating data with missing values, estimating missing values, or modifying analysis
methods to ignore missing values.
Inconsistent Values

Inconsistencies arise when data values conflict with expected norms (e.g., negative height).
Detecting and correcting such issues often requires external validation or additional
redundant information.

Duplicate Data

Duplicates or almost duplicates of data objects may exist in datasets, requiring identification
and resolution to avoid inaccuracies in analysis results. Deduplication processes help manage
these issues effectively.

Conclusion

Understanding and addressing data quality issues are crucial for effective data mining and
analysis. Techniques for detecting, correcting, and managing data imperfections ensure
reliable results from data-driven applications.

Summary of Section 2.2.2: Issues Related to Applications

Data Quality from an Application Viewpoint

Data quality is often defined by its suitability for the intended use, a perspective particularly
valuable in business and industry, as well as in statistics and experimental sciences where
data collection is tailored to specific hypotheses.

Timeliness

Data can lose relevance over time, especially when it reflects dynamic processes like
customer purchasing behavior or web browsing patterns. Outdated data leads to outdated
models and patterns.

Relevance

For effective modeling, data must include all necessary information. Omissions, such as
excluding driver age and gender from a model predicting accident rates, can severely impact
model accuracy unless indirect replacements exist.

Sampling Bias

Sampling bias arises when a sample doesn't accurately represent the full population, skewing
analysis results. For example, survey data may only reflect respondents' views, not the entire
population's views.

Documentation and Knowledge

Well-documented datasets enhance analysis quality by providing insights into data

characteristics like attribute relationships or missing value indicators (e.g., "-9999"). Lack of
documentation can lead to flawed analyses due to misinterpretation or ignorance of crucial
data aspects.
Conclusion

Data quality isn't just about accuracy and completeness; it's also about suitability for specific
applications. Timeliness, relevance, absence of bias, and comprehensive documentation are
crucial aspects that ensure data meets its intended purpose effectively.

1. Importance of Similarity and Dissimilarity:

o Used in data mining techniques like clustering, classification, and anomaly
detection.
o Data can be transformed into a similarity or dissimilarity space for analysis.
2. Definitions:
o Similarity: Measures how alike two objects are. Usually ranges from 0 (no
similarity) to 1 (complete similarity).
o Dissimilarity: Measures how different two objects are. Often used
interchangeably with distance, which has specific properties.
3. Transformations:
o Converting similarities to dissimilarities or vice versa, often to fit specific
ranges like [0, 1].
o Example transformations include linear and non-linear mappings.
4. Types of Proximity Measures:
o Simple Attributes:
§ Nominal: Similarity is 1 if values match, 0 otherwise; dissimilarity is
opposite.
§ Ordinal: Takes into account order; mapped to integers to quantify
differences.
§ Interval or Ratio: Uses absolute differences between attribute values.
5. Complex Proximity Measures:
o Multiple Attributes: Combines individual attribute proximities into overall
object proximity.
o Distance Measures: Euclidean, Manhattan, Minkowski distances for numeric
data.
6. Specific Examples:
o Similarity Measures: Simple Matching Coefficient (SMC), Jaccard
Coefficient, Cosine Similarity (for documents).
7. Properties:
o Metrics satisfy positivity, symmetry, and the triangle inequality.
o Non-metric measures like set differences or time intervals can lack these
properties.
8. Applications:
o Used in various contexts from document similarity to binary data
comparisons.

2.3.2 Sampling

Sampling Approaches

Sampling is a widely used technique in both statistics and data mining for selecting a subset
of data objects to analyze. While statisticians and data miners have different motivations for
sampling, the goal remains consistent: to efficiently obtain insights from a subset of data that
represents the larger population.
Simple Random Sampling

Simple random sampling involves selecting data objects from a population where each object
has an equal probability of being chosen. There are two variations:

• Sampling without replacement: Once an object is selected, it is removed from the

population.
• Sampling with replacement: Objects remain in the population after being selected,
allowing them to be chosen more than once in the sample.

In practical applications, the differences between these methods are minor when the sample
size is small relative to the population size. Sampling with replacement is often simpler to
implement and analyze statistically due to its consistent probabilities throughout the selection
process.

Stratified Sampling

Stratified sampling is particularly useful when the population consists of distinct groups or
strata. Instead of sampling directly from the entire population, stratified sampling divides the
population into homogeneous subgroups called strata and samples proportionately from each
stratum. This approach ensures that each stratum is adequately represented in the sample,
which is crucial for maintaining the integrity of rare classes or groups within the data.

Progressive Sampling

Progressive sampling is an adaptive technique where the sample size increases gradually until
a representative sample is achieved. This method avoids the need to pre-determine an exact
sample size, which can be challenging in practice. The decision to stop sampling is often
based on achieving a desired level of representativeness or accuracy in the model being built.

Determining Sample Size

Choosing an appropriate sample size is critical in sampling. Larger samples generally

increase the representativeness of the sample but come with higher computational costs.
Conversely, smaller samples may miss important patterns or introduce bias. Figure 2.9 in
your source material illustrates how different sample sizes affect the representation of data
structure, highlighting the trade-offs involved in sample size selection.

Conclusion

Sampling plays a pivotal role in data mining by enabling efficient analysis of large datasets.
Whether through simple random sampling, stratified sampling, or adaptive progressive
sampling, the goal is to obtain a subset that accurately represents the characteristics of the
entire dataset. Each sampling method has its advantages and is chosen based on specific
objectives and constraints in data mining applications.

• Feature Extraction:

• Definition: Creating a new set of features from raw data to make it suitable for
classification algorithms.
• Example: Extracting features like edges or areas correlated with human faces from
photographs to classify images.
• Domain-specific: Techniques vary greatly across different domains (e.g., image
processing vs. financial data).

• Mapping Data to a New Space:

• Purpose: Transforming data to reveal hidden patterns (e.g., Fourier transform for time
series to expose frequency information).
• Example: Using Fourier transform to identify periodic patterns in time series data
despite noise.

• Feature Construction:

• Purpose: Creating new features that enhance the effectiveness of data mining
algorithms.
• Example: Constructing a density feature from mass and volume attributes of artifacts
to classify them by material type.

• Discretization and Binarization:

• Purpose: Transforming continuous attributes into categorical or binary forms

required by certain algorithms.
• Methods: Equal width, equal frequency discretization; binarization for association
pattern algorithms.

• Variable Transformation:

• Purpose: Applying transformations to entire variables to modify their distribution or

scale.
• Examples: Using logarithmic transformation to compress large value ranges (e.g.,
byte transfer sizes).

• Normalization or Standardization:

• Purpose: Adjusting variables to a common scale or distribution to avoid dominance

by certain attributes.
• Methods: Standardizing variables to have a mean of 0 and standard deviation of 1,
adjusting for outliers.

Module2 - Preprocessing Updated - V3-2
No ratings yet
Module2 - Preprocessing Updated - V3-2
106 pages
Dmml Notes
No ratings yet
Dmml Notes
89 pages
Chapter 2.1 2.2
No ratings yet
Chapter 2.1 2.2
40 pages
CIS62283 02 PreProcessing
100% (1)
CIS62283 02 PreProcessing
51 pages
ppt2
No ratings yet
ppt2
57 pages
Data Preprocessing and Exploring
No ratings yet
Data Preprocessing and Exploring
9 pages
Data Preprocessing 09112023 065121pm
No ratings yet
Data Preprocessing 09112023 065121pm
30 pages
Data Mining Lecture2-2
No ratings yet
Data Mining Lecture2-2
29 pages
Full
No ratings yet
Full
367 pages
AIML Unit 2 Understanding Data
No ratings yet
AIML Unit 2 Understanding Data
51 pages
Attributes
No ratings yet
Attributes
66 pages
Class 2 Introduction to Data
No ratings yet
Class 2 Introduction to Data
40 pages
Unit1 Data Preprocessing
No ratings yet
Unit1 Data Preprocessing
95 pages
Machine Learning Lecture 4 data types
No ratings yet
Machine Learning Lecture 4 data types
21 pages
ITS632 Lecture2 Data
No ratings yet
ITS632 Lecture2 Data
61 pages
Data Warehousing and Mining: Dr. Hossen Asiful Mustafa
No ratings yet
Data Warehousing and Mining: Dr. Hossen Asiful Mustafa
49 pages
Unit 1 - IDS
No ratings yet
Unit 1 - IDS
50 pages
Lecture2_IntroData
No ratings yet
Lecture2_IntroData
16 pages
Unit 1 - IDS
No ratings yet
Unit 1 - IDS
49 pages
Lecture03 Understanding Data
No ratings yet
Lecture03 Understanding Data
114 pages
Datamining-lect2 - What is Data_ the Data Mining Pipeline. Preprocessing and Postprocessing. Samping and Normalization (1)
No ratings yet
Datamining-lect2 - What is Data_ the Data Mining Pipeline. Preprocessing and Postprocessing. Samping and Normalization (1)
94 pages
03 Data Science Process_Fall 23-24
No ratings yet
03 Data Science Process_Fall 23-24
38 pages
CAC 428 Topic 2_Data Quality
No ratings yet
CAC 428 Topic 2_Data Quality
29 pages
lec01-dataprep
No ratings yet
lec01-dataprep
67 pages
Chapter 2 - Tagged
No ratings yet
Chapter 2 - Tagged
66 pages
CAC 428 Topic 1_introduction to Data
No ratings yet
CAC 428 Topic 1_introduction to Data
24 pages
ML 1,2 Unit Peter Flach Machine Learning. The Art and Scienc
No ratings yet
ML 1,2 Unit Peter Flach Machine Learning. The Art and Scienc
22 pages
Types of Data in Data Mining
No ratings yet
Types of Data in Data Mining
16 pages
Data - part 1
No ratings yet
Data - part 1
58 pages
Data Mining: Data
No ratings yet
Data Mining: Data
50 pages
Cse2026 Module 1 & 2 Detailed Notes
No ratings yet
Cse2026 Module 1 & 2 Detailed Notes
185 pages
DM Preprocessing Lec4,5
No ratings yet
DM Preprocessing Lec4,5
36 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
Bab 2 Data: Created By: Arif Djunaidy (Ftif - Its)
No ratings yet
Bab 2 Data: Created By: Arif Djunaidy (Ftif - Its)
57 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
Lecture 2
No ratings yet
Lecture 2
27 pages
1.3 Data Quality
No ratings yet
1.3 Data Quality
6 pages
17 Data Analysis
No ratings yet
17 Data Analysis
64 pages
Data Mining Chapter 2 Data Preprocessing
No ratings yet
Data Mining Chapter 2 Data Preprocessing
33 pages
Wk. 3. Data (12-05-2021)
No ratings yet
Wk. 3. Data (12-05-2021)
57 pages
Data Preprocessing
No ratings yet
Data Preprocessing
12 pages
Ch.3 Data Preprocessing
No ratings yet
Ch.3 Data Preprocessing
16 pages
Week 5 - Data Mining Exploring Data With R
No ratings yet
Week 5 - Data Mining Exploring Data With R
146 pages
Chapter 2 Data Issues
No ratings yet
Chapter 2 Data Issues
21 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
100 pages
Penggalian Data & Analitika Bisnis: Faculties Teknologi Informasi - ITS
No ratings yet
Penggalian Data & Analitika Bisnis: Faculties Teknologi Informasi - ITS
69 pages
Unit 2 1 Feature Sampling Normalization
No ratings yet
Unit 2 1 Feature Sampling Normalization
43 pages
Lecture 2
No ratings yet
Lecture 2
14 pages
Unit I
No ratings yet
Unit I
57 pages
Unit-II Notes
No ratings yet
Unit-II Notes
9 pages
Preprocessing
No ratings yet
Preprocessing
50 pages
2 Data Pre-Processing
No ratings yet
2 Data Pre-Processing
50 pages
Data Accquisition
No ratings yet
Data Accquisition
6 pages
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
No ratings yet
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
49 pages
Data Mining
No ratings yet
Data Mining
40 pages
Bounded Linear Operators On A Hilbert Space
100% (2)
Bounded Linear Operators On A Hilbert Space
28 pages
Data Preparation
No ratings yet
Data Preparation
21 pages
Lect2 - Data Preprocessing
No ratings yet
Lect2 - Data Preprocessing
10 pages
Basic Data Mining Techniques: Attributes
No ratings yet
Basic Data Mining Techniques: Attributes
12 pages
Area Under The Torque vs. RPM Curve: Average Power
No ratings yet
Area Under The Torque vs. RPM Curve: Average Power
22 pages
Quantitative Graph Theory. Topic-Planar Graph
No ratings yet
Quantitative Graph Theory. Topic-Planar Graph
18 pages
Probability
No ratings yet
Probability
38 pages
Requirements Engineering SE Notes
No ratings yet
Requirements Engineering SE Notes
7 pages
Data Science
No ratings yet
Data Science
12 pages
Data Sufficiency
No ratings yet
Data Sufficiency
39 pages
Progression
No ratings yet
Progression
29 pages
Seating Arrangement
No ratings yet
Seating Arrangement
28 pages
Big Data Hadoop and Spark
No ratings yet
Big Data Hadoop and Spark
27 pages
Bda Mod 1
No ratings yet
Bda Mod 1
32 pages
17 Random Vectors 2 Lecture
No ratings yet
17 Random Vectors 2 Lecture
49 pages
Is It Time For A Raise?
No ratings yet
Is It Time For A Raise?
2 pages
Pipes and Cisterns
No ratings yet
Pipes and Cisterns
25 pages
Data Warehouse Scheme and Syllabus
No ratings yet
Data Warehouse Scheme and Syllabus
2 pages
BDA Mod 3 Piglatin
No ratings yet
BDA Mod 3 Piglatin
10 pages
Hive
No ratings yet
Hive
9 pages
BDA - MongoDB
No ratings yet
BDA - MongoDB
12 pages
Partial Differential Equations Handwritten - 1
No ratings yet
Partial Differential Equations Handwritten - 1
5 pages
Bda Ans For Ia2 (Partial
No ratings yet
Bda Ans For Ia2 (Partial
5 pages
Or-Unit-5-Jk Sharma
No ratings yet
Or-Unit-5-Jk Sharma
54 pages
Mat507 Advanced Mathematics For Biomedical Engineering TH 1.00 Ac18
No ratings yet
Mat507 Advanced Mathematics For Biomedical Engineering TH 1.00 Ac18
2 pages
Optimal Execution & Algorithmic Trading: Jan Ob L Oj Jan - Obloj@maths - Ox.ac - Uk
No ratings yet
Optimal Execution & Algorithmic Trading: Jan Ob L Oj Jan - Obloj@maths - Ox.ac - Uk
26 pages
Module 4
No ratings yet
Module 4
4 pages
Big Data
No ratings yet
Big Data
11 pages
Ia-2 QB
No ratings yet
Ia-2 QB
2 pages
ATC Question Bank
No ratings yet
ATC Question Bank
2 pages
Races & Games
No ratings yet
Races & Games
34 pages
Ip CV QB 1
No ratings yet
Ip CV QB 1
3 pages
Periodical Test 2018 in TVL SMAW
93% (14)
Periodical Test 2018 in TVL SMAW
9 pages
Question Bank For Module 1 and 2
No ratings yet
Question Bank For Module 1 and 2
2 pages
Bda Mod2
No ratings yet
Bda Mod2
8 pages
Programming Assignment 2: Algorithmic Warm-Up
No ratings yet
Programming Assignment 2: Algorithmic Warm-Up
14 pages
Nother Current
No ratings yet
Nother Current
14 pages
Dbms Solutions PDF
No ratings yet
Dbms Solutions PDF
7 pages
Pyq - Atc - 5TH Sem
No ratings yet
Pyq - Atc - 5TH Sem
2 pages
Formal Languages and Automata Theory (06CS56)
No ratings yet
Formal Languages and Automata Theory (06CS56)
2 pages
Vector Spaces
No ratings yet
Vector Spaces
11 pages
Question Bank - BDA (Module 5) 2
No ratings yet
Question Bank - BDA (Module 5) 2
1 page
Automata Theory and Computability (18CS54)
No ratings yet
Automata Theory and Computability (18CS54)
2 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Ir Pulse Generator Pseudocode
No ratings yet
Ir Pulse Generator Pseudocode
3 pages
Problem Solving Involving Permutation With Rep
No ratings yet
Problem Solving Involving Permutation With Rep
5 pages
Convolutional Neural PDF
No ratings yet
Convolutional Neural PDF
187 pages
Class 9 Mathematics Marking Scheme 01
No ratings yet
Class 9 Mathematics Marking Scheme 01
3 pages
Automata Theory and Computability (18CS54)
No ratings yet
Automata Theory and Computability (18CS54)
3 pages
1.1 Gauss's Law
No ratings yet
1.1 Gauss's Law
14 pages
Math 4 Summative 01
No ratings yet
Math 4 Summative 01
3 pages
Ticket Agent
No ratings yet
Ticket Agent
2 pages
The Best Practice Guidelines For CFD
No ratings yet
The Best Practice Guidelines For CFD
9 pages
Difference of Two Square
100% (1)
Difference of Two Square
16 pages
Ratio and Proportion
No ratings yet
Ratio and Proportion
40 pages
IB-1 Thermal Properties & Calorimetry Practice
No ratings yet
IB-1 Thermal Properties & Calorimetry Practice
3 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Reverse Engineering Notes
No ratings yet
Reverse Engineering Notes
4 pages
WORD PROBLEMS INVOLVING DIVISION OF DECIMALS-Edited
No ratings yet
WORD PROBLEMS INVOLVING DIVISION OF DECIMALS-Edited
4 pages
A DETAILED LESSON PLAN IN Algebraic Expression
75% (4)
A DETAILED LESSON PLAN IN Algebraic Expression
5 pages
'Engineering Data Analysis (Probability and Statistics)
100% (1)
'Engineering Data Analysis (Probability and Statistics)
2 pages
Hkimo Heat Round 2022 Statistics
No ratings yet
Hkimo Heat Round 2022 Statistics
2 pages
Construction Estimate PDF
100% (1)
Construction Estimate PDF
40 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet

DWH m2p2

Uploaded by

DWH m2p2

Uploaded by

Types of Data Objects and Attributes

Attributes and Measurement

3. Definition of Attribute: A property or characteristic of an object that can vary.

Types of Attributes and Operations

Attribute Characteristics and Representation

Special Attribute Types

11. Binary Attributes:

Practical Applications and Considerations

Types of Data Sets

1. Transaction or Market Basket Data:

Data where attributes have an inherent order or relationship in time or space:

Characteristics Impacting Data Mining Techniques

Summary of Section 2.2 Data Quality

Data Quality Issues in Data Mining Applications

Measurement and Data Collection Issues

Measurement and Data Collection Errors

Noise and Artifacts

Precision, Bias, and Accuracy

Summary of Section 2.2.2: Issues Related to Applications

Data Quality from an Application Viewpoint

Documentation and Knowledge

Well-documented datasets enhance analysis quality by providing insights into data

1. Importance of Similarity and Dissimilarity:

• Sampling without replacement: Once an object is selected, it is removed from the

Determining Sample Size

Choosing an appropriate sample size is critical in sampling. Larger samples generally

• Mapping Data to a New Space:

• Discretization and Binarization:

• Purpose: Transforming continuous attributes into categorical or binary forms

• Purpose: Applying transformations to entire variables to modify their distribution or

• Purpose: Adjusting variables to a common scale or distribution to avoid dominance

You might also like