0% found this document useful (0 votes)

1 views

Week 3

Uploaded by

sainathgunda99

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views

Week 3

Uploaded by

sainathgunda99

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

sainathgunda99@gmail.

com
DLZNK464L9 Data Preprocessing

This file is meant for personal use by [email protected] only.

● Explain data pre-processing tasks.

● Illustrate methods to handle missing values and noisy data.
● Explain the importance of outlier removal and redundant data removal from datasets.
● List the methods for dimensionality reduction and numerosity reduction .
[email protected]
● Define data discretization and its methods.
DLZNK464L9

● Explain data transformation and the importance of normalization.

● Demonstrate typical data pre-processing tasks in Python.

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
● Concepts of Data Pre-processing:
○ Data Quality
○ Data Formats
○ Major Tasks in Data Pre-processing
[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

● Accuracy: proper or incorrect, accurate or not.

● Completeness: not recorded, un-available, missing values, important variables not included

● Consistency: dangling and some features are modified but some features not
[email protected]
DLZNK464L9

● Interpretability: how easily the data can be understood, codes as variable names, or coded values,

nominal values – semantic ambiguity in the data

● Timeliness: is timely updated?

● Believability: how much data is trustable are as perceived by the end user

● Evaluate all of the above to assess data’s fitness for the task
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Formats: Tidy Data
1. Each variable forms a column.

2. Each observation forms a row.

3. Each type of observational unit forms a table.

Var 1 Var 2 … … Var n
Obs 1
[email protected]
DLZNK464L9 2.3 34 Yes 123.45 0.3
Obs 2 3.6 23 No 567.34 0.7
Obs n 5.6 56 No 112.7 0.56

● Provides a standard way of structuring a dataset.

● Make it easier to extract needed variables for analysis.

This file is meant for personal use by [email protected] only.

Name Math English

Anna 86 90
John 43 75
Cath 80 82
[email protected]
DLZNK464L9

● “long” format: considered variable “Subject”

Name Subject Grade
Anna Math 86
Anna English 90
John Math 43
John English 75
Cath Math 80
Cath English 82
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Pre-processing: Major Tasks
● Data cleaning
○ Handling missing values, noisy data, resolve inconsistencies and identify or remove the outliers
● Data integration
○ Integration of multiple databases, data cubes, or files
● Data reduction
[email protected]
DLZNK464L9

○ Dimensionality reduction (PCA)

○ Numerosity reduction
● Data transformation
○ Normalization
○ Data discretization
○ Concept of hierarchy generation
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Pre-processing: Major Tasks

Tasks Methods
Binning, Histogram analysis
Missing values
Regression
Noisy data
Clustering, Classification
Outliers Correlation/covariance
[email protected]
Redundancy
DLZNK464L9
PCA, Feature selection
Box plots
Dimensionality reduction
Sampling
Numerosity reduction
Data compression
Data discretization Data Normalization
Scale differences Concept hierarchy

This file is meant for personal use by [email protected] only.

● Data quality: format, accuracy, completeness, consistency, timeliness, believability, interpretability.

● Tidy data provides a standard way of structuring a dataset.
● Major pre-processing tasks - Data cleaning, data integration, data reduction, and data
[email protected]

transformation.
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
● Different Tasks and Methods
○ Missing values
○ How to handle missing data?
○ Simple Linear Regression
○ Multiple Linear Regression
[email protected]
DLZNK464L9

○ Noisy data

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Missing Values
● Empty cells or cells filled with “NA”-like tokens.
● Semantics of missing data
○ An empty data cell could mean:
■ Value exists
[email protected]
DLZNK464L9
● Value is available but not recorded due to human error, for example
○ Negative findings are left empty (e.g., negative for asymmetric binary variables)
● Value is not available (e.g., I don’t know my grandpa’s birthday)
■ Value does not exist:
● Absence of a value (I don’t have a middle name)
● Not applicable (I don’t have a tail)
○ Different semantics should be encoded as different values,
NA-not applicable, Missing Sharing –applicable
or but
publishing the contents Rightsnot available,
inReserved.
part or Unauthorized use or etc.
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All
full distribution prohibited.
is liable for legal action.
How to Handle Missing Data ?
● Ignore the tuples with missing value
○ when the class label is missing (when doing classification)
○ not effective when the percentage of missing information varies greatly per attribute - resulting
in a large number of tuples not being included in analyses.
● Fill in the missing value manually: major feasibility issue
[email protected]
DLZNK464L9

● Replace empty cells with ‘NA’, “Missing”, etc. More see https://support.datacite.org/docs/schema-
values-unknown-information-v42

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
How to Handle Missing Data ?
● Fill in automatically with (imputation)
○ A global constant: e.g., NA. Not ideal but often done
○ The attribute mean/median/mode
○ The mean/median/mode for all data objects in the same class (smarter)
○ The most probable value: regression or inference-based such as Bayesian inference or decision
[email protected]
DLZNK464L9

tree: best, but is this problem-free?

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Simple Linear Regression
● A statistical method that summarizes and studies the relationships between two continuous
(quantitative) variables
○ Independent (predictor) variable: x = height
○ Dependent (response) variable: y = weight
● Goal: find the best straight line that fits the data
○ y = bx +a
[email protected]
DLZNK464L9
● Method: find a and b that minimize the objective function

● How good is the fit?: coefficient of determination (R Squared,=1 is the best)

Adjusted R Squared
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Simple Linear Regression
y y = bx + a
250
Residual
200
r = 100 - 150 = -50
Weight (lbs)

150
[email protected]
DLZNK464L9
r
100 (55, 100)

x
10 20 30 40 50 60

Height (inches)

‘r’ here shows a residual, the difference between the true value and the predicted value.
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Multiple Linear Regression
● Multiple linear regression (more than 1 independent variables, X and beta are vectors).
● Tips on choosing the best model.
○ http://blog.minitab.com/blog/adventures-in-statistics-2/how-to-choose-the-best-regression-
model
● Use for:
○ missing values: use predicted values to replace missing values.
[email protected]
DLZNK464L9

○ data smoothing: use predicted values to replace original data.

○ data reduction: save only the function, parameters, and outliers (not the original data for the
predicated dimensions).
○ outlier detection: identify (visualize) data that are far away from the predicted values.

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Noise
● Noise has two main sources:
○ Implicit inaccuracies caused by measuring devices
○ Random errors caused by human errors or other issues
● Noise can occur in attribute names and attribute values, including class labels

[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: How to Handle Noisy Data ?
● Binning/Histogram analysis
○ First, sort data and partition it into (e.g., equal-frequency) bins.
○ Then smooth by bin means, smooth by bin median, or smooth by bin borders.
● Regression
○ Smooth by fitting data into the regression functions
[email protected]
DLZNK464L9

● Clustering
○ Smooth data by cluster centres
○ detect and remove outliers/errors

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: How to Handle Noisy Data ?
● Truncation
○ Truncate the least significant digits in a real number
● Human inspection and Combined computer
○ Detect suspicious values and check by humans
[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Smooth by Binning
● Divide sorted data into bins.
● Partitioning rules:
○ Equal-width: equal bin range
○ Equal-frequency (or equal-depth): equal # of
data points in the bins
[email protected]
DLZNK464L9

● For data smoothing/discretization, replace data

with bin mean, median, etc/bin label.
● In effect, it also reduced the number of different
data values (cardinality of the variable)

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Equal-width binning
● Equal-width (interval) partitioning
○ Divides the range into N bins of equal intervals.
○ if A is lowest and B is highest values of the attribute,
The width of intervals will be:
W = (B –A)/N.
[email protected]
DLZNK464L9

○ In practice: Freedman-Diaconis rule works well (more rules)

■ W=2×IQR×n−1/3 . N = (B−A)/W

○ The most straightforward, but outliers may dominate the presentation.

○ Data can’t be handled well if it is skewed

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Equal-Depth Binning
● Equal-depth (count, frequency) partitioning
○ Divides the entire range into N bins of equal number of data points.
○ Good data scaling with varied bin width

[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Example Equal-Depth Binning for Data Smoothing
● Sorted data for the price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34
○ Partition into equal-frequency (equi-depth) bins:
■ Bin 1: 4, 8, 9, 15
■ Bin 2: 21, 21, 24, 25
■ Bin 3: 26, 28, 29, 34
[email protected]
DLZNK464L9
○ Smoothing by bin boundaries:
■ Bin 1: 4, 4, 4, 15
■ Bin 2: 21, 21, 25, 25
■ Bin 3: 26, 26, 26, 34
○ Smoothing by bin means:
■ Bin 1: 9, 9, 9, 9
■ Bin 2: 23, 23, 23, 23
■ Bin 3: 29, 29, 29, 29
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Clustering
● Partition continuous, discrete, or mixed datasets into clusters
based on similarity [distance].
○ There are many choices of distance functions, clustering
definitions, and clustering algorithms
● Can be used to smooth noisy data, detect outliers, numerosity
[email protected]
DLZNK464L9

reduction, and data discretization.

○ Data smoothing/discretization: take cluster means,
median, etc.
○ Data reduction: store cluster representation only
○ Outlier detection: visualize data points far away

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Noisy Data: Clustering
● Can be very useful if the data is clustered, but it
cannot be effective if the data is "splattered."
● Can have hierarchical clustering and be stored in
multi-dimensional index tree structures.
● A non-parametric method: no assumption. Let the
[email protected]
DLZNK464L9

data tell the story.

This file is meant for personal use by [email protected] only.

● Empty cells or cells filled with “NA”-like tokens are referred to as missing data.
● Noisy Data can be implicit errors introduced by measurement tools, such as different types of
sensors, or random errors.
● There are different ways to handle missing data and noisy data, including various imputation
[email protected]
DLZNK464L9

methods and data smoothing methods.

This file is meant for personal use by [email protected] only.

[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Outliers: Outlier Detection
● Exploratory data analysis:
○ Data summary plots – boxplots
○ Histogram analysis
● Regression
○ Data that doesn’t fit the known distribution model are outliers.
[email protected]
DLZNK464L9

● Clustering
○ Outliers form small and distant clusters or not be included in any cluster.

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Data Integration
● Data integration:
○ Data from multiple sources is combined into a coherent storage.
● Database schema integration
○ Challenging; examining metadata carefully originates from various
sources.
[email protected]
DLZNK464L9

● Data redundancy, e.g., entity identification problem:

○ Identify real-world entities from a variety of data sources
● Detecting and resolving data value conflicts and scale differences.
○ Attribute values from different sources differ for the same real-world
item.
○ Possible reasons: different representations (e.g., date, GPA), different
scales, e.g., metric vs.Proprietary
BritishThis file units
is meant for personal use by [email protected] only.
content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Handling Redundancy in Data Integration
● Redundant attributes may be detected by correlation analysis or covariance analysis.

● Redundant attributes should be removed

● Attributes that are correlated but not redundant should often be kept.

● Careful integration of data from various sources may aid in the reduction/avoidance of redundancies
[email protected]
DLZNK464L9
and inconsistencies, as well as the improvement of mining speed and quality.

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Correlation Analysis (Nominal Data)
Play chess Not play chess Sum (row)
[c1] [c2]
Like science fiction[r1] 250(90) 200(360) 450 [R=r1]
Not like science fiction 50(210) 1000(840) 1050 [R=r2]
[r2]
Sum(col.)
[email protected]
DLZNK464L9
300 [C=c1] 1200 [C=c2] 1500 [n]

This file is meant for personal use by [email protected] only.

Not like science fiction 50(210) 1000(840) 1050

Sum(col.) 300 1200 1500

● H0: A and B are not correlated. alpha = 0.001

[email protected]
DLZNK464L9
● Χ2 (chi-square) value calculation

● Using the Χ2 table (next slide), we find the critical value=10.828 for the alpha and d.f.=1
● Χ2 > 10.828, reject H0, so A and B are correlated.
● Most tests will give you a p-value; if p-value < alpha, reject H0.

This file is meant for personal use by [email protected] only.

[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Correlation Analysis (Numeric Data)
● Correlation coefficient (also called Pearson’s product moment coefficient) [-1, 1]

○ Where n is the number of tuples, and are the respective means of A and B,
[email protected]

○ σA and σB are the respective standard deviation of A and B

DLZNK464L9

○ Σ(aibi) is the sum of the AB cross-product.

● If rA,B > 0, A and B are positively linearly correlated (A’s values increase as B’s). The higher the value of
rA,B, the stronger the correlation.
● rA,B = 0: not linearly correlated; may still be associated in other ways.
● rAB < 0: negatively linearly correlated

This file is meant for personal use by [email protected] only.

Scatter plots showing Pearson

[email protected]
DLZNK464L9

coefficient from –1 to 1.

This file is meant for personal use by [email protected] only.

Contrast: Correlation coefficient:

[email protected]
DLZNK464L9

○ where n is the number of tuples, are the respective mean or expected values (E) of A
and B, σA and σB are the respective standard deviation of A and B.

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Redundancy: Covariance (Numeric Data)
● Positive covariance: Cov A and B> 0, indicating A and B both tend to be larger than their expected
values.

● Negative covariance: CovA and B < 0, indicating two variables change in different directions: one is
larger and the other one is smaller than their expected values.
[email protected]
● Independence: CovA and B= 0, but the reverse is not true:
DLZNK464L9

○ Some random variable pairings may have a covariance of zero but they are not independent. A
covariance of 0 implies independence only under certain additional conditions (for example,
the data have multivariate normal distributions).

This file is meant for personal use by [email protected] only.

● Suppose two stocks A and B have the following values in one week: (2,5), (3, 8), (5, 10), (4,
11), (6, 14).
[email protected]
DLZNK464L9

● Question: Are the prices of A and B rise or fall together?

● E(A) = (2 + 3 + 5 + 4 + 6)/5 = 20/5 = 4
● E(B) = (5 + 8 + 10 + 11 + 14)/5 = 48/5 = 9.6
● Cov(A,B) = (2x5+3x8+5x10+4x11+6x14)/5 - 4 x 9.6 = 4
Thus, A and B rise together since Cov(A, B) > 0.

This file is meant for personal use by [email protected] only.

● Outliers can be detected.

● Data redundancy occurs mostly because of data integration, and redundant attributes may be
detected by correlation or covariance analysis.
● Redundant attributes should be removed.
[email protected]
DLZNK464L9

● Correlated attributes are often useful in mining tasks.

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
● Tasks and Methods
○ Dimensionality reduction
○ Curse of Dimensionality and data sparseness
○ PCA – Principal Component Analysis
○ Numerosity reduction and random sampling methods
[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Reduction Strategies
● Data reduction: Obtain a reduced representation of the data set that is much smaller in volume but
yet produces the same (or almost the same) analytical results.

● Why data reduction? — A database/data warehouse may store terabytes of data. Complex data
analysis may take a long time on the complete data set.
[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Reduction Strategies
● Data reduction strategies
○ Dimensionality reduction, e.g., removing or merging attributes
■ Principal Components Analysis (PCA).
■ Feature subset selection, feature creation
○ Numerosity reduction (reduce data volume, use smaller forms of data representation)
■ Regression
[email protected]
DLZNK464L9

■ Histograms/binning, clustering, sampling

■ Data cube aggregation
○ Data compression

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Dimensionality Reduction: Curse of Dimensionality
● Curse of dimensionality
○ When dimensionality of features in the dataset increases, data becomes increasingly sparse in
feature space.
○ Density and distance between points, which are important for grouping and outlier analysis,
become less relevant.
○ The number of possible subspace combinations will expand exponentially.
● Dimensionality reduction
[email protected]
DLZNK464L9

○ Avoid the curse of dimensionality by reducing features

○ Dimensionality reduction help in eliminate irrelevant features and reduce noise.
○ Reduces time and space required in data mining.
○ Ease to visualize
● Dimensionality reduction techniques
○ Principal Component Analysis
○ Supervised techniques
○ Nonlinear techniques (e.g., feature selection)
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Curse of Dimensionality: sparseness

A single feature does not

result in a perfect separation
of our training data
[email protected]
DLZNK464L9

Adding a third feature

results in a linearly
separable classification
problem in our training data

Adding a second feature still does

not result in a linearly separable
This file is meant for personal use by [email protected] only.
classification Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Sparseness: More Training Data Needed

[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

[email protected]
DLZNK464L9

● With increased dimensionality, the hypersphere occupies only a very small

portion of the search space; all training examples are essentially located in
the corners.
● When dim -> infinity, all training examples are at the same distance from all
other examples.
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis (PCA): Numeric Data
● Finds the projection that captures the most variety in the data.
● The original data can be reflected into a much smaller space, which reduces dimensionality while
keeping variability. We find the eigenvectors (“characteristic” vectors) of the covariance matrix, and
these eigenvectors define the new space.
[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Principal Component Analysis (PCA)
• First three PCs capture 75% of original
variance based on loadings.
• Component values are weighted sum of
the original dimensions.
• Comp1 = 0.361*Sepal.Length +
0.867*Petal.Length + 0.358*Petal.Width
[email protected]
DLZNK464L9

• Subsequent analysis will use reduced

presentation/dimensions

53 This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Numerosity Reduction
● Reduce the size of data volume by choosing alternative smaller forms of data representation.
● Parametric methods (Example: regression)
○ Consider the data fits some model, estimate model parameters, store only the parameters, and
discard the data (except possible outliers).
● Non-parametric methods
[email protected]
DLZNK464L9

○ Do not assume parameterized probability distributions.

○ Major families: histograms/binning, clustering, sampling, …

This file is meant for personal use by [email protected] only.

● Also used in sampling training and test examples.

● Allow mining algorithms to run at a complexity that is possibly sub-linear to data size.

● Key principle: choose a representative subset of the data.

[email protected]
DLZNK464L9

○ In skewed datasets, simple random sampling may perform poorly.

○ Develop adaptive sampling methods, e.g., stratified sampling.

This file is meant for personal use by [email protected] only.

● Data reduction obtains a reduced representation of the data set that is much smaller in volume but
yet produces the same (or almost the same) analytical results.
● Data reduction can be done by:
○ Dimensionality reduction - It is the process of removing unimportant attributes.
[email protected]
DLZNK464L9

○ Numerosity reduction - It reduces data volume; uses smaller forms of data representation.
○ Data compression
● Sampling is about obtaining a small sample s to represent the whole data set N.

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Agenda
In this session, we will discuss:
● Tasks and Methods
○ Data transformation: Normalization
○ Data discretization methods
○ Concept Hierarchy generation
[email protected]
DLZNK464L9

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Data Transformation
● Data are transformed or consolidated into forms appropriate for mining.
● Methods
○ Smoothing: Remove noise from data
○ Attribute / feature construction
■ New attributes constructed from the given ones
[email protected]
DLZNK464L9

○ Aggregation: Data cube construction, summarization

○ Normalization: Scaled to fall within a smaller, specified range for more meaningful comparison
■ min-max normalization
■ z-score normalization
■ normalization by decimal scaling
○ Discretization: Concept hierarchy climbing
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Normalization
● Min-max normalization: to [new_minA, new_maxA]

○ Ex. Let income range from $12,000 to $98,000 normalized to [0.0,

1.0]. Then $73,600 is mapped to
[email protected]
DLZNK464L9

● Z-score normalization (μ: mean, σ: standard deviation):

○ Ex. Let μ = 54,000, σ = 16,000. Then

This file is meant for personal use by [email protected] only.

○ Ex. (50, 20) -> (0.5, 0.2) with j=2

Where j is the smallest integer such

[email protected] that Max(|ν’|) < 1 or =1
DLZNK464L9

This file is meant for personal use by [email protected] only.

● Actual data values are replaced with interval labels..

● Reduce attribute cardinality
● Handles outliers and skewed data
● Supervised vs. unsupervised
[email protected]
DLZNK464L9

● Prepare data for further analysis, e.g., classification.

This file is meant for personal use by [email protected] only.

All the methods mentioned below can be applied recursively.

● Histogram and Binning analysis

○ Top-down split
[email protected]
DLZNK464L9

○ Unsupervised

● Clustering analysis (unsupervised, top-down split, or bottom-up merge)

● Classification analysis, e.g., decision-tree (supervised, top-down split)

● Correlation (e.g., χ2) analysis, e.g., ChiMerge (supervised, bottom-up merge)

This file is meant for personal use by [email protected] only.

● Exploit the correlation between intervals and class labels.

● "Interval – Class” contingency tables

● If two adjacent intervals have low χ2 values (less correlated to the class labels), merge them to form
[email protected]
DLZNK464L9

a larger interval (keeping them separate does not offer more information on how to classify objects).

● Merge performed recursively until a predefined stopping condition is met.

This file is meant for personal use by [email protected] only.

● Statistical approach to Data 1 1 1

Discretization. 2 3 2
3 7 1
● Discretizing the data based on class 4 8 1
labels, using the Chi-square
[email protected]
DLZNK464L9
5 9 1
approach.
6 11 2
● F:attribute 7 23 1
8 37 1
● K:class label 9 39 2
10 45 1
11 46 2
12 59 1
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
ChiMerge Discretization Example Sample F K Intervals
● Sort and arrange the
1 1 1 {0,2}
attributes you want to 2 3 2 {2,5}
group (Example: 3 7 1 {5,7.5}
attribute F). 4 8 1
● Begin by having each {7.5,8.5}
unique value in the
[email protected]
DLZNK464L9 5 9 1 {8.5,10}
attribute in its own 6 11 2 {10,17}
interval. 7 23 2 {17,30}
8 37 1 {30,38}
9 39 2 {38,42}
10 45 1 {42,45.5}
11 46 1 {45.5,52}
12 59 1 {52,60}
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
ChiMerge Discretization Example Sample F K

● Begin calculating the Chi- 1 1 1

square test on every pair 2 3 2
of adjacent intervals 3 7 1

● Interval/class contingency 4 8 1
tables:
[email protected]
DLZNK464L9
5 9 1
Sample K=1 K=2 6 11 2
2 0 1 1 7 23 2
3 1 0 1
8 37 1
total 1 1 2
9 39 2
Sample K=1 K=2
10 45 1
3 1 0 1
11 46 1
4 1 0 1
total 2 0 2 12 59
This file is meant for personal use by [email protected] only. 1
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
Sampl K=1 K=2 E11 = (1/2)*1 = .05
e E12 = (1/2)*1 = .05
E21 = (1/2)*1 = .05
2 0 1 1
E22 = (1/2)*1 = .05
3 1 0 1
total 1 1 2
[email protected]
DLZNK464L9 X2 = (0-.5)2/.5 + (0-.5)2/.5 + (0-.5)2/.5 + (0-.5)2/.5 = 2
Sampl K=1 K=2
E11 = (1/2)*2 = 1
e
E12 = (0/2)*2 = 0
3 1 0 1 E21 = (1/2)*2 = 1
4 1 0 1 E22 = (0/2)*2 = 0

total 2 0 2
X2 = (1-1)2/1+(0-0)2/0+ (1-1)2/1+(0-0)2/0 = 0
Sig Level 0.1 with df=1 from Chi square distribution X2 critical value = 2.7024. Not
correlated, can be merged. ProprietarySharing
This file is meant for personal use by [email protected] only.
content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example

Sample F K Intervals Chi2 ● Calculate all the Chi-square

1 1 1 {0,2} values for all intervals.
2
● Merge the intervals with the
2 3 2 {2,5}
2 smallest Chi values.
3 7 1 {5,7.5}
0
4 8 1 {7.5,8}
[email protected]
DLZNK464L9 0
5 9 1 {8.5,10}
2
6 11 2 {10,17}
0
7 23 2 {17,30}
2
8 37 1 {30,38}
2
9 39 2 {38,42}
2
10 45 1 {42,45.5}
0
11 46 1 {45.5,52}
0
12 59 1 {52,60}
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
Intervals Chi2
Samp F K
1 1 1 {0,2} 2
2 3 2 {2,5}
Repeat.
3 7 1 4
Keep merging intervals with small X2
4 8 1 {5,10} until all X2 > 2.7024
[email protected]
DLZNK464L9 5 9 1 5
6 11 2
7 23 2 {10,30}
3
8 37 1 {30,38}
2
9 39 2 {38,42}
10 45 1 4
11 46 1 {42,60}
12 59 1
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Chi-Merge Discretization Example
Sample F K Intervals Chi2
1 1 1
2 3 2 {0,10}
3 7 1
4 8 1 ● End: There are no
more intervals with
5 9 1 2.72
[email protected]
DLZNK464L9 X2 < 2.7024.

6 11 2 ● These intervals are

7 23 2 {10,30} correlated with class
8 37 1 labels.
9 39 2
3.93

10 45 1
11 46 1 {42,60}
12 59 1 This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Concept Hierarchy Generation
● Concept hierarchy organises concepts (attribute values) hierarchically and is typically associated with
each dimension in a data warehouse.
● In data warehouses, concept hierarchies enable drilling and rolling to see data at various
granularities.
● Concept hierarchy generation
[email protected]
DLZNK464L9

○ Specified by domain experts, taxonomies/thesaurus/ ontologies

○ Generated from data sets (for some simple, specific cases)
■ Discretization for numerical or ordinal data
■ Frequency counts for categorical data (limited cases)
○ Concept hierarchy learning
■ Natural language processing and ML approaches.
This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Concept Hierarchy Generation for Nominal Data
● Specification of a partial/total ordering of attributes explicitly at the schema level by users or
experts.
○ street < city < state < country
● Specification of a hierarchy for a set of values by explicit data grouping.
○ {Urbana, Champaign, Canada} < Illinois
[email protected]
DLZNK464L9

● Specification of only a partial set of attributes.

○ E.g. only street < city, not others
● Automatic generation of hierarchies (or attribute levels) by the analysis of the number of distinct
values.
○ E.g. for a set of attributes: {street, city, state, country}

This file is meant for personal use by [email protected] only.

Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Automatic Concept Hierarchy Generation
● Some hierarchies can be built automatically based on a study of the number of distinct values for
each attribute in the data collection.
○ The attribute with the most distinct values is at the bottom of the hierarchy.
○ Exceptions,
Example: weekday, month, quarter, year
[email protected]
DLZNK464L9

country 15 distinct values

province_or_ state 365 distinct values

city 3567 distinct values

street 674,339 distinct values

This file is meant for personal use by [email protected] only.
Proprietary content. ©University of Arizona. All Rights Reserved. Unauthorized use or distribution prohibited.
Sharing or publishing the contents in part or full is liable for legal action.
Summary
In this session, we discussed:

● Normalization – The data is scaled to fall within a smaller, specified range for more meaningful
comparison.
● Discretization divides the range of a continuous attribute into the interval.
● Chi-Merge Discretization example
[email protected]
DLZNK464L9

● Concept hierarchy organizes concepts (i.e., attribute values) hierarchically and is usually associated
with each dimension in a data warehouse.
● Concept hierarchy generation for nominal data

This file is meant for personal use by [email protected] only.

● Apply data pre-processing tasks and methods to prepare data for a data mining task.
● Summarize the importance of outlier removal and redundant data removal from data sets.
● Explain the methods for dimensionality reduction and numerosity reduction.
● Implement data transformation strategies, such as normalization, discretization, and concept
[email protected]
DLZNK464L9
hierarchy generation.
● Perform typical data pre-processing tasks in Python.

This file is meant for personal use by [email protected] only.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
87% (46)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
58% (78)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (78)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
Shortcut To Shred Ebook Revised 9-9-2015 PDF
88% (8)
Shortcut To Shred Ebook Revised 9-9-2015 PDF
15 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
Medieval Dinasty - Guide
83% (6)
Medieval Dinasty - Guide
359 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
19-1211 Gartner - Innovation Insight For Packaged Biz Capabilties (PBCS) and Role in Composable Enterprise
No ratings yet
19-1211 Gartner - Innovation Insight For Packaged Biz Capabilties (PBCS) and Role in Composable Enterprise
14 pages
Srs of Gaming App
No ratings yet
Srs of Gaming App
8 pages
Appian Quick Reference Guide
No ratings yet
Appian Quick Reference Guide
4 pages
Week+1+-+Part+2
No ratings yet
Week+1+-+Part+2
36 pages
Week 4
No ratings yet
Week 4
125 pages
Week 2
No ratings yet
Week 2
73 pages
Dashboards Intro
No ratings yet
Dashboards Intro
27 pages
Dashboard Layouts
No ratings yet
Dashboard Layouts
33 pages
Case+Study+Summary+Session+May22
No ratings yet
Case+Study+Summary+Session+May22
18 pages
Data Mining - Lab 1
No ratings yet
Data Mining - Lab 1
4 pages
LC - RESUME - TEMPLATE - No Experience - Yes Degree 021219 - 3
No ratings yet
LC - RESUME - TEMPLATE - No Experience - Yes Degree 021219 - 3
2 pages
Python
No ratings yet
Python
235 pages
Big Data Analytics Introduction-lect 1
No ratings yet
Big Data Analytics Introduction-lect 1
26 pages
Decision Tree Slides
No ratings yet
Decision Tree Slides
94 pages
MLS+1+-+Decision+Trees+and+Random+Forests
No ratings yet
MLS+1+-+Decision+Trees+and+Random+Forests
16 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Top Data Science Skills 1721583698
No ratings yet
Top Data Science Skills 1721583698
9 pages
EDA and Cleaning
No ratings yet
EDA and Cleaning
24 pages
dm unit 4
No ratings yet
dm unit 4
24 pages
Accelerate Your Workflow With Data Analytics
0% (1)
Accelerate Your Workflow With Data Analytics
49 pages
Data Analysis - FRO - BW - 4 Slides - ST
No ratings yet
Data Analysis - FRO - BW - 4 Slides - ST
9 pages
Introduction To R-TD
No ratings yet
Introduction To R-TD
38 pages
1.+Overview_+Recommender+Systems
No ratings yet
1.+Overview_+Recommender+Systems
9 pages
Decision Trees Presentation
No ratings yet
Decision Trees Presentation
10 pages
Lesson4 Data
No ratings yet
Lesson4 Data
31 pages
Group 12 - Final Presentation
No ratings yet
Group 12 - Final Presentation
51 pages
Data Science With Python
No ratings yet
Data Science With Python
23 pages
Week 1
No ratings yet
Week 1
50 pages
Unit-5_3161610
No ratings yet
Unit-5_3161610
92 pages
Preprocessing
No ratings yet
Preprocessing
5 pages
Bda Toppers Solution
No ratings yet
Bda Toppers Solution
71 pages
Dna Paper Model Homework Questions Answers
100% (1)
Dna Paper Model Homework Questions Answers
7 pages
Unit 2 Data Preprocessing for Students.pptx
No ratings yet
Unit 2 Data Preprocessing for Students.pptx
169 pages
2. Resampling Methods-1
No ratings yet
2. Resampling Methods-1
43 pages
Take Home Assignment - CCS3342-Business Intelligence (1)
No ratings yet
Take Home Assignment - CCS3342-Business Intelligence (1)
2 pages
Qualitative Data Analysis
No ratings yet
Qualitative Data Analysis
20 pages
Software-Development---Testing5
No ratings yet
Software-Development---Testing5
1 page
Ids Syllabus 20-21
No ratings yet
Ids Syllabus 20-21
2 pages
SCA - Module 3
No ratings yet
SCA - Module 3
48 pages
General ML Notes
No ratings yet
General ML Notes
30 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
NX Nastran Users Guid
No ratings yet
NX Nastran Users Guid
822 pages
Data - part 1
No ratings yet
Data - part 1
58 pages
02 - Diagnostics For Machine Learning Model
No ratings yet
02 - Diagnostics For Machine Learning Model
20 pages
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
No ratings yet
An Introduction To Data Mining: Prof. S. Sudarshan CSE Dept, IIT Bombay
48 pages
7dm Midterm Reviewer
No ratings yet
7dm Midterm Reviewer
10 pages
PuppyGraph at Subsurface LIVE 2024 (2)
No ratings yet
PuppyGraph at Subsurface LIVE 2024 (2)
25 pages
Snowpro™ Advanced: Data Scientist: Exam Study Guide
No ratings yet
Snowpro™ Advanced: Data Scientist: Exam Study Guide
14 pages
classification
No ratings yet
classification
36 pages
MLS+1+-+Presentation
No ratings yet
MLS+1+-+Presentation
11 pages
SDE Course Content
No ratings yet
SDE Course Content
7 pages
DSUR_EA2352001010391_W7
No ratings yet
DSUR_EA2352001010391_W7
3 pages
IDA-Data Analyst Brochure
No ratings yet
IDA-Data Analyst Brochure
12 pages
A4-Module-DataAnalysis-FINAL
No ratings yet
A4-Module-DataAnalysis-FINAL
1 page
Expose Iot Data Mining Yagoub_semida
No ratings yet
Expose Iot Data Mining Yagoub_semida
19 pages
Data Collection Cleaning Preprocessing Presentation
No ratings yet
Data Collection Cleaning Preprocessing Presentation
13 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
DSci-Lecture 02-w2-20240929-type of data - python
No ratings yet
DSci-Lecture 02-w2-20240929-type of data - python
134 pages
Data Science - g.scali (Lect1) (1)
No ratings yet
Data Science - g.scali (Lect1) (1)
22 pages
Designing Data Analysis Procedure
No ratings yet
Designing Data Analysis Procedure
15 pages
UNIT-1,2,3
No ratings yet
UNIT-1,2,3
30 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
data science course training in india hyderabad: innomatics research labs
From Everand
data science course training in india hyderabad: innomatics research labs
innomatics research labs
No ratings yet
Sifo Document 1
No ratings yet
Sifo Document 1
14 pages
Shortcuts For Total Commander and FTP For Windows OS
No ratings yet
Shortcuts For Total Commander and FTP For Windows OS
5 pages
Ranap Paging Failures and Iu Paging Success Rate
No ratings yet
Ranap Paging Failures and Iu Paging Success Rate
38 pages
CN Op
No ratings yet
CN Op
26 pages
Change Product View For CSDM 3
No ratings yet
Change Product View For CSDM 3
8 pages
Manual Pre Implem SAP Note 2245489 PDF
No ratings yet
Manual Pre Implem SAP Note 2245489 PDF
14 pages
Guidance For Specifying Reports
No ratings yet
Guidance For Specifying Reports
5 pages
User Registration Guide 10052017 (F)
No ratings yet
User Registration Guide 10052017 (F)
19 pages
Full Download Advances in AI For Biomedical Instrumentation Electronics and Computing 1st Edition Vibhav Sachan PDF
100% (14)
Full Download Advances in AI For Biomedical Instrumentation Electronics and Computing 1st Edition Vibhav Sachan PDF
70 pages
Da Unit - I - Notes
No ratings yet
Da Unit - I - Notes
30 pages
Specification INFINITT Maintenance Service PACS /RIS More Than 70,000 Exam
No ratings yet
Specification INFINITT Maintenance Service PACS /RIS More Than 70,000 Exam
5 pages
Object-Oriented Programming (OOP) Lecture No. 2
No ratings yet
Object-Oriented Programming (OOP) Lecture No. 2
21 pages
My Melody
No ratings yet
My Melody
4 pages
Situated Politeness
No ratings yet
Situated Politeness
28 pages
NVR DS-76087616NI-K2 User Manual
No ratings yet
NVR DS-76087616NI-K2 User Manual
324 pages
Phishing Paid Ebook Leaked Worth 99$ Learn Phishing
No ratings yet
Phishing Paid Ebook Leaked Worth 99$ Learn Phishing
15 pages
GPON Technology - Topology
No ratings yet
GPON Technology - Topology
32 pages
PM SLP S4 L
No ratings yet
PM SLP S4 L
1 page
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
2015 Amc8
No ratings yet
2015 Amc8
10 pages
Install and Configure Splunk Cluster
No ratings yet
Install and Configure Splunk Cluster
13 pages
Usefullruncommand
No ratings yet
Usefullruncommand
3 pages
Manual Solair3100
No ratings yet
Manual Solair3100
188 pages
Team Building Class For Elementary XL by Slidesgo
No ratings yet
Team Building Class For Elementary XL by Slidesgo
92 pages
Matutum View Academy: (The School of Faith)
No ratings yet
Matutum View Academy: (The School of Faith)
14 pages
D2 - T1 - ARST Solutions Products Update - 092015
No ratings yet
D2 - T1 - ARST Solutions Products Update - 092015
43 pages