0% found this document useful (0 votes)

8 views

1.4

The document discusses the process of Exploratory Data Analysis (EDA) and its techniques, emphasizing the importance of graphical methods for gaining insights into data sets. It contrasts EDA with classical and Bayesian data analysis approaches, highlighting differences in model imposition and analysis sequences. Additionally, it covers concepts related to probability, random variables, and machine learning, including dimensionality reduction and the perceptron as a foundational neural network model.

Uploaded by

123109015

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

1.4

Uploaded by

123109015

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 98

Data Handling

Probability Statistics II
Data Handling Process
Exploratory Data Analysis (EDA)
● An approach/philosophy for data analysis that employs a variety
of techniques (mostly graphical) to
1. maximize insight into a data set;
2. uncover underlying structure;
3. extract important variables;
4. detect outliers and anomalies;
5. test underlying assumptions;
6. develop parsimonious models; and
7. determine optimal factor settings.
● The EDA approach is an approach or an
attitude/philosophy about how a data analysis should be
carried out.
● EDA is not identical to statistical graphics although the
two terms are used almost interchangeably.
● Most EDA techniques are graphical in nature with a few
quantitative techniques.
What do EDA Tools possess?
● Consists of various techniques of:
1. Plotting the raw data (such as data traces, histograms,
bihistograms, probability plots, lag plots, block plots, and
Youden plots.
2. Plotting simple statistics such as mean plots, standard
deviation plots, box plots, and main effects plots of the raw
data.
3. Positioning such plots so as to maximize our natural
pattern-recognition abilities, such as using multiple plots per
page.
Data Analysis Approaches
● Three popular data analysis approaches are:
1. Classical
2. Exploratory (EDA)
3. Bayesian
● These three approaches are similar in that they all start with a general
science/engineering problem and all yield science/engineering conclusions.
● The difference is the sequence and focus of the intermediate steps.
● For Classical analysis, the sequence is
● Problem => Data => Model => Analysis => Conclusions
● For EDA, the sequence is
● Problem => Data => Analysis => Model => Conclusions
● For Bayesian, the sequence is
Differences
● For classical analysis, data collection is followed by the imposition
of a model (normality, linearity, etc.) and the analysis, estimation,
and testing that follows are focused on parameters of that model.
● For EDA, the data collection is not followed by a model imposition;
rather it is followed immediately by analysis with a goal of inferring
what model would be appropriate.
● For a Bayesian analysis, the analyst attempts to incorporate
scientific/engineering knowledge/expertise into the analysis by
imposing a data independent distribution on the parameters of the
selected model; the analysis thus consists of formally combining
both the prior distribution on the parameters and the collected data
to jointly make inferences and/or test assumptions about the model
parameters.
Techniques
● Classical techniques are generally quantitative in nature.
● They include ANOVA, t tests, chi-squared tests, and F tests.
● EDA techniques are generally graphical.
● They include scatter plots, character plots, box plots,
histograms, bihistograms, probability plots, residual plots,
and mean plots.
Conditional Probaility
Multiplication of Probabilities
Independent Events
Bayes’ Theorem
• The conditional probabilities commonly provide the probability of an
event (such as failure) given a condition (such as high or low
contamination).
• But after a random experiment generates an outcome, we are naturally
interested in the probability that a condition was present (high
contamination) given an outcome (a semiconductor failure).
• Thomas Bayes addressed this essential question in the 1700s and
developed the fundamental result known as Bayes’ theorem.
Bayes Theorem for Multiple Events
PMF of Discrete RV
• The probability distribution of a random variable X is a description of
the probabilities associated with the possible values of X.
• For a discrete random variable, the distribution is often specified by
just a list of the possible values along with the probability of each.
• In some cases, it is convenient to express the probability in terms of
a formula.
CDF of Discrete RV
Counter in Dictionary
empty_dict = {}# key, value in dictionary
grades = {“Joel”:80, “Tim”:95}
joels_grade = grades[“Joel”]

#Counter turns a sequence of values into a

defaultdict(int)-like object mapping keys to
counts
from collections import Counter
c = Counter([0,1,2,0])#c is {0:2, 1:1, 2:1}
Mean

def mean(x):
return sum(x) / len(x)
def mean(xs: List[float]) -> float: return
sum(xs) / len(xs) mean(num_friends)

#Note: sorted(L1) is different from L1.sort()

Median
def median(v):
""" median is the middle element"""
n = len(v)
sorted_v = sorted(v)
midpoint = n // 2 #floor or integer division
if n%2 == 1
return sorted_v[midpoint]
else:
lo = midpoint -1
hi = midpoint
return (sorted_v[lo]+ sorted_v[hi])/2
Median
def _median_odd(xs: List[float]) -> float:
"""If len(xs) is odd, the median is the middle element"""
return sorted(xs)[len(xs) // 2]
def _median_even(xs: List[float]) -> float:
"""If len(xs) is even,average of the middle two elements"""
sorted_xs = sorted(xs)
hi_midpoint = len(xs) // 2
return (sorted_xs[hi_midpoint - 1]+ sorted_xs[hi_midpoint])/2
def median(v: List[float]) -> float:
"""Finds the 'middle-most' value of v"""
return _median_even(v) if len(v) % 2 == 0 else _median_odd(v)
assert median([1, 10, 2, 9, 5]) == 5
assert median([1, 9, 2, 10]) == (2 + 9) / 2
Quantile and Mode
def quantile(xs: List[float], p: float) -> float:
"""Returns the pth-percentile value in x"""
p_index = int(p * len(xs))
return sorted(xs)[p_index]
assert quantile(num_friends, 0.10) == 1
assert quantile(num_friends, 0.25) == 3
assert quantile(num_friends, 0.75) == 9
assert quantile(num_friends, 0.90) == 13

def mode(x: List[float]) -> List[float]:

"""Returns a list, since there might be more than one
mode"""
counts = Counter(x)
max_count = max(counts.values())
return [x_i for x_i, count in counts.items()
Data Analytics in Machine Learning
● Our Agenda for next few lectures would be:
1. On Features, Feature Extraction Methods etc
2. Similarity Measures
3. Dimension Reduction
● May be till Feb (In between pandas, scipy etc. also)
● Then Classification, Regression and Clustering
(Note: Some algorithms may be used for multiple purposes)
Single Computational Layer:
The Perceptron
● The simplest neural network is referred to as the perceptron.
● This neural network contains a single input layer and an
output node.
● The perceptron occupies a special place in the historical
development of neural networks: It was the first algorithmically
described neural network.
● Its invention by Rosenblatt, a psychologist, inspired engineers,
physicists, and mathematicians alike to devote their research
effort to different aspects of neural networks in the 1960s and
the 1970s.
● Moreover, it is truly remarkable to find that the perceptron is as
valid today as it was in 1958 when Rosenblatt’s paper on the
perceptron was first published.
● Rosenblatt’s perceptron is built around a nonlinear neuron,
namely, the McCulloch–Pitts model of a neuron.
bias ith
input

output weight on ith input

index over
input connections
Rosenblatt’s Perceptron
Perceptron Decision Boundary
x
2
hyperp
lane p Class
lane C1
Class
C2

0,0 x1
Classify a set of points

Say, for a two input neuron

b=-2, w1=1/2, w2 = ¼, Find v and y for (5,2),
(-1,12), (3,-5) and (-2,-1)
Types of Learning Tasks class A

class A

● Supervised: Learning with a labeled training set Classification

Example: email classification with already labeled emails Anomaly Detection

Sequence labeling
● Unsupervised: Discover patterns in unlabeled data. …

Create an internal representation of the input e.g. form clusters; extract

features
Example: cluster similar documents based on text
● Reinforcement learning: learn to act based on feedback/reward
Example: learn to play Go, reward: win or lose

Clustering
Regression
Dimensionality Reduction
● In machine learning, whether the algorithm is classification or
regression, data are used as inputs and fed to the learner for
decision-making.
● Ideally, there is no need for feature extraction or selection as a
separate process; the classifier (or regressor) must use any
features, removing the irrelevant ones.
● In most learning algorithms, the complexity is based on the number
of input dimensions, as well as on the size of the data sample, and
for reduced memory and computation, we are interested in
reducing the dimensionality of the problem.
● Dimension reduction also reduces the complexity of the
learning algorithm during testing.
Dimensionality Reduction
● Also, if an input is not informative, we can save the cost by
extracting it.
● Simple models are more robust on small datasets.
● Simple models have less variance; that is, they diverge less
reliant on specific samples, including outliers, noise, etc.
● If data can be represented with fewer features, we can gain a
better idea of the process that motivates the data, and this
allocates knowledge extraction.
● If data can be described by fewer dimensions without loss of
information, it can be plotted and analyzed visually for structure
and outliers.
Dimensionality Reduction
● In situations where the data have a huge number of features,
it is always necessary to decrease its dimension or to find a
lower-dimensional depiction conserving some of its properties.
● Therefore, dimensionality reduction (or manifold learning):
1. Speeds up succeeding operations on the data
2. Better visualization of data for tentative analysis by
mapping the input data into two- or three-dimensional spaces.
3. Extracting features to produce a smaller and more efficient,
informative, or valuable set of features
Classical Example
● If the goal of the analysis is to compute summary
statistics plus determine the best linear fit for Y as a
function of X, the results might be given as:
N = 11
Mean of X = 9.0
Mean of Y = 7.5
Intercept = 3
Slope = 0.5
Residual standard deviation = 1.237
Correlation = 0.816
● The above quantitative analysis, although valuable,
gives us only limited insight into the data.
Simple Scatter Plot Gives Us Insights
1) The data set "behaves like" a linear curve
with some scatter;
2) There is no justification for a more
complicated model (e.g., quadratic);
3) There are no outliers;
4) The vertical spread of the data appears to
be of equal height irrespective of the
X-value; this indicates that the data are
equally-precise throughout and so a
"regular" (that is, equi-weighted) fit is
appropriate.
Again obtain statistics results and also plot

● Draw Scatter Plots in the Lab for previous and these below
from matplotlib import pyplot as plt
years = [1950, 1960, 1970, 1980, 1990, 2000, 2010]
gdp = [300.2, 543.3, 1075.9, 2862.5, 5979.6, 10289.7, 14958.3]
# create a line chart, years on x-axis, gdp on y-axis
plt.plot(years, gdp, color='green', marker='o',
linestyle='solid')
# add a title
plt.title("Nominal GDP")
# add a label to the y-axis
plt.ylabel("Billions of $")
plt.show()
from collections import Counter
#A Counter is a dict subclass for counting hashable objects.
import matplotlib.pyplot as plt
num_friends = [100, 49, 41, 40, 25,24,55,5,10,14,18,17,20]
friend_counts = Counter(num_friends)
xs = range(101)# largest value is 100
ys = [friend_counts[x] for x in xs] # number of friends
plt.bar(xs, ys)
plt.axis([0, 101, 0, 25])
plt.title("Histogram of Friend Counts")
plt.xlabel("# of friends")
plt.ylabel("# of people")
plt.show()
num_points = len(num_friends)
largest_value = max(num_friends)
smallest_value = min(num_friends)
sorted_values = sorted(num_friends)
smallest_value = sorted_values[0]#second_smallest_value [1],[2]
friends = [ 70, 65, 72, 63, 71, 64, 60, 64, 67]
minutes = [175, 170, 205, 120, 220, 130, 105, 145, 190]
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
plt.scatter(friends, minutes)
# label each point
for label, friend_count, minute_count in zip(labels, friends, minutes):
plt.annotate(label,xy=(friend_count, minute_count),
#Put label with its point
xytext=(5, -5),
# but slightly offset
textcoords='offset points')
plt.title("Daily Minutes vs. Number of Friends")
plt.xlabel("# of friends")
plt.ylabel("daily minutes spent on the site")
plt.show()
Continuous Random Variable
• A continuous random variable is a random variable with an interval
(either finite or infinite) of real numbers for its range.
• The model provides for any precision in length measurements.
• Because the number of possible values of X is uncountably infinite, X
has a distinctly different distribution from the discrete random
variables.
• A probability density function or PDF f(x) can be used to describe the
probability distribution of a continuous random variable X.
• If an interval is likely to contain a value for X, its probability is large
and it corresponds to large values for f(x).
• The probability that X is between a and b is determined as the integral
of f(x) from a to b.
PDF of Continuous Random Variable
Mean of Continuous Random Variable
Normal Distribution
•Normal Distribution is the most widely used model for a
continuous measurement is a normal random variable.
•Whenever a random experiment is replicated, the random
variable that equals the average (or total) result over the
replicates tends to have a normal distribution as the number
of replicates becomes large.
•De Moivre presented this fundamental result, known as the
central limit theorem, in 1733.
•Unfortunately, his work was lost for some time, and Gauss
independently developed a normal distribution nearly 100
years later.
Example
•Assume that the deviation (or error) in the length of a
machined part is the sum of a large number of infinitesimal
effects, such as temperature and humidity drifts, vibrations,
cutting angle variations, cutting tool wear, bearing wear,
rotational speed variations, mounting and fixture variations,
variations in numerous raw material characteristics, and
variation in levels of contamination.
•If the component errors are independent and equally likely to
be positive or negative, the total error can be shown to have
an approximate normal distribution.
● Random variables with different means and variances can be modeled
by normal probability density functions with appropriate choices of the
center and width of the curve.
● The value of E(X) = μ determines the center of the probability density
function, and the value of V (X) = σ2 determines the width.
Normal probability density functions for
2
selected values of the parameters μ and σ
Probability that X > 13 for a normal
2
random variable with μ = 10 and σ = 4
Standardizing a Normal Random Variable
• Creating a new random variable by this transformation is
referred to as standardizing.
• The random variable Z represents the distance of X from its
mean in terms of standard deviations.
• It is the key step to calculating a probability for an arbitrary
normal random variable.
Central Limit Theorem
•The simplest form of the central limit theorem states that the
sum of n independently distributed random variables will
tend to be normally distributed as n becomes large.
•It is a necessary and sufficient condition that none of the
variances of the individual random variables are large in
comparison to their sum.
•There are more general forms of the central theorem that
allow infinite variances and correlated random variables,
and there is a multivariate version of the theorem.
Expectation
Variance
Covariance
● The covariance between two RVs X and Y measures the
degree to which X and Y are (linearly) related.
● Covariance is defined as
Joint Probability Distributions and the
Sign of Covariance Between X and Y
Gaussian (Normal) Distribution Revisit
Inverse Variation of Normal Distribution
Why Normal Distribution
● First, many distributions we wish to model are truly close to
being normal distributions.
● The central limit theorem shows that the sum of many
independent random variables is approximately normally
distributed.
● This means that in practice many complicated systems can be
modeled successfully.
Why Normal Distribution
● Normal Distribution is unique in that it is the distribution that
maximizes entropy (or uncertainty) among all distributions
with the same mean and variance. This means that, for any
given variance, the normal distribution is the most "spread
out" or least informative distribution i.e. no prior knowledge.
● When comparing it to other distributions with the same
variance (such as a uniform distribution, exponential
distribution, etc.), the normal distribution is often considered
the "most typical" or "standard" distribution because of this
maximization of entropy.
PDF of Bivariate Gaussian Distribution
Correlation
● Covariances can be between negative and positive infinity.
● Sometimes it is more convenient to work with a normalized
measure, with a finite lower and upper bound.
● The (Pearson) correlation coefficient between X and Y is
defined as

where −1 ≤ ρ ≤ 1
Why Correlation?
● One can also show that corr[X; Y ] = 1 if and only if Y = aX + b
for some parameters a and b, i.e., if there is a linear
relationship between X and Y.
● The regression coefficient is given by a = (Cov [X; Y ] / V[X])
● The correlation reflects the noisiness and direction of a linear
relationship, but not the slope of that relationship, nor many
aspects of nonlinear relationships.
Different sets of (x; y) points, with the
correlation coefficient of x and y for each set
Correlation Matrix
● In the case of a vector x of related random variables,
and the correlation matrix is given by
Uncorrelated does not imply independent
● If X and Y are independent, meaning p(X; Y ) = p(X)p(Y ),
then Cov [X; Y ] = 0, and hence corr [X; Y ] = 0.
● So independent implies uncorrelated.
● However, the converse is not true: uncorrelated does not
imply independent.
● For example, let X ∼ U(−1; 1) and Y = X2.
● Clearly Y is dependent on X (in fact, Y is uniquely
determined by X), yet corr [X; Y ] = 0.
Correlation does not imply causation
● It is well known that “correlation does not imply causation”.
● We see a strong correlation between these signals. Indeed, it is
sometimes claimed that “eating ice cream causes murder”. This is just
a spurious correlation, due to a hidden common cause, namely the
weather. Hot weather increases ice cream sales, for obvious reasons.
Simpson’s Paradox
● Says that a statistical trend or relationship that appears in several
different groups of data can disappear or reverse sign when these
groups are combined.
● This results in counterintuitive behavior if we misinterpret claims of
statistical dependence in a causal way.

Overall, we see that y decreases with x, but within each subpopulation, y

increases with x.
Various PDFs
Multivariate Data
● In many applications, several measurements are made on each
individual or event generating an observation vector.
● The sample may be viewed as a data matrix

where the d columns correspond to d variables denoting the result of

measurements made on an individual or event.
● These are also called inputs,features,or attributes.
● The N rows correspond to independent and identically distributed
observations,examples,or instances on N individuals or events.
Multivariate Data
● For example, in deciding on a loan application, an
observation vector is the information associated with a
customer and is composed of age, marital status, yearly
income, and so forth, and we have N such past customers.
● These measurements may be of different scales, for
example, age in years and yearly income in monetary
units. Some like age may be numeric, and some like
marital status may be discrete.
● Typically these variables are correlated.
● If they are not, there is no need for a multivariate analysis.
Multivariate Data
● For simplification tasks on data we summarize the large body
of data by means of relatively few parameters.
● For exploratory tasks we may be interested in generating
hypotheses about data.
● In some applications, we are interested in predicting the value
of one variable from the values of other variables.
● If the predicted variable is discrete, this is multivariate
classification, and if it is numeric, this is a multivariate
regression problem.
The Multivariate Gaussian (normal) Distribution
● The most widely used joint probability distribution for
continuous random variables is the multivariate Gaussian or
multivariate normal (MVN).
● Gaussian assumption is fairly reasonable in many cases.
The normalization constant ensures that the pdf integrates to 1
Moments
● µn( a) = ∑(x-a )n. P(x )
● Central moments are moments taken from the distribution mean.
● The cumulants, kn, of a distribution are taken from the probability
density function, P(x).
● The first moment, µ1, characterizes the mean; the second moment,
µ2, characterizes the variance; and the third moment, µ3, the
skewness of the distribution that computes the asymmetry degree.
● The fourth-order moment is kurtosis, measuring the distribution
peakedness.
● The higher-order cumulants are revealed to be linear combinations of
the moments.
● Most significant statistics of central tendency are mean,
median and mode.
● Then, second type of calculation on features are statistics of
dispersion (spread), which are the average squared deviation
from the mean (variance), and its square root (standard
deviation).
● Variance and standard deviation mainly measure the same
thing, but the latter has the benefit that it is represented on the
same scale as the feature itself.
● A simpler dispersion statistic is the range, which is the
difference between maximum and minimum value.
● The first central moment is the average deviation from the mean, and
the second central moment is the average squared deviation from the
mean, called the variance.
● The skew and “peakedness” of a distribution are measured by
skewness and kurtosis, which are the third and fourth central moment
of the sample.
● The third central moment µ3 might be negative or positive.
Skewness
● A positive value of skewness indicates the right skewed
distribution whereas negative skewness indicates the left-skewed
case.
● Skewness for a normal distribution is zero, and any symmetric
data must have a skewness of zero.
● We have to note that skewness can be affected by outliers!
Kurtosis
● Fourth central moment.
● Positive excess kurtosis indicates that the distribution is
more sharply peaked than the normal distribution.
● Meanwhile, several feature extraction methods yield a
feature vector that is too big to be used as an input to a
classifier.
Quantiles

Plot of the cdf and pdf for the standard normal, N(0; 1)
import scipy as sp
import scipy.stats as stats import numpy as np
from matplotlib import pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
#Create a dataset
X, y = make_blobs(n_samples = 300, centers = 4, cluster_std = 0.60, random_state = 0)
plt.scatter(X[:,0], X[:,1])
X_mean = sp.mean(X[:,1])
print(‘Mean = ‘,X_mean)
X_mean = np.mean(X[:,1])
print(‘Mean = ‘,X_mean)
X_SD = sp.std(X[:,1])
print(‘SD = ‘,X_SD)
X_SD = np.std(X[:,1])
print(‘SD = ‘,X_SD)
X_median = sp.median(X[:,1])
print(‘Median = ‘,X_median)
X_median = np.median(X[:,1])
print(‘Median = ‘,X_median)
X_skewness = stats.skew(X[:,1])
print(‘Skewness = ‘,X_skewness)
X_kurtosis = stats.kurtosis(X[:,1])
print(‘Kurtosis = ‘,X_kurtosis)
Steps during EDA
∙ Descriptive statistics, charts, plots, and visualizations can be
utilized to look at the various data attributes and find relations
and correlations.
∙ Once data is collected, you need to make sure it is in a
useable format.
∙ Some algorithms require features in a specific format; some
algorithms can deal with target variables and features like
strings, integers, etc.
∙ Data preprocessing, cleaning, wrangling, and performing
initial exploratory data analysis is carried out.
To do Tasks in EDA
1. Explore, describe, and visualize data attributes.
2. Choose data and attribute subsets, which seem the most crucial
for the problem.
3. Make widespread assessments to find relationships and
associations and test hypotheses.
4. Note missing data points, if any.
(Data quality analysis is the final step in the data understanding stage
in which the quality of data is analyzed in the datasets and potential
shortcomings, errors, and issues are determined.)
Data Quality Analysis
∙ The data can be checked to determine if any pattern is obvious or if
a few data points are massively different from the rest of the data.
∙ Plotting data in different dimensions might help.
∙ The focus on data quality analysis includes the following
Missing values
Inconsistent values
Wrong information due to data errors (manual/automated)
Wrong metadata information
Next Steps
1. Data preparation for the model
2. Data integration – merging different datasets together
(attributes)
3. Data wrangling
Data Wrangling
● The process of data wrangling includes data processing,
normalization, cleaning, and formatting.
● Data in its raw form is hardly utilized by machine learning
techniques to build models.
Major Tasks in Data Wrangling
∙ Managing missing values (remove rows, impute missing values)
∙ Managing data inconsistencies (delete rows, attributes, fix
inconsistencies)
∙ Correcting inappropriate metadata and annotations
∙ Managing unclear attribute values
∙ Arranging and formatting data into necessary formats (CSV,
JSON, relational)
Next: Feature scaling and feature extraction
● In this stage important features or attributes are extracted from the
raw data or new features are created from existing features.
● Data features frequently should be scaled or normalized to avoid
producing biases with machine learning algorithms.
● Moreover, it is often necessary to choose a subset of all existing
features based on feature quality and importance.
● In situations where the data have a huge number of features, it is
always necessary to decrease its dimension or to find a
lower-dimensional depiction conserving some of its properties.
Types of Features
● Consider two features, one describing a person’s age and the other their house
number.
● Both features map into the integers, but the way we utilize those features can be
rather different.
● In house numbers there is no linear scale compared to age. So, both are numbers
but different.
● Statistical Features:
● Numerous statistical features can be extracted from each subsample data point,
as they are the main distinguishing values to describe the distribution of the data.
● The features are the minimum, maximum, mean, median, mode, standard
deviation, variance, first quartile, third quartile, and interquartile range (IQR) of
the data vector
Statistics or Aggregates
● The varieties of calculations on features are generally stated as
statistics or aggregates.
● Three main types are shape statistics, statistics of dispersion, and
statistics of central tendency.
● Each of these can be represented either as a tangible property of a
given sample (sample statistics) or a hypothetical property of an
unknown population.
● The statistical values—namely, mean, standard deviation,
skewness, and kurtosis—are generally utilized to reduce the
dimension of data.
● The first and second-order statistics are critical in data analysis.
● On the other hand, second-order statistics are not enough for many time
series data.
● Hence, higher-order statistics should also be used for a better description
of the data.
● Although the first and second-order statistics designate mean and
variance, the higher-order statistics designate higher-order moments.
● Higher-order statistics (HOS) denote the cumulants with orders of three
and higher-order computed numbers, which are linear combinations of
lower-order moments and lower-order cumulants.
Structured Features
● We create an instance vector from the features.
● Defining an instance with its vector of feature values is called an abstraction,
which is the result of filtering out redundant information.
● Features that work on structured instance spaces are called structured
features.
● These can be built either prior to learning a model or simultaneously with it.
● Significant characteristic of structured features is that they involve local
variables that denote objects other than the instance itself.
● Nevertheless, it is possible to employ other forms of aggregation over local
variables.
● E.g. Propositionalisation where the features can be translated from first-order
logic to propositional logic without local variables.
● Main challenge here is how to deal with combinatorial explosion of the
number of potential features.
Feature Transformations
● The objective is to improve the effectiveness of a feature by
eliminating, changing, or adding information.
● The best-known feature transformations are those that turn a
feature of one type into another of the next type down this list.
● Transformations also change the scale of quantitative features or
add a scale (or order) to ordinal, categorical, and Boolean
features.
● The simplest feature transformations are entirely deductive in
the sense that they achieve a well-defined result.
Binarization
● Binarization transforms a categorical feature into a set of Boolean
features, one for each value of the categorical feature.
● This loses information since the values of a single categorical
feature are mutually exclusive but are sometimes required if a
model cannot handle more than two feature values.
NumPy Boolean Indexing
92

● In NumPy, boolean indexing allows us to filter elements from an

array based on a specific condition.
● Boolean indexing is commonly known as a filter with boolean
masks to specify the condition.
● Boolean indexing uses the result of a Boolean operation over the
data, returning a mask with True or False for each row.
● The rows marked True in the mask will be selected.
● In NumPy, Boolean mask is a numpy array containing truth values
(True/False) that correspond to each element in the array.
Example of Boolean Masks
93

● Suppose we have an array named array1.

array1 = np.array([12, 24, 16, 21, 32, 29, 7, 15])
● Now let's create a mask that selects all elements of array1 that are greater than 20.
boolean_mask = array1 > 20
● Here, array1 > 20 creates a boolean mask that evaluates to True for elements that
are greater than 20, and False for elements that are less than or equal to 20.
● The resulting mask is an array stored in the boolean_mask variable as:
[False, True, False, True, True, True, False, False]
NumPy Boolean Indexing
94

● Boolean Indexing allows us to create a filtered subset of an array by passing

a boolean mask as an index.
● The boolean mask selects only those elements in the array that have a True

value at the corresponding index position.

import numpy as np
array1 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
boolean_mask = array1 % 2 != 0 # create a boolean mask
#boolean indexing to filter the odd numbers
result = array1[boolean_mask]
print(result)
[ 1 3 5 7 9]
array1 = np.array([1, 2, 4, 9, 11, 16, 18, 22, 26, 31, 33, 47, 51, 52])
# create a boolean mask using combined logical operators
boolean_mask = (array1 < 10) | (array1 > 40)
# apply the boolean mask to the array
result = array1[boolean_mask]
print(result)
[ 1 2 4 9 47 51 52]

numbers = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

numbers_copy = numbers.copy()
# change all even numbers to 0 in the copy
numbers_copy[numbers % 2 == 0] = 0
# print the modified copy
print(numbers_copy)
[1 0 3 0 5 0 7 0 9 0]
2D Boolean Indexing in NumPy
# create a 2D array
array1 = np.array([[1, 7, 9],[14, 19, 21],25, 29,
35]])
# create a boolean mask elements for greater than 9
boolean_mask = array1 > 9
result = array1[boolean_mask]
print(result)
[14 19 21 25 29 35]
● Unordering turns an ordinal feature into a categorical one by
removing the ordering of the feature values.
● Scaling may also be done.
● The sklearn.preprocessing package provides several custom
utility functions and transformer classes to transform raw feature
vectors into a representation that is more appropriate for the
downstream estimators.
Resources: Datasets
98

◻ UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html

◻ Statlib: http://lib.stat.cmu.edu/
◻ European Union (Eurostat): https://ec.europa.eu/eurostat/data/database

Chapter9 EX
No ratings yet
Chapter9 EX
5 pages
E Manual Mathematical Statistics - ms.KN.2018.
No ratings yet
E Manual Mathematical Statistics - ms.KN.2018.
130 pages
Most Compact and Complete Data Science Cheat Sheet 1672981093
No ratings yet
Most Compact and Complete Data Science Cheat Sheet 1672981093
10 pages
What Is Data Science? Probability Overview Descriptive Statistics
No ratings yet
What Is Data Science? Probability Overview Descriptive Statistics
10 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
10 pages
PR Unit 1 2
No ratings yet
PR Unit 1 2
40 pages
Machine Learning: Foundations: Prof. Nathan Intrator
No ratings yet
Machine Learning: Foundations: Prof. Nathan Intrator
60 pages
Bark08 Ghahramani Samlbb 01
No ratings yet
Bark08 Ghahramani Samlbb 01
26 pages
Pattern Recognition Notes Part-2 - Studocu
No ratings yet
Pattern Recognition Notes Part-2 - Studocu
16 pages
PATTERN FILE[1]
No ratings yet
PATTERN FILE[1]
29 pages
Unit-1
No ratings yet
Unit-1
84 pages
MILIT PPT Modifies
No ratings yet
MILIT PPT Modifies
43 pages
Pattern L1 L6
No ratings yet
Pattern L1 L6
19 pages
ML Document-1 - Merged
No ratings yet
ML Document-1 - Merged
19 pages
Note 1518944988
No ratings yet
Note 1518944988
27 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
7 pages
Maths Roadmap For Machine Learning - Statistics
No ratings yet
Maths Roadmap For Machine Learning - Statistics
5 pages
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
No ratings yet
A Pattern Is An Abstract Object, Such As A Set of Measurements Describing A Physical Object
12 pages
Tutorial 3
No ratings yet
Tutorial 3
30 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Learning Bayesian Networks (Neapolitan, Richard) PDF
100% (1)
Learning Bayesian Networks (Neapolitan, Richard) PDF
704 pages
U3 Prob & Stat & Hypo
No ratings yet
U3 Prob & Stat & Hypo
80 pages
UNIT-1 (Preparing To Model)
No ratings yet
UNIT-1 (Preparing To Model)
82 pages
statistics for applied science 200l
No ratings yet
statistics for applied science 200l
122 pages
Essentials of Bayesian Inference 1706204646
No ratings yet
Essentials of Bayesian Inference 1706204646
21 pages
Notes
No ratings yet
Notes
12 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
Duda Solutions PDF
No ratings yet
Duda Solutions PDF
77 pages
Math Psych 03
No ratings yet
Math Psych 03
48 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Unit 5 - Machine Learning
No ratings yet
Unit 5 - Machine Learning
17 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Machine Learning Essentials
No ratings yet
Machine Learning Essentials
36 pages
UNIT 2 dt
No ratings yet
UNIT 2 dt
8 pages
Jeff Byers - Machine Learning and Advanced Statitics
No ratings yet
Jeff Byers - Machine Learning and Advanced Statitics
48 pages
LectureNotes_complete (1)
No ratings yet
LectureNotes_complete (1)
90 pages
CS-601-Machine-learning-Unit-5 (1)
No ratings yet
CS-601-Machine-learning-Unit-5 (1)
18 pages
zzzz-essential_bayes
No ratings yet
zzzz-essential_bayes
158 pages
Learning Bayesian Networks Richard E Neapolitan instant download
100% (1)
Learning Bayesian Networks Richard E Neapolitan instant download
81 pages
Statistic & Machine Learning: Team 2
No ratings yet
Statistic & Machine Learning: Team 2
42 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Bayesian Analysis In Natural Language Processing Shay Cohen pdf download
100% (3)
Bayesian Analysis In Natural Language Processing Shay Cohen pdf download
77 pages
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
No ratings yet
Eric Jang - A Beginner's Guide To Variational Methods - Mean-Field Approximation
9 pages
Machine Learning/Data Science Interview Cheat Sheets: Aqeel Anwar
No ratings yet
Machine Learning/Data Science Interview Cheat Sheets: Aqeel Anwar
17 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
51 pages
ml 5
No ratings yet
ml 5
28 pages
DMbookTOC1
No ratings yet
DMbookTOC1
8 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Design And Analysis Of Algorithm
From Everand
Design And Analysis Of Algorithm
Bhupendra Mandloi
No ratings yet
Visualizing Data Structures
From Everand
Visualizing Data Structures
Rhonda Hoenigman
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Statistical Classification: Fundamentals and Applications
From Everand
Statistical Classification: Fundamentals and Applications
Fouad Sabry
No ratings yet
Learn Statistics Fast: A Simplified Detailed Version for Students
From Everand
Learn Statistics Fast: A Simplified Detailed Version for Students
Hesbon R.M
No ratings yet
Algorithms and Data Structures: An Easy Guide to Programming Skills
From Everand
Algorithms and Data Structures: An Easy Guide to Programming Skills
Rigdon Jonathan
No ratings yet
Statistical Foundations for Psychology
From Everand
Statistical Foundations for Psychology
James C. Ware
No ratings yet
Econometrics Lecture 3 Probability Refresher
No ratings yet
Econometrics Lecture 3 Probability Refresher
17 pages
(Ebook) Evaluating Econometric Forecasts of Economic and Financial Variables by Michael P. Clements (auth.) ISBN 9780230596146, 9781403901729, 9781403901736, 9781403941572, 0230596142, 1403901724, 1403901732, 1403941572 instant download
No ratings yet
(Ebook) Evaluating Econometric Forecasts of Economic and Financial Variables by Michael P. Clements (auth.) ISBN 9780230596146, 9781403901729, 9781403901736, 9781403941572, 0230596142, 1403901724, 1403901732, 1403941572 instant download
51 pages
Ques Bank Gargi 1
No ratings yet
Ques Bank Gargi 1
28 pages
May2012 1 Quantitative Petrophysical Uncertainty Public PDF
No ratings yet
May2012 1 Quantitative Petrophysical Uncertainty Public PDF
83 pages
Automated Colour Grading Using Colour Distribution Transfer: F. Piti E, A. C. Kokaram, R. Dahyot
No ratings yet
Automated Colour Grading Using Colour Distribution Transfer: F. Piti E, A. C. Kokaram, R. Dahyot
31 pages
18EC0409-Probability Theory and Stochastic Processes
No ratings yet
18EC0409-Probability Theory and Stochastic Processes
5 pages
Lecture 3 PDF
No ratings yet
Lecture 3 PDF
17 pages
Stat515 Midterm II Fall 2010
No ratings yet
Stat515 Midterm II Fall 2010
1 page
Exponential Functions PDF
No ratings yet
Exponential Functions PDF
2 pages
202003241550010566rajeev Pandey Stoch Process
No ratings yet
202003241550010566rajeev Pandey Stoch Process
5 pages
Reviewer Business Analytics
No ratings yet
Reviewer Business Analytics
11 pages
GoldSim Appendices
No ratings yet
GoldSim Appendices
129 pages
Sm2205es1 7
No ratings yet
Sm2205es1 7
12 pages
The Convolution of Gamma Distributions With A Stabilizer Constant
No ratings yet
The Convolution of Gamma Distributions With A Stabilizer Constant
13 pages
Normal Quantile Plot For DSPLR: - 4 - 2 0 2 4 Theoretical Quantiles
No ratings yet
Normal Quantile Plot For DSPLR: - 4 - 2 0 2 4 Theoretical Quantiles
5 pages
Stochastic Numerical Methods An Introduction for Students and Scientists 1st Edition Ra?L Toral instant download
100% (12)
Stochastic Numerical Methods An Introduction for Students and Scientists 1st Edition Ra?L Toral instant download
62 pages
(ELEC2600)[2021](s)final~=7em935^_25400
No ratings yet
(ELEC2600)[2021](s)final~=7em935^_25400
3 pages
Practical Weibull Analysis Monograph 5th Ed
No ratings yet
Practical Weibull Analysis Monograph 5th Ed
103 pages
Estimation and Detection Theory by Don H. Johnson
No ratings yet
Estimation and Detection Theory by Don H. Johnson
214 pages
Topic 5 Calculus Review HL
No ratings yet
Topic 5 Calculus Review HL
44 pages
Department of Mathematics Birla Institute of Technology Mesra, Ranchi Tutorial - 3
No ratings yet
Department of Mathematics Birla Institute of Technology Mesra, Ranchi Tutorial - 3
1 page
Math Task
No ratings yet
Math Task
2 pages
Operation On Single Random Variable
No ratings yet
Operation On Single Random Variable
52 pages
AP Calculus BC Released Exam 2008 Multiple-Choice Questions (College Board)
100% (1)
AP Calculus BC Released Exam 2008 Multiple-Choice Questions (College Board)
26 pages
Instant Download (Ebook) System modeling and simulation: an introduction by Frank L. Severance ISBN 0471496944 PDF All Chapters
100% (1)
Instant Download (Ebook) System modeling and simulation: an introduction by Frank L. Severance ISBN 0471496944 PDF All Chapters
81 pages
Complex Variables and Statistical Methods PDF
100% (1)
Complex Variables and Statistical Methods PDF
2 pages
Fractional Differential Equations: Bangti Jin
100% (1)
Fractional Differential Equations: Bangti Jin
377 pages
B.sc. Statistics
No ratings yet
B.sc. Statistics
186 pages

1.4

Uploaded by

1.4

Uploaded by

Data Handling

#Counter turns a sequence of values into a

#Note: sorted(L1) is different from L1.sort()

def mode(x: List[float]) -> List[float]:

output weight on ith input

Say, for a two input neuron

● Supervised: Learning with a labeled training set Classification

Example: email classification with already labeled emails Anomaly Detection

Create an internal representation of the input e.g. form clusters; extract

Overall, we see that y decreases with x, but within each subpopulation, y

where the d columns correspond to d variables denoting the result of

● In NumPy, boolean indexing allows us to filter elements from an

● Suppose we have an array named array1.

● Boolean Indexing allows us to create a filtered subset of an array by passing

value at the corresponding index position.

numbers = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

◻ UCI Repository: http://www.ics.uci.edu/~mlearn/MLRepository.html

You might also like