Supervised and Unsupervised Pattern Recognition
Supervised and Unsupervised Pattern Recognition
their Performance
Luciano da Fontoura Costa
Abstract
Pattern recognition, be in supervised or not, has motivated growing attention because of its several important appli-
cations. One issue of particular importance concerns the validation of the quality, e.g. in terms of correct classifications
and stability, which is often estimated by performing cross-validation methods. A model-based approach is adopted,
in which the data categories are understood statistically in terms of respective random variables, associated to the
features, as well as the associated density probability functions. This allows both the supervised and unsupervised
pattern recognition cases to be addressed in a principled manner while the important issues of bias, undersampling, un-
derlearning and overfitting are all addressed and revisited. Several important and even surprising results are reported,
including the interpretation of overfitting as not being necessarily unwanted, the characterization of the phenomenon
of underlearning, in which several unstable working decision boundaries can be obtained, as being a consequence of
biased sampling and/or undersampling, as well as the approach to unsupervised learning as involving two related but
not necessarily identical issues, namely choosing how to interrelated the clusters and deciding whether a group could
be considered as a cluster. To complement this development, we briefly consider the application of the coincidence
similarity index to some of the covered problems, as well as present the possibility to use the important problem of
image segmentation as a laboratory for better understanding and developing pattern recognition concepts and methods.
1
Each of these three main stages are characterized by Aspect Main characteristics
substantial challenges. At the acquisition level, one prob- Densities specifying
lem of particular relevance concerns which measurements Classification Regions the categories within
are to be adopted for characterizing the entities. At the the feature space.
pre-processing stage, approaches have to be chosen that Uniform / The type of the
are able to improve the data quality (e.g. remove noise) non-uniform regions density.
as well as means to properly normalize the measurements. Used to represent
Major issues related to the third stage include the choice Sampling the category regions,
of recognition methods to be applied. Then, at the vali- can be sparse or biased.
dation stage, metrics and approaches have to be defined Some samples
and adopted in order to validate the recognition results Wrong samples can have wrong
according to the obtained results. categories.
While all the main issues involved in all the above dis- Deviations from the
cussed pattern recognition stages have received substan- Statistical
respective densities
tial attention from the respective literature, the problem fluctuations
caused by sampling.
of characterizing the performance of the obtained recogni- Determined by the number
tion framework and results, so as to validate the adopted Dimensionality
of required features.
approach, remains an important issue worth continuing Defined by the recognition
attention. Decision boundaries
methodology.
The validation of a pattern recognition approach de- How accurate the regions,
pends on whether it is supervised or unsupervised, and Confidence anchor samples, boundaries
both cases are considered in the present work. and features are.
In the former case, this has been typically performed
The set of features
by using cross-validation approaches (e.g. [5, 1]). In their Adopted features
adopted representing
most simple implementations, this type of validation in- (a kind of sampling)
the entities.
volves separating the available data with identified cate-
Supervised / The type of pattern
gories into a training and a testing sets. The supervised
unsupervised recognition method.
recognition approach is then optimized for the identifica-
Performed to quantify
tion of the training set and its performance is then quan-
Cross-validation the performance
tified from the results obtained by its application to the
of the recognition.
test set. There are several variations of this basic princi-
Categories can
ple aimed at improving the validation comprehensiveness Clustered or not
be clustered or not.
and/or accuracy, such as considering several training and
test sets, dividing the groups in non-equal proportions, Table 1: A glossary of the many important aspects influencing pat-
including additional sets and validation stages, etc. tern recognition.
The result from cross-validations on supervised pattern
recognition approaches indicate how many correct and in-
correct classifications were obtained respectively to each stages in Figure 1), allied to the fact that these aspects
of the involved categories. Ideally, there should be no tend to influence one another, provides a cogent indica-
incorrect classifications, with all the test data elements tion about the complexity of the validation problem. In
being correctly identified. When a pattern recognition this work, we develop a model-based approach to study-
approach passes through a strict validation, we have an ing the several performance limitations involved in super-
indication that it may work properly in identifying the vised and unsupervised pattern recognition while trying
categories of new data. to consider in an integrated manner all the aspects iden-
The failing of an approach in the respective cross- tified in Table 1. Special attention is placed on the sta-
validation indicates that there could be problems virtu- tistical modeling of the categories in respective feature
ally anywhere in the framework shown in Figure 1. Ta- spaces, which paves the way to identifying several impor-
ble 1 summarizes the main aspects that typically play an tant concepts and potential issues in pattern recognition,
important role in defining the performance of a pattern including the potentially dramatic effect of the increase
recognition approach. of the feature space dimensionality on the recognition re-
The relative large number of aspects of different na- sults, specially from the perspective of the curse of the
ture involved in the performance of pattern recognition dimensionality. The issue of undersampling therefore re-
(already hinted by the validation task encompassing all ceives special attention along this work, from which we
2
characterize the phenomenon of underlearning, namely 2 Categories, Statistical Modeling
the obtention of several provisionally working but un-
stable recognition configurations that do not withstand
and Sampling
systematic cross-validations. The phenomenon commonly
known as overfitting also receives special attention, and it Groups (ensembles) of entities can be modeled in terms
is argued that it does not intrinsically constitute a short- of their respective measurements, which are considered
coming, but actually an asset in a pattern recognition as random variables. In these cases, the respective joint
approach. probability density function (or field), or densities for
An example of the important interrelationship between brief, provides all available statistical information about
aspects involved in pattern recognition, we have that the the properties of those variables, and therefor about the
two most often sought properties, namely selectivity is entities as far as their features are concerned.
generally obtained at the expense of generalization. Usu- Densities can be understood as mappings from each
ally, a balance needs to be achieved regarding these two point (entity represented in terms of its feature vector)
requirements, which should take into account the nature in a given support (a region of the feature space) into re-
of the data and questions of interest. spective non-negative values. In addition, the sum of all
Supervised and unsupervised pattern recognition are densities in the support needs to be identical to one. In
both approached in the present work, with the former be- a given pattern recognition problem, it is also important
ing addressed first. The case of supervised recognition to identify from the outset the boundings of the respec-
is developed while considering the above outlined con- tively feature space, which we will henceforth refer to as
cepts and aspects, with emphasis on undersampling, un- the respective universe Ω. This set can be determined
derlearning, and overfitting, helping us to identify and from the minimum and maximum values of each involved
approach some the main reasons that can undermine su- feature. Observe that each feature defines an associated
pervised pattern recognition, with several important and axis in the respective feature space where the entities are
potentially surprising results. Unsupervised recognition to be represented.
is then treated with emphasis on two key issues: (a) the There are two main types of probability density func-
quantification of the separation between groups of data tions: (i) uniform; and (ii) non-uniform. The first case
elements; and (b) the criterion adopted for deciding on is characterized by constant values assigned for all points
the existence of clusters. in the whole support. Non-uniform densities have vary-
To complement our development, we also propose the ing values assigned to those points. An example of non-
consideration of image segmentation, an important and uniform density is the normal distribution. Figure 2(a)
challenging issue on itself, as a laboratory for better un- illustrates a uniform density defined on a disk on the R2
derstanding, developing, and evaluating pattern recogni- space.
tion approaches and systems. From the perspective of this article, uniform densities
Though focus is kept on presenting the several con- can be treated in simplified manner, as corresponding to
cepts and methods in a relatively accessible manner, en- all the points in their respective support. As such, uni-
hanced understanding of this work will be helped by form densities provide a particularly effective means for
some previous experience in pattern recognition and/or approaching several of the intricacies of pattern recogni-
related areas, particularly multivariate statistics (e.g. [7]), tion and its performance characterization. For instance,
stochastic geometry (e.g. [10]), and set/multiset theory in Figure 2(a), it is enough to represent the region asso-
(e.g. [11, 12, 13, 14, 15]). In order to emphasize the main ciated to the support of a uniform density instead of a
concepts and results along the development of the present three dimensional representation where the constant den-
work, several snippets have been respectively included. sity would also be shown.
It should also be kept in mind that the presented con-
cepts and methods are still subject to further complemen- It is not always the case that the normal density and
tation and validations, so that they should be treated as its support are available. Indeed, oftentimes we only
preliminary. In addition, the application of any pattern have samples, obtained from inaccessible density formu-
recognition approach to real-world data as approached lae, from a given density. This is illustrated in Figure 2
here should be understood mostly as means for providing in terms of three possible samplings of the density in (a):
insights on the data and groups interrelationships to be sparser (b); denser (c); and biased (d).
further investigated and validated, not providing a basis The amount and quality of samples is of critical im-
for absolute decisions regarding the separation or exis- portance in pattern recognition. Even if all samples are
tence of clusters. correct, in practice they will always be available in lim-
ited numbers, implying that the original density will never
3
Needless to say, biases can have critical impact on the
recognition results.
Sampling is also required for approximating non-
uniform densities, as illustrated in Figure 4, where a vary-
ing density on a disk support (a) is sampled by a limited
number of samples (b). Observe that the density of the
samples tends to reflect the respective original density at
each of the points in the support.
4
presented in Section 2. from the samples; and (b) use the samples directly for
Basically, supervised pattern recognition involves two the recognition.
stages: (i) training, and (ii) application to classification In both cases, when only a limited number of samples
of new categories. It is the former stage that makes this are available, there will always be the possibility of un-
type of recognition supervised. Let us illustrate this basic dersampling, which can have critical impacts on the clas-
principle in supervised recognition with the help of the sification.
example in Figure 5, which involves two uniform category However, before addressing undersampling in a more
regions. systematic way, it is interesting to discuss the frequently
considered problem of overfitting. Basically, this phe-
In this particular case, the optimal decision boundary nomenon consists of the obtained decision boundaries be-
resulted aligned to the own original boundaries, which ing ‘too’ closely adapted to the samples, as illustrated in
is characteristic of adjacent category regions. Provided Figure 9(a).
there are no errors in new data elements, perfect perfor-
mance will characterize subsequent classifications. Here, we have two adjacent category regions that have
Another, more frequent, supervised classification situ- been fully separated at the expense of the use of a rela-
ation involving non-adjacent regions is presented in Fig- tively intricate decision boundary. Observe that all points
ure 6. have been correctly classified in this case. Now, if a new
data element becomes available and is mapped into the
Some additional examples of relatively compact, well- previous space as shown in Figure 9(b). Given that this
separated category regions are illustrated in Figure 7. new element resulted within the blue region, a misclassi-
fication will be respective implied. However, the system
Of particular importance is the fact that compact, well- can be retrained so that the new boundary region shown
separated regions makes the classification much easier, in (c) is obtained, again ensuring correct classifications
while also reducing the chances of underlearning as im- throughout, but at the cost of an even more intricate de-
plied by undersampling. cision boundary, therefore enhancing the overfitting. In-
Figure 8 presents another supervised recognition exam- terestingly, it is possible to show that decision boundaries
ple involving non uniform regions in a one-dimensional can be found in any supervised recognition problem that
space. will yield full adherence to the involved categories, there-
fore implying no classification errors.
Provided the densities are fully known, Bayesian deci- Now, an important point concerns the fact that, pro-
sion theory indicates the means for identifying the optimal vided there are no errors in the supplied categories of
decision boundaries respectively to minimizing the num- all the samples, the phenomenon of overfitting is not in-
ber of misclassifications, whose probability corresponds trinsically unwanted, but actually necessary to properly
to the areas where one density overlaps the other. More represent the categories in the feature space. Actually, in
specifically, let M categories c = 1, 2, . . . , M be repre- case the configuration in Figure 9(c) corresponds to all
sented by respective conditional densities p(~x | c). Let possible samples constituting the respective category, the
the mass probability of each category be P (c). Then, the decision boundary in that figure actually corresponds to
Bayesian classification criterion consists of applying: an optimal solution. In conclusion, generally speaking,
overfitting does not constitute a shortcoming of the ap-
C(~x) = c | max {P (c) p(~x | c)} (1) proach, but actually one respective asset. In summary,
c=1,M
we may conclude that:
In the case of Figure 8, this criterion yields the optimal
border as corresponding to the intersection between the 2 - The overfitting respectively to the correct classi-
two densities, i.e. x = b. fication of the whole set of correctly labels samples is
When the region densities cannot be accurately deter- not necessarily unwanted, but actually welcomed, irre-
mined, other approaches need to be applied, and that is spectively of the level of intricacy or tight adherence
precisely where the recognition problems start because implemented by the respective decision boundaries.
the respective loss of information. There are two main
situations yielding inaccurate densities: we do not know Figure 9 also illustrates some possible results of apply-
them, have only approximations or hypothesis, or only ing cross-validation to the situation in (a). Figure 9(d)
respective samples are available, possibly in limited num- presents the same case, but after removal of some of its
bers. At least the two following two approaches can be points. In this specific case, the retraining under these
attempted in the latter case: (a) estimate the densities circumstances will yield a decision boundary similar to
5
Figure 5: The basic principle underlying supervised pattern recognition, respective to two adjacent uniform regions in a two-dimensional
feature space: (a) objects belong to two categories, defined by their respective regions delimitated by the blue and green contours, are to be
recognized; (b) samples of the two categories are taken and used to train a respective classifier, which yields the decision boundary shown
in orange; and (c) new data can now be classified depending on which region they fall into.
Figure 6: The basic principle underlying supervised pattern recognition, respective to two non-adjacent uniform regions in a two-dimensional
feature space: (a) objects belong to two categories, defined by their respective regions delimitated by the blue and green contours, are to be
recognized; (b) samples of the two categories are taken and used to train a respective classifier, which yields the decision boundary shown
in orange; and (c) new data can now be classified depending on which region they fall into. Remarkably, a wide range of possible boundary
decisions, instead of the single one obtained in the example in Fig. 5 are now possible. This does not represent neither underlearning nor
overfitting.
Figure 7: Additional examples of relatively compact, well-separated category regions. Many decision boundaries can be found in these cases
that ensures fully correct classifications.
the previous one, yielding no classification errors. Fig- situation in (a). The key aspect here is that the decision
ure 9(e) presents another example in which several points boundary in (b) are in fact not accurate, hence the mis-
were removed from (a), but now a new decision boundary classification errors implied. In other words, the failing of
is obtained. In case the removed points are now tested in this approach under cross-validation only indicates that
this new region, misclassifications will take place (f). It is the boundary obtained with fewer points is less precise.
important to identify what can be learnt from these cross- To any extent, adjacent category regions with intricate
validation experiments as applied to the specific overfitted interrelationships will tend to be identified as overfitted,
6
4 - Underlearning, characterized by many provision-
ally working decision boundaries that are not similar to
the correct one, happens when the sampling does not
properly represent the regions.
7
Figure 9: Illustration of the phenomenon of overfitting and its relationship to cross-validations.
8
between the nodes are proportional to the respectively
obtained average distances. Another reference network
is obtained by randomized groups with the same number
of elements and dimensionality. These two networks can
then be compared qualitative and/or quantitatively. In
case the two networks are similar, it is very likely that
undersampling may be taking place, which can be further
investigated by cross-validation. Given that the minimum
distance between the samples in each pair of clusters is
too strict and relatively unstable (the change of a single
sample can strongly impact on the result), it is also in-
teresting to consider the distances between the center of
mass of the real and random groups.
(a) Let us illustrate this method respectively to a dataset
containing 3 types of handwritten characters (‘’c, ‘e’, and
‘o’) [16], each being represented by 50 samples. Each
data element is characterized in terms of four respective
features, which are henceforth taken in their standard-
ized version. The average distances obtained from the
randomly assigned pairs of groups with the same dimen-
sionality and number of samples was hdr i = 0.580765942.
Figure 12 shows the principal component (e.g. [17]) pro-
jection of the handwritten characters dataset (a), as well
as the distances networks respectively to the real data (b)
and randomly assigned simulation (c).
9
a cluster:
10
Figure 14: Dendrograms obtained from the handwritten characters dataset by using single-linkage, complete-linkage, average-linkage and
Ward agglomerative clustering. Observe the completely distinct clusters interrelationships suggested by each of these distinct approaches.
What is the most adequate for the handwritten characters dataset?
The issue (b) above, namely deciding on the existence formed dendrograms with those in Figure 14. Though
of one or more cluster is directly related to the interre- they are completely distinct as far as the relative posi-
lationship, especially the separation, between the candi- tions of the vertical axes where the mergings occur, the
date groups, and can be approached in those terms. For merging sequence is completely identical in all average-
instance, it is possible (e.g. [22]) to consider the length linkage cases presented above. At the same time, the
of the branches leading to a branch, multiplied by the type of illustrated transformations constitute a particu-
number of samples in that branch as an indication of how larly useful resource for zooming in and out of the several
much that possible cluster stands out among the others. scales along the y−axis. For instance, in case we are es-
While the above mentioned type of approach is inter- pecially interested in studying the clusters relationship at
esting and often leads to suitable results, there is an im- the finer merging scale, we could resource to a transforma-
portant issue that is not so often realized or discussed, tion similar to that obtained by the sharp sigmoid, and so
and it has to do with the fact that the scaling of the on. Another relevant observation about the dendrograms
y−axis variable, henceforth referred to as y, has a some- obtained for the handwritten characters dataset consists
what arbitrary nature. For instance, in the case of the in the fact that none of them, original or transformed,
average-linkage method, instead of taking the respective provided a pronounced indication, as far as the relative
average Euclidean distances between the groups as y, it lengths and widths of their branches are concerned, about
would be also possible to consider any monotonic trans- the original separation between three main types of char-
formation of y, for instance by taking it to the 5-th power, acters in this specific dataset.
to the 0.2 power, or taking a sharp sigmoid, as depicted The above example illustrates the difficulty in using the
in Figure 15. lengths of the branches as a criterion for deciding on the
whether the involved clusters should be separated or not.
It is particularly interesting to compared these trans- Actually, there is an alternative approach that does not
11
Figure 15: Monotonic transformations of the average-linkage dendrogram obtained for the handwritten characters, but taking the by taking
the average distances y to the 5-th power (a), to the 0.2 power (b), or through a sharp sigmoid (c). Completely distinct dendrograms can
therefore be obtained, emphasizing respectively the large, medium, and small scales of the clustering structure of the dataset. Importantly,
the sequence of merging of each of these transformations is completely preserved, while only the y-axes is ‘elastically’ modified.
depend on the length of the branches. It consists of us- ples, so that the obtained clusters can be confronted with
ing other criteria for that purpose, in particular one of the original ones. As a summary of our brief discussion
the several approaches to quantifying the separation be- of unwanted effects on the performance of unsupervised
tween clusters, including those based on scatter matrices recognition methods and their respective validation, it can
(e.g. [4]) or even network modularity (e.g. [16]). be said that:
Now that we have considered some of the most basic
aspects of unsupervised clustering, we can attempt to ap- 10 - Overfitting, undersampling, and underlearning
proach the issue of their respective performance. There is phenomena equally affect the supervised or unsuper-
a particularly direct manner to do this, by using a set of vised case.
classified samples that are then treated as it they were not,
being therefore classified in an unsupervised manner. The
original categories can then be taken into account in order
to evaluate the recognition results in terms of misclassi-
5 Similarity-Based Pattern
fications, as well as all the aspects discussed respectively Recognition
to supervised pattern recognition. Actually the effects of
most of those aspects are precisely the same whatever we In this section we consider some possibilities of using the
are dealing with supervised or unsupervised learning. For real-valued coincidence similarity index [23, 21, 16, 24] as
instance, the presence of biased samples in relatively high the basis for comparing and interrelating groups of sam-
dimensions, and/or undersampling, are likely to induce ples in both supervised and unsupervised pattern recog-
underlearning, implying clusters to be found where there nition.
are none. Basically, the coincidence similarity corresponds to
One important difference respectively to unsupervised the product between the real-valued Jaccard and in-
recognition is that cross-validation, e.g. by k-folding, can- teriority indices, which are based on multiset theory
not generally be performed in the same way, as there is (e.g. [11, 12, 13, 14, 15]). It is primarily aimed at perform-
no training stage involved in that case. Those methods ing similarity comparisons between two patterns, e.g. as
need to be adapted, for instance by identifying clusters represented by respective feature vectors. In the present
respective to a portion of the reference samples, and then work we limit our attention to the parameterless version
comparing them with other the results obtained by the of the coincidence similarity index which, in this partic-
same unsupervised respectively to other sets of samples. ularly case, has results comprised in the interval [−1, 1].
However, it should be observed that this is not a complete The higher the coincidence similarity value, the most sim-
test, because the procedure may induced similarly biased ilar two patterns can be said to be. So far, the coincidence
results in all cases. More comprehensive validation ap- index has been successfully applied to several applica-
proaches need to consider previously labelled sets of sam- tions, including template matching [21] and translation
of datasets into respective complex networks [16].
12
First, we present in Figure 17 the average ± stan- was obtained for the real data, two of which are higher
dard deviation of the coincidence values between random (in absolute value) than the random reference, while the
groups for 20, 25, . . . , 50 points in feature spaces of dimen- remainder distance is similar. This suggests that, at least
sion D, with all features varying from 0 to 1. for this specific example, the coincidence similarity can
lead to less intense underlearning as a consequence of bi-
ased sampling or undersampling.
Figure 18: Networks of average similarity for the real data (a)
Figure 16: The average ± standard deviations of the coincidence and respective randomly assigned simulation (b) respectively to the
values between two groups of 20, 25, . . . , 50 points in feature spaces handwritten characters database. Comparatively to the Euclidean
of dimension D, with all features varying from 0 to 1. Comparatively distance based results shown in Fig. 12, two of the real pairwise
to the respective Euclidean distance counterpart in Fig. 11, it can distances resulted larger than the random counterparts, while the
be said that the obtained coincidence values increase more steeply other distance result very similar. This suggest a better underlearn-
along the smaller dimensions, become relatively stable for the larger ing resilience of the coincidence similarity representation of the data
dimensions. elements, at least for the case of the specific data in this example.
Observe that the magnitude of negative coincidence similarity quan-
tifies the dissimilarity between the respective groups.
13
not only for better understanding supervised and unsu-
pervised pattern recognition, but also for developing and
comparing respective concepts and methods. That is be-
lieved to be so as a consequence of the following aspects:
(i) the original data to be classified (pixels) can be im-
mediately inferred from the images; (ii) in the case of
supervised recognition, the choice of prototypes can be
easily performed, e.g. by clicking on specific points of the
image; (iii) images in general have great complexity and
intricacy, providing a comprehensive resource for testing
methods; (iv) the effects of the pattern recognition can
be immediately perceived in terms of the highlighted seg-
mented regions, especially the identification of possible
underlearning caused by biased sampled and undersam-
pling.
Figure 19: Dendrogram obtained for the handwritten characters
Figure 20 illustrates the above possibilities with respect
dataset through single-linkage of coincidence similarities between
the clusters. A well-balanced distribution of mergings is obtained to the supervised segmentation of a color image of a land-
at all scales while moderately emphasizing the intrinsic subdivision scape (a) including natural and human made objects and
into three respective types of handwritten characters. Remarkably, structures, while also incorporating varying levels of lu-
virtually none of the intense chaining characterizing the Euclidean
minosity, shadows, and diverse types of backgrounds and
distance-based counter part can be observed.
textures. More specifically, segmentation results obtained
by using Pearson correlation coefficient and coincidence
similarity are shown respectively in (b) and (c), respec-
It should be observed that the coincidence similarity tively to five prototype points marked with red crosses. In
index can also be adapted to the other clustering ap- the former case, the segmentation generalized too much in
proaches, substituting the Euclidean distance or corre- detriment of the selectivity, which implied several struc-
lations whenever necessary. tures and textures to be merged. The results obtained by
the coincidence similarity resulted substantially more ad-
herent to the structures from which the prototypes were
6 Image Segmentation as a Labo- taken with a moderate loss of generalization. In addition,
ratory given that a relatively high number of features was in-
volved, namely 25 RGB pixels sorted by intensity within
Image analysis and computer vision constitute important each color channel, the good adherence to the respective
branches of artificial intelligence (e.g. [25, 5, 4, 6]) as a objects can be taken as an indication that underlearning
consequence of their impressive potential for automating is not taking place.
and enhancing activities typically performed by humans,
including prospection, surveillance, quality control, as-
tronomy, to name but a few possibilities.
One of the first steps along the image analysis pipeline, 7 Concluding Remarks
the task of image segmentation (e.g. [25, 4]) is as critical
as it is challenging. Basically, given an image, to segment Pattern recognition has progressed all the way from early
it typically means identifying its portions of special rele- promising approaches toward becoming one of the central
vance for being possibly related to specific objects in the current research subjects. This has been motivated by the
image, or portions of these objects. This seemingly sim- many important applications to virtually every scientific
ple endeavor is complicated by several effects including and technological area and aspect. Yet, the question of
noise, shadows, reflections, occlusions, and transparency, evaluating the performance of pattern recognition while
among several other unwanted interferences. The impor- also identifying the main causes that can undermine it and
tance and challenge of image segmentation has been di- devising possibilities of improvements, remains an ever
rectly reflected to a so large number of related studies, important subject.
based on the most varied areas and concepts. It should be kept in mind that all results in the present
As a consequence of some special characteristics, we work are preliminary and still being complemented and
argue here that the problem of image segmentation can evaluated. In addition, the application of pattern recogni-
provide a particularly interesting and effective laboratory tion as approached here should be taken as a resource for
14
(a) (b) (c)
Figure 20: Original image (a) and respective segmentations (b) by using the Pearson correlation coefficient between the selected features of
the prototypes and those of all pixels in the image. The prototypes, are marked by red crosses, refer to the sandstone wall (2 samples) and
the further away bridge (3 samples). The obtained regions, delimitated by respective red contours, can be observed not to adhere selectively
to any of the types of structures in this image. In this case, the generalization prevailed strongly in detriment of the selectivity to the types
of structures. The results obtained by the coincidence similarity (c) are characterized by a precise adherence to the types of structures
in the image while maintaining an excellent generalization ability. These enhanced results are a direct consequence of important specific
properties of the coincidence similarity operation, including its high selectivity/sensitivity while being substantially robust to localized
feature perturbations [24].
gathering insights about the analyzer problem from the considered several possible aspects influencing the per-
point of view of the interrelationship between its compo- formance of supervised and unsupervised recognition, it
nents that can lead to insights and better understanding, would be interesting to approach the issue of features
not as an absolute or definitive result. Indeed, the ap- normalization to greater depth, as this aspect can also
plication and interpretation of pattern recognition should strongly influence the recognition results. Several possi-
closely take into account the nature of the data, the ques- bilities have also been established respectively to the phe-
tions to be worked, as well as the limitations of the fea- nomenon of underlearning, which has been argued to play
tures, classification methods as well as all other involved a critically important role especially not only in the case
aspects. of highly dimensional feature spaces, but even for mod-
This work addressed the issue of identifying the aspects erate dimensions. In particular, it would be interesting
that influence the performance of supervised and unsuper- to derive more complete tables of the artifact distances
vised pattern recognition from the perspective of statis- not only in terms of addition numbers of samples and
tical modeling of the original categories. Several related dimensions, but also respectively to other normalizing in-
factors were addressed, with special attention given to tervals. Regarding the identification of clustering as in-
the phenomena of biased sampling, undersampling, uner- volving two related tasks that can perhaps be performed
learning caused by the former, as well as overfitting. Sev- more effectively in separate, it would be interesting to
eral important effects were identified and discussed with evaluate in a more systematic and comparative fashion
the help of some real-world data examples. Snippets have how it would perform respectively to several types of syn-
also been included in order to emphasize 10 main points thetic and real-world datasets, including diverse types of
discussed and addressed here, provide a good concluding noise and interferences. Another promising research line
remarks summary. consists in considering further multiset-based similarity
The developed concepts and methods as reported in indices, and especially the coincidence approach, respec-
the present work pave the way to several future develop- tively to several other types of data and possible appli-
ments. While the range of possibilities is particularly am- cations to supervised and unsupervised pattern recogni-
ple, some of the potentially mostly promising prospects tion. Among many other related developments, it would
are briefly presented in the following. Even though we be interesting to consider the parametric version of the
15
coincidence index, which allows for enhanced versatility [12] P. M. Mahalakshmi and P. Thangavelu. Properties of
in its applications. multisets. International Journal of Innovative Tech-
nology and Exploring Engineering, 8:1–4, 2019.
[6] E. R. Davies. Machine Vision. Morgan Kaufmann, [20] M. K. Vijaymeena and K. Kavitha. A survey on sim-
Amsterdam, 2005. ilarity measures in text mining. Machine Learning
and Applications, 3(1):19–28, 2016.
[7] R. A. Johnson and D.W. Wichern. Applied multi-
variate analysis. Prentice Hall, 2002. [21] L. da F. Costa. On similarity. https:
//www.sciencedirect.com/science/article/
[8] N. Mukhopadhyay. Probability and Statistical Infer- pii/S037843712200334X, 2022. Physica A: Statisti-
ence. CRC Press, New York, 2000. cal Mechanics and its Applications, 127456.
[9] S. Haykin. Neural Networks And Learning Machines. [22] Eric K. Tokuda, Cesar H. Comin, and Luciano
McGraw-Hill Education, 9th edition, 2013. da F. Costa. Revisiting agglomerative cluster-
ing. https://www.sciencedirect.com/science/
[10] D. Stoyan, W. S. Kendall, J. Mecke, and L. Ruschen- article/pii/S0378437121007068, 2022. Physica
dorf. Stochastic geometry and its applications. Wiley A, 585: 26433.
Chichester, 1995.
[23] L. da F. Costa. Further generalizations of the
[11] W. D. Blizard. Multiset theory. Notre Dame Journal Jaccard index. https://www.researchgate.
of Formal Logic, 30:36—66, 1989. net/publication/355381945_Further_
16
Generalizations_of_the_Jaccard_Index, 2021.
[Online; accessed 21-Aug-2021].
17