Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering 1st Edition Israël César Lerman (Auth.) pdf download
Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering 1st Edition Israël César Lerman (Auth.) pdf download
https://textbookfull.com/product/foundations-and-methods-in-
combinatorial-and-statistical-data-analysis-and-clustering-1st-
edition-israel-cesar-lerman-auth/
https://textbookfull.com/product/statistical-data-analysis-using-
sas-intermediate-statistical-methods-mervyn-g-marasinghe/
https://textbookfull.com/product/an-introduction-to-statistical-
methods-and-data-analysis-7th-edition-r-lyman-ott/
https://textbookfull.com/product/statistical-methods-for-
imbalanced-data-in-ecological-and-biological-studies-osamu-
komori/
https://textbookfull.com/product/statistical-methods-an-
introduction-to-basic-statistical-concepts-and-analysis-2nd-
edition-cheryl-ann-willard/
Statistical Human Genetics Methods and Protocols 2nd
Edition Robert C. Elston (Eds.)
https://textbookfull.com/product/statistical-human-genetics-
methods-and-protocols-2nd-edition-robert-c-elston-eds/
https://textbookfull.com/product/analysis-for-computer-
scientists-foundations-methods-and-algorithms-michael-
oberguggenberger/
https://textbookfull.com/product/statistical-methods-in-
psychiatry-and-related-fields-longitudinal-clustered-and-other-
repeated-measures-data-1st-edition-ralitza-gueorguieva/
https://textbookfull.com/product/analysis-for-computer-
scientists-foundations-methods-and-algorithms-second-edition-
oberguggenberger/
https://textbookfull.com/product/transcriptome-data-analysis-
methods-and-protocols-1st-edition-yejun-wang/
Advanced Information and Knowledge Processing
Foundations
and Methods in
Combinatorial and
Statistical Data
Analysis and
Clustering
Advanced Information and Knowledge
Processing
Series editors
Lakhmi C. Jain
Bournemouth University, Poole, UK, and
University of South Australia, Adelaide, Australia
Xindong Wu
University of Vermont
Information systems and intelligent knowledge processing are playing an increasing
role in business, science and technology. Recently, advanced information systems
have evolved to facilitate the co-evolution of human and information networks
within communities. These advanced information systems use various paradigms
including artificial intelligence, knowledge management, and neural science as well
as conventional information processing paradigms. The aim of this series is to
publish books on new designs and applications of advanced information and
knowledge processing paradigms in areas including but not limited to aviation,
business, security, education, engineering, health, management, and science. Books
in the series should have a strong focus on information processing—preferably
combined with, or extended by, new results from adjacent sciences. Proposals for
research monographs, reference books, coherently integrated multi-author edited
books, and handbooks will be considered for the series and each proposal will be
reviewed by the Series Editors, with additional reviews from the editorial board and
independent reviewers where appropriate. Titles published within the Advanced
Information and Knowledge Processing series are included in Thomson Reuters’
Book Citation Index.
123
Israël César Lerman
Department of Data Knowledge
and Management
University of Rennes 1, IRISA
Rennes, Ille-et-Vilaine
France
The author(s) has/have asserted their right(s) to be identified as the author(s) of this work in accordance
with the Copyright, Design and Patents Act 1988.
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, express or implied, with respect to the material contained herein or
for any errors or omissions that may have been made.
vii
viii Preface
given. The methodological principles are very new in the data mining field. All
types of data structures are clearly represented and can be handled in a precise way:
qualitative data of any sort, quantitative data and contingency data. The methods
invented have been validated by many important and big applications. Their the-
oretical foundations are clearly and strongly established from three points of view:
logical, combinatorial and statistical. In this way, the respective rationales of the
distinct methods are clearly set up.
As expressed above, the special structure we are interested in for a reduced
representation of the data is that obtained by clustering methods. A non-hierarchical
clustering algorithm on a finite set E, endowed with a similarity index, produces a
partition on E. Whereas a hierarchical clustering algorithm on E produces an
ordered partition chain on E. This book is dominated by hierarchical clustering.
However, methods of non-hierarchical clustering are also considered (see below).
In Chap. 1 we study some formal and combinatorial aspects of the sought
mathematical structure: partition or ordered chain of partitions. More particularly,
two sides are developed. The first is enumerative and consists of counting chains in
the partition lattice or counting specific subsets in the partition set. In order to relate
the partition type and the cardinality of the equivalence relation graph associated
with it, we are led to address the set organized of an integer partition. The second
important side concerns the mathematical representation of a partition and, more
generally and importantly, an ordered chain of partitions on a finite set E. Thereby,
the relationships between the latter structure and numerical (rep., ordinal) ultra-
metric spaces are established. In fact, all the algorithmic development of a given
clustering method is dependent on the representation adopted. We end Chap. 1 by
showing the transition between the formalization of symmetrical hierarchical
clustering and that of directed hierarchical clustering, where junctions between
clusters are directed according to a total (also said “linear”) order on E.
Our method is focused on ascendant agglomerative hierarchical clustering
(AAHC). However, non-hierarchical clustering plays an important role in the
compression of data representation. This methodology addresses the problem of
clustering an object set and not that of an attribute set. Its philosophy is different
from that of hierarchical clustering. In these conditions, we describe in Chap. 2 two
fundamental and essentially different methods of non-hierarchical clustering. These
reflect two important families of no-hierarchical clustering algorithms. It is a matter
of the “central” partitions of S. Régnier and that of “dynamic clustering” of E.
Diday. The latter is derived from a generalization of the “allocating and centring”
k-means algorithm, defined by D.J. Hall and G.H. Ball (see references of the
chapter concerned). This method is discussed in this chapter. On the other hand,
new theoretical and software developments are mentioned.
For the mathematical data representation the descriptive attributes are interpreted
in terms of relations on the object set. Thereby, categorical attributes of any sort are
represented faithfully. In these conditions, numerical attributes are defined as val-
ued relations. Whereas classical approaches propose a converse reasoning by
assigning, more or less arbitrarily, numerical values to categories.
Preface ix
In Chap. 3 we describe the set theoretic and relational representation of the data
description. All types of data can be taken into account. Two description levels are
considered: objects and categories. For each of the levels, object description and
category description, two attribute types are considered depending on the arity
of the representative relation on the object set, unary or binary. Notice that the arity
of the representative relation associated with a given attribute can be greater than
two. And this, is also considered in our development. Thus, in this framework, we
define several structured attributes concerned by observation of real data.
The fundamental concept of resemblance between data units: attributes, objects
or categories, is studied in Chaps. 4–7. It is based on a deep development of a
similarity notion between combinatorial structures. Invariance properties of statis-
tical nature are set up. These lead to a constructive and unified theory of the
resemblance notion. Classical association coefficients such that the Goodman and
Kruskal, Kendall and Yule coefficients are clearly stood in the framework of this
theory. Two options are considered for normalization of the association coefficients
between descriptive attributes: standard deviation and maximum. A probability
scale, associated with the first normalization, is built in order to compare association
coefficients between attributes or similarity indices between objects (resp., cate-
gories). This scale is obtained by associating independent random data with the
observed one, the random model respecting the general characteristics of the data
observed. This comparison technique is a part of the likelihood linkage analysis
(LLA) clustering method where an observed value of a numerical similarity index is
situated with respect to its unlikelihood bigness. Well-know non-parametric sta-
tistical theorems are needed for the application of this approach to the attribute
comparison. New theorems are established. Based on the same principle an index of
implication between Boolean attributes is set up. Also, we show how partial
association coefficients between structured categorical attributes are built.
Comparing objects described is not equivalent to comparing descriptive attri-
butes. We show in Chap. 7 how the LLA approach enables similarity indices
between objects, described by heterogeneous attributes of different types, to be
built. We also show how comparing categories is a specific task.
The fascinating concept of “natural” cluster of objects cannot be defined
mathematically. Its realization in real cases is expected as a result derived from
application of clustering algorithms. Such a cluster is interpreted intuitively.
However, it is important to define it as accurately as possible. This definition is
necessarily a statistical one. Nevertheless, statistical formalization of a “natural”
cluster is very difficult. In Chap. 8 we address this concept. Statistical tools are
established for understanding the meaning of such a cluster. For this purpose, initial
description is examined for all types of data. Thus, the analysis of a “natural”
cluster is essentially analytical. Another way consists of crossing with the target
cluster associated with a “natural” cluster, known and discriminant clusters disjoint
logically of it, but statistically linked. A “natural” cluster is a part of a “natural”
clustering. Generally, this statistical structure sustains real data. However, it is
important to test this hypothesis for the data treated. In these conditions, “classi-
fiability” testing hypotheses are proposed and studied.
x Preface
they were around preparing theses and subsequent articles. I especially thank them.
The theses defended at the University of Rennes 1 can be consulted at the link
address: Sadoc.abes.fr/Recherche avancée.
Collaborators
xv
xvi Acknowledgements
Philippe Louarn (INRIA-Rennes) has defined the general LATEX structure with
respect to which I have composed this book. He helped me many times and his help
was always valuable. I am very grateful to him.
I cannot conclude these acknowledgments without special thanks to my
son-in-law Benjamin Enriquez (Professor of Mathematics at the University Louis
Pasteur of Strasbourg). I regularly used to inform him about the progress of my
writing. His encouragement and advice have always been very beneficial.
Contents
xvii
xviii Contents
∀(x, y) ∈ O × O , x P y ⇔ x and y are in the same class of the partition P
Clearly, there is a bijective correspondence between P(O) and the set, designated
by Eq (O) of equivalence relations on O. To simplify notations, we will denote below
by P the set introduced with the notation P(O).
The graph of a binary relation P on O is the subset of O × O defined by
oa ob oc od oe of og
oa
ob
oc
od
oe
of
og
which means
∀ (x, y) ∈ O × O , x P y ⇒ x P y
is finer than
P = {oa , ob , oc , od }, {oe , o f , og } .
This order relation endows P with a lattice structure; that is to say, to every pair
{P, P } of P elements corresponds in P a common greatest lower bound P ∧ P
and a common lowest upper bound P ∨ P .
P ∧ P can be defined by the graph of the associated equivalence relation
Gr (P ∧ P ) = Gr (P) ∩ Gr (P )
where Gr (P) (resp., Gr (P )) is the graph of the equivalence relation associated with
P (resp., P ).
P ∨ P can also be defined from its graph. Gr (P ∨ P ) is the graph of the
transitive closure of the binary relation “P or P ”. In more explicit words, for any
(x, y) ∈ O × O, x P ∨ P y, if and only if there exists a sequence (z 0 , z 1 , . . . , zl ),
where z 0 = x, zl = y and such that z i Pz i+1 or zi P z i+1 , for i = 0, 1, . . . , l −1.
Example
Relative to above, consider P = {oa , ob , oc , od }, {oe , o f }, {og } and
P = {oa , ob }, {oc , od }, {oe , o f , og }
P ∧ P = {oa , ob }, {oc , od }, {oe , o f }, {og } , P ∨ P = {oa , ob , oc , od }, {oe , o f , og }
Clearly, the lattice P depends only on the cardinality n of O. The smallest ele-
ment of P is the finest partition, that for which each class is a “singleton” class,
including exactly one element of O. The biggest element of P is defined by the least
fine partition of O comprising a single class which includes O in its totality. The
finest and least fine partitions are considered as “trivial” partitions. They are called
partition of singletons and singleton partition, and will be denoted below by Ps and
Pt , respectively.
(a) A partition P covers a partition P if and only if
1. P < P ;
2. {Q|Q ∈ P, P < Q < P } =]P, P [= ∅.
Exploring the Variety of Random
Documents with Different Content
[2095] S. Fleischhafen.
[2096] S. Attest.
[2097] S. Eltern.
[2099] S. Stadt.
[2100] S. abbetteln.
[2101] S. Blut.
[2102] S. abbrennen.
[2103] S. bezahlen.
[2104] S. abzahlen.
[2106] S. Aas.
[2107] S. anbrennen.
[2108] S. aufschlagen.
[2109] S. schlagen.
[2110] S. Ast.
[2113] S. handeln.
[2114] S. ankleiden.
[2115] S. anlachen.
[2116] S. abbetteln.
[2118] S. abgehen.
[2119] S. belügen.
[2120] Vgl. (betr. den Gebrauch des Subst. als Adj.)
„Vorbemerkung“, S. 15, Anm. 38 E.
[2121] S. abbrennen.
[2123] S. ermorden.
[2124] S. besonnen.
[2125] S. Konkurs.
[2127] S. arg.
[2128] S. aberwitzig.
[2129] S. angenehm.
[2130] S. abschließen.
[2131] S. Ärger.
[2132] S. absterben.
[2133] S. Amme.
[2134] S. essen.
[2136] S. Brücke.
[2137] S. Mastpulver.
[2138] S. Adler.
[2139] S. Gewerbeschein.
[2141] S. Entenstall.
[2142] S. Ei.
[2143] S. Fleischhafen.
[2144] S. abschießen.
[2145] S. anbeten.
[2146] S. Entenfuß.
[2147] S. Angesicht.
[2148] S. abbeißen.
[2149] S. alljährlich.
[2150] S. belügen.
[2151] S. anreden.
[2152] S. ansagen.
[2153] S. abgeben.
[2154] S. abschreiben.
[2155] S. absingen.
[2156] S. aufspielen.
[2157] S. Stadt.
[2158] S. alltäglich.
[2163] S. Eisenbahnwagen.
[2164] S. abfahren.
[2165] S. Betrug.
[2167] S. Haselnuß.
[2168] S. Aschenbecher.
[2169] S. Bauch.
[2171] S. Brücke.
[2173] S. Bauernfrau.
[2174] S. Abort.
[2175] S. abbrühen.
[2176] S. Hahn.
[2177] S. Henne.
[2178] S. Fleischhafen.
[2179] S. Mühle.
[2180] S. Metzelsuppe.
[2181] S. Adler.
[2183] S. Chaussee.
[2185] S. abfahren.
[2186] S. abgehen.
[2187] S. abbeißen.
[2188] S. anschauen.
[2189] S. anfassen.
[2190] S. ausstehlen.
[2191] S. abtragen.
[2193] S. Frau.
[2194] S. Bauernfrau.
[2195] S. Amme.
[2196] S. (betr. M a l f e s) Frauenrock.
[2198] S. Flurschütz.
[2199] S. Apfelwein
[2200] S. Ananas.
[2202] S. ausweinen.
[2203] S. Bierglas.
[2204] S. Abort.
[2205] S. Fleischhafen.
[2206] S. Baumholz.
[2207] S. Apfelbaum.
[2212] S. bewerfen.
[2213] S. abfallen.
[2216] S. abwaschen.
[2217] S. Aas.
[2218] S. gebären.
[2221] S. Hauswirt.
[2222] S. Abort.
[2224] S. Ärger.
[2227] S. Leberwurst.
[2229] S. Abendessen.
[2230] S. Aas.
[2231] S. Brücke.
[2232] S. Metzelsuppe.
[2233] S. arg.
[2236] S. abbeißen.
[2238] S. Aas.
[2239] S. behext.
[2240] S. Betrug.
[2241] S. Amme.
[2245] S. Attest.
[2246] S. Kaffee.
[2249] S. Pfeife.
[2253] S. Löwenzahn.
[2256] S. Haushund.
[2257] S. Beischläferin.
[2258] S. Eisenbahnwagen.
[2259] S. Frauenstube.
[2266] S. Fingerhut.
[2267] S. abbrühen.
[2268] S. abbrennen.
[2270] S. abgehen.
[2271] S. anschauen.
[2272] S. abgeben.
[2273] S. aufschlagen.
[2274] S. Ast.
[2275] S. abschließen.
[2279] S. Apfelkern.
[2280] S. Apfelkuchen.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
textbookfull.com