SlideShare a Scribd company logo
Solved Example
Birch Algorithms
Balanced Iterative Reducing And Clustering Using Hierarchies
Dr. Kailash Shaw & Dr. Sashikala Mishra
Symbiosis International University.
Introduction
BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data-mining algorithm used to
perform hierarchical-clustering over particularly large data-sets.
• The BIRCH algorithm takes as input a set of N data points, represented as real-valued vectors, and a desired number
of clusters K. It operates in four phases, the second of which is optional. tree, while removing outliers and grouping
crowded subclusters into larger ones.
• Phase 1: Load data into memory
Scan DB and load data into memory by building a CF tree. If
memory is exhausted rebuild the tree from the leaf node.
• Phase 2: Condense data
Resize the data set by building a smaller CF tree
Remove more outliers
Condensing is optional
• Phase 3: Global clustering
Use existing clustering algorithm (e.g. KMEANS, HC) on CF
entries
• Phase 4: Cluster refining
Refining is optional
Fixes the problem with CF trees where same valued data points
may be assigned to different leaf entries.
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, , considering T<1.5, and Max Branch = 2
For each Data Point we need to evaluate Radius and
Cluster Feature:
->Consider Data Pint (3,4):
As it is alone in the Feature map, Hence
1. Radius = 0
2. Cluster Feature CF1 <N, LS, SS>
N = 1 as there is now one data point under
consideration.
LS = Sum of Data Point under consideration = (3,4)
SS = Square Sum of Data Point Under Consideration
= (32, 42)=(9,16)
3. Now construct the Leaf with Data Point X1 and Branch
as CF1.
CF1 <1, (3,4), (9,16)>
Leaf
X1 = (3, 4)
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
For each Data Point we need to evaluate Radius and Cluster Feature:
->Consider Data Pint x2 = (2,6):
1. Linear Sum LS = (3,4) + (2,6) = (5,10)
2. Square Sum SS = (32+22 , 42+62) =(13, 52)
Now Evaluate Radius considering N=2
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(13,52)−(5,10)2/2
2
=
(13,52)−(25,100)/2
2
=
(13,52)−(12.5,50)
2
= 6.5,26 − (6.25,25) = (0.25,1) =(0.5, 1)<T As
(0.25,1) < (T, T), hence X2 will cluster with Leaf X1.
2. Cluster Feature CF1 <N, LS, SS> = <2,(5,10),(13,52)>
N = 2 as there is now two data point under CF1.
LS = (3,4) + (2,6) = (5,10)
SS = (32+22 , 42+62) =(13, 52)
CF1 <1, (5,10), (13,52)>
Leaf
X1 = (3, 4),
X2 = (2,6)
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
For each Data Point we need to evaluate Radius and Cluster Feature:
->Consider Data Pint x3 = (4,5) on CF1:
1. Linear Sum LS = (4,5) + (5,10) = (9,15)
2. Square Sum SS = (42+13 , 52 + 52) =(29, 77)
Now Evaluate Radius considering N=3
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(29,77)−(9,15)2/3
3
=(0.47, 0.4714)<T
As (0.47, 0.471) < (T, T), hence X3 will cluster with Leaf (X1, x2).
2. Cluster Feature CF1 <N, LS, SS> = <3,(9,15),(29,77)>
N = 3 as there is now Three data point under CF1.
LS = (4,5) + (5,10) = (9,15)
SS = (42+13 , 52 + 52) =(29, 77)
CF1 <1, (9,15), (29,77)>
Leaf
X1 = (3, 4),
X2 = (2,6),
X3 = (4,5)
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
For each Data Point we need to evaluate Radius and Cluster Feature:
->Consider Data Pint x4 = (4,7) on CF1:
1. Linear Sum LS = (4,7) + (9,15) = (13,22)
2. Square Sum SS = (42+29 , 72 + 77) =(45, 126)
Now Evaluate Radius considering N=4
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(45,126)−(13,22)2/4
4
=(0.41, 0.55)
As (0.41, 0.55) < (T, T), hence X4 will cluster with Leaf (X1, x2, x3).
2. Cluster Feature CF1 <N, LS, SS> = <4,(13,22),(45,126)>
N = 4 as there is now four data point under CF1.
LS = (4,7) + (9,15) = (13,22)
SS = (42+29 , 72 + 77) =(45, 126)
CF1 <1, (13,22), (45,126)>
Leaf
X1 = (3, 4),
X2 = (2,6),
X3 = (4,5),
X4 = (4,7)
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
For each Data Point we need to evaluate Radius and Cluster Feature:
->Consider Data Pint x5 = (3,8) on CF1:
1. Linear Sum LS = (3,8) + (13,22) = (16,30)
2. Square Sum SS = (32+45 , 82 + 126) =(54, 190)
Now Evaluate Radius considering N=5
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(54,190)−(16,30)2/5
5
=(0.33, 0.63)
As (0.33, 0.63) < (T, T), hence X5 will cluster with Leaf (X1, x2, x3, x4).
2. Cluster Feature CF1 <N, LS, SS> = <5,(16,30),(54,190)>
N = 5 as there is now four data point under CF1.
CF1 <5,(16,30),(54,190)>
Leaf
X1 = (3, 4),
X2 = (2,6),
X3 = (4,5),
X4 = (4,7)
X5 = (3,8)
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
For each Data Point we need to evaluate Radius and Cluster Feature:
->Consider Data Pint x6 = (6,2) on CF1:
1. Linear Sum LS = (6,2) + (16,30) = (22,32)
2. Square Sum SS = (62+54 , 22 + 190) =(90, 194)
Now Evaluate Radius considering N=6
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(90,194)−(22,32)2/6
6
=(1.24, 1.97)
As (1.24, 1.97) < (T, T), False. hence X6 will Not form cluster with CF1.
CF1 will remain as it was in previous step. And New CF2 with leaf x6
will be created.
2. Cluster Feature CF2 <N, LS, SS> = <1,(6,2),(36,4)>
N = 1 as there is now one data point under CF2.
LS = (6,2)
SS = (62, 22)= (36,4)
CF1 <5,(16,30),(54,190)>
CF2 <1,(6,2),(36,4)>
Leaf
X1 = (3, 4),
X2 = (2,6),
X3 = (4,5),
X4 = (4,7)
X5 = (3,8)
Leaf
X6 = (6, 2),
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
For each Data Point we need to evaluate Radius and Cluster Feature:
->Consider Data Pint x7 = (7,2). As There are Two Branch CF1 and
CF2 hence we need to find with which branch X7 is nearer, then with
that leaf radius will be evaluated.
With CF1 = LS/N= (16,30)/5=(8,6) As there are N=5 Data Point
With CF2 = LS/N= (6,2)/1=(6,2) As there is N=1 Data Point
Now x7 is closer to (6,2) then (8,6). Hence X7 will calculate radius with
CF2.
1. Linear Sum LS = (7,2) + (6,2) = (13,4)
2. Square Sum SS = (72+36 , 22 + 4) =(85, 8)
Now Evaluate Radius considering N=2
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(85,8)−(13,4)2/2
2
=(0.5, 0)
As (0.5, 0) < (T, T), True. hence X7 will form cluster with CF2
2. Cluster Feature CF2 <N, LS, SS> = <2,(13,4),(85,8)>
N = 2 as there is now two data point under CF2.
LS = (7,2) + (6,2) = (13,4)
SS = (72+36 , 22 + 4) =(85, 8)
CF1 <5,(16,30),(54,190)>
CF2 <2,(13,4),(85,8)>
Leaf
X1 = (3, 4),
X2 = (2,6),
X3 = (4,5),
X4 = (4,7)
X5 = (3,8)
Leaf
X6 = (6, 2),
X7=(7,2)
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
->Consider Data Pint x8 = (7,4). As There are Two Branch CF1 and
CF2 hence we need to find with which branch X8 is nearer, then with
that leaf, radius will be evaluated.
With CF1 = LS/N= (16,30)/5=(8,6) As there are N=5 Data Point
With CF2 = LS/N= (13,4)/2=(6.5,2) As there is N=2 Data Point
Now x8 is closer to (6.5,2) then (8,6). Hence X8 will calculate radius
with CF2.
1. Linear Sum LS = (7,4) + (13,4) = (20,8)
2. Square Sum SS = (72+85 , 42 + 8) =(134, 24)
Now Evaluate Radius considering N=3
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(134,24)−(20,8)2/3
3
=(0.47, 0.94)
As (0.47, 94) < (T, T), True. hence X8 will form cluster with CF2
2. Cluster Feature CF2 <N, LS, SS> = <3,(20,8),(134,24)>
N = 3 as there is now two data point under CF2.
LS (7,4) + (13,4) = (20,8)
SS = (134,24)
CF1 <5,(16,30),(54,190)>
CF2 <3,(20,8),(134,24)>
Leaf
X1 = (3, 4),
X2 = (2,6),
X3 = (4,5),
X4 = (4,7)
X5 = (3,8)
Leaf
X6 = (6, 2),
X7=(7,2) ,
X8 = (7,4)
Example
Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4) , x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
->Consider Data Pint x9 = (8,4). As There are Two Branch CF1 and
CF2 hence we need to find with which branch X9 is nearer, then with
that leaf, radius will be evaluated.
With CF1 = LS/N= (16,30)/5=(8,6) As there are N=5 Data Point
With CF2 = LS/N= (20,8)/3=(6.6,2.6) As there is N=3 Data Point
Now x9 is closer to (6.6,2.6) then (8,6). Hence X8 will calculate radius
with CF2.
1. Linear Sum LS = (8,4) + (20,8) = (28,12)
2. Square Sum SS = (82+134 , 42 + 24) =(198, 40)
Now Evaluate Radius considering N=4
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(198,40)−(28,12)2/4
4
=(0.70, 1)
As (0.7, 1) < (T, T), True. hence X9 will form cluster with CF2
2. Cluster Feature CF2 <N, LS, SS> = <4,(28,12),(198,40)>
N = 4 as there is now four data point under CF2.
LS = (28,12)
SS = (198,40)
CF1 <5,(16,30),(54,190)>
CF2 <4,(28,12),(198,40)>
Leaf
X1 = (3, 4),
X2 = (2,6),
X3 = (4,5),
X4 = (4,7)
X5 = (3,8)
Leaf
X6 = (6, 2),
X7=(7,2) ,
X8 = (7,4),
X9 = (8,4)
Example Let Have Following Data
X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4) ,
x10=(7,9)
Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2
->Consider Data Pint x10 = (7,9). As There are Two Branch CF1 and
CF2 hence we need to find with which branch X9 is nearer, then with
that leaf, radius will be evaluated.
With CF1 = LS/N= (16,30)/5=(8,6) As there are N=5 Data Point
With CF2 = LS/N= (28,12)/4=(7,3) As there is N=4 Data Point
Now x10 is closer to (8,6) then (7,3). Hence X10 will calculate radius
with CF1.
1. Linear Sum LS = (7,9) + (16,30) = (23,39)
2. Square Sum SS = (72+54 , 92 + 190) =(103, 271)
Now Evaluate Radius considering N=6
𝑅 =
𝑆𝑆−𝐿𝑆2/𝑁
𝑁
=
(103,271)−(23,39)2/6
6
=(1.57, 1.70)
As (1.57, 1.70) < (T, T), False. hence X10 will become new leaf and Create new
cluster feature CF3. But in a Branch only two CF is allowed hence Branch will
Split.
2. Cluster Feature CF3 <N, LS, SS> = <1,(7,9),(49,81)>
CF1 <5,(16,30),(54,190)>
CF2 <4,(28,12),(198,40)>
Leaf
X1 = (3, 4),
X2 = (2,6),
X3 = (4,5),
X4 = (4,7)
X5 = (3,8)
Leaf
X6 = (6, 2),
X7=(7,2) ,
X8 = (7,4),
X9 = (8,4)
CF12 <9,(44,42),(252,230)>
CF3 <1,(7,9),(49,81)>
CF3 <1,(7,9),(49,81)>
Leaf
X10= (7,9)
Thank You
Ad

More Related Content

What's hot (20)

DBSCAN : A Clustering Algorithm
DBSCAN : A Clustering AlgorithmDBSCAN : A Clustering Algorithm
DBSCAN : A Clustering Algorithm
Pınar Yahşi
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
Pabna University of Science & Technology
 
Direct linking loaders
Direct linking loadersDirect linking loaders
Direct linking loaders
Satyamevjayte Haxor
 
05 Clustering in Data Mining
05 Clustering in Data Mining05 Clustering in Data Mining
05 Clustering in Data Mining
Valerii Klymchuk
 
strong slot and filler
strong slot and fillerstrong slot and filler
strong slot and filler
BMS Institute of Technology and Management
 
Lecture 21 problem reduction search ao star search
Lecture 21 problem reduction search ao star searchLecture 21 problem reduction search ao star search
Lecture 21 problem reduction search ao star search
Hema Kashyap
 
Birch
BirchBirch
Birch
ThamizharasiM3
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
BIGDATA ANALYTICS LAB MANUAL final.pdf
BIGDATA  ANALYTICS LAB MANUAL final.pdfBIGDATA  ANALYTICS LAB MANUAL final.pdf
BIGDATA ANALYTICS LAB MANUAL final.pdf
ANJALAI AMMAL MAHALINGAM ENGINEERING COLLEGE
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
Artificial Intelligence Searching Techniques
Artificial Intelligence Searching TechniquesArtificial Intelligence Searching Techniques
Artificial Intelligence Searching Techniques
Dr. C.V. Suresh Babu
 
Disjoint sets
Disjoint setsDisjoint sets
Disjoint sets
Core Condor
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
Xueping Peng
 
Dynamic Itemset Counting
Dynamic Itemset CountingDynamic Itemset Counting
Dynamic Itemset Counting
Tarat Diloksawatdikul
 
K-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptxK-Means Clustering Algorithm.pptx
K-Means Clustering Algorithm.pptx
JebaRaj26
 
Birch
BirchBirch
Birch
Binod Malla
 
Adversarial search
Adversarial searchAdversarial search
Adversarial search
Nilu Desai
 
sum of subset problem using Backtracking
sum of subset problem using Backtrackingsum of subset problem using Backtracking
sum of subset problem using Backtracking
Abhishek Singh
 
Context free grammar
Context free grammar Context free grammar
Context free grammar
Mohammad Ilyas Malik
 

Similar to Birch Algorithm With Solved Example (20)

Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
Vyacheslav Arbuzov
 
module 1.pdf
module 1.pdfmodule 1.pdf
module 1.pdf
KimTaehyung188352
 
Computer Graphics End Semester Question Paper
Computer Graphics End Semester Question PaperComputer Graphics End Semester Question Paper
Computer Graphics End Semester Question Paper
Ansuman Mahapatra
 
INTRODUCTION TO MATLAB session with notes
  INTRODUCTION TO MATLAB   session with  notes  INTRODUCTION TO MATLAB   session with  notes
INTRODUCTION TO MATLAB session with notes
Infinity Tech Solutions
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theorem
Wathna
 
Solution of matlab chapter 3
Solution of matlab chapter 3Solution of matlab chapter 3
Solution of matlab chapter 3
AhsanIrshad8
 
Department of MathematicsMTL107 Numerical Methods and Com.docx
Department of MathematicsMTL107 Numerical Methods and Com.docxDepartment of MathematicsMTL107 Numerical Methods and Com.docx
Department of MathematicsMTL107 Numerical Methods and Com.docx
salmonpybus
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...
Alexander Decker
 
BUilt in Functions and Simple programs in R.pdf
BUilt in Functions and Simple programs in R.pdfBUilt in Functions and Simple programs in R.pdf
BUilt in Functions and Simple programs in R.pdf
karthikaparthasarath
 
1574 multiple integral
1574 multiple integral1574 multiple integral
1574 multiple integral
Dr Fereidoun Dejahang
 
Unit 3
Unit 3Unit 3
Unit 3
Siddhant Goyal
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions Manual
LewisSimmonss
 
me310_6_interpolation.pdf for numerical method
me310_6_interpolation.pdf for numerical methodme310_6_interpolation.pdf for numerical method
me310_6_interpolation.pdf for numerical method
kedirabdisa61
 
10 Coordinate Geometry Math Concepts .ppt
10 Coordinate Geometry Math Concepts .ppt10 Coordinate Geometry Math Concepts .ppt
10 Coordinate Geometry Math Concepts .ppt
ManojWilson1
 
CLUSTERGRAM
CLUSTERGRAMCLUSTERGRAM
CLUSTERGRAM
Dr. Volkan OBAN
 
Matlab-free course by Mohd Esa
Matlab-free course by Mohd EsaMatlab-free course by Mohd Esa
Matlab-free course by Mohd Esa
Mohd Esa
 
Solucionario teoria-electromagnetica-hayt-2001
Solucionario teoria-electromagnetica-hayt-2001Solucionario teoria-electromagnetica-hayt-2001
Solucionario teoria-electromagnetica-hayt-2001
Rene Mauricio Cartagena Alvarez
 
Lesson 12 centroid of an area
Lesson 12 centroid of an areaLesson 12 centroid of an area
Lesson 12 centroid of an area
Lawrence De Vera
 
Javascript Array map method
Javascript Array map methodJavascript Array map method
Javascript Array map method
tanerochris
 
Chapter 3 - Part 1 [Autosaved].pptx
Chapter 3 - Part 1 [Autosaved].pptxChapter 3 - Part 1 [Autosaved].pptx
Chapter 3 - Part 1 [Autosaved].pptx
Kokebe2
 
Computer Graphics End Semester Question Paper
Computer Graphics End Semester Question PaperComputer Graphics End Semester Question Paper
Computer Graphics End Semester Question Paper
Ansuman Mahapatra
 
INTRODUCTION TO MATLAB session with notes
  INTRODUCTION TO MATLAB   session with  notes  INTRODUCTION TO MATLAB   session with  notes
INTRODUCTION TO MATLAB session with notes
Infinity Tech Solutions
 
Rosser's theorem
Rosser's theoremRosser's theorem
Rosser's theorem
Wathna
 
Solution of matlab chapter 3
Solution of matlab chapter 3Solution of matlab chapter 3
Solution of matlab chapter 3
AhsanIrshad8
 
Department of MathematicsMTL107 Numerical Methods and Com.docx
Department of MathematicsMTL107 Numerical Methods and Com.docxDepartment of MathematicsMTL107 Numerical Methods and Com.docx
Department of MathematicsMTL107 Numerical Methods and Com.docx
salmonpybus
 
A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...A common fixed point theorem for two random operators using random mann itera...
A common fixed point theorem for two random operators using random mann itera...
Alexander Decker
 
BUilt in Functions and Simple programs in R.pdf
BUilt in Functions and Simple programs in R.pdfBUilt in Functions and Simple programs in R.pdf
BUilt in Functions and Simple programs in R.pdf
karthikaparthasarath
 
Econometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions ManualEconometric Analysis 8th Edition Greene Solutions Manual
Econometric Analysis 8th Edition Greene Solutions Manual
LewisSimmonss
 
me310_6_interpolation.pdf for numerical method
me310_6_interpolation.pdf for numerical methodme310_6_interpolation.pdf for numerical method
me310_6_interpolation.pdf for numerical method
kedirabdisa61
 
10 Coordinate Geometry Math Concepts .ppt
10 Coordinate Geometry Math Concepts .ppt10 Coordinate Geometry Math Concepts .ppt
10 Coordinate Geometry Math Concepts .ppt
ManojWilson1
 
Matlab-free course by Mohd Esa
Matlab-free course by Mohd EsaMatlab-free course by Mohd Esa
Matlab-free course by Mohd Esa
Mohd Esa
 
Lesson 12 centroid of an area
Lesson 12 centroid of an areaLesson 12 centroid of an area
Lesson 12 centroid of an area
Lawrence De Vera
 
Javascript Array map method
Javascript Array map methodJavascript Array map method
Javascript Array map method
tanerochris
 
Chapter 3 - Part 1 [Autosaved].pptx
Chapter 3 - Part 1 [Autosaved].pptxChapter 3 - Part 1 [Autosaved].pptx
Chapter 3 - Part 1 [Autosaved].pptx
Kokebe2
 
Ad

Recently uploaded (20)

183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag183409-christina-rossetti.pdfdsfsdasggsag
183409-christina-rossetti.pdfdsfsdasggsag
fardin123rahman07
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
LLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bertLLM finetuning for multiple choice google bert
LLM finetuning for multiple choice google bert
ChadapornK
 
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbbEDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
EDU533 DEMO.pptxccccvbnjjkoo jhgggggbbbb
JessaMaeEvangelista2
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Developing Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response ApplicationsDeveloping Security Orchestration, Automation, and Response Applications
Developing Security Orchestration, Automation, and Response Applications
VICTOR MAESTRE RAMIREZ
 
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
CTS EXCEPTIONSPrediction of Aluminium wire rod physical properties through AI...
ThanushsaranS
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptxPerencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
Perencanaan Pengendalian-Proyek-Konstruksi-MS-PROJECT.pptx
PareaRusan
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Digilocker under workingProcess Flow.pptx
Digilocker  under workingProcess Flow.pptxDigilocker  under workingProcess Flow.pptx
Digilocker under workingProcess Flow.pptx
satnamsadguru491
 
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docxMASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
MASAkkjjkttuyrdquesjhjhjfc44dddtions.docx
santosh162
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
Molecular methods diagnostic and monitoring of infection  -  Repaired.pptxMolecular methods diagnostic and monitoring of infection  -  Repaired.pptx
Molecular methods diagnostic and monitoring of infection - Repaired.pptx
7tzn7x5kky
 
C++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptxC++_OOPs_DSA1_Presentation_Template.pptx
C++_OOPs_DSA1_Presentation_Template.pptx
aquibnoor22079
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf4. Multivariable statistics_Using Stata_2025.pdf
4. Multivariable statistics_Using Stata_2025.pdf
axonneurologycenter1
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Ad

Birch Algorithm With Solved Example

  • 1. Solved Example Birch Algorithms Balanced Iterative Reducing And Clustering Using Hierarchies Dr. Kailash Shaw & Dr. Sashikala Mishra Symbiosis International University.
  • 2. Introduction BIRCH (balanced iterative reducing and clustering using hierarchies) is an unsupervised data-mining algorithm used to perform hierarchical-clustering over particularly large data-sets. • The BIRCH algorithm takes as input a set of N data points, represented as real-valued vectors, and a desired number of clusters K. It operates in four phases, the second of which is optional. tree, while removing outliers and grouping crowded subclusters into larger ones. • Phase 1: Load data into memory Scan DB and load data into memory by building a CF tree. If memory is exhausted rebuild the tree from the leaf node. • Phase 2: Condense data Resize the data set by building a smaller CF tree Remove more outliers Condensing is optional • Phase 3: Global clustering Use existing clustering algorithm (e.g. KMEANS, HC) on CF entries • Phase 4: Cluster refining Refining is optional Fixes the problem with CF trees where same valued data points may be assigned to different leaf entries.
  • 3. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, , considering T<1.5, and Max Branch = 2 For each Data Point we need to evaluate Radius and Cluster Feature: ->Consider Data Pint (3,4): As it is alone in the Feature map, Hence 1. Radius = 0 2. Cluster Feature CF1 <N, LS, SS> N = 1 as there is now one data point under consideration. LS = Sum of Data Point under consideration = (3,4) SS = Square Sum of Data Point Under Consideration = (32, 42)=(9,16) 3. Now construct the Leaf with Data Point X1 and Branch as CF1. CF1 <1, (3,4), (9,16)> Leaf X1 = (3, 4)
  • 4. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 For each Data Point we need to evaluate Radius and Cluster Feature: ->Consider Data Pint x2 = (2,6): 1. Linear Sum LS = (3,4) + (2,6) = (5,10) 2. Square Sum SS = (32+22 , 42+62) =(13, 52) Now Evaluate Radius considering N=2 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (13,52)−(5,10)2/2 2 = (13,52)−(25,100)/2 2 = (13,52)−(12.5,50) 2 = 6.5,26 − (6.25,25) = (0.25,1) =(0.5, 1)<T As (0.25,1) < (T, T), hence X2 will cluster with Leaf X1. 2. Cluster Feature CF1 <N, LS, SS> = <2,(5,10),(13,52)> N = 2 as there is now two data point under CF1. LS = (3,4) + (2,6) = (5,10) SS = (32+22 , 42+62) =(13, 52) CF1 <1, (5,10), (13,52)> Leaf X1 = (3, 4), X2 = (2,6)
  • 5. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 For each Data Point we need to evaluate Radius and Cluster Feature: ->Consider Data Pint x3 = (4,5) on CF1: 1. Linear Sum LS = (4,5) + (5,10) = (9,15) 2. Square Sum SS = (42+13 , 52 + 52) =(29, 77) Now Evaluate Radius considering N=3 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (29,77)−(9,15)2/3 3 =(0.47, 0.4714)<T As (0.47, 0.471) < (T, T), hence X3 will cluster with Leaf (X1, x2). 2. Cluster Feature CF1 <N, LS, SS> = <3,(9,15),(29,77)> N = 3 as there is now Three data point under CF1. LS = (4,5) + (5,10) = (9,15) SS = (42+13 , 52 + 52) =(29, 77) CF1 <1, (9,15), (29,77)> Leaf X1 = (3, 4), X2 = (2,6), X3 = (4,5)
  • 6. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 For each Data Point we need to evaluate Radius and Cluster Feature: ->Consider Data Pint x4 = (4,7) on CF1: 1. Linear Sum LS = (4,7) + (9,15) = (13,22) 2. Square Sum SS = (42+29 , 72 + 77) =(45, 126) Now Evaluate Radius considering N=4 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (45,126)−(13,22)2/4 4 =(0.41, 0.55) As (0.41, 0.55) < (T, T), hence X4 will cluster with Leaf (X1, x2, x3). 2. Cluster Feature CF1 <N, LS, SS> = <4,(13,22),(45,126)> N = 4 as there is now four data point under CF1. LS = (4,7) + (9,15) = (13,22) SS = (42+29 , 72 + 77) =(45, 126) CF1 <1, (13,22), (45,126)> Leaf X1 = (3, 4), X2 = (2,6), X3 = (4,5), X4 = (4,7)
  • 7. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 For each Data Point we need to evaluate Radius and Cluster Feature: ->Consider Data Pint x5 = (3,8) on CF1: 1. Linear Sum LS = (3,8) + (13,22) = (16,30) 2. Square Sum SS = (32+45 , 82 + 126) =(54, 190) Now Evaluate Radius considering N=5 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (54,190)−(16,30)2/5 5 =(0.33, 0.63) As (0.33, 0.63) < (T, T), hence X5 will cluster with Leaf (X1, x2, x3, x4). 2. Cluster Feature CF1 <N, LS, SS> = <5,(16,30),(54,190)> N = 5 as there is now four data point under CF1. CF1 <5,(16,30),(54,190)> Leaf X1 = (3, 4), X2 = (2,6), X3 = (4,5), X4 = (4,7) X5 = (3,8)
  • 8. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 For each Data Point we need to evaluate Radius and Cluster Feature: ->Consider Data Pint x6 = (6,2) on CF1: 1. Linear Sum LS = (6,2) + (16,30) = (22,32) 2. Square Sum SS = (62+54 , 22 + 190) =(90, 194) Now Evaluate Radius considering N=6 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (90,194)−(22,32)2/6 6 =(1.24, 1.97) As (1.24, 1.97) < (T, T), False. hence X6 will Not form cluster with CF1. CF1 will remain as it was in previous step. And New CF2 with leaf x6 will be created. 2. Cluster Feature CF2 <N, LS, SS> = <1,(6,2),(36,4)> N = 1 as there is now one data point under CF2. LS = (6,2) SS = (62, 22)= (36,4) CF1 <5,(16,30),(54,190)> CF2 <1,(6,2),(36,4)> Leaf X1 = (3, 4), X2 = (2,6), X3 = (4,5), X4 = (4,7) X5 = (3,8) Leaf X6 = (6, 2),
  • 9. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 For each Data Point we need to evaluate Radius and Cluster Feature: ->Consider Data Pint x7 = (7,2). As There are Two Branch CF1 and CF2 hence we need to find with which branch X7 is nearer, then with that leaf radius will be evaluated. With CF1 = LS/N= (16,30)/5=(8,6) As there are N=5 Data Point With CF2 = LS/N= (6,2)/1=(6,2) As there is N=1 Data Point Now x7 is closer to (6,2) then (8,6). Hence X7 will calculate radius with CF2. 1. Linear Sum LS = (7,2) + (6,2) = (13,4) 2. Square Sum SS = (72+36 , 22 + 4) =(85, 8) Now Evaluate Radius considering N=2 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (85,8)−(13,4)2/2 2 =(0.5, 0) As (0.5, 0) < (T, T), True. hence X7 will form cluster with CF2 2. Cluster Feature CF2 <N, LS, SS> = <2,(13,4),(85,8)> N = 2 as there is now two data point under CF2. LS = (7,2) + (6,2) = (13,4) SS = (72+36 , 22 + 4) =(85, 8) CF1 <5,(16,30),(54,190)> CF2 <2,(13,4),(85,8)> Leaf X1 = (3, 4), X2 = (2,6), X3 = (4,5), X4 = (4,7) X5 = (3,8) Leaf X6 = (6, 2), X7=(7,2)
  • 10. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4), x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 ->Consider Data Pint x8 = (7,4). As There are Two Branch CF1 and CF2 hence we need to find with which branch X8 is nearer, then with that leaf, radius will be evaluated. With CF1 = LS/N= (16,30)/5=(8,6) As there are N=5 Data Point With CF2 = LS/N= (13,4)/2=(6.5,2) As there is N=2 Data Point Now x8 is closer to (6.5,2) then (8,6). Hence X8 will calculate radius with CF2. 1. Linear Sum LS = (7,4) + (13,4) = (20,8) 2. Square Sum SS = (72+85 , 42 + 8) =(134, 24) Now Evaluate Radius considering N=3 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (134,24)−(20,8)2/3 3 =(0.47, 0.94) As (0.47, 94) < (T, T), True. hence X8 will form cluster with CF2 2. Cluster Feature CF2 <N, LS, SS> = <3,(20,8),(134,24)> N = 3 as there is now two data point under CF2. LS (7,4) + (13,4) = (20,8) SS = (134,24) CF1 <5,(16,30),(54,190)> CF2 <3,(20,8),(134,24)> Leaf X1 = (3, 4), X2 = (2,6), X3 = (4,5), X4 = (4,7) X5 = (3,8) Leaf X6 = (6, 2), X7=(7,2) , X8 = (7,4)
  • 11. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4) , x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 ->Consider Data Pint x9 = (8,4). As There are Two Branch CF1 and CF2 hence we need to find with which branch X9 is nearer, then with that leaf, radius will be evaluated. With CF1 = LS/N= (16,30)/5=(8,6) As there are N=5 Data Point With CF2 = LS/N= (20,8)/3=(6.6,2.6) As there is N=3 Data Point Now x9 is closer to (6.6,2.6) then (8,6). Hence X8 will calculate radius with CF2. 1. Linear Sum LS = (8,4) + (20,8) = (28,12) 2. Square Sum SS = (82+134 , 42 + 24) =(198, 40) Now Evaluate Radius considering N=4 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (198,40)−(28,12)2/4 4 =(0.70, 1) As (0.7, 1) < (T, T), True. hence X9 will form cluster with CF2 2. Cluster Feature CF2 <N, LS, SS> = <4,(28,12),(198,40)> N = 4 as there is now four data point under CF2. LS = (28,12) SS = (198,40) CF1 <5,(16,30),(54,190)> CF2 <4,(28,12),(198,40)> Leaf X1 = (3, 4), X2 = (2,6), X3 = (4,5), X4 = (4,7) X5 = (3,8) Leaf X6 = (6, 2), X7=(7,2) , X8 = (7,4), X9 = (8,4)
  • 12. Example Let Have Following Data X1=(3,4), x2= (2,6), x3=(4,5), x4=(4,7), x5=(3,8), x6=(6,2), x7=(7,2), x8=(7,4), x9=(8,4) , x10=(7,9) Cluster the Above Data Using BIRCH Algorithm, considering T<1.5, and Max Branch = 2 ->Consider Data Pint x10 = (7,9). As There are Two Branch CF1 and CF2 hence we need to find with which branch X9 is nearer, then with that leaf, radius will be evaluated. With CF1 = LS/N= (16,30)/5=(8,6) As there are N=5 Data Point With CF2 = LS/N= (28,12)/4=(7,3) As there is N=4 Data Point Now x10 is closer to (8,6) then (7,3). Hence X10 will calculate radius with CF1. 1. Linear Sum LS = (7,9) + (16,30) = (23,39) 2. Square Sum SS = (72+54 , 92 + 190) =(103, 271) Now Evaluate Radius considering N=6 𝑅 = 𝑆𝑆−𝐿𝑆2/𝑁 𝑁 = (103,271)−(23,39)2/6 6 =(1.57, 1.70) As (1.57, 1.70) < (T, T), False. hence X10 will become new leaf and Create new cluster feature CF3. But in a Branch only two CF is allowed hence Branch will Split. 2. Cluster Feature CF3 <N, LS, SS> = <1,(7,9),(49,81)> CF1 <5,(16,30),(54,190)> CF2 <4,(28,12),(198,40)> Leaf X1 = (3, 4), X2 = (2,6), X3 = (4,5), X4 = (4,7) X5 = (3,8) Leaf X6 = (6, 2), X7=(7,2) , X8 = (7,4), X9 = (8,4) CF12 <9,(44,42),(252,230)> CF3 <1,(7,9),(49,81)> CF3 <1,(7,9),(49,81)> Leaf X10= (7,9)