SlideShare a Scribd company logo
Analysing Online Social Network
  Data with Biclustering and
          Triclustering
Alexander Semenov1        Dmitry Ignatov1
 Dmitry Gnatyshak1       Jonas Poelmans2,1

   1NRU   Higher School of Economics, Russia
              2KU Leuven, Belgium
Motivation I
• There are large amount of network data that can
  be represented as bipartite and tripartite graphs
• Standard techniques like maximal bicliques
  search result in huge number of patterns (in the
  worst case exponential w.r.t. of input size)…
• Therefore we need some relaxation of this notion
  and good measures of interestingness of biclique
  communities
Motivation II
• Applied lattice theory provide us with a notion of formal concept which is
  the same thing as biclique
• L. C. Freeman, D. R. White. Using Galois Lattices to Represent Network
  Data Sociological Methodology 1993 (23).
• Social Networks 18(3), 1996
    – L. C. Freeman, Cliques, Galois Lattices, and the Structure of Human Social
      Groups.
    – V. Duquenne, Lattice analysis and the representation of handicap
      associations.
    – D. R. White. Statistical entailments and the Galois lattice.
• J.W. Mohr, Vincent D. The duality of culture and practice: Poverty relief in
  New York City, 1888—1917 Theory and Society, 1997
• Camille Roth et al., Towards Concise Representation for Taxonomies of
  Epistemic Communities, CLA 4th Intl Conf on Concept Lattices and their
  Applications, 2006
• And many other papers on application to social network analysis with FCA
Motivation III
• Concept-based bicluster (Ignatov et al., 2010) is a
  scalable approximation of a formal concept
  (biclique)
   – Less number of patterns to analyze
   – Less computational time (polynomial vs exp.)
   – Manual tuning of bicluster (community) density
     threshold
   – Tolerance to missing (object, attribute) pairs
• For analyzing three-way network data like
  folksonomies we proposed triclustering
  (Ignatov et al., 2011)
Formal Concept Analysis
           [Wille, 1982, Ganter & Wille, 1999]

Definition 1. Formal Context is a triple (G, M, I ), where G is a
 set of (formal) objects, M is a set of (formal) attributes, and I
   G M is the incidence relation which shows that object g
 G posseses an attribute m M.

Example
             Car           House         Laptop        Bicycle
Kate         x                                         x
Mike         x                           x
Alex                       x             x
David                      x             x             x
Formal Concept Analysis
 Definition 2. Derivation operators (defining Galois connection)
 AI := { m M | gIm for all g       A } is the set of attributes common to all
      objects in A
 BI := { g G | gIm for all m   B } is the set of objects that have all
      attributes from B

Example

        Car    House   Laptop Bicycle        {Kate, Mike}I = {Car}
Kate    x                      x
                                             {Laptop} I = {Mike, Alex, David}
Mike    x              x
Alex           x       x                     {Car, House} I = {}G
David          x       x       x             {} IG =M
Formal Concept Analysis
Definition 3. (A, B) is a formal concept of (G, M, I) iff
                    A G, B M, AI = B, and BI = A .
A is the extent and B is the intent of the concept (A, B).
B (G, M , I ) is a set of all concepts of the context (G, M, I)


Example                                           •   A pair ({Kate, Mike},{Car}) is a
                                                      formal concept
        Car    House     Laptop Bicycle
Kate    x                         x               •   ({Alex, David} ,{Laptop}) doesn‘t
                                                      form a formal concept, because
Mike    x                x                            {Laptop} I {Alex, David}
Alex           x         x                        •   ({Alex, David} {House, Laptop})
David          x         x        x                   is a formal concept
FCA and Graphs
        a   b    c   d
Kate    x            x
Mike    x        x
Alex        x    x
David       x    x   x



   Formal Context        Bipartite graph
   Formal Cocept            Biclique
 (maximal rectangle)
Formal Concept Analysis
 Definition 4. A formal concept (A,B) is said to be more general than (C,B), that
    is (A,B) (C,D) iff A C (equivalently D B)
 The set of all concepts of the context (G, M, I) ordered by relation forms a
   complete lattice B (G, M , I ) called concept lattice (Galois lattice).


Example

        Car    House   Laptop Bicycle      ({Alex, David, Mike} ,{Laptop})
Kate    x                     x            is more general than concept
Mike    x              x
                                           ({Alex, David} {House, Laptop})
Alex           x       x
David          x       x      x
Concept Lattice Diagram




        a   b   c   d   a - Car
Kate    x           x   b - House
Mike    x       x
                        c - Laptop
Alex        x   x
David       x   x   x   d - Bicycle
Biclustering


Geometrical inerpretation
Biclustering Example
                           Car   House    Laptop Bicycle
                   Kate    x                       x
                   Mike    x              x
                   Alex          x        x
                   David         x        x        x



Since (House, David) is in the context
(HouseI, DavidI)= ({Alex, David}, {House,Laptop, Bicycle})
 (HouseI, DavidI)=5/6
Biclustering properties
• Number of all biclusters for a context (G,M,I)
  not greater than |I| vs 2min{|G|,|M|} formal
  concepts. Usually |I| « 2min{|G|,|M|}, especially
  for sparse contexts.
• Probably dense biclusters ( (bicluster) min)
  are good representation of communities,
  because all users inside the extent of every
  dense bicluster have almost all interests from
  its intent.
Triadic FCA and Folksonomies
 Definition 1. Triadic Formal Context is a quadruple (G, M, B, Y ), where G is a
   set of (formal) objects, M is a set of (formal) attributes, B is a set of conditions,
   and Y G M B is the incidence relation which shows that object g G
   posseses an attribute m M under condition.


Example. Folksonomy
as triadic context (U, T, R, Y),
where
U is a set of users
T is a set of tags
R is a set of resources
Concept forming operators in triadic case




To define triclusters we propose box operators
Triclustering
              [Ignatov et al., 2011]




One dense tricluster VS 33 =27 formal triconcepts
Pseudo Triclustering for Social
          Networks
Algorithm
Algorithm
Data
Pseudo-triclustering algorithm was tested on the data of
Vkontakte, Russian social networking site. Student of two major
technical and two universities for humanities and sociology were
considered:


                               Bauman   MIPT    RSUH    RSSU
        # users                18542    4786    10266   12281
        # interests             8118    2593    5892    3733
        # groups               153985   46312   95619   102046
Biclustering results
                  Bauman                                   MIPT                                      RSUH                                   RSSU

             UI               UG                  UI                    UG                  UI                 UG                  UI                 UG

      Time        #    Time        #       Time        #         Time        #       Time        #      Time        #       Time        #      Time        #

0.0    9188       8863 1874458 248077        863       2492 109012           46873    3958       5293 519772 116882          2588       4014 693658 145086

0.1    8882       8331 1296056 173786        827       2401      91187       38226    3763       4925 419145        93219    2450       3785 527135 110964

0.2    8497       6960 966000 120075         780       2015      74498       28391    3656       4003 330371        68709    2369       3220 402159        79802

0.3    8006       5513 788008      85227     761       1600      63888       21152    3361       3123 275394        50650    2284       2612 332523        58321

0.4    7700       4308 676733      59179     705       1270      56365       15306    3252       2399 232154        35434    2184       2037 281164        40657

0.5    7536       3777 654047      53877     668       1091      54868       13828    3189       2087 224808        32578    2179       1782 270605        37244

0.6    7324       2718 522110      18586     670           775   44850        5279    3075       1367 174657        10877    2159       1264 211897        12908

0.7    7250       2409 511711      15577     743           676   43854        4399    3007       1224 171554         9171    2084       1109 208632        10957

0.8    7217       2326 508368      14855     663           654   43526        4215    3032       1188 170984         8742    2121       1081 209084        10503

0.9    7246       2314 507983      14691     669           647   43216        4157    2985       1180 174781         8649    2096       1072 206902        10422

1.0    7236       2309 511466      14654     669           647   43434        4148    3057       1177 173240         8635    2086       1068 207198        10408
Pseudo triclustering results
           Bauman                   MIPT                      RSUH                    RSSU
      Time, ms   Count        Time, ms     Count        Time, ms   Count        Time, ms     Count

0.0    3353426    230161         77562       24852        256801     35275        183595       55338
0.1      76758      10928        35137        5969         62736      5679         18725        5582
0.2      80647      8539         31231        4908         58695      5089         16466        3641
0.3      77956      6107         27859        3770         53789      3865         17448        2772
0.4      60929           31       2060             12       9890           14      13585             12
0.5      66709           24       2327             10       9353           14      12776             10
0.6      57803           22       2147             8       11352           14      12268             10
0.7      68361           18       2333             8       10778           12      13819             4
0.8      70948           18       2256             8        9489           12      13725             4
0.9      65527           18       1942             8       10769           12      11705             4
1.0      65991           18       1971             8       10763           12      13263             4
Pseudo triclustering results
                              Bauman                                                    MIPT
1000000                                                          100000

100000
                                                                  10000
  10000
                                                                   1000
   1000

    100                                                             100

     10                                                              10
      1
                                                                      1
              0       0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9    1
                                                                          0   0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9       1

                                RSUH                                                   RSSU
  100000                                                         100000

   10000                                                          10000

    1000                                                           1000

     100                                                            100

      10                                                             10

          1                                                           1
                  0    0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9   1            0   0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9   1
Examples. Biclusters
• =83,33%         Gen. pair: {3609, home}
G: {3609, 4566} M: {family, work, home}
• =83,33%        Gen. pair: {30568, orthodox
  church}
G: {25092, 30568}      M:
{music, monastery, orthodox church}
• =100% Gen. pair: {4220, beauty}
G: {1269, 4220, 5337, 20787} M: {love, beauty}
Examples. Tricluster
Conclusion
• It is possible to use pseudo-triclustering method
  for tagging groups by interests in social
  networking sites and finding tricommunities.
  E.g., if we have found a dense pseudo-trciluster
  (Users, Groups, Interests) we can mark Groups by
  user intersts from Interests.
• It also make sense to use biclusters and tricluster
  for making recommendations. Missing pairs and
  triples seem to be good candidates to
  recommend potentionaly interesting
  users, groups and interests.
Conclusion
• The approach needs some improvements and fine tune
  in order to increase the scalability and quality of
  communities
   – Strategies for approximate density calculation
   – Choosing a good thresholds for n-clusters density and
     communities similarity
   – More sophisticated quality measures like recall and
     precision in Information Retrieval
• It needs comparison with other approaches like iceberg
  lattices (Stumme), stable concepts (Kuznetsov), fault-
  tolerant concepts (Boulicaut) and different n-clustering
  techniques from bioinformatics (Zaki, Mirkin, etc.)
• Current version also requires expert’s feedback on the
  output data analysis and interpretation
Questions?

More Related Content

Similar to Analysing Online Social Network Data with Biclustering and Triclustering (20)

PDF
Nl2422102215
IJERA Editor
 
PDF
Mathematics of incidence (part 2): formal concepts and formal concept lattices
Benjamin Keller
 
PDF
On the Mining of Numerical Data with Formal Concept Analysis
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
PPTX
Collaborative Similarity Measure for Intra-Graph Clustering
Waqas Nawaz
 
PPT
Introduction to Artificial Intelligence
Manoj Harsule
 
PDF
Smqa unit ii
Manoj Patil
 
PDF
2red reviewrelation
sandeepreddy3754
 
PDF
Ontology Engineering SSSC2009
Elena Simperl
 
PPTX
Big data
Waqas Nawaz
 
PDF
Cluster Analysis
SSA KPI
 
PPTX
community detection I.pptx
MiroslavPulgarCorrot1
 
PDF
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
PDF
Characterizing and mining numerical patterns, an FCA point of view
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
PPT
Clustering
Shubra Singh
 
PDF
Automated Experimentation in Social Informatics
Aliaksandr Birukou
 
PPT
A Langauge of Patterns for Mathematical Learning
Yishay Mor
 
PDF
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
Roman Elizarov
 
PPTX
Bloom filters
Devesh Maru
 
PDF
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
summersocialwebshop
 
PDF
Managing and benefiting from multi million rule systems
Jeff Long
 
Nl2422102215
IJERA Editor
 
Mathematics of incidence (part 2): formal concepts and formal concept lattices
Benjamin Keller
 
On the Mining of Numerical Data with Formal Concept Analysis
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
Collaborative Similarity Measure for Intra-Graph Clustering
Waqas Nawaz
 
Introduction to Artificial Intelligence
Manoj Harsule
 
Smqa unit ii
Manoj Patil
 
2red reviewrelation
sandeepreddy3754
 
Ontology Engineering SSSC2009
Elena Simperl
 
Big data
Waqas Nawaz
 
Cluster Analysis
SSA KPI
 
community detection I.pptx
MiroslavPulgarCorrot1
 
Lecture 7: Data-Intensive Computing for Text Analysis (Fall 2011)
Matthew Lease
 
Characterizing and mining numerical patterns, an FCA point of view
INSA Lyon - L'Institut National des Sciences Appliquées de Lyon
 
Clustering
Shubra Singh
 
Automated Experimentation in Social Informatics
Aliaksandr Birukou
 
A Langauge of Patterns for Mathematical Learning
Yishay Mor
 
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
Roman Elizarov
 
Bloom filters
Devesh Maru
 
Jana Diesner, "Words and Networks: Considering the Content of Text Data for N...
summersocialwebshop
 
Managing and benefiting from multi million rule systems
Jeff Long
 

Recently uploaded (20)

PDF
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
PDF
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
PDF
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
PDF
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PPTX
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PPTX
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
PDF
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Survival Models: Proper Scoring Rule and Stochastic Optimization with Competi...
Paris Women in Machine Learning and Data Science
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
Manual Testing for Accessibility Enhancement
Julia Undeutsch
 
Bitkom eIDAS Summit | European Business Wallet: Use Cases, Macroeconomics, an...
Carsten Stoecker
 
NASA A Researcher’s Guide to International Space Station : Earth Observations
Dr. PANKAJ DHUSSA
 
Evolution: How True AI is Redefining Safety in Industry 4.0
vikaassingh4433
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
CapCut Pro PC Crack Latest Version Free Free
josanj305
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Talbott's brief History of Computers for CollabDays Hamburg 2025
Talbott Crowell
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
NLJUG Speaker academy 2025 - first session
Bert Jan Schrijver
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
Linux schedulers for fun and profit with SchedKit
Alessio Biancalana
 
Digital Circuits, important subject in CS
contactparinay1
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Ad

Analysing Online Social Network Data with Biclustering and Triclustering

  • 1. Analysing Online Social Network Data with Biclustering and Triclustering Alexander Semenov1 Dmitry Ignatov1 Dmitry Gnatyshak1 Jonas Poelmans2,1 1NRU Higher School of Economics, Russia 2KU Leuven, Belgium
  • 2. Motivation I • There are large amount of network data that can be represented as bipartite and tripartite graphs • Standard techniques like maximal bicliques search result in huge number of patterns (in the worst case exponential w.r.t. of input size)… • Therefore we need some relaxation of this notion and good measures of interestingness of biclique communities
  • 3. Motivation II • Applied lattice theory provide us with a notion of formal concept which is the same thing as biclique • L. C. Freeman, D. R. White. Using Galois Lattices to Represent Network Data Sociological Methodology 1993 (23). • Social Networks 18(3), 1996 – L. C. Freeman, Cliques, Galois Lattices, and the Structure of Human Social Groups. – V. Duquenne, Lattice analysis and the representation of handicap associations. – D. R. White. Statistical entailments and the Galois lattice. • J.W. Mohr, Vincent D. The duality of culture and practice: Poverty relief in New York City, 1888—1917 Theory and Society, 1997 • Camille Roth et al., Towards Concise Representation for Taxonomies of Epistemic Communities, CLA 4th Intl Conf on Concept Lattices and their Applications, 2006 • And many other papers on application to social network analysis with FCA
  • 4. Motivation III • Concept-based bicluster (Ignatov et al., 2010) is a scalable approximation of a formal concept (biclique) – Less number of patterns to analyze – Less computational time (polynomial vs exp.) – Manual tuning of bicluster (community) density threshold – Tolerance to missing (object, attribute) pairs • For analyzing three-way network data like folksonomies we proposed triclustering (Ignatov et al., 2011)
  • 5. Formal Concept Analysis [Wille, 1982, Ganter & Wille, 1999] Definition 1. Formal Context is a triple (G, M, I ), where G is a set of (formal) objects, M is a set of (formal) attributes, and I G M is the incidence relation which shows that object g G posseses an attribute m M. Example Car House Laptop Bicycle Kate x x Mike x x Alex x x David x x x
  • 6. Formal Concept Analysis Definition 2. Derivation operators (defining Galois connection) AI := { m M | gIm for all g A } is the set of attributes common to all objects in A BI := { g G | gIm for all m B } is the set of objects that have all attributes from B Example Car House Laptop Bicycle {Kate, Mike}I = {Car} Kate x x {Laptop} I = {Mike, Alex, David} Mike x x Alex x x {Car, House} I = {}G David x x x {} IG =M
  • 7. Formal Concept Analysis Definition 3. (A, B) is a formal concept of (G, M, I) iff A G, B M, AI = B, and BI = A . A is the extent and B is the intent of the concept (A, B). B (G, M , I ) is a set of all concepts of the context (G, M, I) Example • A pair ({Kate, Mike},{Car}) is a formal concept Car House Laptop Bicycle Kate x x • ({Alex, David} ,{Laptop}) doesn‘t form a formal concept, because Mike x x {Laptop} I {Alex, David} Alex x x • ({Alex, David} {House, Laptop}) David x x x is a formal concept
  • 8. FCA and Graphs a b c d Kate x x Mike x x Alex x x David x x x Formal Context Bipartite graph Formal Cocept Biclique (maximal rectangle)
  • 9. Formal Concept Analysis Definition 4. A formal concept (A,B) is said to be more general than (C,B), that is (A,B) (C,D) iff A C (equivalently D B) The set of all concepts of the context (G, M, I) ordered by relation forms a complete lattice B (G, M , I ) called concept lattice (Galois lattice). Example Car House Laptop Bicycle ({Alex, David, Mike} ,{Laptop}) Kate x x is more general than concept Mike x x ({Alex, David} {House, Laptop}) Alex x x David x x x
  • 10. Concept Lattice Diagram a b c d a - Car Kate x x b - House Mike x x c - Laptop Alex x x David x x x d - Bicycle
  • 12. Biclustering Example Car House Laptop Bicycle Kate x x Mike x x Alex x x David x x x Since (House, David) is in the context (HouseI, DavidI)= ({Alex, David}, {House,Laptop, Bicycle}) (HouseI, DavidI)=5/6
  • 13. Biclustering properties • Number of all biclusters for a context (G,M,I) not greater than |I| vs 2min{|G|,|M|} formal concepts. Usually |I| « 2min{|G|,|M|}, especially for sparse contexts. • Probably dense biclusters ( (bicluster) min) are good representation of communities, because all users inside the extent of every dense bicluster have almost all interests from its intent.
  • 14. Triadic FCA and Folksonomies Definition 1. Triadic Formal Context is a quadruple (G, M, B, Y ), where G is a set of (formal) objects, M is a set of (formal) attributes, B is a set of conditions, and Y G M B is the incidence relation which shows that object g G posseses an attribute m M under condition. Example. Folksonomy as triadic context (U, T, R, Y), where U is a set of users T is a set of tags R is a set of resources
  • 15. Concept forming operators in triadic case To define triclusters we propose box operators
  • 16. Triclustering [Ignatov et al., 2011] One dense tricluster VS 33 =27 formal triconcepts
  • 17. Pseudo Triclustering for Social Networks
  • 20. Data Pseudo-triclustering algorithm was tested on the data of Vkontakte, Russian social networking site. Student of two major technical and two universities for humanities and sociology were considered: Bauman MIPT RSUH RSSU # users 18542 4786 10266 12281 # interests 8118 2593 5892 3733 # groups 153985 46312 95619 102046
  • 21. Biclustering results Bauman MIPT RSUH RSSU UI UG UI UG UI UG UI UG Time # Time # Time # Time # Time # Time # Time # Time # 0.0 9188 8863 1874458 248077 863 2492 109012 46873 3958 5293 519772 116882 2588 4014 693658 145086 0.1 8882 8331 1296056 173786 827 2401 91187 38226 3763 4925 419145 93219 2450 3785 527135 110964 0.2 8497 6960 966000 120075 780 2015 74498 28391 3656 4003 330371 68709 2369 3220 402159 79802 0.3 8006 5513 788008 85227 761 1600 63888 21152 3361 3123 275394 50650 2284 2612 332523 58321 0.4 7700 4308 676733 59179 705 1270 56365 15306 3252 2399 232154 35434 2184 2037 281164 40657 0.5 7536 3777 654047 53877 668 1091 54868 13828 3189 2087 224808 32578 2179 1782 270605 37244 0.6 7324 2718 522110 18586 670 775 44850 5279 3075 1367 174657 10877 2159 1264 211897 12908 0.7 7250 2409 511711 15577 743 676 43854 4399 3007 1224 171554 9171 2084 1109 208632 10957 0.8 7217 2326 508368 14855 663 654 43526 4215 3032 1188 170984 8742 2121 1081 209084 10503 0.9 7246 2314 507983 14691 669 647 43216 4157 2985 1180 174781 8649 2096 1072 206902 10422 1.0 7236 2309 511466 14654 669 647 43434 4148 3057 1177 173240 8635 2086 1068 207198 10408
  • 22. Pseudo triclustering results Bauman MIPT RSUH RSSU Time, ms Count Time, ms Count Time, ms Count Time, ms Count 0.0 3353426 230161 77562 24852 256801 35275 183595 55338 0.1 76758 10928 35137 5969 62736 5679 18725 5582 0.2 80647 8539 31231 4908 58695 5089 16466 3641 0.3 77956 6107 27859 3770 53789 3865 17448 2772 0.4 60929 31 2060 12 9890 14 13585 12 0.5 66709 24 2327 10 9353 14 12776 10 0.6 57803 22 2147 8 11352 14 12268 10 0.7 68361 18 2333 8 10778 12 13819 4 0.8 70948 18 2256 8 9489 12 13725 4 0.9 65527 18 1942 8 10769 12 11705 4 1.0 65991 18 1971 8 10763 12 13263 4
  • 23. Pseudo triclustering results Bauman MIPT 1000000 100000 100000 10000 10000 1000 1000 100 100 10 10 1 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 RSUH RSSU 100000 100000 10000 10000 1000 1000 100 100 10 10 1 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
  • 24. Examples. Biclusters • =83,33% Gen. pair: {3609, home} G: {3609, 4566} M: {family, work, home} • =83,33% Gen. pair: {30568, orthodox church} G: {25092, 30568} M: {music, monastery, orthodox church} • =100% Gen. pair: {4220, beauty} G: {1269, 4220, 5337, 20787} M: {love, beauty}
  • 26. Conclusion • It is possible to use pseudo-triclustering method for tagging groups by interests in social networking sites and finding tricommunities. E.g., if we have found a dense pseudo-trciluster (Users, Groups, Interests) we can mark Groups by user intersts from Interests. • It also make sense to use biclusters and tricluster for making recommendations. Missing pairs and triples seem to be good candidates to recommend potentionaly interesting users, groups and interests.
  • 27. Conclusion • The approach needs some improvements and fine tune in order to increase the scalability and quality of communities – Strategies for approximate density calculation – Choosing a good thresholds for n-clusters density and communities similarity – More sophisticated quality measures like recall and precision in Information Retrieval • It needs comparison with other approaches like iceberg lattices (Stumme), stable concepts (Kuznetsov), fault- tolerant concepts (Boulicaut) and different n-clustering techniques from bioinformatics (Zaki, Mirkin, etc.) • Current version also requires expert’s feedback on the output data analysis and interpretation