Stochastic block model: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access=subscription updated in citation with #oabot.
 
(25 intermediate revisions by 14 users not shown)
Line 1:
{{Short description|Concept in network science}}
The '''stochastic block model''' is a [[generative model]] for random [[Graph (discrete mathematics)|graphs]]. This model tends to produce graphs containing ''communities'', subsets characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. Its mathematical formulation has been firstly introduced in 1983 in the field of social network by Holland et al.<ref name="hol" /> The stochastic block model is important in [[statistics]], [[machine learning]], and [[network science]], where it serves as a useful benchmark for the task of recovering [[community structure]] in graph data.
{{Network science}}
 
The '''stochastic [[Blockmodel|block model]]''' is a [[generative model]] for random [[Graph (discrete mathematics)|graphs]]. This model tends to produce graphs containing ''communities'', subsets of nodes characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. Its mathematical formulation haswas been firstlyfirst introduced in 1983 in the field of social network analysis by [[Paul W. Holland]] et al.<ref name="hol" /> The stochastic block model is important in [[statistics]], [[machine learning]], and [[network science]], where it serves as a useful benchmark for the task of recovering [[community structure]] in graph data.
 
== Definition ==
Line 15 ⟶ 18:
The ''planted partition model'' is the special case that the values of the probability matrix <math>P</math> are a constant <math>p</math> on the diagonal and another constant <math>q</math> off the diagonal. Thus two vertices within the same community share an edge with probability <math>p</math>, while two vertices in different communities share an edge with probability <math>q</math>. Sometimes it is this restricted model that is called the stochastic block model. The case where <math>p > q</math> is called an ''assortative'' model, while the case <math>p < q</math> is called ''disassortative''.
 
Returning to the general stochastic block model, a model is called ''strongly assortative'' if <math>P_{ii} > P_{jk}</math> whenever <math>j \neq k</math>: all diagonal entries dominate all off-diagonal entries. A model is called ''weakly assortative'' if <math>P_{ii} > P_{ij}</math> whenever <math>i \neq j</math>: each diagonal entry is only required to dominate the rest of its own row and column.<ref name="al14" /> ''Disassortative'' forms of this terminology exist, by reversing all inequalities. AlgorithmicFor some algorithms, recovery ismight oftenbe easier againstfor block models with assortative or disassortative conditions of this form.<ref name="al14" />
 
== Typical statistical tasks ==
Line 49 ⟶ 52:
 
== Topic models ==
Stochastic Blockblock Modelmodel have been recognised to be a [[topic model]] on bipartite networks.<ref name="gerlachnetwork">{{cite journal
| author1 = Martin Gerlach |author2=Tiago Peixoto |author3=Eduardo Altmann
| title = A network approach to topic models
Line 55 ⟶ 58:
| journal=Science Advances
| date = 2018 |doi = 10.1126/sciadv.aaq1360 | volume=4 | issue=7
|pmid=30035215 |pmc=6051742 |arxiv=1708.01677 |bibcode=2018SciA....4.1360G }}</ref> In a network of documents and words, Stochastic Blockblock Modelmodel can identify topics: group of words with a similar meaning.
 
== Extensions to signed graphs ==
Signed graphs allow for both favorable and adverse relationships and serve as a common model choice for various data analysis applications, e.g., correlation clustering. The stochastic block model can be trivially extended to signed graphs by assigning both positive and negative edge weights or equivalently using a difference of adjacency matrices of two stochastic block models.
<ref>{{cite book
| author1=Alyson Fox |author2=Geoffrey Sanders |author3=Andrew Knyazev
|title=2018 IEEE High Performance extreme Computing Conference (HPEC) |chapter=Investigation of Spectral Clustering for Signed Graph Matrix Representations | date = 2018 |pages=1–7 |doi = 10.1109/HPEC.2018.8547575|osti=1476177 |isbn=978-1-5386-5989-2 |s2cid=54443034 }}</ref>
 
== DARPA/MIT/AWS Graph Challenge: streaming stochastic block partition ==
GraphChallenge<ref>[http://graphchallenge.mit.edu] {{Webarchive|url=https://web.archive.org/web/20230204160402/http://graphchallenge.mit.edu/|date=2023-02-04}} DARPA/MIT/AWS Graph Challenge</ref> encourages community approaches to developing new solutions for analyzing graphs and sparse data derived from social media, sensor feeds, and scientific data to enable relationships between events to be discovered as they unfold in the field. Streaming stochastic block partition is one of the challenges since 2017.
<ref>[http://graphchallenge.mit.edu/champions] {{Webarchive|url=https://web.archive.org/web/20230204160403/http://graphchallenge.mit.edu/champions|date=2023-02-04}} DARPA/MIT/AWS Graph Challenge Champions</ref> [[Spectral clustering]] has demonstrated outstanding performance compared to the original and even improved<ref>{{cite book
| author1 = A. J. Uppal |author2 = J. Choi |author3 = T. B. Rolinger |author4 = H. Howie Huang
|title = 2021 IEEE High Performance Extreme Computing Conference (HPEC) |chapter = Faster Stochastic Block Partition Using Aggressive Initial Merging, Compressed Representation, and Parallelism Control | date = 2021 |pages = 1–7 |doi = 10.1109/HPEC49654.2021.9622836|isbn = 978-1-6654-2369-4 |s2cid = 244780210 }}</ref>
base algorithm, matching its quality of clusters while being multiple orders of magnitude faster.<ref>{{cite book
| author1 = David Zhuzhunashvili |author2 = Andrew Knyazev
|title = 2017 IEEE High Performance Extreme Computing Conference (HPEC)
|chapter = Preconditioned spectral clustering for stochastic block partition streaming graph challenge (Preliminary version at arXiv.)
| date = 2017 |pages = 1–6
|doi = 10.1109/HPEC.2017.8091045|arxiv = 1708.07481
|isbn = 978-1-5386-3472-1
|s2cid = 19781504
}}</ref>
<ref>{{cite book
| author1 = Lisa Durbeck |author2 = Peter Athanas
|title = 2020 IEEE High Performance Extreme Computing Conference (HPEC)
|chapter = Incremental Streaming Graph Partitioning
| date = 2020 |pages = 1–8
|doi = 10.1109/HPEC43674.2020.9286181|isbn = 978-1-7281-9219-2
|s2cid = 229376193
}}</ref>
 
==See also==
* [[blockmodeling]]
* {{annotated link|Girvan–Newman algorithm}}
* {{annotated link|Lancichinetti–Fortunato–Radicchi benchmark}} for generating benchmark networks with communities
Line 70 ⟶ 103:
title = Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications |
arxiv = 1109.3041 |
date = September 2011 | doi=10.1103/PhysRevE.84.066106 | volume=84 | journal=Physical Review E| issue = 6 |bibcode = 2011PhRvE..84f6106D | pmid=22304154 | page=066106| s2cid = 15788070 }}</ref>
<ref name="abh14">{{cite arXiv| last1 = Abbe| first1 = Emmanuel| last2 = Bandeira| first2 = Afonso S.| last3 = Hall| first3 = Georgina| title = Exact Recovery in the Stochastic Block Model|eprint= 1405.3267| date = May 2014 | class = cs.SI}}</ref>
<ref name="as15a">{{cite arXiv| last1 = Abbe| first1 = Emmanuel| last2 = Sandon| first2 = Colin| title = Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms|eprint= 1503.00609| date = March 2015 | class = math.PR}}</ref>
<ref name="as15b">{{cite arXiv| last1 = Abbe| first1 = Emmanuel| last2 = Sandon| first2 = Colin| title = Recovering communities in the general stochastic block model without knowing the parameters|eprint= 1506.03729| date = June 2015 | class = math.PR}}</ref>
<ref name="lr15">{{Cite journal| doi = 10.1214/14-AOS1274| issn = 0090-5364| volume = 43| issue = 1| pages = 215–237| last1 = Lei| first1 = Jing| last2 = Rinaldo| first2 = Alessandro| title = Consistency of spectral clustering in stochastic block models| journal = The Annals of Statistics| date = February 2015 | arxiv = 1312.2050| s2cid = 88519551}}</ref>
<ref name="gbm">{{Cite journal| last1 = Galhotra| first1 = Sainyam| last2 = Mazumdar| first2 = Arya| last3 = Pal| first3 = Soumyabrata|
last4 = Saha| first4 = Barna|author4-link=Barna Saha| title = The Geometric Block Model| journal = AAAI| date = February 2018 | volume = 32| doi = 10.1609/aaai.v32i1.11905| arxiv = 1709.05510| s2cid = 19152144}}</ref>
<ref name="krzakala-pnas">{{cite journal|
last1 = Krzakala| first1 = Florent|
Line 85 ⟶ 118:
last6 = Lenka| first6 = Lenka|
last7 = Zhang| first7 = Pan|
title = Spectral redemption in clustering sparse networks | journal = Proceedings of the National Academy of Sciences| date = October 2013 | doi = 10.1073/pnas.1312486110 |arxiv = 1306.5550|bibcode = 2013PNAS..11020935K | volume=110 | issue = 52| pages=20935–20940| pmid = 24277835| pmc = 3876200| doi-access = free}}</ref>
<ref name="mns12">{{cite arXiv| last1 = Mossel| first1 = Elchanan| last2 = Neeman| first2 = Joe| last3 = Sly| first3 = Allan| title = Stochastic Block Models and Reconstruction|eprint= 1202.1499| date = February 2012 | class = math.PR}}</ref>
<ref name="mns13">{{cite journal| last1 = Mossel| first1 = Elchanan| last2 = Neeman| first2 = Joe| last3 = Sly| first3 = Allan| title = Belief Propagation, Robust Reconstruction, and Optimal Recovery of Block Models| journal = The Annals of Applied Probability|arxiv= 1309.1380| date = September 2013 | volume = 26| issue = 4| pages = 2211–2256| doi = 10.1214/15-AAP1145| bibcode = 2013arXiv1309.1380M| s2cid = 184446}}</ref>
<ref name="mas13">{{cite arXiv| last = Massoulie| first = Laurent| title = Community detection thresholds and the weak Ramanujan property|eprint= 1311.3085| date = November 2013 | class = cs.SI}}</ref>
<ref name="al14">{{cite arXiv| last1 = Amini| first1 = Arash A.| last2 = Levina| first2 = Elizaveta|author2-link= Elizaveta Levina| title = On semidefinite relaxations for the block model|eprint= 1406.5647| date = June 2014 | class = cs.LG}}</ref>
Line 97 ⟶ 130:
title = Mixed membership stochastic blockmodels| journal = Journal of Machine Learning Research |arxiv = 0705.4485| date = May 2007 | volume = 9| pages = 1981–2014| pmid = 21701698| pmc = 3119541| bibcode = 2007arXiv0705.4485A}}</ref>
<ref name="fat19">{{cite arXiv| last = Fathi| first = Reza| title = Efficient Distributed Community Detection in the Stochastic Block Model|eprint= 1904.07494| date = April 2019 | class = cs.DC}}</ref>
<ref name="hol">{{cite journal |title=Stochastic blockmodels: First steps |journal=[[Social Networks]] |year=1983 |lastlast1=Holland |firstfirst1=Paul W |last2=Laskey |first2=Kathryn Blackmond |last3=Leinhardt |first3=Samuel |volume=5 |issue=2 |pages=109-137109–137 |issn=0378-8733 |doi=10.1016/0378-8733(83)90021-7 |s2cid=34098453 |url=https://doi.org/10.1016/0378-8733(83)90021-7 |accessdate=2021-06-16 |archive-date=2023-02-04 |archive-url=https://web.archive.org/web/20230204160405/https://www.sciencedirect.com/science/article/abs/pii/0378873383900217?via%3Dihub |url-status=live |url-access=subscription }}</ref>
<ref name="ker">{{cite journal |title=Stochastic blockmodels and community structure in networks |journal=Physical Review E |year=2011 |lastlast1=Karrer |firstfirst1=Brian |last2=Newman |first2=Mark E J |volume=83 |issue=1 |page=016107 |doi=10.1103/PhysRevE.83.016107 |pmid=21405744 |url=https://link.aps.org/doi/10.1103/PhysRevE.83.016107 |accessdate=2021-06-16 |arxiv=1008.3926 |bibcode=2011PhRvE..83a6107K |s2cid=9068097 |archive-date=2023-02-04 |archive-url=https://web.archive.org/web/20230204160406/https://journals.aps.org/pre/abstract/10.1103/PhysRevE.83.016107 |url-status=live }}</ref>
<ref name="pei">{{cite journal |title=Hierarchical block structures and high-resolution model selection in large networks |journal=Physical Review X |year=2014 |last=Peixoto |first=Tiago |volume=4 |issue=1 |page=011047 |doi=10.1103/PhysRevX.4.011047 |url=https://journals.aps.org/prx/abstract/10.1103/PhysRevX.4.011047 |accessdate=2021-06-16 |arxiv=1310.4377 |bibcode=2014PhRvX...4a1047P |s2cid=5841379 |archive-date=2021-06-24 |archive-url=https://web.archive.org/web/20210624195430/https://journals.aps.org/prx/abstract/10.1103/PhysRevX.4.011047 |url-status=live }}</ref>
}}
 
[[Category:Machine learning]]
[[Category:Random graphs]]
[[Category:Networks]]
[[Category:Machine learningBlockmodeling]]