Content deleted Content added
CoralisTree (talk | contribs) Added an example problem in the definition of the model |
m Open access bot: url-access=subscription updated in citation with #oabot. |
||
(46 intermediate revisions by 28 users not shown) | |||
Line 1:
{{Short description|Concept in network science}}
The '''stochastic block model''' is a [[generative model]] for random [[Graph (discrete mathematics)|graphs]]. This model tends to produce graphs containing ''communities'', subsets characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. The stochastic block model is important in [[statistics]], [[machine learning]], and [[network science]], where it serves as a useful benchmark for the task of recovering [[community structure]] in graph data.▼
{{Network science}}
▲The '''stochastic [[Blockmodel|block model]]''' is a [[generative model]] for random [[Graph (discrete mathematics)|graphs]]. This model tends to produce graphs containing ''communities'', subsets of nodes characterized by being connected with one another with particular edge densities. For example, edges may be more common within communities than between communities. Its mathematical formulation was first introduced in 1983 in the field of social network analysis by [[Paul W. Holland]] et al.<ref name="hol" /> The stochastic block model is important in [[statistics]], [[machine learning]], and [[network science]], where it serves as a useful benchmark for the task of recovering [[community structure]] in graph data.
== Definition ==
Line 7 ⟶ 10:
* a symmetric <math>r \times r</math> matrix <math>P</math> of edge probabilities.
The edge set is then sampled at random as follows: any two vertices <math>u \in C_i</math> and <math>v \in C_j</math> are connected by an edge with probability <math>P_{ij}</math>. An example problem is: given a graph with <math>n</math> vertices, where the edges are sampled as described, recover the groups <math>C_1,\ldots,C_r</math>.
== Special cases ==
[[File:Assortative Case of the SBM.png|thumb|149x149px|An example of the assortative case for the stochastic block model.]]
If the probability matrix is a constant, in the sense that <math>P_{ij} = p</math> for all <math>i,j</math>, then the result is the [[Erdős–Rényi model]] <math>G(n,p)</math>. This case is degenerate—the partition into communities becomes irrelevant—but it illustrates a close relationship to the Erdős–Rényi model.
The ''planted partition model'' is the special case that the values of the probability matrix <math>P</math> are a constant <math>p</math> on the diagonal and another constant <math>q</math> off the diagonal. Thus two vertices within the same community share an edge with probability <math>p</math>, while two vertices in different communities share an edge with probability <math>q</math>. Sometimes it is this restricted model that is called the stochastic block model. The case where <math>p > q</math> is called an ''assortative'' model, while the case <math>p < q</math> is called ''
Returning to the general stochastic block model, a model is called ''strongly assortative'' if <math>P_{ii} > P_{jk}</math> whenever <math>j \neq k</math>: all diagonal entries dominate all off-diagonal entries. A model is called ''weakly assortative'' if <math>P_{ii} > P_{ij}</math> whenever <math>i \neq j</math>: each diagonal entry is only required to dominate the rest of its own row and column.<ref name="al14" /> ''
== Typical statistical tasks ==
Line 29 ⟶ 33:
== Statistical lower bounds and threshold behavior ==
Stochastic block models exhibit a sharp
For partial recovery, the appropriate scaling is to take <math>P_{ij} = \tilde P_{ij} / n</math> for fixed <math>\tilde P</math>, resulting in graphs of constant average degree. In the case of two equal-sized communities, in the assortative planted partition model with probability matrix
Line 42 ⟶ 46:
In principle, exact recovery can be solved in its feasible range using [[maximum likelihood]], but this amounts to solving a constrained or [[Regularization (mathematics)|regularized]] cut problem such as minimum bisection that is typically [[NP-complete]]. Hence, no known efficient algorithms will correctly compute the maximum-likelihood estimate in the worst case.
However, a wide variety of algorithms perform well in the average case, and many high-probability performance guarantees have been proven for algorithms in both the partial and exact recovery settings. Successful algorithms include [[spectral clustering]] of the vertices,<ref name="krzakala-pnas"/><ref name="mas13" /><ref name="as15a" /><ref name="lr15" /> [[semidefinite programming]],<ref name="al14" /><ref name="abh14" />
== Variants ==
Several variants of the model exist. One minor tweak allocates vertices to communities randomly, according to a [[categorical distribution]], rather than in a fixed partition.<ref name="as15a" /> More significant variants include the degree-corrected stochastic block model,<ref name="ker" /> the hierarchical stochastic block model,<ref name="pei"/> the geometric block model,<ref name="gbm" /> censored block model and the mixed-membership block model.<ref name="ar1"/>
== Topic models ==
Stochastic block model have been recognised to be a [[topic model]] on bipartite networks.<ref name="gerlachnetwork">{{cite journal
| author1 = Martin Gerlach |author2=Tiago Peixoto |author3=Eduardo Altmann
| title = A network approach to topic models
| page=eaaq1360
| journal=Science Advances
| date = 2018 |doi = 10.1126/sciadv.aaq1360 | volume=4 | issue=7
|pmid=30035215 |pmc=6051742 |arxiv=1708.01677 |bibcode=2018SciA....4.1360G }}</ref> In a network of documents and words, Stochastic block model can identify topics: group of words with a similar meaning.
== Extensions to signed graphs ==
Signed graphs allow for both favorable and adverse relationships and serve as a common model choice for various data analysis applications, e.g., correlation clustering. The stochastic block model can be trivially extended to signed graphs by assigning both positive and negative edge weights or equivalently using a difference of adjacency matrices of two stochastic block models.
<ref>{{cite book
| author1=Alyson Fox |author2=Geoffrey Sanders |author3=Andrew Knyazev
|title=2018 IEEE High Performance extreme Computing Conference (HPEC) |chapter=Investigation of Spectral Clustering for Signed Graph Matrix Representations | date = 2018 |pages=1–7 |doi = 10.1109/HPEC.2018.8547575|osti=1476177 |isbn=978-1-5386-5989-2 |s2cid=54443034 }}</ref>
== DARPA/MIT/AWS Graph Challenge: streaming stochastic block partition ==
GraphChallenge<ref>[http://graphchallenge.mit.edu] {{Webarchive|url=https://web.archive.org/web/20230204160402/http://graphchallenge.mit.edu/|date=2023-02-04}} DARPA/MIT/AWS Graph Challenge</ref> encourages community approaches to developing new solutions for analyzing graphs and sparse data derived from social media, sensor feeds, and scientific data to enable relationships between events to be discovered as they unfold in the field. Streaming stochastic block partition is one of the challenges since 2017.
<ref>[http://graphchallenge.mit.edu/champions] {{Webarchive|url=https://web.archive.org/web/20230204160403/http://graphchallenge.mit.edu/champions|date=2023-02-04}} DARPA/MIT/AWS Graph Challenge Champions</ref> [[Spectral clustering]] has demonstrated outstanding performance compared to the original and even improved<ref>{{cite book
| author1 = A. J. Uppal |author2 = J. Choi |author3 = T. B. Rolinger |author4 = H. Howie Huang
|title = 2021 IEEE High Performance Extreme Computing Conference (HPEC) |chapter = Faster Stochastic Block Partition Using Aggressive Initial Merging, Compressed Representation, and Parallelism Control | date = 2021 |pages = 1–7 |doi = 10.1109/HPEC49654.2021.9622836|isbn = 978-1-6654-2369-4 |s2cid = 244780210 }}</ref>
base algorithm, matching its quality of clusters while being multiple orders of magnitude faster.<ref>{{cite book
| author1 = David Zhuzhunashvili |author2 = Andrew Knyazev
|title = 2017 IEEE High Performance Extreme Computing Conference (HPEC)
|chapter = Preconditioned spectral clustering for stochastic block partition streaming graph challenge (Preliminary version at arXiv.)
| date = 2017 |pages = 1–6
|doi = 10.1109/HPEC.2017.8091045|arxiv = 1708.07481
|isbn = 978-1-5386-3472-1
|s2cid = 19781504
}}</ref>
<ref>{{cite book
| author1 = Lisa Durbeck |author2 = Peter Athanas
|title = 2020 IEEE High Performance Extreme Computing Conference (HPEC)
|chapter = Incremental Streaming Graph Partitioning
| date = 2020 |pages = 1–8
|doi = 10.1109/HPEC43674.2020.9286181|isbn = 978-1-7281-9219-2
|s2cid = 229376193
}}</ref>
==See also==
* [[blockmodeling]]
* {{annotated link|Girvan–Newman algorithm}}
* {{annotated link|Lancichinetti–Fortunato–Radicchi benchmark}} for generating benchmark networks with communities
==References==
Line 53 ⟶ 100:
last2 = Krzakala| first2 = Florent |
last3 = Moore| first3 = Cristopher |
last4 =
title = Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications |
arxiv = 1109.3041 |
date = September 2011 | doi=10.1103/PhysRevE.84.066106 | volume=84 | journal=Physical Review E| issue = 6 |bibcode = 2011PhRvE..84f6106D | pmid=22304154 | page=066106| s2cid = 15788070 }}</ref>
<ref name="abh14">{{cite arXiv| last1 = Abbe| first1 = Emmanuel| last2 = Bandeira| first2 = Afonso S.| last3 = Hall| first3 = Georgina| title = Exact Recovery in the Stochastic Block Model|
<ref name="as15a">{{cite arXiv| last1 = Abbe| first1 = Emmanuel| last2 = Sandon| first2 = Colin| title = Community detection in general stochastic block models: fundamental limits and efficient recovery algorithms|
<ref name="as15b">{{cite arXiv| last1 = Abbe| first1 = Emmanuel| last2 = Sandon| first2 = Colin| title = Recovering communities in the general stochastic block model without knowing the parameters|
<ref name="lr15">{{Cite journal| doi = 10.1214/14-AOS1274| issn = 0090-5364| volume = 43| issue = 1| pages = 215–237| last1 = Lei| first1 = Jing| last2 = Rinaldo| first2 = Alessandro| title = Consistency of spectral clustering in stochastic block models| journal = The Annals of Statistics| date = February 2015 | arxiv = 1312.2050| s2cid = 88519551}}</ref>
<ref name="gbm">{{Cite journal| last1 = Galhotra| first1 = Sainyam| last2 = Mazumdar| first2 = Arya| last3 = Pal| first3 = Soumyabrata|
last4 = Saha| first4 = Barna|author4-link=Barna Saha| title = The Geometric Block Model| journal = AAAI| date = February 2018 | volume = 32| doi = 10.1609/aaai.v32i1.11905| arxiv = 1709.05510| s2cid = 19152144}}</ref>
<ref name="krzakala-pnas">{{cite journal|
last1 = Krzakala| first1 = Florent|
Line 69 ⟶ 118:
last6 = Lenka| first6 = Lenka|
last7 = Zhang| first7 = Pan|
title = Spectral redemption in clustering sparse networks | journal = Proceedings of the National Academy of Sciences| date = October 2013 | doi = 10.1073/pnas.1312486110 |arxiv = 1306.5550|bibcode = 2013PNAS..11020935K | volume=110 | issue = 52| pages=20935–20940| pmid = 24277835| pmc = 3876200| doi-access = free}}</ref>
<ref name="mns12">{{cite arXiv| last1 = Mossel| first1 = Elchanan| last2 = Neeman| first2 = Joe| last3 = Sly| first3 = Allan| title = Stochastic Block Models and Reconstruction|
<ref name="mns13">{{cite
<ref name="mas13">{{cite arXiv| last = Massoulie| first = Laurent| title = Community detection thresholds and the weak Ramanujan property|
<ref name="al14">{{cite arXiv| last1 = Amini| first1 = Arash A.| last2 = Levina| first2 = Elizaveta|author2-link= Elizaveta Levina| title = On semidefinite relaxations for the block model|
<ref name="ar1">{{cite
last1 = Airoldi| first1 = Edoardo|authorlink1=Edoardo Airoldi|
last2 = Blei| first2 = David|
last3 = Feinberg| first3 = Stephen|
last4 = Xing| first4 = Eric|
title = Mixed membership stochastic blockmodels| journal = Journal of Machine Learning Research |arxiv = 0705.4485| date = May 2007 | volume = 9| pages = 1981–2014| pmid = 21701698| pmc = 3119541| bibcode = 2007arXiv0705.4485A}}</ref>
<ref name="fat19">{{cite arXiv| last = Fathi| first = Reza| title = Efficient Distributed Community Detection in the Stochastic Block Model|
<ref name="hol">{{cite journal |title=Stochastic blockmodels: First steps |journal=[[Social Networks]] |year=1983 |last1=Holland |first1=Paul W |last2=Laskey |first2=Kathryn Blackmond |last3=Leinhardt |first3=Samuel |volume=5 |issue=2 |pages=109–137 |issn=0378-8733 |doi=10.1016/0378-8733(83)90021-7 |s2cid=34098453 |url=https://doi.org/10.1016/0378-8733(83)90021-7 |accessdate=2021-06-16 |archive-date=2023-02-04 |archive-url=https://web.archive.org/web/20230204160405/https://www.sciencedirect.com/science/article/abs/pii/0378873383900217?via%3Dihub |url-status=live |url-access=subscription }}</ref>
<ref name="ker">{{cite journal |title=Stochastic blockmodels and community structure in networks |journal=Physical Review E |year=2011 |last1=Karrer |first1=Brian |last2=Newman |first2=Mark E J |volume=83 |issue=1 |page=016107 |doi=10.1103/PhysRevE.83.016107 |pmid=21405744 |url=https://link.aps.org/doi/10.1103/PhysRevE.83.016107 |accessdate=2021-06-16 |arxiv=1008.3926 |bibcode=2011PhRvE..83a6107K |s2cid=9068097 |archive-date=2023-02-04 |archive-url=https://web.archive.org/web/20230204160406/https://journals.aps.org/pre/abstract/10.1103/PhysRevE.83.016107 |url-status=live }}</ref>
<ref name="pei">{{cite journal |title=Hierarchical block structures and high-resolution model selection in large networks |journal=Physical Review X |year=2014 |last=Peixoto |first=Tiago |volume=4 |issue=1 |page=011047 |doi=10.1103/PhysRevX.4.011047 |url=https://journals.aps.org/prx/abstract/10.1103/PhysRevX.4.011047 |accessdate=2021-06-16 |arxiv=1310.4377 |bibcode=2014PhRvX...4a1047P |s2cid=5841379 |archive-date=2021-06-24 |archive-url=https://web.archive.org/web/20210624195430/https://journals.aps.org/prx/abstract/10.1103/PhysRevX.4.011047 |url-status=live }}</ref>
}}
[[Category:Machine learning]]▼
[[Category:Random graphs]]
[[Category:Networks]]
|