`fastRG`

quickly samples a broad class of network models known as generalized random dot product graphs (GRDPGs). In particular, for matrices *X*, *S* and *Y*, `fastRG`

samples a matrix *A* with expectation *X SY*

`fastRG`

has two primary use cases:

- Sampling enormous sparse graphs that cannot feasibly be sampled with existing samplers, and
- validating new methods for random dot product graphs (and variants).

`fastRG`

makes the latent parameters of random dot product graphs readily available to users, such that simulation studies for community detection, subspace recovery, etc, are straightforward.

`fastRG`

is not yet on CRAN. You can install the development version with:

There are two stages to sampling from generalized random dot product graphs. First, we sample the latent factors *X* and *Y*. Then we sample *A* conditional on those latent factors. `fastRG`

mimics this two-stage sample structure. For example, to sample from a stochastic blockmodel, we first create the latent factors.

```
library(fastRG)
#> Loading required package: Matrix
set.seed(27)
sbm <- sbm(n = 1000, k = 5, expected_density = 0.01)
#> Generating random mixing matrix `B` with independent Uniform(0, 1) entries. This distribution may change in the future. Explicitly set `B` for reproducible results.
```

You can specify the latent factors and the mixing matrix *B* yourself, but there are also defaults to enable fast prototyping. Here *B* was randomly generated with `Uniform[0, 1]`

entries and nodes were assigned randomly to communities with equal probability of falling in all communities. Printing the result object gives us some additional information:

```
sbm
#> Undirected Stochastic Blockmodel
#> --------------------------------
#>
#> Nodes (n): 1000 (arranged by block)
#> Blocks (k): 5
#>
#> Traditional SBM parameterization:
#>
#> Block memberships (z): 1000 [factor]
#> Block probabilities (pi): 5 [numeric]
#> Edge distribution: poisson
#>
#> Factor model parameterization:
#>
#> X: 1000 x 5 [dgCMatrix]
#> S: 5 x 5 [dgeMatrix]
#>
#> Expected edges: 10000
#> Expected degree: 10
#> Expected density: 0.01
```

Now, conditional on this latent representation, we can sample graphs. `fastRG`

supports several different output types, each of which is specified by the suffix to `sample_*()`

functions. For example, we can obtain an edgelist in a `tibble`

with:

```
sample_edgelist(sbm)
#> # A tibble: 9,986 x 2
#> from to
#> <int> <int>
#> 1 1 184
#> 2 111 4
#> 3 86 12
#> 4 43 194
#> 5 61 37
#> 6 22 16
#> 7 4 10
#> 8 30 107
#> 9 119 209
#> 10 41 91
#> # … with 9,976 more rows
```

but we can just as easily obtain the graph as a sparse matrix

```
A <- sample_sparse(sbm)
A[1:10, 1:10]
#> 10 x 10 sparse Matrix of class "dsCMatrix"
#>
#> [1,] . . . 1 . . . . . .
#> [2,] . . 1 . . . . . . .
#> [3,] . 1 . . . . . . . .
#> [4,] 1 . . . . . . . . .
#> [5,] . . . . . . . . . .
#> [6,] . . . . . . . . . .
#> [7,] . . . . . . . . . .
#> [8,] . . . . . . . . . .
#> [9,] . . . . . . . . . .
#> [10,] . . . . . . . . . .
```

or an igraph object

```
sample_igraph(sbm)
#> IGRAPH b16ccb2 U--- 1000 10072 --
#> + edges from b16ccb2:
#> [1] 50-- 54 90--202 94--165 115--210 171--189 7-- 38 86--184 91--215
#> [9] 3-- 92 35--111 9--159 3--158 66--131 11-- 28 164--214 48--163
#> [17] 116--141 8--189 10--170 102--193 14--207 99--209 36-- 89 72--213
#> [25] 62--126 17--136 14--145 15-- 15 41--211 161--174 23--215 5--132
#> [33] 121--190 61-- 84 29-- 95 28--182 52--109 22-- 56 24--166 14--109
#> [41] 2-- 37 93--125 53--153 27-- 62 122--211 52-- 81 34--180 12-- 93
#> [49] 120--196 50--190 7--175 107--186 137--155 92--173 195--213 2--163
#> [57] 159--208 98--215 53--205 75--124 164--167 27-- 44 82-- 96 177--188
#> [65] 55--167 38--134 170--192 5-- 6 14--145 166--209 40--183 66--138
#> + ... omitted several edges
```

Note that every time we call `sample_*()`

we draw a new sample.

```
A <- sample_sparse(sbm)
B <- sample_sparse(sbm)
all(A == B) # random realizations from the SBM don't match!
#> [1] FALSE
```

If you would like to obtain the singular value decomposition of the population adjacency matrix conditional on latent factors, that is straightforward:

Note that eigendecompositions and SVDS (for directed graphs) use `RSpectra`

and do not require explicitly forming large dense population adjacency matrices; the population decompositions should be efficient in both time and space for even large graphs.

There are several essential tools to modify graph sampling that you should know about. First there are options that affect the latent factor sampling:

`expected_degree`

: Set the expected average degree of the graph by scaling sampling probabilities. We*strongly, strongly*recommend that you always set this option. If you do not, it is easy accidentally sample from large and dense graphs.`expected_density`

: Set the expected density of the graph by scaling sampling probabilities. You cannot specify both`expected_degree`

and`expected_density`

at the same time.

In the second stage of graph sampling, the options are:

`poisson_edges`

: Either`TRUE`

or`FALSE`

depending on whether you would like a Bernoulli graph or a Poisson multi-graph. Scaling via`expected_degree`

assumes a Poisson multi-graph, with some limited exceptions.`allow_self_edges`

: Whether nodes should be allowed to connect to themselves. Either`TRUE`

or`FALSE`

.

Sampling blockmodels with very small numbers of nodes (or blockmodels with the number of blocks `k`

on the same order as `n`

) results in a degeneracy that can cause issues.

`igraph`

allows users to sample SBMs (in 𝒪(*m* + *n* + *k*^{2}) time) and random dot product graphs (in 𝒪(*n*^{2}*k*) time). You can find the original research code associated with `fastRG`

here. There is also a Python translation of the original code in Python here. Both of these implementations are bare bones.