Papers
Topics
Authors
Recent
Search
2000 character limit reached

Circulated Neighbors Random Walk (CNRW)

Updated 23 January 2026
  • CNRW is a higher-order Markov chain Monte Carlo algorithm that enhances uniform node sampling by tracking visited neighbors per entry edge.
  • It systematically circulates through all available neighbors before repeating transitions, reducing variance and shortening burn-in time.
  • Empirical results show a 30–50% reduction in sampling steps needed to achieve target estimator accuracy on large social networks.

Circulated Neighbors Random Walk (CNRW) is a higher-order Markov chain Monte Carlo sampling algorithm for graphs, introduced to accelerate uniform node sampling over large online social networks while accessing only local neighborhood information through restricted web or API interfaces. CNRW modifies the simple random walk (SRW) paradigm by deterministically circulating over neighbors of each node—conditioned on the entry edge—before repeating neighbors, thereby improving the mixing rate and reducing variance without altering the stationary distribution. Extensively evaluated on large-scale real-world and synthetic networks, CNRW achieves a 30–50% reduction in burn-in and sampling cost for fixed estimation accuracy relative to SRW (Zhou et al., 2015).

1. CNRW Algorithmic Design and Intuition

Traditional SRW generates a node sequence X0,X1,X2,X_0, X_1, X_2, \ldots on graph G=(V,E)G=(V,E), choosing each next node uniformly at random from the neighbors N(v)N(v) of the current node vv, disregarding prior visit history. In contrast, CNRW extends to a higher-order chain by tracking, for each directed edge (u ⁣ ⁣v)(u\!\to\!v), which neighbors of vv have already been selected when traversing from uu to vv. Traversal from (u ⁣ ⁣v)(u\!\to\!v) to a neighbor ww uniformly samples from N(v)b(u,v)N(v)\setminus b(u,v), where b(u,v)b(u,v) captures already-used neighbors; upon exhausting all neighbors, b(u,v)b(u,v) resets to \emptyset and selection proceeds afresh. This discourages repeated transitions along the same localized paths (which can trap the walk in subgraphs), fostering better exploration and expedited mixing.

The intuition is that CNRW avoids local trapping by forcing the walk to exploit all outgoing possibilities from vv conditioned on how it was entered, before any repetition, thus systematically enhancing coverage of the network's topology.

2. Formal Specification and Transition Probabilities

Let the time index be tt, with XtX_t denoting the walk location and HtH_t the path history. For each traversal of (u ⁣ ⁣v)(u\!\to\!v), bt(u,v)b_t(u,v) records the set of neighbors of vv that have already been chosen immediately after arriving via (u ⁣ ⁣v)(u\!\to\!v). The formal transition rule is: P(Xt+1=wXt=v, Ht)={1N(v)bt(u,v)if wN(v)bt(u,v), 0otherwise.P\bigl(X_{t+1}=w\mid X_t=v,\ H_t\bigr) = \begin{cases} \frac{1}{|N(v)\setminus b_t(u,v)|} & \text{if } w\in N(v)\setminus b_t(u,v), \ 0 & \text{otherwise}. \end{cases} If bt(u,v)=N(v)b_t(u,v)=N(v), bt(u,v)b_t(u,v) resets to \emptyset, and

P(Xt+1=wXt=v, Ht)=1kv,wN(v),P\bigl(X_{t+1}=w\mid X_t=v,\ H_t\bigr) = \frac{1}{k_v}, \quad \forall w\in N(v),

where kv=N(v)k_v=|N(v)| is the degree of vv.

The algorithm uses a hash-map data structure b(u,v)b(u,v) for each distinct (u,v)(u,v) to store already-used neighbors. Each step requires O(1)O(1) expected time using dynamic hashing. The storage requirement grows as O(T)O(T) over TT steps, corresponding to the distinct traversed edges and their blocked-neighbor lists.

3. Stationary Distribution and Variance Properties

CNRW preserves the stationary distribution of SRW: π(v)=kv2E,\pi(v) = \frac{k_v}{2|E|}, where π(v)\pi(v) is the stationary probability of visiting node vv, and E|E| is the total number of edges. The proof constructs the infinite walk trace as a concatenation of path-blocks—subpaths initiated by traversing (u ⁣ ⁣v)(u\!\to\!v)—showing that under CNRW, these path-blocks cycle through all possible neighbor-rooted block types in a without-replacement fashion, but with unchanged internal structure compared to SRW. The ergodic occupation time is thus invariant.

A key property is variance reduction: block-stratification (cycling through all neighbor options before repeating) guarantees that CNRW's asymptotic variance for ergodic averages is no higher than SRW's. This result extends Neal’s (2004) findings on variance minimization under deterministic stratification (Zhou et al., 2015).

4. Burn-in, Mixing, and Efficiency Gains

Although no closed-form mixing time bound is derived, theoretical and empirical evidence demonstrates significantly accelerated mixing and reduced burn-in:

  • On structured bottlenecked graphs (e.g., barbell graphs comprising two cliques), the probability of crossing sparse bridges under CNRW is enhanced by a factor of 1n1ln(n)\frac{1}{n-1}\ln(n), yielding dramatically faster escapes from bottlenecks compared to SRW.
  • Asymptotic variance analyses confirm reduced estimator variance for attribute aggregates due to stratified coverage.
  • Empirical results reveal CNRW and its extension GNRW require 30–50% fewer steps to reach target estimator accuracy or achieve similar bias as SRW/NB-SRW.

Practical experiments further demonstrate for real-world social network datasets (Google+, Yelp, Facebook, YouTube) that estimator relative error and measures of sampling bias (KL-divergence, 2\ell_2 distance to stationary) fall more rapidly for CNRW than for traditional schemes, across sampling budgets from 100 to 1,000 steps.

5. Query and Space Complexity

CNRW maintains efficient per-step computational complexity, with O(1)O(1) random neighbor selection implemented via hash-maps tracking used neighbors for each traversed edge. Total storage overhead is O(T)O(T) for TT-step runs, scaling linearly with the number of distinct edge traversals. Query complexity—number of unique neighborhood retrievals—matches the number of walk steps, with redundant neighborhood requests resolved by local caching. The overall convergence is such that, for any target estimator variance or bias, CNRW never requires more steps, and typically fewer, than SRW.

Implementation over real-world APIs necessitates local cache management of node neighborhoods and modest additional memory for blocked-neighbor tracking per directed edge. This overhead remains practical for sampling budgets up to approximately 10410^4 steps.

6. Practical Considerations and Extensions

CNRW and its generalization, Groupby Neighbors Random Walk (GNRW), function as drop-in replacements for SRW, requiring no global access to network topology. GNRW further partitions neighbors by attribute (such as node degree or review count) and cycles within strata, yielding further improvements for attribute-specific estimations—an effect analogous to stratified sampling. The grouping function in GNRW is application-dependent; the selection and potential automation of such grouping remains an open area for research.

Directed graphs may be handled by symmetrization of the adjacency structure or by tracking in- and out-neighbors separately for each edge. Memory scalability limits applicability in extremely large-diameter graphs unless walk budgets remain moderate.

A principal open question is the derivation of explicit mixing-time bounds for CNRW relative to core graph metrics such as conductance. Notably, empirical results suggest substantial improvement, but formal characterization is pending.

7. Experimental Results on Network Datasets

Extensive experiments document the efficiency and accuracy of CNRW and GNRW:

  • For Google+ (240K nodes), Yelp (120K nodes), and other major social network snapshots:
    • Estimator relative error for average degree drops below $0.06$ after approximately $450$ steps for CNRW/GNRW, compared to about $800$ steps for SRW/NB-SRW.
    • KL-divergence and 2\ell_2 distance to the stationary distribution remain 30–50% lower for CNRW/GNRW across sample budgets of 100–1,000 steps.
  • On synthetic clustered and barbell graphs, CNRW/GNRW reduce statistical bias by a factor of $1.5-2$ compared to SRW.
  • GNRW yields further error reduction for aggregates over grouped attributes, demonstrating the benefits of stratification.

These results confirm the practical advantage of incorporating history-aware transition mechanisms such as those in CNRW for network sampling in applications constrained by API access or local-neighborhood visibility (Zhou et al., 2015).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Circulated Neighbors Random Walk (CNRW).