Parallel Voting Decision Tree (PV-Tree)

Updated 24 January 2026

PV-Tree is a distributed algorithm that uses a two-tier voting protocol to efficiently select split attributes with reduced communication overhead.
It operates in three phases—local voting, global voting, and final split selection—to build decision trees and ensemble variants on horizontally partitioned data.
Empirical studies demonstrate significant speedups and lower transfer volumes compared to traditional data- and attribute-parallel methods, with strong theoretical guarantees.

Parallel Voting Decision Tree (PV-Tree) is a communication-efficient distributed algorithm for learning decision trees and their ensemble variants, such as Gradient Boosted Decision Trees (GBDT) and Random Forest, in large-scale settings. The core innovation of PV-Tree lies in its two-tiered voting protocol for attribute selection, which enables substantial reduction in communication overhead while preserving the statistical fidelity of split selection. PV-Tree is designed for horizontally partitioned data distributed across multiple compute nodes, with formal theoretical guarantees and empirical validation on industry-scale datasets (Meng et al., 2016).

1. Problem Setting and Objective

PV-Tree operates on a training dataset $D = \{ (x_i, y_i) \}_{i=1}^N$ , partitioned horizontally across $M$ machines, such that each machine $m$ holds $D_m$ with $|D_m| = n = N/M$ . For each split at a tree node $O$ , the goal is to identify the attribute $j^*$ and split point $w^*$ maximizing a node-informativeness criterion, such as information gain (for classification) or variance reduction (for regression):

Information Gain (IG):

$IG_j(w;O) = H(Y|O) - [ P_L H(Y|X_j < w) + P_R H(Y|X_j \geq w) ]$

Variance Gain (VG):

$VG_j(w;O) = \operatorname{Var}(Y|O) - [ P_L \operatorname{Var}(Y|X_j < w) + P_R \operatorname{Var}(Y|X_j \geq w) ]$

with $M$ 0, $M$ 1, and $M$ 2 denoting conditional entropy and variance, respectively.

2. Algorithmic Framework

The PV-Tree algorithm finds the best split at each node via three main phases: local voting, global voting, and final split selection:

Local Voting: Each machine computes the local gain $M$ 3 for every attribute $M$ 4, selecting the top- $M$ 5 locally best attributes $M$ 6, using binning (typically $M$ 7 bins) for continuous features.
Global Voting: All machines exchange their local top- $M$ 8 sets $M$ 9 (e.g., using MPI_AllGather). Global vote counts $m$ 0 are computed for each attribute. The top-2 $m$ 1 globally most-voted attributes are selected as $m$ 2.
Final Split Selection: Each machine sends full-binned histograms $m$ 3 for $m$ 4 to a master (or all-reduce operation). The master aggregates global histograms $m$ 5 and scans for the split $m$ 6.

The three phases are formalized by the following pseudocode:

$w^*$ 2 (Meng et al., 2016)

3. Communication Complexity and Scalability

The communication efficiency of PV-Tree is achieved by restricting the exchange of full-grained histograms to a subset of attributes:

Local Voting: Each machine sends $m$ 7 attribute indices ( $m$ 8 words).
Global Voting and Aggregation: Each machine sends $m$ 9 histograms of $D_m$ 0 bins each ( $D_m$ 1 words).
Total: $D_m$ 2 words per split iteration, independent of the total number of attributes $D_m$ 3 or training instances $D_m$ 4.

Comparative communication costs for one split iteration (per node):

Method	Communication per split	Dependent on $D_m$ 5?	Dependent on $D_m$ 6?
PV-Tree	$D_m$ 7	No	No
Data-parallel	$D_m$ 8	Yes	No
Attribute-parallel	$D_m$ 9	No	Yes

(Meng et al., 2016)

This configuration supports scalable and efficient distributed training, with empirical evidence of significant reductions in communication volume (e.g., 10MB for PV-Tree vs. 424MB for data-parallel when $|D_m| = n = N/M$ 0e9, $|D_m| = n = N/M$ 1, $|D_m| = n = N/M$ 2).

4. Theoretical Analysis and Accuracy Guarantees

PV-Tree provides formal probabilistic guarantees that its two-stage voting protocol will, with high probability, select the statistically optimal split attribute:

Given true information gain rankings $|D_m| = n = N/M$ 3, the probability that PV-Tree selects the best attribute satisfies

$|D_m| = n = N/M$ 4

where for $|D_m| = n = N/M$ 5, $|D_m| = n = N/M$ 6 with $|D_m| = n = N/M$ 7, $|D_m| = n = N/M$ 8, $|D_m| = n = N/M$ 9 as $O$ 0. As $O$ 1, this probability approaches 1 for fixed $O$ 2, $O$ 3, and $O$ 4.

This result arises from a combination of concentration bounds comparing empirical and true information gain, and a combinatorial Majoritarian (binomial) argument for the global voting phase (Meng et al., 2016).

5. Empirical Performance and Trade-offs

Extensive experiments demonstrate the superior performance of PV-Tree in terms of wall-clock convergence and communication efficiency across industrial-scale tasks, specifically for GBDT learning:

Example summary (training on Gradient Boosted Decision Trees):

Task	# Training	# Test	$O$ 5	Machines	Sequential	Data-parallel	Attr.-parallel	PV-Tree
LTR	11M	1M	1200	8	28,690 s	32,260 s	14,660 s	5,825 s
CTR	235M	31M	800	32	154,112 s	9,209 s	26,928 s	5,349 s

(Meng et al., 2016)

PV-Tree achieves up to $O$ 6 speedup over sequential training on LTR data using eight machines and $O$ 7 speedup on CTR using 32 machines. Communication cost analyses indicate an order-of-magnitude lower transfer volume relative to alternatives at equivalent accuracy.

A recognized trade-off is that increasing $O$ 8 reduces per-node data but increases parallelism; optimal $O$ 9 depends on fixed $j^*$ 0. Excessively small $j^*$ 1 may degrade accuracy, but $j^*$ 2 is sufficient with large $j^*$ 3. The design ensures statistical correctness is not compromised as communication is reduced.

6. Comparison to Other Parallel Decision Tree Frameworks

PV-Tree contrasts with both data-parallel and attribute-parallel (vertical) approaches:

Data-parallel approaches exchange histogram data for all attributes, incurring $j^*$ 4 cost.
Attribute-parallel methods require global reshuffling of example-indexed values for selected attributes, entailing $j^*$ 5 communication per split.
Vertical Hoeffding Tree (VHT) (Kourtellis et al., 2016) achieves parallelization by distributing features across workers with a Model Aggregator architecture but is tailored for streaming scenarios and uses a Hoeffding bound-based split protocol.

The two-stage voting of PV-Tree leverages horizontal partitioning and candidate restriction, thus scaling efficiently to scenarios with high $j^*$ 6, large $j^*$ 7, and many compute nodes.

7. Application Domains and Limitations

PV-Tree is directly applicable to GBDT and Random Forest learning on large-scale tabular data. By sharply lowering communication costs, it enables distributed model induction in environments with limited network bandwidth or where high feature dimensionality would otherwise bottleneck attribute selection. A plausible implication is enhanced applicability on real-world tasks in advertising (CTR), ranking (LTR), and other domains with massive datapoints and features.

Known limitations include reduced per-node sample size as $j^*$ 8 increases, which may affect statistical power if not offset by larger $j^*$ 9 or $w^*$ 0. Careful tuning of $w^*$ 1 is requisite for balancing accuracy and communication efficiency. These characteristics position PV-Tree as a method with broad practical scalability, strong theoretical underpinnings, and empirical validation as a state-of-the-art approach in scalable parallel decision tree induction (Meng et al., 2016).

Markdown Report Issue Upgrade to Chat

References (2)

A Communication-Efficient Parallel Algorithm for Decision Tree (2016)

VHT: Vertical Hoeffding Tree (2016)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Parallel Decision Tree Learning (PV-Tree).

Parallel Voting Decision Tree (PV-Tree)

1. Problem Setting and Objective

2. Algorithmic Framework

3. Communication Complexity and Scalability

4. Theoretical Analysis and Accuracy Guarantees

5. Empirical Performance and Trade-offs

6. Comparison to Other Parallel Decision Tree Frameworks

7. Application Domains and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Parallel Voting Decision Tree (PV-Tree)

1. Problem Setting and Objective

2. Algorithmic Framework

3. Communication Complexity and Scalability

4. Theoretical Analysis and Accuracy Guarantees

5. Empirical Performance and Trade-offs

6. Comparison to Other Parallel Decision Tree Frameworks

7. Application Domains and Limitations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research