Data-Driven Networks Approach

Updated 28 December 2025

Data-driven networks are strategies that use operational data and ML to infer latent structures, forecast dynamic conditions, and overcome limitations of traditional models.
They employ techniques such as unsupervised clustering, graph neural networks, and reinforcement learning to automate resource allocation and optimize control policies.
This approach improves network resilience and performance in complex environments, supporting applications from 5G core networks to large-scale cyber-physical systems.

A data-driven networks approach refers to the systematic use of operational measurements and ML techniques to analyze, optimize, and control complex networked systems—ranging from communication infrastructures (such as 5G core networks) to distributed cyber-physical systems and large-scale dynamical networks—without reliance on a priori analytic or simulation models. This paradigm stands in contrast to classical model-based methods by directly leveraging observed traces, logs, and performance counters to infer latent structures, forecast network states, and enact adaptive policies. Across domains, the data-driven approach enables robust network management under highly dynamic, uncertain, or nonlinear conditions, and generalizes to both architectural design (e.g., topologies, resource allocation) and operational control (e.g., traffic engineering, consensus, anomaly detection) (Manias et al., 2022, Hope et al., 2021, Samari et al., 2024, Caro-Ruiz et al., 2017).

1. Principles: Data-Driven vs. Model-Based Methodologies

Traditional model-based approaches require the formulation of accurate closed-form models describing all relevant network phenomena, such as queuing, channel propagation, or protocol behaviors. These models are often brittle under non-stationarity and may fail to capture the full range of real-world operating conditions, especially in mobile, large-scale, or highly heterogeneous networks.

In contrast, the data-driven approach dispenses with explicit modeling:

Direct use of measurements: Operational logs, packet and event traces, and temporal performance indicators are ingested and form the empirical basis for analysis and optimization.
Statistical and ML inference: Modern learning techniques (e.g., unsupervised clustering, deep learning, graph neural networks, reinforcement learning) are employed to extract patterns, detect anomalies, and anticipate future conditions.
Robustness and adaptivity: Data-driven algorithms adapt to sudden traffic surges, topology changes, and device heterogeneity, accommodating effects that are impractical to model comprehensively.

This paradigm enables the discovery of latent network structure (e.g., clustering of function-to-function traffic in a 5G core), learning control policies in environments where node behaviors are unknown, and proactive optimizations grounded in observed, rather than hypothesized, dynamics (Manias et al., 2022, Samari et al., 2024).

2. Architectures and Systems Implementing Data-Driven Networks

Network Data Analytics Function (NWDAF) in 5G Core Networks

The 3GPP-specified NWDAF is a logical network function aggregating events, KPIs, and traffic data from across core network functions (NFs). Key components include:
- NWDAF analytics engine: Hosts clustering/forecasting algorithms.
- Data repository: Retains event logs and measurements (e.g., MongoDB).
- Kafka pipeline: Ingests real-time packet-level data for subsequent ML analysis.
- Interface Nnwdaf: Exposes standardized APIs (e.g., Nnwdaf_AnalyticsInfo) for consuming analytics results.
Operational packet capture is implemented via hypervisor-based port mirroring, ensuring both real-time and historical data coexist for online and retrospective analytics.

Graph-Driven Policy Architectures

In domains such as data-driven traffic engineering, Graph Neural Networks (GNNs) provide a natural substrate for learning control policies that respect network topology and can generalize across topological changes (Hope et al., 2021).
- Nodes and edges are embedded with features derived from recent traffic states.
- Message-passing (i.e., GNN layers) encodes multi-hop dependencies and enables policy transfer to new or perturbed graphs.

Symbolic and Divide-and-Conquer Compositional Architectures

For large-scale or infinite networks with unknown agents and topologies, data-driven compositional methods build symbolic discrete-domain models of each subsystem using data from local trajectories, sidestepping monolithic modeling and enabling tractable synthesis and safety guarantees (Samari et al., 2024, Zaker et al., 15 Jul 2025).

3. Data Acquisition, Feature Engineering, and Preprocessing

Measurement Collection

Continuous packet and event capture: E.g., over 170,000 packets in 138 minutes, filtered to only NF-to-NF interactions (Manias et al., 2022).
Real-time mirroring for live inference and retrospective querying.

Feature Engineering

For each source-destination (NF_i, NF_j) pair, extraction of:
- n: Total number of packets exchanged.
- L₁, L₂, …, L_n: Individual packet lengths.
- $\bar L$ : Average packet length; $L_{\max}$ : Maximum packet size; $\sigma_L$ : Standard deviation.
Features are min-max scaled or standardized (zero mean, unit variance) for ML suitability.
Similar pipelines extend to joint feature construction in multi-input systems, e.g., traffic history matrices in GNN-based routing (Hope et al., 2021).

4. Machine Learning and Inference Techniques

Unsupervised Structure Discovery

k-means clustering groups NF-to-NF feature vectors by minimizing within-cluster Euclidean variance. The number of clusters $k$ is tuned based on interpretability and silhouette score metrics, enabling the identification of both high-traffic ("heavy") and idle ("null") NF pairs (Manias et al., 2022).
Heatmaps and cluster analysis extract operational insights (e.g., asymmetric traffic profiles due to protocol payloads, concentration of registration traffic).

Policy Learning and Control

In routing, GNN architectures parameterize per-edge (or per-node) policies that are optimized via deep reinforcement learning (PPO), with rewards tied to downstream congestion metrics compared to linear programming optima (Hope et al., 2021).
Iterative multi-message-passing architectures require ≥3 rounds for effective encoding of route state and generalization.
Divide-and-conquer strategies build local symbolic models, aggregate them with scenario-based data-driven functions (e.g., alternating sub-bisimulation), and synthesize compositional controllers with statistical guarantees, eliminating the need for small-gain computation or explicit knowledge of interconnection topology (Samari et al., 2024).

Proactive and Autonomous Optimization

Automated mapping of heavy-cluster NF pairs to co-location or bandwidth-priority measures.
Predictive scaling of highly variable NFs avoids future bottlenecks.
Dynamic migration decisions for under-utilized NF instances to optimize host resource utilization.

5. Applications and Case Studies

Application Domain	Data-Driven Approach Elements	Key Outcome
5G Core (NWDAF) (Manias et al., 2022)	Packet capture, feature extraction, unsupervised clustering, Kafka-ML stack	Automated resource balancing
Data-Driven Routing (Hope et al., 2021)	GNN policy, RL optimization, transfer to unseen topologies	Near-optimal congestion min.
Symbolic Control in Unknown Networks (Samari et al., 2024)	Local scenario-based symbolic models, compositional bisimulation	Formal correctness, scalability
Event-driven Consensus (Renganathan et al., 2022)	Probabilistic trust estimation, history-driven weighting	Consensus, attack robustness

This approach accommodates evolving traffic, topology, and device heterogeneity found in contemporary and future networks, offering both adaptive intelligence and formal performance/robustness properties.

6. Generalization, Limitations, and Future Perspectives

Data-driven network analysis and control generalizes naturally across technological domains. For example, the NWDAF ingestion/analytics pipeline is directly extensible to radio-access/C-RAN domains, containerized edge environments, and IoT protocols by modifying event schemas and feature sets (Manias et al., 2022). Techniques such as anomaly detection, drift analysis, or advanced time-series forecasting (RNNs/transformers) are earmarked for ongoing enhancements.

Limitations:

Ground truth and feature availability constrain the granularity and domain of some analyses.
ML methods require robust, representative operational data for accuracy.
Feature selection and scaling critically impact clustering and policy learning results.
Some pipelines omit direct application of advanced techniques (e.g., DBSCAN for nonconvex clusters is not always empirically implemented).

Future Directions:

Deep integration with edge/cloud-native telemetry.
Online anomaly and drift detection for enhanced resilience.
Cross-technology deployment by reuse of event ingestion and analytics stacks in new domains.
Time-series based proactive slice reconfiguration and self-optimization.
Hierarchical and explainable learning models to improve scalability and interpretability in massive-scale networks (Manias et al., 2022, Samari et al., 2024).

7. Significance and Impact

Data-driven network approaches provide a scalable, flexible alternative to brittle model-based methods for analyzing, optimizing, and controlling both communication and generalized dynamical networks. By accommodating operational complexity and enabling automation, these approaches underpin autonomy in modern network management—evident in 5G core systems and forward-looking towards future edge, RAN, and cross-domain deployments (Manias et al., 2022, Hope et al., 2021, Samari et al., 2024). The approach's extensibility to both structure and policy spaces ensures its centrality in ongoing network systems research.

Markdown Upgrade to Chat

References (6)

An NWDAF Approach to 5G Core Network Signaling Traffic: Analysis and Characterization (2022)

GDDR: GNN-based Data-Driven Routing (2021)

Data-Driven Control of Large-Scale Networks with Formal Guarantees: A Small-Gain Free Approach (2024)

Self-Organization in Networks: A Data-Driven Koopman Approach (2017)

Data-Driven Safety Certificates of Infinite Networks with Unknown Models and Interconnection Topologies (2025)

History Data Driven Distributed Consensus in Networks (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Data-Driven Networks Approach.

Data-Driven Networks Approach

1. Principles: Data-Driven vs. Model-Based Methodologies

2. Architectures and Systems Implementing Data-Driven Networks

3. Data Acquisition, Feature Engineering, and Preprocessing

4. Machine Learning and Inference Techniques

5. Applications and Case Studies

6. Generalization, Limitations, and Future Perspectives

7. Significance and Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Data-Driven Networks Approach

1. Principles: Data-Driven vs. Model-Based Methodologies

2. Architectures and Systems Implementing Data-Driven Networks

3. Data Acquisition, Feature Engineering, and Preprocessing

4. Machine Learning and Inference Techniques

5. Applications and Case Studies

6. Generalization, Limitations, and Future Perspectives

7. Significance and Impact

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research