MDG: MLaaS Dataset Generator
- MDG is a configurable framework that simulates MLaaS operations, enabling systematic benchmarking, workflow optimization, and integration with IoT systems.
- It employs six tightly integrated stages to generate reproducible datasets across diverse ML models and tasks, using both IID and non-IID data splits.
- MDG supports detailed performance metrics and QoS indicators, leading to up to 62% improvement in service selection accuracy and enhanced workflow composition.
A Machine Learning as a Service Dataset Generator (MDG) is a configurable framework for generating rich, reproducible datasets that systematically capture the behavior, performance, and composability of MLaaS service instances across real-world conditions. MDG is designed to simulate realistic MLaaS operations—spanning training, evaluation, and service composition—enabling rigorous benchmarking and downstream research on service selection, workflow optimization, and IoT system composition (Kanneganti et al., 18 Jan 2026).
1. Architectural Overview
MDG comprises six tightly integrated stages, structured to maximize reproducibility and coverage of MLaaS service diversity:
- Input-Data Generation: Supports interactive (Wizard), controlled (Generate), and randomized (Autogen) entry points for dataset/model/hyperparameter selection, facilitating both guided and large-scale, automated simulation.
- Dataset & Model Configuration: Implements normalization and partitioning; includes IID and non-IID data splits (Dirichlet , shard, quantity-skew), reflecting federated and skewed deployment scenarios encountered in IoT contexts.
- Individual MLaaS Simulation: Trains an array of models (CNN, RNN, MLP, MobileNetV2, Random Forest, Logistic Regression, K-means) using federated and centralized protocols, logging metrics at run, round, and client granularity with systematic SQLite persistence.
- Composability Indicator Computation: Quantifies functional and cross-service compatibility via metrics such as Data Utility Measurement (DUM), Model Utility Measurement (MUM), Scalability Measurement (SM), Historical Quality Score (HQS), and Service Reliability Score (SRS).
- Service Composition Executor: Executes parametric aggregations (e.g., weighted parameter averaging for neural models), ensemble-based approaches for non-parametric models, and maintains fidelity to real-world workflow aggregation patterns.
- Dataset Export & Storage: Outputs comprehensive instance and composition datasets in CSV, JSON, and SQLite formats, facilitating broad integration with evaluation pipelines.
2. Simulation of Diverse Model Families
MDG supports training and evaluation of six major model classes across multiple canonical datasets (MNIST, Fashion-MNIST, Digits, CIFAR-10, Iris, Wine, California Housing):
- All models are instantiated with exhaustive, grid-sampled hyperparameters: , , , rounds.
- Data preprocessing includes feature scaling (none, standard, min–max), automated train/test splitting (), and supports both classification, regression, and clustering tasks.
- Each federated simulation logs per-round metrics and final predictions, maintaining exhaustive traceability of each simulated MLaaS instance.
- Evaluation metrics:
- Classification: Accuracy, Precision, Recall, -score (stored by run, round, client).
- Regression: RMSE.
- Clustering: silhouette score, inertia, ARI, NMI.
- All service metrics are persistently tracked for detailed post hoc analysis.
3. Functional and Quality-of-Service Attributes
Every MLaaS service instance generated by MDG is annotated with comprehensive functional descriptors and QoS records, including:
- Supported algorithms and task types (classification/regression/clustering).
- Input/output formats and schema metadata (JSON feature vectors, tensor shapes, label types).
- Hyperparameter sets (learning rate, batch size, etc.).
- Data distribution strategies (IID/non-IID, Dirichlet , shards, quantity-skew).
- Endpoint details (API schema, authentication).
- Measured QoS attributes under realistic IoT network perturbations:
- Response time , throughput , reliability , availability —aggregated by round/client and summarized per instance.
4. Composition-Specific Indicators and Optimization
MDG provides systematic computation of cross-service compatibility and optimal workflow selection:
- Composability indicators:
- DUM: quantifies distribution compatibility.
- MUM: balances performance attributes.
- SM: scalability ratio.
- HQS: moving average of workflow scores.
- SRS: long-term reliability.
- Composition optimization:
Objective: maximize total MUM subject to latency and cost constraints.
Both parametric aggregations (for neural models) and non-parametric ensemble strategies are supported, reflecting the diversity of real-world MLaaS workflows.
5. Benchmark Dataset Composition and Statistical Properties
The current MDG release encompasses:
| Attribute | Value/Range | Notes |
|---|---|---|
| Number of instances | 10,432 | Exhaustive across models/datasets |
| Datasets | MNIST, CIFAR-10, Iris, etc. | Seven standard tasks |
| Models | CNN, RNN, MLP, RF, etc. | Six major families |
| Task breakdown | ~4k classification, etc. | Classification, regression, clustering |
| Data splits | 50% IID, 50% non-IID | Dirichlet in [0.1, 1] |
| Hyperparam ranges | , , , | As above |
| Accuracy (classif.) | Aggregated over runs | |
| Response time (ms) | LogNormal | IoT emulated conditions |
| Reliability | -distributed 0.90 | Across service instances |
MDG thus provides fine-grained records suitable for benchmarking selection and composition strategies under diverse operational scenarios.
6. Integrated Service Composition Mechanism
MDG incorporates a native mechanism for automated workflow selection:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
Input: Service registry S, constraints (R_max, C_max, accuracy_min) Output: Best workflow W* 1. candidates ← filter S by constraints 2. indicators ← compute {DUM, MUM, SM, HQS, SRS} for all candidates 3. best_score ← -∞; W* ← ∅ 4. for each subset W ⊆ candidates of size K: θ_W ← ∑_{s∈W} w_s · θ_s # parametric aggregation ŷ_W ← majority_vote({ŷ_s | s∈W}) # ensemble for non-parametric acc_W ← evaluate(...) latency_W ← ∑_{s∈W} r(s) score_W ← α·acc_W - β·latency_W if score_W > best_score: best_score ← score_W; W* ← W 5. return W* |
Optimization prioritizes accuracy under latency and cost constraints, as is typical in high-stakes IoT and MLaaS scenarios. This logic enables direct evaluation and improvement of algorithmic selection and composition strategies using the MDG-generated datasets.
7. Experimental Results and Practical Impact
- In controlled comparisons, MDG-driven selection and composition approaches yield 12–62% higher selection accuracy and 10% higher composition quality versus traditional QWS-based baselines.
- The benchmark datasets and composition mechanisms support robust research on MLaaS service matching, workflow structuring, and cross-service reliability in realistic IoT settings.
- Empirical results (rule-based: 0.92 vs. 0.82, skyline-based: 0.81 vs. 0.50, composition score: 0.68 vs. 0.58) substantiate the utility of MDG for systematic and reproducible MLaaS research (Kanneganti et al., 18 Jan 2026).
MDG establishes a formal, extensible foundation for data-driven advancements in MLaaS benchmarking, selection, and service workflow composition, especially within heterogeneous and resource-variable environments typified by IoT deployments.