DAGE in ML: Domain Adaptation, 3D Geometry, & Query

Updated 4 July 2026

DAGE is a polysemous acronym in machine learning, representing distinct frameworks in domain adaptation, 3D geometry estimation, and DAG query answering.
Its implementations range from benchmark competitions using synthetic data for character recognition to dual-stream transformer architectures balancing global consistency and fine detail in 3D reconstruction.
In knowledge-graph reasoning, DAGE extends query embeddings with relational combinators to faithfully represent complex DAG queries, highlighting a trend towards structurally factorized models.

DAGE is a polysemous acronym in recent machine learning literature rather than the name of a single unified framework. In domain adaptation literature, DAGE denotes Domain Adaptation and GEneralization, most concretely in the ICPR 2024 DAGECC competition on character classification under domain shift (Marino et al., 2024). In 3D vision, DAGE denotes Dual-stream Architecture for Efficient and Fine-grained Geometry Estimation, a feed-forward dual-stream transformer for recovering view-consistent geometry and camera poses from uncalibrated multi-view or video RGB inputs (Ngo et al., 4 Mar 2026). In knowledge-graph reasoning, DAGE denotes DAG Query Answering via Relational Combinator with Logical Constraints, a plug-and-play extension that enables query embedding methods to answer DAG queries formulated in $\mathcal{ALCOIR}$ (He et al., 2024). Because these usages are technically unrelated, precise interpretation depends entirely on subfield context.

1. Acronym scope and disambiguation

The acronym appears in several distinct research settings.

Usage	Expansion	Research setting
DAGE	Domain Adaptation and GEneralization	Character classification under domain shift
DAGE	Dual-stream Architecture for Efficient and Fine-grained Geometry Estimation	Multi-view/video 3D geometry and pose estimation
DAGE	DAG Query Answering via Relational Combinator with Logical Constraints	Knowledge-graph complex query answering

In the transfer-learning setting, DAGE is tied to a benchmark and competition infrastructure centered on industrial OCR-like recognition under domain shift. In the visual-geometry setting, it is an architectural proposal that separates global coherence from high-resolution detail. In the knowledge-graph setting, it is a reasoning extension that adds a relation-level conjunction operator so that DAG queries can be embedded recursively.

Two additional arXiv works in the supplied literature are adjacent but not identical usages. SeaDAG and LayerDAG both concern directed acyclic graph generation rather than any acronym expanded as DAGE. Their inclusion is nevertheless relevant because they occupy the same DAG-centered methodological neighborhood as the knowledge-graph DAGE, and therefore contribute to terminological ambiguity in citation and search workflows.

2. DAGE as Domain Adaptation and GEneralization

Within the ICPR 2024 competition paper, DAGE refers to Domain Adaptation and GEneralization, and DAGECC expands to Domain Adaptation and GEneralization for Character Classification. The competition targets character recognition under domain shift, with the core problem defined as building classifiers that remain accurate when the test distribution differs from the training distribution. The industrial motivation comes from serial numbers and engraved characters on manufactured avionic parts, where robust automation is valuable for traceability, maintenance, and inspection (Marino et al., 2024).

The benchmark suite is Safran-MNIST, described as a real-world, lightweight, high-quality dataset suitable for fast prototyping and validation. Characters were extracted from images of Safran avionic parts acquired from aircraft engines returned from flights, with engraved serial/reference numbers made using laser engraving, pencil marking, and micro-percussion engraving. The resulting variability includes illumination changes, different orientations, varying writing styles, surface textures, image noise, and wear. The lightweight character of the benchmark is attributed to relatively low-resolution images, compact models, and lower computational cost than large-scale vision benchmarks.

The competition contains two datasets and two transfer settings. Safran-MNIST-D, used for Task 1 on domain generalization, consists of RGB images of size $128 \times 192$ , 10 digit classes $0$–$9$, no training set, 421 validation images, and 1684 test images. Safran-MNIST-DLS, used for Task 2 on unsupervised domain adaptation, consists of gray-scale images with variable size from $18 \times 30$ to $86 \times 79$ , 32 classes, 9314 unlabeled training images, 862 validation images, and 3448 test images. The 32 classes comprise digits $0$–$9$, letters $A, B, C, D, E, F, G, H, J, K, L, M, N, P, R, S, T, U, W, Y$ , and symbols / and .

Task 1 evaluates Domain Generalization (DG): participants had to train a model for an unseen target domain without access to target-domain data during training. Public datasets and synthetic data derived from public sources were allowed, including MNIST, MNIST-M, SVHN, HASYv2, DIGITS, and EMNIST, while proprietary data were not allowed. Task 2 evaluates Unsupervised Domain Adaptation (UDA): participants were given unlabeled target-domain data from Safran-MNIST-DLS and needed to adapt from a source dataset of their choice; public datasets and synthetic data generated using image processing or generative AI were permitted, provided the generative model was trained on public data. Data from one task could not be used in the other, and publicly available data could be used for pretraining.

The evaluation metric is the macro-averaged F1-score, selected because the data are class-imbalanced: $F_1^{Macro}=\frac{\sum_{k=1}^K F_1^k}{K}$ with class-wise F1 defined as

$128 \times 192$ 0

The competition used Codabench with a development phase evaluated on unseen validation data and up to 6 submissions per day, followed by a final phase in which each team submitted its best model once for evaluation on unseen test data.

Participation figures were modest but nontrivial: 28 teams registered, 6 teams submitted results, and recorded submissions totaled 125 for Task 1 and 181 for Task 2. In Task 1, the top three were Team Deng with $128 \times 192$ 1 macro F1, Fraunhofer IIS DEAL with $128 \times 192$ 2, and JasonMendoza2008 with $128 \times 192$ 3. In Task 2, the top three were Team Deng with $128 \times 192$ 4 macro F1, Deep Unsupervised Trouble with $128 \times 192$ 5, and Raul with $128 \times 192$ 6.

The winning and runner-up entries establish the dominant empirical pattern of the benchmark. Team Deng used a ResNet50 pretrained on ImageNet and fine-tuned it on custom synthetic data plus MNIST, SVHN, and MNIST-M; class imbalance was handled with WeightedRandomSampler. Fraunhofer IIS DEAL used GoogLeNet pretrained on ImageNet, fine-tuned on MNIST, Chars74K, HASYv2, SVHN, USPS, SYN NUMBERS, and MNIST-M, and generated additional synthetic digits with a pretrained stable diffusion model. In Task 2, Team Deng again used ResNet50 over 32 classes and relied on EMNIST plus custom-generated synthetic data; notably, the team did not use the unlabeled Safran-MNIST-DLS target data even though it was available for UDA. The paper’s overall takeaways are that pretrained CNNs were very effective, synthetic data generation was a dominant strategy, domain realism in generated data mattered substantially, class imbalance handling helped, and direct use of unlabeled target data was surprisingly underexploited.

3. DAGE as Dual-stream Architecture for Efficient and Fine-grained Geometry Estimation

In visual geometry, DAGE denotes Dual-stream Architecture for Efficient and Fine-grained Geometry Estimation, a feed-forward model for recovering view-consistent 3D geometry and camera poses from uncalibrated multi-view or video RGB inputs. The input is an uncalibrated set of $128 \times 192$ 7 RGB images $128 \times 192$ 8, and the model predicts per-frame pointmaps $128 \times 192$ 9, camera poses $0$0, and a single scene-wise metric scale $0$1. Each pointmap $0$2 stores 3D coordinates for each pixel in the local camera frame (Ngo et al., 4 Mar 2026).

The central architectural idea is to disentangle global coherence from fine detail. A low-resolution stream processes aggressively downsampled frames, with the long side capped at about 252 px, and uses a global transformer with alternating frame attention and global attention in a repeated $0$3 pattern. Its role is to build a permutation-equivariant, scene-level representation and regress camera poses and scene scale efficiently. A high-resolution stream processes each frame independently at native resolution using a pretrained MoGe2-style ViT backbone, preserving thin structures, sharp occlusion boundaries, small distant objects, and fine surface discontinuities. This division addresses the tension between global multi-view consistency and high-resolution sharpness.

Fusion is handled by a lightweight cross-attention adapter rather than direct concatenation or interpolation. For each frame, HR tokens query LR tokens through cross-attention, followed by HR self-attention refinement and an MLP residual update. The adapter stacks five $0$4 blocks. The adapter is inserted after the HR ViT encoder so that the pretrained single-image representation remains intact initially, and its final projection is zero-initialized to avoid destabilizing the frozen HR backbone. Positional encoding is modified correspondingly: interpolated RoPE is used for HR self-attention to stabilize very large resolutions, while cross-attention “snaps” each HR token to the nearest LR grid cell and reuses the LR positional encoding for that cell.

The LR stream is initialized from a Pi3 checkpoint and supervised by feature distillation from a higher-resolution Pi3 teacher, because training the LR stream from scratch harms pose quality. The distillation loss is written as

$0$5

where $0$6 is cosine similarity. The overall training objective combines pointmap, camera, scale, normal, gradient, and distillation losses. A notable design choice is that uncertainty weighting is not used on pointmap errors because it suppresses hard structures and reduces sharpness. Camera supervision is imposed on pairwise relative poses rather than absolute poses, and the model is trained in two stages: first on longer, lower-resolution clips and then on shorter, higher-resolution clips.

A defining systems property of the architecture is that it decouples resolution from clip length. Since global multi-view reasoning is confined to the LR stream, whose token count stays fixed, the HR stream can scale to high resolution independently. The paper reports that DAGE is about 2× faster at 540p and about 28× faster at 2K than prior global-attention baselines, supports inputs up to 2K, and remains practical on sequences of up to 1000 frames. Experimentally, it is evaluated on GMU Kitchens, Monkaa, Sintel, ScanNet, KITTI, UrbanSyn, Unreal4K, and Diode for video pointmap estimation; on Monkaa, Sintel, UrbanSyn, and Unreal4K for boundary sharpness; on 7-Scenes and NRGBD for multi-view reconstruction; and on Sintel, TUM-Dynamics, and ScanNet for pose estimation. The reported findings are that DAGE achieves the best average rank on relative point error and inlier ratio, is best or near-best on boundary F1 and pseudo depth boundary error among temporally consistent video methods, matches or closely approaches the best methods in multi-view reconstruction, and is competitive in pose estimation even though the LR stream runs at only 252 px long side. The stated limitations are degraded performance under very low overlap or rapid non-rigid motion, HR memory intensity at extreme resolutions, and the absence of explicit dynamic motion modeling.

4. DAGE as DAG query answering in knowledge graphs

In knowledge-graph reasoning, DAGE denotes DAG Query Answering via Relational Combinator with Logical Constraints. The method is introduced as a plug-and-play extension for complex query embedding models such as Query2Box, BetaE, and ConE, whose standard formulations can answer tree-form queries but fail on queries whose computation graph contains a node with multiple paths leading to the same target. DAGE addresses this by adding a trainable relational combinator that merges multiple relations between the same pair of nodes into a single relation embedding, so that the overall DAG query can again be embedded recursively (He et al., 2024).

The paper formalizes DAG queries as a subset of $0$7 concept descriptions: $0$8 A query is tree-form if it does not include $0$9 in role descriptions. The distinction is important because tree-form relaxations lose equality constraints on shared intermediate witnesses. The paper’s example is

$9$0

where the same variable $9$1 must satisfy both $9$2 and $9$3. In $9$4, this can be expressed as

$9$5

This formulation exposes the role conjunction explicitly.

The core operator is

$9$6

with the embedding of a conjunction $9$7 defined by

$9$8

The instantiation is a DeepSets-style permutation-invariant operator: $9$9 where

$18 \times 30$ 0

Because the operator is commutative, it is suited to representing relation intersection. Operationally, DAGE replaces duplicated branches with a single path labeled by a conjunction such as $18 \times 30$ 1.

The method is also regularized with logical tautologies derived from $18 \times 30$ 2: commutativity $18 \times 30$ 3, monotonicity $18 \times 30$ 4, and restricted conjunction preserving $18 \times 30$ 5. These induce soft regularizers, including a monotonicity loss and a restricted-conjunction-preserving loss, yielding a final objective

$18 \times 30$ 6

The result is not a new end-to-end architecture from scratch, but a model-agnostic extension layered on top of established query embedding systems.

Because standard QA benchmarks primarily contain tree-form queries, the paper introduces new DAG-query benchmarks: NELL-DAG, FB15k-237-DAG, and FB15k-DAG. It defines six DAG query structures $18 \times 30$ 7, $18 \times 30$ 8, $18 \times 30$ 9, $86 \times 79$ 0, $86 \times 79$ 1, and $86 \times 79$ 2, and additionally uses $86 \times 79$ 3 and $86 \times 79$ 4 for enforcing role-conjunction tautologies. Each benchmark has easy and hard test splits, where hard queries are chosen so that overlap with the relaxed tree-form answer set is below $86 \times 79$ 5. Reported benchmark sizes are 10,000 train / 1,000 valid / 1,000 easy test / 1,500 hard test for NELL-DAG, 50,000 / 1,000 / 5,000 / 4,700 for FB15k-237-DAG, and 80,000 / 8,000 / 8,000 / 7,500 for FB15k-DAG.

Empirically, DAGE substantially improves MRR on DAG queries across Query2Box, BetaE, and ConE, with especially strong gains on hard splits. On NELL-DAG easy, average MRR rises from 21.23 to 42.41 for Query2Box, from 18.32 to 41.33 for BetaE, and from 26.54 to 40.19 for ConE. The paper further reports that DAGE mostly preserves tree-form performance on NELL-QA, FB15k-237-QA, and FB15k-QA; that logical constraints help, with monotonicity usually helping more than restricted conjunction preserving and both together often best; and that DAGE-enhanced methods generally outperform CQD and BiQE on the proposed DAG benchmarks.

5. Adjacent DAG-centered models often confusable with DAGE

Two contemporaneous DAG-generation papers are methodologically relevant when DAGE is interpreted in a broader DAG-centric context, although neither uses DAGE as its own expansion.

SeaDAG is a semi-autoregressive diffusion model for conditional DAG generation. Its central mechanism is to simulate layer-wise autoregressive generation by assigning different denoising speeds to different DAG layers while retaining a complete graph structure at every diffusion step. The local timestep schedule is defined by

$86 \times 79$ 6

with separate bottom-up and top-down schedules, and the training objective augments the standard graph denoising loss with an explicit condition loss

$86 \times 79$ 7

The method is evaluated on AIG generation from truth tables and molecule generation from quantum properties. Reported results include Validity 92.38 and Accuracy 89.25 for AIGs, and for molecules the best or near-best MAE on most properties, including $86 \times 79$ 8, HOMO: 30.91, gap: 89.70, and $86 \times 79$ 9, with Validity 100.0 (Zhou et al., 2024).

LayerDAG is a layerwise autoregressive diffusion model that decomposes a DAG into a unique sequence of bipartite layers. It models the graph distribution as

$0$0

and further decomposes each step into next-layer size prediction, node-attribute generation, and bipartite-edge generation. Autoregression handles inter-layer directional dependencies, while discrete diffusion handles intra-layer logical dependencies. The model uses BiMPNN encoders, a transformer denoiser for node attributes, and an MLP-based edge predictor, together with sinusoidal positional encodings and a layer-dependent denoising budget

$0$1

It is evaluated on synthetic latent preferential DAGs and on real-world computation flow graphs from TPU Tile, HLS, and NA-Edge, with graphs up to about 400 nodes. In the strictest synthetic setting $0$2, LayerDAG achieves Validity $0$3 versus $0$4 for D-VAE, $0$5 for GraphRNN, $0$6 for GraphPNAS, and $0$7 for OneShotDAG. On TPU Tile, HLS, and NA-Edge, the paper reports Pearson / MAE / graph-statistic results including 0.65 / 1.2 / 1.3 / 0.1, 0.85 / 1.1 / 1.4, and 0.990 / 0.9 / 1.3 / 0.4, respectively (Li et al., 2024).

These works clarify an important boundary. The knowledge-graph DAGE operates on DAG-structured queries over symbolic relations, while SeaDAG and LayerDAG generate DAG-structured objects. The overlap is therefore structural rather than task-level.

6. Conceptual relations, boundaries, and historical context

Although the three principal DAGE usages are technically disjoint, they exhibit a common architectural pattern: each paper introduces an explicit mechanism for handling structure that simpler baselines neglect. In DAGECC, the salient structure is domain shift in industrial character recognition, addressed empirically through source diversity, synthetic data generation, and imbalance handling. In the geometry-estimation DAGE, the salient structure is the separation between global cross-view consistency and high-frequency per-frame detail. In the knowledge-graph DAGE, the salient structure is role conjunction in DAG queries, which tree-form embeddings cannot represent faithfully.

The same contrast also appears in the related DAG-generation works. SeaDAG modifies diffusion so that denoising speed depends on graph layer while retaining global graph visibility. LayerDAG separates directional dependencies across layers from logical dependencies within a layer. This suggests a broader methodological tendency: recent graph- and geometry-centric models increasingly avoid monolithic processing in favor of structurally factorized designs.

Within domain adaptation specifically, an earlier but differently named framework is DGA-DA: Discriminative and Geometry Aware Unsupervised Domain Adaptation. It is not itself called DAGE, but it is relevant to the transfer-learning sense of the acronym because it combines JDA-style alignment, a repulsive force for class discrimination, and geometry-aware pseudo-label inference through label smoothness consistency and geometric structure consistency. The method is motivated by a Ben-David-type target-error bound and is reported to outperform 22 prior methods over 36 image classification domain-adaptation tasks through 6 benchmarks (Luo et al., 2017). A plausible implication is that the DAGECC use of “Domain Adaptation and GEneralization” sits within a longer line of work that treats transfer not merely as marginal alignment, but as a problem involving class separation, manifold structure, and robustness to target-domain uncertainty.

For researchers, the main practical consequence of the acronym collision is bibliographic rather than conceptual. “DAGE” may denote a benchmark and competition in industrial domain adaptation, a transformer architecture for high-resolution multi-view geometry, or a logical extension for DAG query answering in knowledge graphs. Accurate interpretation therefore requires immediate inspection of expansion, venue context, and surrounding technical vocabulary such as Safran-MNIST, pointmaps, or $0$8.