Deformable Transposed Convolution

Updated 1 February 2026

DTC is an upsampling operator that dynamically predicts per-position offsets and modulation, enabling adaptive, detail-preserving feature reconstruction.
It integrates a transposed convolution to predict offsets combined with grid sampling, allowing the network to focus on semantically important regions.
Empirical results show improved segmentation metrics (e.g., DICE, mIoU) with minimal parameter overhead, making it effective in both 2D and 3D applications.

Deformable Transposed Convolution (DTC) is a class of upsampling operators that generalize traditional transposed convolution by introducing dynamic, learnable sampling of feature locations. DTC modules combine a spatially adaptive offset field with optionally learned interpolation (or modulation) kernels to produce high-resolution feature maps that better preserve structural detail, attenuate artifacts, and can be implemented as drop-in replacements for standard transposed convolution in both 2D and 3D settings. Unlike fixed upsampling approaches, DTC explicitly regresses spatial sampling positions conditioned on the local context, allowing the network to attend to semantically salient or structurally challenging regions during upsampling (Sun et al., 25 Jan 2026, Blumberg et al., 2022).

1. Motivation and Limitations of Conventional Upsampling

Conventional upsampling operators such as transposed convolution (deconvolution) and (bi-/tri-)linear interpolation are based on fixed spatial sampling locations. In transposed convolution, zeros are interleaved in the low-resolution map, and a kernel is applied at predetermined spatial locations. Linear interpolation computes output pixels based on fixed weighted averages of input neighbors. These approaches are agnostic to structural cues off the regular grid and are susceptible to blurring, checkerboard artifacts, and detail loss—especially in medical image segmentation and generative imaging contexts (Sun et al., 25 Jan 2026).

The paradigm of deformable convolution (DCN) demonstrated the expressiveness gained by making spatial sampling positions dynamic—adapting offsets per feature location. Deformable Transposed Convolution adopts this principle for upsampling: the network learns per-position offsets and, in some variants, modulation weights or interpolation kernels, thereby enabling the upsampling operator to target informative or structurally critical input regions (Blumberg et al., 2022).

2. Mathematical Formulation and Implementation

Let $X \in \mathbb{R}^{C_{\text{in}} \times H \times W}$ (2D) or $X \in \mathbb{R}^{C_{\text{in}} \times D \times H \times W}$ (3D) denote a low-resolution feature map. The goal is to obtain an upsampled output $Y \in \mathbb{R}^{C_{\text{out}} \times sH \times sW}$ using a scale factor $s$ .

DTC/DSTC Forward Pass Overview

A typical DTC block decomposes the upsampling process into the following steps:

Offset and Modulation Prediction:

A transposed convolution applied to $X$ predicts a dense offset field $\Delta P$ and corresponding modulation weights $M$ ; $Z = \text{Conv}^T(X) \in \mathbb{R}^{(g \cdot \text{dim} + g) \times sH \times sW}$ , where $\text{dim}=2$ or $3$ and $g = \text{dim}$ (number of axes). - $\Delta p=\tanh(\Delta P)$ clamps offsets to $[-1,1]$ . - $m=\text{sigmoid}(M)$ restricts modulation weights to $[0,1]$ .

Receptive Field Control:

Sampling positions are moved from the regular grid $P_{\text{grid}}$ by scaled and modulated offset, $P_{\text{new}} = P_{\text{grid}} + \lambda \cdot (\Delta p \odot m)$ , where $\lambda$ is a scalar (e.g., $\lambda=1/H$ ).

Feature Extraction and Deformable Sampling: A $1 \times 1$ (or $1 \times 1 \times 1$ ) convolution generates $X_{\text{feat}} = \text{Conv}_{1 \times 1}(X)$ . Feature values are interpolated at $P_{\text{new}}$ using grid sampling ( $\text{grid\_sample}$ for 2D, trilinear for 3D).
Residual Fusion: A baseline upsampling result (e.g., transposed convolution or linear interpolation) $Y_{\text{base}}$ is added for stability and global structure preservation:

$Y = Y_{\text{base}} + Y_{\text{def}}$

DSTC (Deformably-Scaled Transposed Convolution) Specifics

DSTC additionally introduces a learnable anti-aliasing interpolation kernel $G$ at each output location, weighted over a Gaussian mixture or similar function, and can use a compact parameterization with global shift and dilation per input location (Blumberg et al., 2022).

3. Integration into Neural Architectures

DTC and DSTC are modular and can be inserted into any upsampling position of an encoder-decoder architecture (e.g., U-Net, UNETR, nnUNet). The input and output channels for DTC are inherited from the layer it replaces, typically adding only the parameters of a $1 \times 1$ convolution and a small offset-prediction transposed convolution. For a 6-stage 2D U-Net, the parameter increase is approximately $+1.3$ million (from $66$M to $67.3$M) and computational overhead is on the order of $+0.4$ GFLOPs—constituting about $+1$ --$2$\% parameters and $+1$ --$5$\% FLOPs overall (Sun et al., 25 Jan 2026).

In DSTC, non-parametrized versions learn separate interpolation kernels and offsets for each location and kernel index; parametrized variants share kernels and use a global shift and spatial dilation per site, greatly reducing parameter count but achieving near-identical empirical performance (Blumberg et al., 2022).

4. Empirical Results and Comparative Analysis

2D Medical Segmentation

On ISIC18 and BUSI, DTC consistently improved segmentation performance across multiple architectures. For U-Net + bilinear upsampling, DICE increased from $78.23\%$ to $79.58\%$ ; adding DTC to Conv $^T$ upsampling gave a DICE increase from $78.57\%$ to $79.41\%$ . The method also improved SegMamba and SwinUNETR V2 decoders, producing sharper boundaries and reducing hair artifacts and noise (Sun et al., 25 Jan 2026).

3D Medical Segmentation

On BTCV-15:

nnUNet: DICE from $81.52\% \rightarrow 81.66\%$
UNETR: DICE from $69.65\% \rightarrow 71.98\%$
nnMamba: DICE from $75.47\% \rightarrow 76.77\%$

Notably, performance on small organs and thin structures improved, as shown by visualizations and metric gains (Sun et al., 25 Jan 2026).

General Vision Tasks (DSTC)

DSTC improves instance and semantic segmentation and generative modeling:

Mask R-CNN Box AP: $38.3$ (TC) $\rightarrow 39.2$ (DSTC)
Mask AP: $34.8 \rightarrow 35.8$
HRNet-W48 VOC-12 mIoU: $76.17$ (TC) $\rightarrow 76.99$ (DSTC)
DCGAN FID (CelebAScaled): $29.6 \rightarrow 26.3$

DSTC outperforms standard TC in 2D/3D segmentation and MR image enhancement, with competitive performance achieved by the parametrized version at much lower parameter cost (Blumberg et al., 2022).

5. Implementation, Training, and Ablation Considerations

Key implementation practices include:

Use of AdamW with learning rate $1 \times 10^{-4}$ , weight decay $1 \times 10^{-5}$ for DTC segmentation.
Offsets constrained via $\tanh$ and modulation/weights via $\text{sigmoid}$ to ensure stable gradients through $grid\_sample$ (Sun et al., 25 Jan 2026).
The receptive field scaling parameter $\lambda$ is typically set to $1/\text{feature-map-size}$ , with optimal values depending on task and architecture.
For DSTC, the number of Gaussian mixture components for anti-aliasing, kernel size, and offset parameterization are hyperparameters, with most ablations indicating improved accuracy and minimal computational penalty at moderate values (Blumberg et al., 2022).

Table: Hyperparameter Ablations (DSTC)

Parameter	Best Setting	Empirical Impact
# Gaussians $s$	$s=4$	Best box/mask AP; $s=1$ gives no significant gain
Kernel size $K_{\Sigma}$	$K_\Sigma=5$	Maximum performance, diminishing returns at $>5$
Offset param. ( $D+1$ ch.)	Yes	Same AP as full, with $1/10$ parameters

Unbounded or unconstrained offsets/weights lead to divergence or poor segmentation; both branches (offsets and modulation) are necessary for robust performance. Tuning $\lambda$ is critical: excessive receptive field can reduce edge precision, while small $\lambda$ limits adaptability.

6. Advantages, Limitations, and Applications

Advantages of DTC/DSTC include:

Dynamic, data-driven localization for upsampling, enhancing boundary fidelity and reducing artifacts compared to fixed-grid methods.
Modularity: Single-line integration into a wide range of decoders and architectures in both 2D and 3D contexts.
Minimal computational and parameter overhead compared to fixed transposed convolution.

Limitations:

Learned offsets may be unstable in homogeneous regions, potentially introducing noise.
Requires careful tuning of the receptive field scaling parameter $\lambda$ for optimal delineation of structure.
Single deformable head per upsample (no multi-head extension as in some later DCN variants).

Potential applications beyond segmentation include super-resolution, detection and localization heads, and generative decoders such as VAEs and GANs (Sun et al., 25 Jan 2026).

DTC and DSTC are extensions of the deformable convolutional paradigm but apply adaptivity specifically to the upsampling step. The approach contrasts with fixed upsampling and other adaptive upsampling strategies, such as Dysample and FADE, outperforming these baselines in standard benchmarks. DSTC introduces additional flexibility with learned anti-aliasing kernels and compact parameterization, providing a broader framework for deformable upsampling (Blumberg et al., 2022).

Potential future directions include multi-head offset prediction, improved regularization for stability in homogeneous regions, and broader adoption in non-segmentation generative and regression architectures. The modularity and minimal overhead of DTC-type operators suggest further application in any setting requiring learnable, detail-preserving upsampling.

Markdown Upgrade to Chat

References (2)

DTC: A Deformable Transposed Convolution Module for Medical Image Segmentation (2026)

Deformably-Scaled Transposed Convolution (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Deformable Transposed Convolution (DTC).

Deformable Transposed Convolution

1. Motivation and Limitations of Conventional Upsampling

2. Mathematical Formulation and Implementation

DTC/DSTC Forward Pass Overview

DSTC (Deformably-Scaled Transposed Convolution) Specifics

3. Integration into Neural Architectures

4. Empirical Results and Comparative Analysis

2D Medical Segmentation

3D Medical Segmentation

General Vision Tasks (DSTC)

5. Implementation, Training, and Ablation Considerations

6. Advantages, Limitations, and Applications

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Deformable Transposed Convolution

1. Motivation and Limitations of Conventional Upsampling

2. Mathematical Formulation and Implementation

DTC/DSTC Forward Pass Overview

DSTC (Deformably-Scaled Transposed Convolution) Specifics

3. Integration into Neural Architectures

4. Empirical Results and Comparative Analysis

2D Medical Segmentation

3D Medical Segmentation

General Vision Tasks (DSTC)

5. Implementation, Training, and Ablation Considerations

6. Advantages, Limitations, and Applications

7. Related Methods and Future Directions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research