Reconfigurable Width Multiplier

Updated 14 February 2026

Width multiplier is a reconfigurable digital architecture that adjusts operand bit-width in multiplication operations for optimized power, area, and delay.
It integrates non-volatile memristors with CMOS ripple-carry arrays, enabling selective activation of arithmetic cells to lower switching power and latency.
The design delivers substantial improvements in energy efficiency and performance, making it ideal for applications in DSP, machine learning, and video processing.

A width multiplier in the context of digital systems refers to hardware mechanisms and circuit architectures that enable dynamic adjustment of the effective operand bit-width for multiplication operations. The proposed memristor–CMOS reconfigurable multiplier achieves flexible bit-width multiplication by integrating nanoscale, non-volatile memristors for bit-mask gating with a conventional CMOS ripple-carry full-adder array. This approach allows real-time tailoring of the active subarray width—hence area, power, and delay—by writing horizontal and vertical control vectors to the memristor elements, providing significant benefits for applications requiring varying computational precision (Baek, 2019).

1. Reconfigurable Array Architecture and Bit-Width Selection

The core of the width-adjustable multiplier is an N×N ripple-carry array of 1-bit reconfigurable full-adder primitives. Each cell contains inputs A, B, carry-in (Ci), and sum-in (Si), and is controlled by two digital enable signals: CTRLH (horizontal) and CTRLV (vertical). The logic gates (XOR and 3-input AND) in conjunction with the enable signals gate the partial product $A \cdot B$ into the adder only when CTRLH $\oplus$ CTRLV $=1$ , otherwise the cell propagates only carry and sum values. Disabled cells are effectively non-functional, drawing negligible switching power and only relaying the ripple-carry chain.

Global bit vectors CTRLH $[N-1:0]$ and CTRLV $[N-1:0]$ are broadcast across rows and columns, respectively. To select an active $M\times M$ sub-multiplier at the top left, CTRLH and CTRLV are each set to $M$ ones, padded by zeros; any pattern of subarrays or even disjoint sub-multiplier blocks can be synthesized by appropriate setting of these bit vectors. Non-participating cells remain inactive, ensuring area and power efficiency.

2. Mathematical Modeling of Gating and Width Control

Memristor-based logic is utilized for the programmable gating. Each 1-bit cell implements threshold logic with memristor-resistive elements, exploiting their programmable resistance states (Ron, Roff). The threshold gates (NAND, NOR) are realized as series or parallel combinations feeding CMOS inverters. The fundamental memristor equations are:

$v(t) = R(w) \cdot i(t)$
$\dot{w} = \mu_v R_{on}/D \cdot i(t)$
$R(w) = R_{on} \cdot w/D + R_{off} \cdot (1-w/D)$

For active partial product computation, let $A=[A_{M-1},...,A_0]$ , $B=[B_{N-1},...,B_0]$ . The gating vectors $h_j = \text{CTRLH}[j]$ and $v_i = \text{CTRLV}[i]$ produce the element-wise activation,

$P_{ij} = A_i \cdot B_j \cdot (h_j \wedge v_i)$

In matrix form: $P = A^T B \odot (v h^T)$ , with $\odot$ denoting element-wise AND. This formulation allows explicit control over which partial products contribute to the sum, directly enabling variable precision and parallelism.

3. Device-Level Implementation and Memristor Model Parameters

The deployed SPICE model uses an HP-type ionic drift memristor (Strukov et al.), embedded in a $180$ nm CMOS process. Key parameters include $R_{on} \approx 100\,\Omega$ , $R_{off} \approx 16\,\text{k}\Omega$ , film thickness $D \approx 10\,\text{nm}$ , and $\mu_v \approx 10^{-14}\,\text{m}^2/\text{V\,s}$ . The ratio $R_{off}/R_{on}$ is critical for noise margin: higher ratios yield sharper distinguishing between enabled and disabled states. Reconfiguration speed is bounded by ionic drift ( $t_{prog} \sim D^2/(\mu_v V_{prog})$ ). The compact size (sub-20 nm lateral dimensions, $\sim0.01\,\mu \text{m}^2$ per device) facilitates area reductions compared to pure CMOS gates.

4. Quantitative Performance and System Benchmarks

Quantitative assessment in a $180$ nm process demonstrates the following metrics for an $8\times8$ multiplier (values normalized to baseline CMOS ripple-carry array):

Design	Delay (ns)	Power (mW)	Area (kμm²)
CMOS RCA (baseline)	11.8 (1.00)	10.9 (1.00)	61.7 (1.00)
Mem-CMOS RCA	11.9 (1.01)	11.2 (1.03)	65.1 (1.06)
Twin-precision	17.3 (1.47)	28.8 (2.64)	51.2 (0.83)
Scalable	12.5 (1.06)	20.3 (1.86)	61.2 (0.99)
Proposed reconfigurable	10.9 (0.92)	11.2 (1.03)	24.5 (0.40)

For specific DSP kernels at $100$ MHz, the reconfigurable multiplier yields up to $35$–$50$ % power savings and $30$–$40$ % latency reduction in narrower-width modes. Even at full $8$-bit width, the design is $8$ % faster and over $50$ % smaller in silicon area than standard CMOS multiplication arrays.

5. Reconfiguration Control and Programming Mechanisms

Width selection is realized by two $\gamma$ -bit registers (where $\gamma$ is the maximal supported width, e.g., $8$), storing CTRLH $[\gamma-1:0]$ and CTRLV $[\gamma-1:0]$ . These are updated, typically by an on-chip microcontroller or FSM, on a mode-change event. Programming pulses (typically $V_{prog} \approx \pm 1$ V, $t_{prog} \sim 10$ ns) set each memristor in the threshold logic to either the Ron (“enabled”) or Roff (“disabled”) state. As the cell resistance is non-volatile, reconfiguration persists without additional static power or area from latches/muxes until reprogrammed.

6. Application Scenarios and System-Level Implications

Four-tap FIR filter: Utilizing four $8\times8$ multipliers and three $16$-bit adders, reducing the multiplier width to $4\times4$ in low-precision phases cuts power by $35$ % and latency by $30$ %.
Four-point FFT: Employing six complex multiplications (split over real multipliers), exploiting sub-8-bit parallelism reduces FFT latency $34$ % and power $49$ %.

Dynamic, width-adaptive multipliers match datapath precision to algorithmic requirements, minimizing switching capacitance and power. The architecture supports mixed-precision pipelines for applications in machine learning, video, and radar, facilitating reduced bit-depth for intermediate computations without architectural overhead. The non-volatile nature of the memristors obviates the need for area-intensive latches or multiplexing logic, marking a structural advantage over conventional width-multipliable designs (Baek, 2019).

Markdown Upgrade to Chat

References (1)

Reconfigurable multiplier architecture based on memristor-cmos with higher flexibility (2019)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Width Multiplier.