Papers
Topics
Authors
Recent
Search
2000 character limit reached

Reconfigurable Width Multiplier

Updated 14 February 2026
  • Width multiplier is a reconfigurable digital architecture that adjusts operand bit-width in multiplication operations for optimized power, area, and delay.
  • It integrates non-volatile memristors with CMOS ripple-carry arrays, enabling selective activation of arithmetic cells to lower switching power and latency.
  • The design delivers substantial improvements in energy efficiency and performance, making it ideal for applications in DSP, machine learning, and video processing.

A width multiplier in the context of digital systems refers to hardware mechanisms and circuit architectures that enable dynamic adjustment of the effective operand bit-width for multiplication operations. The proposed memristor–CMOS reconfigurable multiplier achieves flexible bit-width multiplication by integrating nanoscale, non-volatile memristors for bit-mask gating with a conventional CMOS ripple-carry full-adder array. This approach allows real-time tailoring of the active subarray width—hence area, power, and delay—by writing horizontal and vertical control vectors to the memristor elements, providing significant benefits for applications requiring varying computational precision (Baek, 2019).

1. Reconfigurable Array Architecture and Bit-Width Selection

The core of the width-adjustable multiplier is an N×N ripple-carry array of 1-bit reconfigurable full-adder primitives. Each cell contains inputs A, B, carry-in (Ci), and sum-in (Si), and is controlled by two digital enable signals: CTRLH (horizontal) and CTRLV (vertical). The logic gates (XOR and 3-input AND) in conjunction with the enable signals gate the partial product ABA \cdot B into the adder only when CTRLH\oplusCTRLV=1=1, otherwise the cell propagates only carry and sum values. Disabled cells are effectively non-functional, drawing negligible switching power and only relaying the ripple-carry chain.

Global bit vectors CTRLH[N1:0][N-1:0] and CTRLV[N1:0][N-1:0] are broadcast across rows and columns, respectively. To select an active M×MM\times M sub-multiplier at the top left, CTRLH and CTRLV are each set to MM ones, padded by zeros; any pattern of subarrays or even disjoint sub-multiplier blocks can be synthesized by appropriate setting of these bit vectors. Non-participating cells remain inactive, ensuring area and power efficiency.

2. Mathematical Modeling of Gating and Width Control

Memristor-based logic is utilized for the programmable gating. Each 1-bit cell implements threshold logic with memristor-resistive elements, exploiting their programmable resistance states (Ron, Roff). The threshold gates (NAND, NOR) are realized as series or parallel combinations feeding CMOS inverters. The fundamental memristor equations are:

  • v(t)=R(w)i(t)v(t) = R(w) \cdot i(t)
  • w˙=μvRon/Di(t)\dot{w} = \mu_v R_{on}/D \cdot i(t)
  • R(w)=Ronw/D+Roff(1w/D)R(w) = R_{on} \cdot w/D + R_{off} \cdot (1-w/D)

For active partial product computation, let A=[AM1,...,A0]A=[A_{M-1},...,A_0], B=[BN1,...,B0]B=[B_{N-1},...,B_0]. The gating vectors hj=CTRLH[j]h_j = \text{CTRLH}[j] and vi=CTRLV[i]v_i = \text{CTRLV}[i] produce the element-wise activation,

  • Pij=AiBj(hjvi)P_{ij} = A_i \cdot B_j \cdot (h_j \wedge v_i)

In matrix form: P=ATB(vhT)P = A^T B \odot (v h^T), with \odot denoting element-wise AND. This formulation allows explicit control over which partial products contribute to the sum, directly enabling variable precision and parallelism.

3. Device-Level Implementation and Memristor Model Parameters

The deployed SPICE model uses an HP-type ionic drift memristor (Strukov et al.), embedded in a $180$ nm CMOS process. Key parameters include Ron100ΩR_{on} \approx 100\,\Omega, Roff16kΩR_{off} \approx 16\,\text{k}\Omega, film thickness D10nmD \approx 10\,\text{nm}, and μv1014m2/Vs\mu_v \approx 10^{-14}\,\text{m}^2/\text{V\,s}. The ratio Roff/RonR_{off}/R_{on} is critical for noise margin: higher ratios yield sharper distinguishing between enabled and disabled states. Reconfiguration speed is bounded by ionic drift (tprogD2/(μvVprog)t_{prog} \sim D^2/(\mu_v V_{prog})). The compact size (sub-20 nm lateral dimensions, 0.01μm2\sim0.01\,\mu \text{m}^2 per device) facilitates area reductions compared to pure CMOS gates.

4. Quantitative Performance and System Benchmarks

Quantitative assessment in a $180$ nm process demonstrates the following metrics for an 8×88\times8 multiplier (values normalized to baseline CMOS ripple-carry array):

Design Delay (ns) Power (mW) Area (kμm²)
CMOS RCA (baseline) 11.8 (1.00) 10.9 (1.00) 61.7 (1.00)
Mem-CMOS RCA 11.9 (1.01) 11.2 (1.03) 65.1 (1.06)
Twin-precision 17.3 (1.47) 28.8 (2.64) 51.2 (0.83)
Scalable 12.5 (1.06) 20.3 (1.86) 61.2 (0.99)
Proposed reconfigurable 10.9 (0.92) 11.2 (1.03) 24.5 (0.40)

For specific DSP kernels at $100$ MHz, the reconfigurable multiplier yields up to $35$–$50$ % power savings and $30$–$40$ % latency reduction in narrower-width modes. Even at full $8$-bit width, the design is $8$ % faster and over $50$ % smaller in silicon area than standard CMOS multiplication arrays.

5. Reconfiguration Control and Programming Mechanisms

Width selection is realized by two γ\gamma-bit registers (where γ\gamma is the maximal supported width, e.g., $8$), storing CTRLH[γ1:0][\gamma-1:0] and CTRLV[γ1:0][\gamma-1:0]. These are updated, typically by an on-chip microcontroller or FSM, on a mode-change event. Programming pulses (typically Vprog±1V_{prog} \approx \pm 1 V, tprog10t_{prog} \sim 10 ns) set each memristor in the threshold logic to either the Ron (“enabled”) or Roff (“disabled”) state. As the cell resistance is non-volatile, reconfiguration persists without additional static power or area from latches/muxes until reprogrammed.

6. Application Scenarios and System-Level Implications

  • Four-tap FIR filter: Utilizing four 8×88\times8 multipliers and three $16$-bit adders, reducing the multiplier width to 4×44\times4 in low-precision phases cuts power by $35$ % and latency by $30$ %.
  • Four-point FFT: Employing six complex multiplications (split over real multipliers), exploiting sub-8-bit parallelism reduces FFT latency $34$ % and power $49$ %.

Dynamic, width-adaptive multipliers match datapath precision to algorithmic requirements, minimizing switching capacitance and power. The architecture supports mixed-precision pipelines for applications in machine learning, video, and radar, facilitating reduced bit-depth for intermediate computations without architectural overhead. The non-volatile nature of the memristors obviates the need for area-intensive latches or multiplexing logic, marking a structural advantage over conventional width-multipliable designs (Baek, 2019).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Width Multiplier.