Reconfigurable Width Multiplier
- Width multiplier is a reconfigurable digital architecture that adjusts operand bit-width in multiplication operations for optimized power, area, and delay.
- It integrates non-volatile memristors with CMOS ripple-carry arrays, enabling selective activation of arithmetic cells to lower switching power and latency.
- The design delivers substantial improvements in energy efficiency and performance, making it ideal for applications in DSP, machine learning, and video processing.
A width multiplier in the context of digital systems refers to hardware mechanisms and circuit architectures that enable dynamic adjustment of the effective operand bit-width for multiplication operations. The proposed memristor–CMOS reconfigurable multiplier achieves flexible bit-width multiplication by integrating nanoscale, non-volatile memristors for bit-mask gating with a conventional CMOS ripple-carry full-adder array. This approach allows real-time tailoring of the active subarray width—hence area, power, and delay—by writing horizontal and vertical control vectors to the memristor elements, providing significant benefits for applications requiring varying computational precision (Baek, 2019).
1. Reconfigurable Array Architecture and Bit-Width Selection
The core of the width-adjustable multiplier is an N×N ripple-carry array of 1-bit reconfigurable full-adder primitives. Each cell contains inputs A, B, carry-in (Ci), and sum-in (Si), and is controlled by two digital enable signals: CTRLH (horizontal) and CTRLV (vertical). The logic gates (XOR and 3-input AND) in conjunction with the enable signals gate the partial product into the adder only when CTRLHCTRLV, otherwise the cell propagates only carry and sum values. Disabled cells are effectively non-functional, drawing negligible switching power and only relaying the ripple-carry chain.
Global bit vectors CTRLH and CTRLV are broadcast across rows and columns, respectively. To select an active sub-multiplier at the top left, CTRLH and CTRLV are each set to ones, padded by zeros; any pattern of subarrays or even disjoint sub-multiplier blocks can be synthesized by appropriate setting of these bit vectors. Non-participating cells remain inactive, ensuring area and power efficiency.
2. Mathematical Modeling of Gating and Width Control
Memristor-based logic is utilized for the programmable gating. Each 1-bit cell implements threshold logic with memristor-resistive elements, exploiting their programmable resistance states (Ron, Roff). The threshold gates (NAND, NOR) are realized as series or parallel combinations feeding CMOS inverters. The fundamental memristor equations are:
For active partial product computation, let , . The gating vectors and produce the element-wise activation,
In matrix form: , with denoting element-wise AND. This formulation allows explicit control over which partial products contribute to the sum, directly enabling variable precision and parallelism.
3. Device-Level Implementation and Memristor Model Parameters
The deployed SPICE model uses an HP-type ionic drift memristor (Strukov et al.), embedded in a $180$ nm CMOS process. Key parameters include , , film thickness , and . The ratio is critical for noise margin: higher ratios yield sharper distinguishing between enabled and disabled states. Reconfiguration speed is bounded by ionic drift (). The compact size (sub-20 nm lateral dimensions, per device) facilitates area reductions compared to pure CMOS gates.
4. Quantitative Performance and System Benchmarks
Quantitative assessment in a $180$ nm process demonstrates the following metrics for an multiplier (values normalized to baseline CMOS ripple-carry array):
| Design | Delay (ns) | Power (mW) | Area (kμm²) |
|---|---|---|---|
| CMOS RCA (baseline) | 11.8 (1.00) | 10.9 (1.00) | 61.7 (1.00) |
| Mem-CMOS RCA | 11.9 (1.01) | 11.2 (1.03) | 65.1 (1.06) |
| Twin-precision | 17.3 (1.47) | 28.8 (2.64) | 51.2 (0.83) |
| Scalable | 12.5 (1.06) | 20.3 (1.86) | 61.2 (0.99) |
| Proposed reconfigurable | 10.9 (0.92) | 11.2 (1.03) | 24.5 (0.40) |
For specific DSP kernels at $100$ MHz, the reconfigurable multiplier yields up to $35$–$50$ % power savings and $30$–$40$ % latency reduction in narrower-width modes. Even at full $8$-bit width, the design is $8$ % faster and over $50$ % smaller in silicon area than standard CMOS multiplication arrays.
5. Reconfiguration Control and Programming Mechanisms
Width selection is realized by two -bit registers (where is the maximal supported width, e.g., $8$), storing CTRLH and CTRLV. These are updated, typically by an on-chip microcontroller or FSM, on a mode-change event. Programming pulses (typically V, ns) set each memristor in the threshold logic to either the Ron (“enabled”) or Roff (“disabled”) state. As the cell resistance is non-volatile, reconfiguration persists without additional static power or area from latches/muxes until reprogrammed.
6. Application Scenarios and System-Level Implications
- Four-tap FIR filter: Utilizing four multipliers and three $16$-bit adders, reducing the multiplier width to in low-precision phases cuts power by $35$ % and latency by $30$ %.
- Four-point FFT: Employing six complex multiplications (split over real multipliers), exploiting sub-8-bit parallelism reduces FFT latency $34$ % and power $49$ %.
Dynamic, width-adaptive multipliers match datapath precision to algorithmic requirements, minimizing switching capacitance and power. The architecture supports mixed-precision pipelines for applications in machine learning, video, and radar, facilitating reduced bit-depth for intermediate computations without architectural overhead. The non-volatile nature of the memristors obviates the need for area-intensive latches or multiplexing logic, marking a structural advantage over conventional width-multipliable designs (Baek, 2019).