Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
158 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Corona: System Implications of Emerging Nanophotonic Technology (2307.06294v1)

Published 12 Jul 2023 in cs.AR, cs.ET, and cs.NI

Abstract: We expect that many-core microprocessors will push performance per chip from the 10 gigaflop to the 10 teraflop range in the coming decade. To support this increased performance, memory and inter-core bandwidths will also have to scale by orders of magnitude. Pin limitations, the energy cost of electrical signaling, and the non-scalability of chip-length global wires are significant bandwidth impediments. Recent developments in silicon nanophotonic technology have the potential to meet these off- and on- stack bandwidth requirements at acceptable power levels. Corona is a 3D many-core architecture that uses nanophotonic communication for both inter-core communication and off-stack communication to memory or I/O devices. Its peak floating-point performance is 10 teraflops. Dense wavelength division multiplexed optically connected memory modules provide 10 terabyte per second memory bandwidth. A photonic crossbar fully interconnects its 256 low-power multithreaded cores at 20 terabyte per second bandwidth. We have simulated a 1024 thread Corona system running synthetic benchmarks and scaled versions of the SPLASH-2 benchmark suite. We believe that in comparison with an electrically-connected many-core alternative that uses the same on-stack interconnect power, Corona can provide 2 to 6 times more performance on many memory-intensive workloads, while simultaneously reducing power.

Citations (704)

Summary

  • The paper introduces the Corona architecture, a 3D many-core NUMA system that leverages nanophotonic communication to deliver 10 teraflops performance with high bandwidth.
  • It demonstrates that optically connected memory and a DWDM photonic crossbar can outperform electrical interconnects by significantly reducing latency and overall power consumption.
  • Detailed simulations reveal that the nanophotonic interconnect maintains uniform latency and scalability across hundreds of cores, addressing critical bandwidth challenges.

Corona: System Implications of Emerging Nanophotonic Technology

The paper "Corona: System Implications of Emerging Nanophotonic Technology" explores the architectural design and performance implications of integrating nanophotonic technology in many-core processors. The primary argument posited is that the scaling of core counts in many-core systems necessitates a corresponding increase in memory and inter-core bandwidth, a challenge that traditional electrical interconnects struggle to meet due to significant power and space constraints.

Key Architectural Components

1. System Design:

Corona is presented as a 3D many-core NUMA system employing nanophotonic communication for both inter-core and off-chip memory communication. The system is designed to achieve 10 teraflops of scalar performance with inter-core and memory bandwidths of 20 TB/s and 10 TB/s, respectively. The architecture comprises 256 multithreaded cores organized into 64 clusters, interconnected through a dense wavelength division multiplexed (DWDM) photonic crossbar.

2. Core and Memory:

Each cluster consists of four in-order multithreaded cores with private L1 caches and a shared L2 cache. An integral component is the optically connected memory (OCM) employing a pair of fiber links for high bandwidth communication between the processor die stack and the external memory modules. This OCM reduces latency and power by directly connecting memory through photonic interconnects.

Photonic Technology

The paper highlights several advances in silicon nanophotonics that make it a viable alternative to electrical interconnects:

  • Waveguides: Silicon and silicon oxide waveguides offer low loss and the potential for significant bandwidth density improvements.
  • Resonators and Modulators: Resonator rings are employed for modulation, injection, and detection of data. These allow efficient wavelength-selective operations crucial for DWDM systems.
  • Light Sources: Mode-locked lasers are proposed to provide multiple wavelengths in the DWDM system, ensuring sufficient bandwidth for the projected data rates.

Performance Simulation

The paper evaluates Corona through detailed simulations involving synthetic and SPLASH-2 benchmarks. Key findings are:

  • Memory Bandwidth: Systems using optically connected memory (OCM) outperform traditional electrically connected memory systems (ECM) by a factor of 2 to 6 on memory-intensive workloads.
  • Interconnect Power: Despite the high power consumption associated with the dense waveguide and resonator arrays, the photonic interconnect significantly reduces overall system power compared to electrical interconnects due to the reduced need for global electrical wires and buffers.
  • Scalability: The nanophotonic crossbar provides near-uniform latencies and high bandwidth across hundreds of cores, addressing both latency and bandwidth bottlenecks in traditional electrical interconnects.

Implications and Future Directions

Practical Implications

  • Power Efficiency: The substantial reduction in interconnect power is pivotal for the scalability of many-core architectures, particularly in data-centric and high-performance contexts.
  • High Bandwidth Communication: The approach described can accommodate the increasing bandwidth demands without necessitating extraneous pin counts or excessive energy costs associated with electrical alternatives.

Theoretical Implications

  • System Integration: By exploring the integration of multiple optical components within a CMOS-compatible process, the paper underscores advancements necessary for the broader adoption of optical interconnects.
  • Architectural Shifts: The architecture illustrates a potential shift in chip design philosophy, where photonics can address fundamental limitations faced by electrical interconnects in scaling many-core systems.

Future Developments

  • Further research is warranted to explore the impacts of variability and yield in fabricating large-scale photonic components.
  • The development of more sophisticated control electronics and integration techniques will be necessary to optimize the performance and efficiency of these systems.
  • Expanding this architecture to incorporate emerging non-volatile memory technologies and heterogeneous computing elements could provide a path for achieving extreme-scale computing with balanced power and performance metrics.

In summary, the paper presents a comprehensive analysis of how nanophotonic technology can significantly enhance the capabilities of many-core processors, addressing critical challenges of bandwidth, latency, and power consumption. The proposed Corona architecture exemplifies the potential of nanophotonics to meet future computational demands effectively.