Embedded Neural Networks on Pixel Processor Arrays
This paper explores the implementation of Convolutional Neural Networks (CNNs) directly on Pixel Processor Arrays (PPAs), specifically using the SCAMP5 vision system. The concept of incorporating neural network processing capabilities at the sensor level offers substantial promise for enhancing the efficiency and performance of devices operating in environments with restricted computational resources. The authors present a novel approach to embedding neural networks onto PPAs, representing a pioneering effort to combine CNN processing with real-time image acquisition directly within the sensor hardware.
Architecture and Methodology
PPAs are characterized by a two-dimensional array of general-purpose processing elements, which enable integrated light capture, data storage, and processing at the pixel level. This paper focuses on using ternary-weight CNNs to make efficient use of the constrained resources available in PPAs. The SCAMP5 system, used in the paper, consists of 256x256 pixel-processors, each with limited analog and digital storage registers.
The CNN implementation is optimized for the SCAMP5 architecture by utilizing a 4x4 block structure within the array. This method trades image resolution for increased local memory capacity per "pixel," using 16-bit values to perform essential image operations such as addition, subtraction, and bit-shifting. The convolution process, a cornerstone of CNNs, is executed using ternary weights (values of -1, 0, and +1) to simplify computation and minimize memory requirements. The implementation is further optimized for PPA hardware through efficient digital-to-analog conversions and a checkerboard storage methodology to facilitate complex operations like max-pooling and reduction in bit precision.
Experimentation and Results
The paper presents experimental validation across two tasks: MNIST digit classification and a simple car-tracking application. Networks trained with real-valued weights on conventional computing architectures are adapted to ternary weights for deployment on the SCAMP5. Performance evaluations demonstrate that the proposed approach maintains competitive classification accuracy while operating in real-time, achieving frame rates between 135 and 250 frames per second depending on model parameters and threshold settings for ternarization.
The MNIST models achieved approximately 95% accuracy, showcasing the viability of low-bit-depth networks in visual recognition tasks. Furthermore, the car-tracking task illustrates the system's potential to perform localization tasks using edge-detected input, simulating how embedded systems could be used in robotic applications.
Implications for Future Development
This work contributes to the ongoing exploration of efficient hardware implementations for deep learning models and identifies the potential of PPAs to support sophisticated neural network computations in sensor-constrained environments. The successful demonstration using SCAMP5 highlights the near-future prospects for embedding more complex and deeper CNNs in visual processing units.
There is potential for further advancements in PPA technology, particularly with increased array size, higher integration of digital logic, and lower power consumption in more advanced silicon processes. These improvements could enable a broader application range encompassing autonomous vehicles, mobile robotics, and wearable computing devices. From a theoretical standpoint, this research opens the door for new architectures that prioritize local computation, data reduction, and low-energy consumption.
In conclusion, this paper provides an insightful step toward integrating neural network capabilities directly into sensor hardware. It offers a foundation for future innovations in embedded deep learning, emphasizing PPAs as a promising technological frontier in real-time, resource-constrained computing.