A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays (1909.05647v2)

Published 12 Sep 2019 in cs.CV and cs.ET

Abstract: We present a convolutional neural network implementation for pixel processor array (PPA) sensors. PPA hardware consists of a fine-grained array of general-purpose processing elements, each capable of light capture, data storage, program execution, and communication with neighboring elements. This allows images to be stored and manipulated directly at the point of light capture, rather than having to transfer images to external processing hardware. Our CNN approach divides this array up into 4x4 blocks of processing elements, essentially trading-off image resolution for increased local memory capacity per 4x4 "pixel". We implement parallel operations for image addition, subtraction and bit-shifting images in this 4x4 block format. Using these components we formulate how to perform ternary weight convolutions upon these images, compactly store results of such convolutions, perform max-pooling, and transfer the resulting sub-sampled data to an attached micro-controller. We train ternary weight filter CNNs for digit recognition and a simple tracking task, and demonstrate inference of these networks upon the SCAMP5 PPA system. This work represents a first step towards embedding neural network processing capability directly onto the focal plane of a sensor.

Citations (35)

View on Semantic Scholar

Summary

Embedded Neural Networks on Pixel Processor Arrays

This paper explores the implementation of Convolutional Neural Networks (CNNs) directly on Pixel Processor Arrays (PPAs), specifically using the SCAMP5 vision system. The concept of incorporating neural network processing capabilities at the sensor level offers substantial promise for enhancing the efficiency and performance of devices operating in environments with restricted computational resources. The authors present a novel approach to embedding neural networks onto PPAs, representing a pioneering effort to combine CNN processing with real-time image acquisition directly within the sensor hardware.

Architecture and Methodology

PPAs are characterized by a two-dimensional array of general-purpose processing elements, which enable integrated light capture, data storage, and processing at the pixel level. This paper focuses on using ternary-weight CNNs to make efficient use of the constrained resources available in PPAs. The SCAMP5 system, used in the paper, consists of 256x256 pixel-processors, each with limited analog and digital storage registers.

The CNN implementation is optimized for the SCAMP5 architecture by utilizing a 4x4 block structure within the array. This method trades image resolution for increased local memory capacity per "pixel," using 16-bit values to perform essential image operations such as addition, subtraction, and bit-shifting. The convolution process, a cornerstone of CNNs, is executed using ternary weights (values of -1, 0, and +1) to simplify computation and minimize memory requirements. The implementation is further optimized for PPA hardware through efficient digital-to-analog conversions and a checkerboard storage methodology to facilitate complex operations like max-pooling and reduction in bit precision.

Experimentation and Results

The paper presents experimental validation across two tasks: MNIST digit classification and a simple car-tracking application. Networks trained with real-valued weights on conventional computing architectures are adapted to ternary weights for deployment on the SCAMP5. Performance evaluations demonstrate that the proposed approach maintains competitive classification accuracy while operating in real-time, achieving frame rates between 135 and 250 frames per second depending on model parameters and threshold settings for ternarization.

The MNIST models achieved approximately 95% accuracy, showcasing the viability of low-bit-depth networks in visual recognition tasks. Furthermore, the car-tracking task illustrates the system's potential to perform localization tasks using edge-detected input, simulating how embedded systems could be used in robotic applications.

Implications for Future Development

This work contributes to the ongoing exploration of efficient hardware implementations for deep learning models and identifies the potential of PPAs to support sophisticated neural network computations in sensor-constrained environments. The successful demonstration using SCAMP5 highlights the near-future prospects for embedding more complex and deeper CNNs in visual processing units.

There is potential for further advancements in PPA technology, particularly with increased array size, higher integration of digital logic, and lower power consumption in more advanced silicon processes. These improvements could enable a broader application range encompassing autonomous vehicles, mobile robotics, and wearable computing devices. From a theoretical standpoint, this research opens the door for new architectures that prioritize local computation, data reduction, and low-energy consumption.

In conclusion, this paper provides an insightful step toward integrating neural network capabilities directly into sensor hardware. It offers a foundation for future innovations in embedded deep learning, emphasizing PPAs as a promising technological frontier in real-time, resource-constrained computing.

Related Papers

YouTube

Show All Videos