Abstract:The ability to predict motion in real time is fundamental to many maneuvering activities in animals, particularly those critical for survival, such as attack and escape responses. Given its significance, it is no surprise that motion prediction in animals begins in the retina. Similarly, autonomous systems utilizing computer vision could greatly benefit from the capability to predict motion in real time. Therefore, for computer vision applications, motion prediction should be integrated directly at the camera pixel level. Towards that end, we present a retina-inspired neuromorphic framework capable of performing real-time, energy-efficient MP directly within camera pixels. Our hardware-algorithm framework, implemented using GlobalFoundries 22nm FDSOI technology, integrates key retinal MP compute blocks, including a biphasic filter, spike adder, nonlinear circuit, and a 2D array for multi-directional motion prediction. Additionally, integrating the sensor and MP compute die using a 3D Cu-Cu hybrid bonding approach improves design compactness by minimizing area usage and simplifying routing complexity. Validated on real-world object stimuli, the model delivers efficient, low-latency MP for decision-making scenarios reliant on predictive visual computation, while consuming only 18.56 pJ/MP in our mixed-signal hardware implementation.
Abstract:Near-tissue computing requires sensor-level processing of high-resolution images, essential for real-time biomedical diagnostics and surgical guidance. To address this need, we introduce a novel Capacitive Transimpedance Amplifier-based In-Pixel Computing (CTIA-IPC) architecture. Our design leverages CTIA pixels that are widely used for biomedical imaging owing to the inherent advantages of excellent linearity, low noise, and robust operation under low-light conditions. We augment CTIA pixels with IPC to enable precise deep learning computations including multi-channel, multi-bit convolution operations along with integrated batch normalization (BN) and Rectified Linear Unit (ReLU) functionalities in the peripheral ADC (Analog to Digital Converters). This design improves the linearity of Multiply and Accumulate (MAC) operations while enhancing computational efficiency. Leveraging 3D integration to embed pixel circuitry and weight storage, CTIA-IPC maintains pixel density comparable to standard CTIA designs. Moreover, our algorithm-circuit co-design approach enables efficient real-time diagnostics and AI-driven medical analysis. Evaluated on the EndoVis tissu dataset (1280x1024), CTIA-IPC achieves approximately 12x reduction in data bandwidth, yielding segmentation IoUs of 75.91% (parts), and 28.58% (instrument)-a minimal accuracy reduction (1.3%-2.5%) compared to baseline methods. Achieving 1.98 GOPS throughput and 3.39 GOPS/W efficiency, our CTIA-IPC architecture offers a promising computational framework tailored specifically for biomedical near-tissue computing.