Abstract:In this work, we optimally solve the problem of multiplierless design of second-order Infinite Impulse Response filters with minimum number of adders. Given a frequency specification, we design a stable direct form filter with hardware-aware fixed-point coefficients that yielding minimal number of adders when replacing all the multiplications by bit shifts and additions. The coefficient design, quantization and implementation, typically conducted independently, are now gathered into one global optimization problem, modeled through integer linear programming and efficiently solved using generic solvers. We guarantee the frequency-domain specifications and stability, which together with optimal number of adders will significantly simplify design-space exploration for filter designers. The optimal filters are implemented within the FloPoCo IP core generator and synthesized for Field Programmable Gate Arrays. With respect to state-of-the-art three-step filter design methods, our one-step design approach achieves, on average, 42% reduction in the number of lookup tables and 21% improvement in delay.
Abstract:Deep Neural Networks (DNN) represent a performance-hungry application. Floating-Point (FP) and custom floating-point-like arithmetic satisfies this hunger. While there is need for speed, inference in DNNs does not seem to have any need for precision. Many papers experimentally observe that DNNs can successfully run at almost ridiculously low precision. The aim of this paper is two-fold: first, to shed some theoretical light upon why a DNN's FP accuracy stays high for low FP precision. We observe that the loss of relative accuracy in the convolutional steps is recovered by the activation layers, which are extremely well-conditioned. We give an interpretation for the link between precision and accuracy in DNNs. Second, the paper presents a software framework for semi-automatic FP error analysis for the inference phase of deep-learning. Compatible with common Tensorflow/Keras models, it leverages the frugally-deep Python/C++ library to transform a neural network into C++ code in order to analyze the network's need for precision. This rigorous analysis is based on Interval and Affine arithmetics to compute absolute and relative error bounds for a DNN. We demonstrate our tool with several examples.