Abstract:The so-called independent low-rank matrix analysis (ILRMA) has demonstrated a great potential for dealing with the problem of determined blind source separation (BSS) for audio and speech signals. This method assumes that the spectra from different frequency bands are independent and the spectral coefficients in any frequency band are Gaussian distributed. The Itakura-Saito divergence is then employed to estimate the source model related parameters. In reality, however, the spectral coefficients from different frequency bands may be dependent, which is not considered in the existing ILRMA algorithm. This paper presents an improved version of ILRMA, which considers the dependency between the spectral coefficients from different frequency bands. The Sinkhorn divergence is then exploited to optimize the source model parameters. As a result of using the cross-band information, the BSS performance is improved. But the number of parameters to be estimated also increases significantly, and so is the computational complexity. To reduce the algorithm complexity, we apply the Kronecker product to decompose the modeling matrix into the product of a number of matrices of much smaller dimensionality. An efficient algorithm is then developed to implement the Sinkhorn divergence based BSS algorithm and the complexity is reduced by an order of magnitude.
Abstract:In this work, we consider the problem of regularization in minimum mean-squared error (MMSE) linear filters. Exploiting the relationship with statistical machine learning methods, the regularization parameter is found from the observed signals in a simple and automatic manner. The proposed approach is illustrated through system identification examples, where the automatic regularization yields near-optimal results.
Abstract:The image model method has been widely used to simulate room impulse responses and the endeavor to adapt this method to different applications has also piqued great interest over the last few decades. This paper attempts to extend the image model method and develops an anchor-point-image-model (APIM) approach as a solution for simulating impulse responses by including both the source radiation and sensor directivity patterns. To determine the orientations of all the virtual sources, anchor points are introduced to real sources, which subsequently lead to the determination of the orientations of the virtual sources. An algorithm is developed to generate room impulse responses with APIM by taking into account the directional pattern functions, factional time delays, as well as the computational complexity. The developed model and algorithms can be used in various acoustic problems to simulate room acoustics and improve and evaluate processing algorithms.
Abstract:In this work we define and analyze the bilinear models which replace the conventional linear operation used in many building blocks of machine learning (ML). The main idea is to devise the ML algorithms which are adapted to the objects they treat. In the case of monochromatic images, we show that the bilinear operation exploits better the structure of the image than the conventional linear operation which ignores the spatial relationship between the pixels. This translates into significantly smaller number of parameters required to yield the same performance. We show numerical examples of classification in the MNIST data set.