Abstract:We explore the problem of sampling graph signals in scenarios where the graph structure is not predefined and must be inferred from data. In this scenario, existing approaches rely on a two-step process, where a graph is learned first, followed by sampling. More generally, graph learning and graph signal sampling have been studied as two independent problems in the literature. This work provides a foundational step towards jointly optimizing the graph structure and sampling set. Our main contribution, Vertex Importance Sampling (VIS), is to show that the sampling set can be effectively determined from the vertex importance (node weights) obtained from graph learning. We further propose Vertex Importance Sampling with Repulsion (VISR), a greedy algorithm where spatially -separated "important" nodes are selected to ensure better reconstruction. Empirical results on simulated data show that sampling using VIS and VISR leads to competitive reconstruction performance and lower complexity than the conventional two-step approach of graph learning followed by graph sampling.
Abstract:Modern compression systems use linear transformations in their encoding and decoding processes, with transforms providing compact signal representations. While multiple data-dependent transforms for image/video coding can adapt to diverse statistical characteristics, assembling large datasets to learn each transform is challenging. Also, the resulting transforms typically lack fast implementation, leading to significant computational costs. Thus, despite many papers proposing new transform families, the most recent compression standards predominantly use traditional separable sinusoidal transforms. This paper proposes integrating a new family of Symmetry-based Graph Fourier Transforms (SBGFTs) of variable sizes into a coding framework, focusing on the extension from our previously introduced 8x8 SBGFTs to the general case of NxN grids. SBGFTs are non-separable transforms that achieve sparse signal representation while maintaining low computational complexity thanks to their symmetry properties. Their design is based on our proposed algorithm, which generates symmetric graphs on the grid by adding specific symmetrical connections between nodes and does not require any data-dependent adaptation. Furthermore, for video intra-frame coding, we exploit the correlations between optimal graphs and prediction modes to reduce the cardinality of the transform sets, thus proposing a low-complexity framework. Experiments show that SBGFTs outperform the primary transforms integrated in the explicit Multiple Transform Selection (MTS) used in the latest VVC intra-coding, providing a bit rate saving percentage of 6.23%, with only a marginal increase in average complexity. A MATLAB implementation of the proposed algorithm is available online at [1].
Abstract:We present a novel method to correct flying pixels within data captured by Time-of-flight (ToF) sensors. Flying pixel (FP) artifacts occur when signals from foreground and background objects reach the same sensor pixel, leading to a confident yet incorrect depth estimation in space - floating between two objects. Commercial RGB-D cameras have a complementary setup consisting of ToF sensors to capture depth in addition to RGB cameras. We propose a novel method to correct FPs by leveraging the aligned RGB and depth image in such RGB-D cameras to estimate the true depth values of FPs. Our method defines a 3D neighborhood around each point, representing a "field of view" that mirrors the acquisition process of ToF cameras. We propose a two-step iterative correction algorithm in which the FPs are first identified. Then, we estimate the true depth value of FPs by solving a least-squares optimization problem. Experimental results show that our proposed algorithm estimates the depth value of FPs as accurately as other algorithms in the literature.
Abstract:3D Point clouds (PCs) are commonly used to represent 3D scenes. They can have millions of points, making subsequent downstream tasks such as compression and streaming computationally expensive. PC sampling (selecting a subset of points) can be used to reduce complexity. Existing PC sampling algorithms focus on preserving geometry features and often do not scale to handle large PCs. In this work, we develop scalable graph-based sampling algorithms for PC color attributes, assuming the full geometry is available. Our sampling algorithms are optimized for a signal reconstruction method that minimizes the graph Laplacian quadratic form. We first develop a global sampling algorithm that can be applied to PCs with millions of points by exploiting sparsity and sampling rate adaptive parameter selection. Further, we propose a block-based sampling strategy where each block is sampled independently. We show that sampling the corresponding sub-graphs with optimally chosen self-loop weights (node weights) will produce a sampling set that approximates the results of global sampling while reducing complexity by an order of magnitude. Our empirical results on two large PC datasets show that our algorithms outperform the existing fast PC subsampling techniques (uniform and geometry feature preserving random sampling) by 2dB. Our algorithm is up to 50 times faster than existing graph signal sampling algorithms while providing better reconstruction accuracy. Finally, we illustrate the efficacy of PC attribute sampling within a compression scenario, showing that pre-compression sampling of PC attributes can lower the bitrate by 11% while having minimal effect on reconstruction.
Abstract:Choosing an appropriate frequency definition and norm is critical in graph signal sampling and reconstruction. Most previous works define frequencies based on the spectral properties of the graph and use the same frequency definition and $\ell_2$-norm for optimization for all sampling sets. Our previous work demonstrated that using a sampling set-adaptive norm and frequency definition can address challenges in classical bandlimited approximation, particularly with model mismatches and irregularly distributed data. In this work, we propose a method for selecting sampling sets tailored to the sampling set adaptive GFT-based interpolation. When the graph models the inverse covariance of the data, we show that this adaptive GFT enables localizing the bandlimited model mismatch error to high frequencies, and the spectral folding property allows us to track this error in reconstruction. Based on this, we propose a sampling set selection algorithm to minimize the worst-case bandlimited model mismatch error. We consider partitioning the sensors in a sensor network sampling a continuous spatial process as an application. Our experiments show that sampling and reconstruction using sampling set adaptive GFT significantly outperform methods that used fixed GFTs and bandwidth-based criterion.
Abstract:This paper develops fast graph Fourier transform (GFT) algorithms with O(n log n) runtime complexity for rank-one updates of the path graph. We first show that several commonly-used audio and video coding transforms belong to this class of GFTs, which we denote by DCT+. Next, starting from an arbitrary generalized graph Laplacian and using rank-one perturbation theory, we provide a factorization for the GFT after perturbation. This factorization is our central result and reveals a progressive structure: we first apply the unperturbed Laplacian's GFT and then multiply the result by a Cauchy matrix. By specializing this decomposition to path graphs and exploiting the properties of Cauchy matrices, we show that Fast DCT+ algorithms exist. We also demonstrate that progressivity can speed up computations in applications involving multiple transforms related by rank-one perturbations (e.g., video coding) when combined with pruning strategies. Our results can be extended to other graphs and rank-k perturbations. Runtime analyses show that Fast DCT+ provides computational gains over the naive method for graph sizes larger than 64, with runtime approximately equal to that of 8 DCTs.
Abstract:We introduce a novel uncertainty principle for generalized graph signals that extends classical time-frequency and graph uncertainty principles into a unified framework. By defining joint vertex-time and spectral-frequency spreads, we quantify signal localization across these domains, revealing a trade-off between them. This framework allows us to identify a class of signals with maximal energy concentration in both domains, forming the fundamental atoms for a new joint vertex-time dictionary. This dictionary enhances signal reconstruction under practical constraints, such as incomplete or intermittent data, commonly encountered in sensor and social networks. Numerical experiments on real-world datasets demonstrate the effectiveness of the proposed approach, showing improved reconstruction accuracy and noise robustness compared to existing methods.
Abstract:With the increasing number of images and videos consumed by computer vision algorithms, compression methods are evolving to consider both perceptual quality and performance in downstream tasks. Traditional codecs can tackle this problem by performing rate-distortion optimization (RDO) to minimize the distance at the output of a feature extractor. However, neural network non-linearities can make the rate-distortion landscape irregular, leading to reconstructions with poor visual quality even for high bit rates. Moreover, RDO decisions are made block-wise, while the feature extractor requires the whole image to exploit global information. In this paper, we address these limitations in three steps. First, we apply Taylor's expansion to the feature extractor, recasting the metric as an input-dependent squared error involving the Jacobian matrix of the neural network. Second, we make a localization assumption to compute the metric block-wise. Finally, we use randomized dimensionality reduction techniques to approximate the Jacobian. The resulting expression is monotonic with the rate and can be evaluated in the transform domain. Simulations with AVC show that our approach provides bit-rate savings while preserving accuracy in downstream tasks with less complexity than using the feature distance directly.
Abstract:As language models become more general purpose, increased attention needs to be paid to detecting out-of-distribution (OOD) instances, i.e., those not belonging to any of the distributions seen during training. Existing methods for detecting OOD data are computationally complex and storage-intensive. We propose a novel soft clustering approach for OOD detection based on non-negative kernel regression. Our approach greatly reduces computational and space complexities (up to 11x improvement in inference time and 87% reduction in storage requirements) and outperforms existing approaches by up to 4 AUROC points on four different benchmarks. We also introduce an entropy-constrained version of our algorithm, which leads to further reductions in storage requirements (up to 97% lower than comparable approaches) while retaining competitive performance. Our soft clustering approach for OOD detection highlights its potential for detecting tail-end phenomena in extreme-scale data settings.
Abstract:Point clouds are a general format for representing realistic 3D objects in diverse 3D applications. Since point clouds have large data sizes, developing efficient point cloud compression methods is crucial. However, excessive compression leads to various distortions, which deteriorates the point cloud quality perceived by end users. Thus, establishing reliable point cloud quality assessment (PCQA) methods is essential as a benchmark to develop efficient compression methods. This paper presents an accurate full-reference point cloud quality assessment (FR-PCQA) method called full-reference quality assessment using support vector regression (FRSVR) for various types of degradations such as compression distortion, Gaussian noise, and down-sampling. The proposed method demonstrates accurate PCQA by integrating five FR-based metrics covering various types of errors (e.g., considering geometric distortion, color distortion, and point count) using support vector regression (SVR). Moreover, the proposed method achieves a superior trade-off between accuracy and calculation speed because it includes only the calculation of these five simple metrics and SVR, which can perform fast prediction. Experimental results with three types of open datasets show that the proposed method is more accurate than conventional FR-PCQA methods. In addition, the proposed method is faster than state-of-the-art methods that utilize complicated features such as curvature and multi-scale features. Thus, the proposed method provides excellent performance in terms of the accuracy of PCQA and processing speed. Our method is available from https://github.com/STAC-USC/FRSVR-PCQA.