Abstract:Omnidirectional image super-resolution (ODISR) aims to upscale low-resolution (LR) omnidirectional images (ODIs) to high-resolution (HR), addressing the growing demand for detailed visual content across a $180^{\circ}\times360^{\circ}$ viewport. Existing methods are limited by simple degradation assumptions (e.g., bicubic downsampling), which fail to capture the complex, unknown real-world degradation processes. Recent diffusion-based approaches suffer from slow inference due to their hundreds of sampling steps and frequent pixel-latent space conversions. To tackle these challenges, in this paper, we propose RealOSR, a novel diffusion-based approach for real-world ODISR (Real-ODISR) with single-step diffusion denoising. To sufficiently exploit the input information, RealOSR introduces a lightweight domain alignment module, which facilitates the efficient injection of LR ODI into the single-step latent denoising. Additionally, to better utilize the rich semantic and multi-scale feature modeling ability of denoising UNet, we develop a latent unfolding module that simulates the gradient descent process directly in latent space. Experimental results demonstrate that RealOSR outperforms previous methods in both ODI recovery quality and efficiency. Compared to the recent state-of-the-art diffusion-based ODISR method, OmniSSR, RealOSR achieves significant improvements in visual quality and over \textbf{200$\times$} inference acceleration. Our code and models will be released.
Abstract:Hybrid spectral CT integrates energy integrating detectors (EID) and photon counting detectors (PCD) into a single system, combining the large field-of-view advantage of EID with the high energy and spatial resolution of PCD. This represents a new research direction in spectral CT imaging. However, the different imaging principles and inconsistent geometric paths of the two detectors make it difficult to reconstruct images using data from hybrid detectors. In addition, the quality reconstructed images considering spectrum is affected by the accuracy of spectral estimation and the scattered photons. In this work, Firstly, we propose a general hybrid spectral reconstruction method that takes into account both the spectral CT imaging principles of the two different detectors and the influence of scattered photons in the forward process modelling. Furthermore, we also apply volume fraction constraints to the results reconstructed from the two detector data. By alternately solving the spectral estimation and the spectral image reconstruction by the ADMM method, the estimated spectra and the reconstructed images reinforce each other, thus improving the accuracy of the spectral estimation and the quality of the reconstructed images. The proposed method is the first to achieve hybrid spectral CT reconstruction for both detectors, allowing simultaneous recovery of spectrum and image reconstruction from hybrid spectral data containing scattering. In addition, the method is also applicable to spectral CT imaging using a single type of detector. We validated the effectiveness of the proposed method through numerical experiments and successfully performed the first hybrid spectral CT reconstruction experiment on our self-developed hybrid spectral CT system.
Abstract:Understanding the wheel-terrain interaction is of great importance to improve the maneuverability and traversability of the rovers. A well-developed sensing device carried by the rover would greatly facilitate the complex risk-reducing operations on sandy terrains. In this paper, an instrumented wheel-on-limb (WOL) system of planetary rovers for wheel-terrain interaction characterization is presented. Assuming the function of a passive suspension of the wheel, the WOL system allows itself to follow the terrain contour, and keep the wheel remain lowered onto the ground during rover motion including climbing and descending, as well as deploy and place the wheel on the ground before a drive commanding. The system concept, functional requirements, and pre-design work, as well as the system integration are presented.
Abstract:Making language models bigger does not inherently make them better at following a user's intent. For example, large language models can generate outputs that are untruthful, toxic, or simply not helpful to the user. In other words, these models are not aligned with their users. In this paper, we show an avenue for aligning language models with user intent on a wide range of tasks by fine-tuning with human feedback. Starting with a set of labeler-written prompts and prompts submitted through the OpenAI API, we collect a dataset of labeler demonstrations of the desired model behavior, which we use to fine-tune GPT-3 using supervised learning. We then collect a dataset of rankings of model outputs, which we use to further fine-tune this supervised model using reinforcement learning from human feedback. We call the resulting models InstructGPT. In human evaluations on our prompt distribution, outputs from the 1.3B parameter InstructGPT model are preferred to outputs from the 175B GPT-3, despite having 100x fewer parameters. Moreover, InstructGPT models show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. Even though InstructGPT still makes simple mistakes, our results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.
Abstract:We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web. By setting up the task so that it can be performed by humans, we are able to train models on the task using imitation learning, and then optimize answer quality with human feedback. To make human evaluation of factual accuracy easier, models must collect references while browsing in support of their answers. We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users. Our best model is obtained by fine-tuning GPT-3 using behavior cloning, and then performing rejection sampling against a reward model trained to predict human preferences. This model's answers are preferred by humans 56% of the time to those of our human demonstrators, and 69% of the time to the highest-voted answer from Reddit.
Abstract:In this paper, we focus on the unsupervised multi-view feature selection which tries to handle high dimensional data in the field of multi-view learning. Although some graph-based methods have achieved satisfactory performance, they ignore the underlying data structure across different views. Besides, their pre-defined laplacian graphs are sensitive to the noises in the original data space, and fail to get the optimal neighbor assignment. To address the above problems, we propose a novel unsupervised multi-view feature selection model based on graph learning, and the contributions are threefold: (1) during the feature selection procedure, the consensus similarity graph shared by different views is learned. Therefore, the proposed model can reveal the data relationship from the feature subset. (2) a reasonable rank constraint is added to optimize the similarity matrix to obtain more accurate information; (3) an auto-weighted framework is presented to assign view weights adaptively, and an effective alternative iterative algorithm is proposed to optimize the problem. Experiments on various datasets demonstrate the superiority of the proposed method compared with the state-of-the-art methods.
Abstract:The Teacher-Student (T-S) framework is widely utilized in the classification tasks, through which the performance of one neural network (the student) can be improved by transferring knowledge from another trained neural network (the teacher). Since the transferring knowledge is related to the network capacities and structures between the teacher and the student, how to define efficient knowledge remains an open question. To address this issue, we design a novel transferring knowledge, the Self-Attention based Inter-Class Correlation (ICC) map in the output layer, and propose our T-S framework, Inter-Class Correlation Transfer (ICCT).