Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tomo Miyazaki

Controlling Rate, Distortion, and Realism: Towards a Single Comprehensive Neural Image Compression Model

May 27, 2024

Shoma Iwai, Tomo Miyazaki, Shinichiro Omachi

Abstract:In recent years, neural network-driven image compression (NIC) has gained significant attention. Some works adopt deep generative models such as GANs and diffusion models to enhance perceptual quality (realism). A critical obstacle of these generative NIC methods is that each model is optimized for a single bit rate. Consequently, multiple models are required to compress images to different bit rates, which is impractical for real-world applications. To tackle this issue, we propose a variable-rate generative NIC model. Specifically, we explore several discriminator designs tailored for the variable-rate approach and introduce a novel adversarial loss. Moreover, by incorporating the newly proposed multi-realism technique, our method allows the users to adjust the bit rate, distortion, and realism with a single model, achieving ultra-controllability. Unlike existing variable-rate generative NIC models, our method matches or surpasses the performance of state-of-the-art single-rate generative NIC models while covering a wide range of bit rates using just one model. Code will be available at https://github.com/iwa-shi/CRDR

* WACV2024 Oral. Code is at https://github.com/iwa-shi/CRDR

Via

Access Paper or Ask Questions

IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model

May 16, 2024

Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Shinichiro Omachi

Figure 1 for IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model

Figure 2 for IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model

Figure 3 for IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model

Figure 4 for IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model

Abstract:Infrared (IR) image super-resolution faces challenges from homogeneous background pixel distributions and sparse target regions, requiring models that effectively handle long-range dependencies and capture detailed local-global information. Recent advancements in Mamba-based (Selective Structured State Space Model) models, employing state space models, have shown significant potential in visual tasks, suggesting their applicability for IR enhancement. In this work, we introduce IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model, a novel Mamba-based model designed specifically for IR image super-resolution. This model enhances the restoration of context-sparse target details through its advanced dependency modeling capabilities. Additionally, a new wavelet transform feature modulation block improves multi-scale receptive field representation, capturing both global and local information efficiently. Comprehensive evaluations confirm that IRSRMamba outperforms existing models on multiple benchmarks. This research advances IR super-resolution and demonstrates the potential of Mamba-based models in IR image processing. Code are available at \url{https://github.com/yongsongH/IRSRMamba}.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer

Dec 27, 2023

Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Kaiyuan Jiang, Zhengmi Tang, Shinichiro Omachi

Figure 1 for Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer

Figure 2 for Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer

Figure 3 for Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer

Figure 4 for Learn From Orientation Prior for Radiograph Super-Resolution: Orientation Operator Transformer

Abstract:Background and objective: High-resolution radiographic images play a pivotal role in the early diagnosis and treatment of skeletal muscle-related diseases. It is promising to enhance image quality by introducing single-image super-resolution (SISR) model into the radiology image field. However, the conventional image pipeline, which can learn a mixed mapping between SR and denoising from the color space and inter-pixel patterns, poses a particular challenge for radiographic images with limited pattern features. To address this issue, this paper introduces a novel approach: Orientation Operator Transformer - $O^{2}$former. Methods: We incorporate an orientation operator in the encoder to enhance sensitivity to denoising mapping and to integrate orientation prior. Furthermore, we propose a multi-scale feature fusion strategy to amalgamate features captured by different receptive fields with the directional prior, thereby providing a more effective latent representation for the decoder. Based on these innovative components, we propose a transformer-based SISR model, i.e., $O^{2}$former, specifically designed for radiographic images. Results: The experimental results demonstrate that our method achieves the best or second-best performance in the objective metrics compared with the competitors at $\times 4$ upsampling factor. For qualitative, more objective details are observed to be recovered. Conclusions: In this study, we propose a novel framework called $O^{2}$former for radiological image super-resolution tasks, which improves the reconstruction model's performance by introducing an orientation operator and multi-scale feature fusion strategy. Our approach is promising to further promote the radiographic image enhancement field.

* Accepted by Computer Methods and Programs in Biomedicine

Via

Access Paper or Ask Questions

Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Nov 15, 2023

Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Yafei Dong, Shinichiro Omachi

Figure 1 for Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Figure 2 for Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Figure 3 for Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Figure 4 for Target-oriented Domain Adaptation for Infrared Image Super-Resolution

Abstract:Recent efforts have explored leveraging visible light images to enrich texture details in infrared (IR) super-resolution. However, this direct adaptation approach often becomes a double-edged sword, as it improves texture at the cost of introducing noise and blurring artifacts. To address these challenges, we propose the Target-oriented Domain Adaptation SRGAN (DASRGAN), an innovative framework specifically engineered for robust IR super-resolution model adaptation. DASRGAN operates on the synergy of two key components: 1) Texture-Oriented Adaptation (TOA) to refine texture details meticulously, and 2) Noise-Oriented Adaptation (NOA), dedicated to minimizing noise transfer. Specifically, TOA uniquely integrates a specialized discriminator, incorporating a prior extraction branch, and employs a Sobel-guided adversarial loss to align texture distributions effectively. Concurrently, NOA utilizes a noise adversarial loss to distinctly separate the generative and Gaussian noise pattern distributions during adversarial training. Our extensive experiments confirm DASRGAN's superiority. Comparative analyses against leading methods across multiple benchmarks and upsampling factors reveal that DASRGAN sets new state-of-the-art performance standards. Code are available at \url{https://github.com/yongsongH/DASRGAN}.

* 11 pages, 9 figures

Via

Access Paper or Ask Questions

Deep Image Compression Using Scene Text Quality Assessment

May 19, 2023

Shohei Uchigasaki, Tomo Miyazaki, Shinichiro Omachi

Abstract:Image compression is a fundamental technology for Internet communication engineering. However, a high compression rate with general methods may degrade images, resulting in unreadable texts. In this paper, we propose an image compression method for maintaining text quality. We developed a scene text image quality assessment model to assess text quality in compressed images. The assessment model iteratively searches for the best-compressed image holding high-quality text. Objective and subjective results showed that the proposed method was superior to existing methods. Furthermore, the proposed assessment model outperformed other deep-learning regression models.

* Accepted by Pattern Recognition, 2023

Via

Access Paper or Ask Questions

Infrared Image Super-Resolution: Systematic Review, and Future Trends

Dec 22, 2022

Yongsong Huang, Tomo Miyazaki, Xiaofeng Liu, Shinichiro Omachi

Abstract:Image Super-Resolution (SR) is essential for a wide range of computer vision and image processing tasks. Investigating infrared (IR) image (or thermal images) super-resolution is a continuing concern within the development of deep learning. This survey aims to provide a comprehensive perspective of IR image super-resolution, including its applications, hardware imaging system dilemmas, and taxonomy of image processing methodologies. In addition, the datasets and evaluation metrics in IR image super-resolution tasks are also discussed. Furthermore, the deficiencies in current technologies and possible promising directions for the community to explore are highlighted. To cope with the rapid development in this field, we intend to regularly update the relevant excellent work at \url{https://github.com/yongsongH/Infrared_Image_SR_Survey

* Submitted to IEEE TIP

Via

Access Paper or Ask Questions

A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Sep 06, 2022

Zhengmi Tang, Tomo Miyazaki, Shinichiro Omachi

Figure 1 for A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Figure 2 for A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Figure 3 for A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Figure 4 for A Scene-Text Synthesis Engine Achieved Through Learning from Decomposed Real-World Data

Abstract:Scene-text image synthesis techniques aimed at naturally composing text instances on background scene images are very appealing for training deep neural networks because they can provide accurate and comprehensive annotation information. Prior studies have explored generating synthetic text images on two-dimensional and three-dimensional surfaces based on rules derived from real-world observations. Some of these studies have proposed generating scene-text images from learning; however, owing to the absence of a suitable training dataset, unsupervised frameworks have been explored to learn from existing real-world data, which may not result in a robust performance. To ease this dilemma and facilitate research on learning-based scene text synthesis, we propose DecompST, a real-world dataset prepared using public benchmarks, with three types of annotations: quadrilateral-level BBoxes, stroke-level text masks, and text-erased images. Using the DecompST dataset, we propose an image synthesis engine that includes a text location proposal network (TLPNet) and a text appearance adaptation network (TAANet). TLPNet first predicts the suitable regions for text embedding. TAANet then adaptively changes the geometry and color of the text instance according to the context of the background. Our comprehensive experiments verified the effectiveness of the proposed method for generating pretraining data for scene text detectors.

Via

Access Paper or Ask Questions

Stroke-Based Scene Text Erasing Using Synthetic Data

Apr 23, 2021

Zhengmi Tang, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi

Figure 1 for Stroke-Based Scene Text Erasing Using Synthetic Data

Figure 2 for Stroke-Based Scene Text Erasing Using Synthetic Data

Figure 3 for Stroke-Based Scene Text Erasing Using Synthetic Data

Figure 4 for Stroke-Based Scene Text Erasing Using Synthetic Data

Abstract:Scene text erasing, which replaces text regions with reasonable content in natural images, has drawn attention in the computer vision community in recent years. There are two potential subtasks in scene text erasing: text detection and image inpainting. Either sub-task requires considerable data to achieve better performance; however, the lack of a large-scale real-world scene-text removal dataset allows the existing methods to not work in full strength. To avoid the limitation of the lack of pairwise real-world data, we enhance and make full use of the synthetic text and consequently train our model only on the dataset generated by the improved synthetic text engine. Our proposed network contains a stroke mask prediction module and background inpainting module that can extract the text stroke as a relatively small hole from the text image patch to maintain more background content for better inpainting results. This model can partially erase text instances in a scene image with a bounding box provided or work with an existing scene text detector for automatic scene text erasing. The experimental results of qualitative evaluation and quantitative evaluation on the SCUT-Syn, ICDAR2013, and SCUT-EnsText datasets demonstrate that our method significantly outperforms existing state-of-the-art methods even when trained on real-world data.

Via

Access Paper or Ask Questions

Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Aug 24, 2020

Shoma Iwai, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi

Figure 1 for Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Figure 2 for Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Figure 3 for Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Figure 4 for Fidelity-Controllable Extreme Image Compression with Generative Adversarial Networks

Abstract:We propose a GAN-based image compression method working at extremely low bitrates below 0.1bpp. Most existing learned image compression methods suffer from blur at extremely low bitrates. Although GAN can help to reconstruct sharp images, there are two drawbacks. First, GAN makes training unstable. Second, the reconstructions often contain unpleasing noise or artifacts. To address both of the drawbacks, our method adopts two-stage training and network interpolation. The two-stage training is effective to stabilize the training. Moreover, the network interpolation utilizes the models in both stages and reduces undesirable noise and artifacts, while maintaining important edges. Hence, we can control the trade-off between perceptual quality and fidelity without re-training models. The experimental results show that our model can reconstruct high quality images. Furthermore, our user study confirms that our reconstructions are preferable to state-of-the-art GAN-based image compression model. The code will be available.

* 8 pages, 11 figures

Via

Access Paper or Ask Questions

Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Apr 16, 2020

Huy Manh Nguyen, Tomo Miyazaki, Yoshihiro Sugaya, Shinichiro Omachi

Figure 1 for Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Figure 2 for Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Figure 3 for Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Figure 4 for Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Abstract:Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed instances due to the difficulty of matching visual dynamics in videos to textual features in sentences. A single space is not enough to accommodate various videos and sentences. In this paper, we propose a novel framework that maps instances into multiple individual embedding spaces so that we can capture multiple relationships between instances, leading to compelling video retrieval. We propose to produce a final similarity between instances by fusing similarities measured in each embedding space using a weighted sum strategy. We determine the weights according to a sentence. Therefore, we can flexibly emphasize an embedding space. We conducted sentence-to-video retrieval experiments on a benchmark dataset. The proposed method achieved superior performance, and the results are competitive to state-of-the-art methods. These experimental results demonstrated the effectiveness of the proposed multiple embedding approach compared to existing methods.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions