Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Changkyu Choi

DocVXQA: Context-Aware Visual Explanations for Document Question Answering

May 12, 2025

Mohamed Ali Souibgui, Changkyu Choi, Andrey Barsky, Kangsoo Jung, Ernest Valveny, Dimosthenis Karatzas

Abstract:We propose DocVXQA, a novel framework for visually self-explainable document question answering. The framework is designed not only to produce accurate answers to questions but also to learn visual heatmaps that highlight contextually critical regions, thereby offering interpretable justifications for the model's decisions. To integrate explanations into the learning process, we quantitatively formulate explainability principles as explicit learning objectives. Unlike conventional methods that emphasize only the regions pertinent to the answer, our framework delivers explanations that are \textit{contextually sufficient} while remaining \textit{representation-efficient}. This fosters user trust while achieving a balance between predictive performance and interpretability in DocVQA applications. Extensive experiments, including human evaluation, provide strong evidence supporting the effectiveness of our method. The code is available at https://github.com/dali92002/DocVXQA.

Via

Access Paper or Ask Questions

Addressing Label Shift in Distributed Learning via Entropy Regularization

Feb 04, 2025

Zhiyuan Wu, Changkyu Choi, Xiangcheng Cao, Volkan Cevher, Ali Ramezani-Kebrya

Abstract:We address the challenge of minimizing true risk in multi-node distributed learning. These systems are frequently exposed to both inter-node and intra-node label shifts, which present a critical obstacle to effectively optimizing model performance while ensuring that data remains confined to each node. To tackle this, we propose the Versatile Robust Label Shift (VRLS) method, which enhances the maximum likelihood estimation of the test-to-train label density ratio. VRLS incorporates Shannon entropy-based regularization and adjusts the density ratio during training to better handle label shifts at the test time. In multi-node learning environments, VRLS further extends its capabilities by learning and adapting density ratios across nodes, effectively mitigating label shifts and improving overall model performance. Experiments conducted on MNIST, Fashion MNIST, and CIFAR-10 demonstrate the effectiveness of VRLS, outperforming baselines by up to 20% in imbalanced settings. These results highlight the significant improvements VRLS offers in addressing label shifts. Our theoretical analysis further supports this by establishing high-probability bounds on estimation errors.

* Accepted at the International Conference on Learning Representations (ICLR 2025)

Via

Access Paper or Ask Questions

Data-free mixed-precision quantization using novel sensitivity metric

Mar 18, 2021

Donghyun Lee, Minkyoung Cho, Seungwon Lee, Joonho Song, Changkyu Choi

Figure 1 for Data-free mixed-precision quantization using novel sensitivity metric

Figure 2 for Data-free mixed-precision quantization using novel sensitivity metric

Figure 3 for Data-free mixed-precision quantization using novel sensitivity metric

Figure 4 for Data-free mixed-precision quantization using novel sensitivity metric

Abstract:Post-training quantization is a representative technique for compressing neural networks, making them smaller and more efficient for deployment on edge devices. However, an inaccessible user dataset often makes it difficult to ensure the quality of the quantized neural network in practice. In addition, existing approaches may use a single uniform bit-width across the network, resulting in significant accuracy degradation at extremely low bit-widths. To utilize multiple bit-width, sensitivity metric plays a key role in balancing accuracy and compression. In this paper, we propose a novel sensitivity metric that considers the effect of quantization error on task loss and interaction with other layers. Moreover, we develop labeled data generation methods that are not dependent on a specific operation of the neural network. Our experiments show that the proposed metric better represents quantization sensitivity, and generated data are more feasible to be applied to mixed-precision quantization.

* Submission to ICIP2021

Via

Access Paper or Ask Questions

A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone

Oct 16, 2019

Tianchu Guo, Yongchao Liu, Hui Zhang, Xiabing Liu, Youngjun Kwak, Byung In Yoo, Jae-Joon Han, Changkyu Choi

Figure 1 for A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone

Figure 2 for A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone

Figure 3 for A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone

Figure 4 for A Generalized and Robust Method Towards Practical Gaze Estimation on Smart Phone

Abstract:Gaze estimation for ordinary smart phone, e.g. estimating where the user is looking at on the phone screen, can be applied in various applications. However, the widely used appearance-based CNN methods still have two issues for practical adoption. First, due to the limited dataset, gaze estimation is very likely to suffer from over-fitting, leading to poor accuracy at run time. Second, the current methods are usually not robust, i.e. their prediction results having notable jitters even when the user is performing gaze fixation, which degrades user experience greatly. For the first issue, we propose a new tolerant and talented (TAT) training scheme, which is an iterative random knowledge distillation framework enhanced with cosine similarity pruning and aligned orthogonal initialization. The knowledge distillation is a tolerant teaching process providing diverse and informative supervision. The enhanced pruning and initialization is a talented learning process prompting the network to escape from the local minima and re-born from a better start. For the second issue, we define a new metric to measure the robustness of gaze estimator, and propose an adversarial training based Disturbance with Ordinal loss (DwO) method to improve it. The experimental results show that our TAT method achieves state-of-the-art performance on GazeCapture dataset, and that our DwO method improves the robustness while keeping comparable accuracy.

* Accepted by ICCV 2019 Workshop. Fix the error of the Figure 1 in the camera ready file

Via

Access Paper or Ask Questions

Deep generative-contrastive networks for facial expression recognition

Oct 25, 2018

Youngsung Kim, ByungIn Yoo, Youngjun Kwak, Changkyu Choi, Junmo Kim

Figure 1 for Deep generative-contrastive networks for facial expression recognition

Figure 2 for Deep generative-contrastive networks for facial expression recognition

Figure 3 for Deep generative-contrastive networks for facial expression recognition

Figure 4 for Deep generative-contrastive networks for facial expression recognition

Abstract:As the expressive depth of an emotional face differs with individuals or expressions, recognizing an expression using a single facial image at a moment is difficult. A relative expression of a query face compared to a reference face might alleviate this difficulty. In this paper, we propose to utilize contrastive representation that embeds a distinctive expressive factor for a discriminative purpose. The contrastive representation is calculated at the embedding layer of deep networks by comparing a given (query) image with the reference image. We attempt to utilize a generative reference image that is estimated based on the given image. Consequently, we deploy deep neural networks that embed a combination of a generative model, a contrastive model, and a discriminative model with an end-to-end training manner. In our proposed networks, we attempt to disentangle a facial expressive factor in two steps including learning of a generator network and a contrastive encoder network. We conducted extensive experiments on publicly available face expression databases (CK+, MMI, Oulu-CASIA, and in-the-wild databases) that have been widely adopted in the recent literatures. The proposed method outperforms the known state-of-the art methods in terms of the recognition accuracy.

Via

Access Paper or Ask Questions

Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

Aug 20, 2018

Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Youngjun Kwak, Jae-Joon Han, Changkyu Choi

Figure 1 for Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

Figure 2 for Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

Figure 3 for Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

Figure 4 for Joint Training of Low-Precision Neural Network with Quantization Interval Parameters

Abstract:Optimization for low-precision neural network is an important technique for deep convolutional neural network models to be deployed to mobile devices. In order to realize convolutional layers with the simple bit-wise operations, both activation and weight parameters need to be quantized with a low bit-precision. In this paper, we propose a novel optimization method for low-precision neural network which trains both activation quantization parameters and the quantized model weights. We parameterize the quantization intervals of the weights and the activations and train the parameters with the full-precision weights by directly minimizing the training loss rather than minimizing the quantization error. Thanks to the joint optimization of quantization parameters and model weights, we obtain the highly accurate low-precision network given a target bitwidth. We demonstrated the effectiveness of our method on two benchmarks: CIFAR-10 and ImageNet.

* 11 pages, 5 figures, submitted to NIPS 2018

Via

Access Paper or Ask Questions