Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Han Cui

Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values

Feb 19, 2025

Hongbo Zhang, Han Cui, Guangsheng Bao, Linyi Yang, Jun Wang, Yue Zhang

Figure 1 for Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values

Figure 2 for Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values

Figure 3 for Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values

Figure 4 for Direct Value Optimization: Improving Chain-of-Thought Reasoning in LLMs with Refined Values

Abstract:We introduce Direct Value Optimization (DVO), an innovative reinforcement learning framework for enhancing large language models in complex reasoning tasks. Unlike traditional methods relying on preference labels, DVO utilizes value signals at individual reasoning steps, optimizing models via a mean squared error loss. The key benefit of DVO lies in its fine-grained supervision, circumventing the need for labor-intensive human annotations. Target values within the DVO are estimated using either Monte Carlo Tree Search or an outcome value model. Our empirical analysis on both mathematical and commonsense reasoning tasks shows that DVO consistently outperforms existing offline preference optimization techniques, even with fewer training steps. These findings underscore the importance of value signals in advancing reasoning capabilities and highlight DVO as a superior methodology under scenarios lacking explicit human preference information.

* preprint

Via

Access Paper or Ask Questions

Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts

May 01, 2024

Han Cui, Alfredo De Goyeneche, Efrat Shimron, Boyuan Ma, Michael Lustig

Figure 1 for Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts

Figure 2 for Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts

Figure 3 for Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts

Figure 4 for Reference-Free Image Quality Metric for Degradation and Reconstruction Artifacts

Abstract:Image Quality Assessment (IQA) is essential in various Computer Vision tasks such as image deblurring and super-resolution. However, most IQA methods require reference images, which are not always available. While there are some reference-free IQA metrics, they have limitations in simulating human perception and discerning subtle image quality variations. We hypothesize that the JPEG quality factor is representatives of image quality measurement, and a well-trained neural network can learn to accurately evaluate image quality without requiring a clean reference, as it can recognize image degradation artifacts based on prior knowledge. Thus, we developed a reference-free quality evaluation network, dubbed "Quality Factor (QF) Predictor", which does not require any reference. Our QF Predictor is a lightweight, fully convolutional network comprising seven layers. The model is trained in a self-supervised manner: it receives JPEG compressed image patch with a random QF as input, is trained to accurately predict the corresponding QF. We demonstrate the versatility of the model by applying it to various tasks. First, our QF Predictor can generalize to measure the severity of various image artifacts, such as Gaussian Blur and Gaussian noise. Second, we show that the QF Predictor can be trained to predict the undersampling rate of images reconstructed from Magnetic Resonance Imaging (MRI) data.

Via

Access Paper or Ask Questions

Exploring Hybrid Question Answering via Program-based Prompting

Feb 16, 2024

Qi Shi, Han Cui, Haofeng Wang, Qingfu Zhu, Wanxiang Che, Ting Liu

Figure 1 for Exploring Hybrid Question Answering via Program-based Prompting

Figure 2 for Exploring Hybrid Question Answering via Program-based Prompting

Figure 3 for Exploring Hybrid Question Answering via Program-based Prompting

Figure 4 for Exploring Hybrid Question Answering via Program-based Prompting

Abstract:Question answering over heterogeneous data requires reasoning over diverse sources of data, which is challenging due to the large scale of information and organic coupling of heterogeneous data. Various approaches have been proposed to address these challenges. One approach involves training specialized retrievers to select relevant information, thereby reducing the input length. Another approach is to transform diverse modalities of data into a single modality, simplifying the task difficulty and enabling more straightforward processing. In this paper, we propose HProPro, a novel program-based prompting framework for the hybrid question answering task. HProPro follows the code generation and execution paradigm. In addition, HProPro integrates various functions to tackle the hybrid reasoning scenario. Specifically, HProPro contains function declaration and function implementation to perform hybrid information-seeking over data from various sources and modalities, which enables reasoning over such data without training specialized retrievers or performing modal transformations. Experimental results on two typical hybrid question answering benchmarks HybridQA and MultiModalQA demonstrate the effectiveness of HProPro: it surpasses all baseline systems and achieves the best performances in the few-shot settings on both datasets.

Via

Access Paper or Ask Questions

MiliPoint: A Point Cloud Dataset for mmWave Radar

Sep 23, 2023

Han Cui, Shu Zhong, Jiacheng Wu, Zichao Shen, Naim Dahnoun, Yiren Zhao

Figure 1 for MiliPoint: A Point Cloud Dataset for mmWave Radar

Figure 2 for MiliPoint: A Point Cloud Dataset for mmWave Radar

Figure 3 for MiliPoint: A Point Cloud Dataset for mmWave Radar

Figure 4 for MiliPoint: A Point Cloud Dataset for mmWave Radar

Abstract:Millimetre-wave (mmWave) radar has emerged as an attractive and cost-effective alternative for human activity sensing compared to traditional camera-based systems. mmWave radars are also non-intrusive, providing better protection for user privacy. However, as a Radio Frequency (RF) based technology, mmWave radars rely on capturing reflected signals from objects, making them more prone to noise compared to cameras. This raises an intriguing question for the deep learning community: Can we develop more effective point set-based deep learning methods for such attractive sensors? To answer this question, our work, termed MiliPoint, delves into this idea by providing a large-scale, open dataset for the community to explore how mmWave radars can be utilised for human activity recognition. Moreover, MiliPoint stands out as it is larger in size than existing datasets, has more diverse human actions represented, and encompasses all three key tasks in human activity recognition. We have also established a range of point-based deep neural networks such as DGCNN, PointNet++ and PointTransformer, on MiliPoint, which can serve to set the ground baseline for further development.

Via

Access Paper or Ask Questions

Explanation Graph Generation via Generative Pre-training over Synthetic Graphs

Jun 01, 2023

Han Cui, Shangzhan Li, Yu Zhang, Qi Shi

Figure 1 for Explanation Graph Generation via Generative Pre-training over Synthetic Graphs

Figure 2 for Explanation Graph Generation via Generative Pre-training over Synthetic Graphs

Figure 3 for Explanation Graph Generation via Generative Pre-training over Synthetic Graphs

Figure 4 for Explanation Graph Generation via Generative Pre-training over Synthetic Graphs

Abstract:The generation of explanation graphs is a significant task that aims to produce explanation graphs in response to user input, revealing the internal reasoning process. This task is challenging due to the significant discrepancy between unstructured user queries and structured explanation graphs. Current research commonly fine-tunes a text-based pre-trained language model on a small downstream dataset that is annotated with labeled graphs. However, due to the limited scale of available datasets, this approach may prove to be insufficient in bridging the gap between natural language text and structured graphs. In this paper, to alleviate the above limitations, we propose a novel pre-trained framework EG3P(for Explanation Graph Generation via Generative Pre-training over synthetic graphs) for the explanation graph generation task. Specifically, we first propose a text-to-graph generative task to pre-train the model with the goal of bridging the text-graph gap. Additionally, we propose an automatic corpus synthesis strategy for synthesizing a large scale of high-quality corpus, reducing the reliance on costly manual annotation methods. Experimental results on ExplaGraphs show the effectiveness of EG3P that our model surpasses all baseline systems with remarkable margins. Besides, further analysis demonstrates that EG3P is able to generate better explanation graphs on actual reasoning tasks such as CommonsenseQA and OpenbookQA.

* Accepted by ACL23-Findings

Via

Access Paper or Ask Questions

Millimetre-wave Radar for Low-Cost 3D Imaging: A Performance Study

Jan 31, 2023

Han Cui, Jiacheng Wu, Naim Dahnoun

Abstract:Millimetre-wave (mmWave) radars can generate 3D point clouds to represent objects in the scene. However, the accuracy and density of the generated point cloud can be lower than a laser sensor. Although researchers have used mmWave radars for various applications, there are few quantitative evaluations on the quality of the point cloud generated by the radar and there is a lack of a standard on how this quality can be assessed. This work aims to fill the gap in the literature. A radar simulator is built to evaluate the most common data processing chains of 3D point cloud construction and to examine the capability of the mmWave radar as a 3D imaging sensor under various factors. It will be shown that the radar detection can be noisy and have an imbalance distribution. To address the problem, a novel super-resolution point cloud construction (SRPC) algorithm is proposed to improve the spatial resolution of the point cloud and is shown to be able to produce a more natural point cloud and reduce outliers.

* 14 pages, 16 figures

Via

Access Paper or Ask Questions

Deep Transfer Learning for WiFi Localization

Mar 08, 2021

Peizheng Li, Han Cui, Aftab Khan, Usman Raza, Robert Piechocki, Angela Doufexi, Tim Farnham

Figure 1 for Deep Transfer Learning for WiFi Localization

Figure 2 for Deep Transfer Learning for WiFi Localization

Figure 3 for Deep Transfer Learning for WiFi Localization

Figure 4 for Deep Transfer Learning for WiFi Localization

Abstract:This paper studies a WiFi indoor localisation technique based on using a deep learning model and its transfer strategies. We take CSI packets collected via the WiFi standard channel sounding as the training dataset and verify the CNN model on the subsets collected in three experimental environments. We achieve a localisation accuracy of 46.55 cm in an ideal $(6.5m \times 2.5m)$ office with no obstacles, 58.30 cm in an office with obstacles, and 102.8 cm in a sports hall $(40 \times 35m)$. Then, we evaluate the transfer ability of the proposed model to different environments. The experimental results show that, for a trained localisation model, feature extraction layers can be directly transferred to other models and only the fully connected layers need to be retrained to achieve the same baseline accuracy with non-transferred base models. This can save 60% of the training parameters and reduce the training time by more than half. Finally, an ablation study of the training dataset shows that, in both office and sport hall scenarios, after reusing the feature extraction layers of the base model, only 55% of the training data is required to obtain the models' accuracy similar to the base models.

* 5 pages, 5 figures, has been accepted for lecture presentation at the 2021 IEEE Radar Conference (IEEE RadarConf 2021)

Via

Access Paper or Ask Questions

Wireless Localisation in WiFi using Novel Deep Architectures

Oct 16, 2020

Peizheng Li, Han Cui, Aftab Khan, Usman Raza, Robert Piechocki, Angela Doufexi, Tim Farnham

Figure 1 for Wireless Localisation in WiFi using Novel Deep Architectures

Figure 2 for Wireless Localisation in WiFi using Novel Deep Architectures

Figure 3 for Wireless Localisation in WiFi using Novel Deep Architectures

Figure 4 for Wireless Localisation in WiFi using Novel Deep Architectures

Abstract:This paper studies the indoor localisation of WiFi devices based on a commodity chipset and standard channel sounding. First, we present a novel shallow neural network (SNN) in which features are extracted from the channel state information (CSI) corresponding to WiFi subcarriers received on different antennas and used to train the model. The single-layer architecture of this localisation neural network makes it lightweight and easy-to-deploy on devices with stringent constraints on computational resources. We further investigate for localisation the use of deep learning models and design novel architectures for convolutional neural network (CNN) and long-short term memory (LSTM). We extensively evaluate these localisation algorithms for continuous tracking in indoor environments. Experimental results prove that even an SNN model, after a careful handcrafted feature extraction, can achieve accurate localisation. Meanwhile, using a well-organised architecture, the neural network models can be trained directly with raw data from the CSI and localisation features can be automatically extracted to achieve accurate position estimates. We also found that the performance of neural network-based methods are directly affected by the number of anchor access points (APs) regardless of their structure. With three APs, all neural network models proposed in this paper can obtain localisation accuracy of around 0.5 metres. In addition the proposed deep NN architecture reduces the data pre-processing time by 6.5 hours compared with a shallow NN using the data collected in our testbed. In the deployment phase, the inference time is also significantly reduced to 0.1 ms per sample. We also demonstrate the generalisation capability of the proposed method by evaluating models using different target movement characteristics to the ones in which they were trained.

* Accepted for presentation at the 25th International Conference on Pattern Recognition (ICPR), IEEE, 2020

Via

Access Paper or Ask Questions

Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Sep 06, 2019

Yiren Zhao, Ilia Shumailov, Han Cui, Xitong Gao, Robert Mullins, Ross Anderson

Figure 1 for Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Figure 2 for Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Figure 3 for Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Figure 4 for Blackbox Attacks on Reinforcement Learning Agents Using Approximated Temporal Information

Abstract:Recent research on reinforcement learning has shown that trained agents are vulnerable to maliciously crafted adversarial samples. In this work, we show how adversarial samples against RL agents can be generalised from White-box and Grey-box attacks to a strong Black-box case, namely where the attacker has no knowledge of the agents and their training methods. We use sequence-to-sequence models to predict a single action or a sequence of future actions that a trained agent will make. Our approximation model, based on time-series information from the agent, successfully predicts agents' future actions with consistently above 80% accuracy on a wide range of games and training methods. Second, we find that although such adversarial samples are transferable, they do not outperform random Gaussian noise as a means of reducing the game scores of trained RL agents. This highlights a serious methodological deficiency in previous work on such agents; random jamming should have been taken as the baseline for evaluation. Third, we do find a novel use for adversarial samples in this context: they can be used to trigger a trained agent to misbehave after a specific delay. This appears to be a genuinely new type of attack; it potentially enables an attacker to use devices controlled by RL agents as time bombs.

Via

Access Paper or Ask Questions