Abstract:Speech is a fundamental aspect of human life, crucial not only for communication but also for cognitive, social, and academic development. Children with speech disorders (SD) face significant challenges that, if unaddressed, can result in lasting negative impacts. Traditionally, speech and language assessments (SLA) have been conducted by skilled speech-language pathologists (SLPs), but there is a growing need for efficient and scalable SLA methods powered by artificial intelligence. This position paper presents a survey of existing techniques suitable for automating SLA pipelines, with an emphasis on adapting automatic speech recognition (ASR) models for children's speech, an overview of current SLAs and their automated counterparts to demonstrate the feasibility of AI-enhanced SLA pipelines, and a discussion of practical considerations, including accessibility and privacy concerns, associated with the deployment of AI-powered SLAs.
Abstract:Large language models (LLMs) have attracted significant attention for their remarkable abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging these factors, we demonstrate that intrinsic self-correction ability is exhibited across multiple existing LLMs. Our findings offer insights into the fundamental theories underlying the self-correction behavior of LLMs and remark on the importance of unbiased prompts and zero temperature settings in harnessing their full potential.
Abstract:Neural Architecture Search (NAS) has become the de fecto tools in the industry in automating the design of deep neural networks for various applications, especially those driven by mobile and edge devices with limited computing resources. The emerging large language models (LLMs), due to their prowess, have also been incorporated into NAS recently and show some promising results. This paper conducts further exploration in this direction by considering three important design metrics simultaneously, i.e., model accuracy, fairness, and hardware deployment efficiency. We propose a novel LLM-based NAS framework, FL-NAS, in this paper, and show experimentally that FL-NAS can indeed find high-performing DNNs, beating state-of-the-art DNN models by orders-of-magnitude across almost all design considerations.
Abstract:Offline reinforcement learning aims to find the optimal policy from a pre-collected dataset without active exploration. This problem is faced with major challenges, such as a limited amount of data and distribution shift. Existing studies employ the principle of pessimism in face of uncertainty, and penalize rewards for less visited state-action pairs. In this paper, we directly model the uncertainty in the transition kernel using an uncertainty set, and then employ the approach of distributionally robust optimization that optimizes the worst-case performance over the uncertainty set. We first design a Hoeffding-style uncertainty set, which guarantees that the true transition kernel lies in the uncertainty set with high probability. We theoretically prove that it achieves an $\epsilon$-accuracy with a sample complexity of $\mathcal{O}\left((1-\gamma)^{-4}\epsilon^{-2}SC^{\pi^*} \right)$, where $\gamma$ is the discount factor, $C^{\pi^*}$ is the single-policy concentrability for any comparator policy $\pi^*$, and $S$ is the number of states. We further design a Bernstein-style uncertainty set, which does not necessarily guarantee the true transition kernel lies in the uncertainty set. We show an improved and near-optimal sample complexity of $\mathcal{O}\left((1-\gamma)^{-3}\epsilon^{-2}\left(SC^{\pi^*}+(\mu_{\min})^{-1}\right) \right)$, where $\mu_{\min}$ denotes the minimal non-zero entry of the behavior distribution. In addition, the computational complexity of our algorithms is the same as one of the LCB-based methods in the literature. Our results demonstrate that distributionally robust optimization method can also efficiently solve offline reinforcement learning.
Abstract:Tactile sensing or fabric hand plays a critical role in an individual's decision to buy a certain fabric from the range of available fabrics for a desired application. Therefore, textile and clothing manufacturers have long been in search of an objective method for assessing fabric hand, which can then be used to engineer fabrics with a desired hand. Recognizing textures and materials in real-world images has played an important role in object recognition and scene understanding. In this paper, we explore how to computationally characterize apparent or latent properties (e.g., surface smoothness) of materials, i.e., computational material surface characterization, which moves a step further beyond material recognition. We formulate the problem as a very fine-grained texture classification problem, and study how deep learning-based texture representation techniques can help tackle the task. We introduce a new, large-scale challenging microscopic material surface dataset (CoMMonS), geared towards an automated fabric quality assessment mechanism in an intelligent manufacturing system. We then conduct a comprehensive evaluation of state-of-the-art deep learning-based methods for texture classification using CoMMonS. Additionally, we propose a multi-level texture encoding and representation network (MuLTER), which simultaneously leverages low- and high-level features to maintain both texture details and spatial information in the texture representation. Our results show that, in comparison with the state-of-the-art deep texture descriptors, MuLTER yields higher accuracy not only on our CoMMonS dataset for material characterization, but also on established datasets such as MINC-2500 and GTOS-mobile for material recognition.
Abstract:In this paper, we present an efficient and distinctive local descriptor, namely block intensity and gradient difference (BIGD). In an image patch, we randomly sample multi-scale block pairs and utilize the intensity and gradient differences of pairwise blocks to construct the local BIGD descriptor. The random sampling strategy and the multi-scale framework help BIGD descriptors capture the distinctive patterns of patches at different orientations and spatial granularity levels. We use vectors of locally aggregated descriptors (VLAD) or improved Fisher vector (IFV) to encode local BIGD descriptors into a full image descriptor, which is then fed into a linear support vector machine (SVM) classifier for texture classification. We compare the proposed descriptor with typical and state-of-the-art ones by evaluating their classification performance on five public texture data sets including Brodatz, CUReT, KTH-TIPS, and KTH-TIPS-2a and -2b. Experimental results show that the proposed BIGD descriptor with stronger discriminative power yields 0.12% ~ 6.43% higher classification accuracy than the state-of-the-art texture descriptor, dense microblock difference (DMD).
Abstract:Image retrieval is an important problem in the area of multimedia processing. This paper presents two new curvelet-based algorithms for texture retrieval which are suitable for use in constrained-memory devices. The developed algorithms are tested on three publicly available texture datasets: CUReT, Mondial-Marmi, and STex-fabric. Our experiments confirm the effectiveness of the proposed system. Furthermore, a weighted version of the proposed retrieval algorithm is proposed, which is shown to achieve promising results in the classification of seismic activities.
Abstract:In this paper, we propose a multi-level texture encoding and representation network (MuLTER) for texture-related applications. Based on a multi-level pooling architecture, the MuLTER network simultaneously leverages low- and high-level features to maintain both texture details and spatial information. Such a pooling architecture involves few extra parameters and keeps feature dimensions fixed despite of the changes of image sizes. In comparison with state-of-the-art texture descriptors, the MuLTER network yields higher recognition accuracy on typical texture datasets such as MINC-2500 and GTOS-mobile with a discriminative and compact representation. In addition, we analyze the impact of combining features from different levels, which supports our claim that the fusion of multi-level features efficiently enhances recognition performance. Our source code will be published on GitHub (https://github.com/olivesgatech).
Abstract:Previous VoIP steganalysis methods face great challenges in detecting speech signals at low embedding rates, and they are also generally difficult to perform real-time detection, making them hard to truly maintain cyberspace security. To solve these two challenges, in this paper, combined with the sliding window detection algorithm and Convolution Neural Network we propose a real-time VoIP steganalysis method which based on multi-channel convolution sliding windows. In order to analyze the correlations between frames and different neighborhood frames in a VoIP signal, we define multi channel sliding detection windows. Within each sliding window, we design two feature extraction channels which contain multiple convolution layers with multiple convolution kernels each layer to extract correlation features of the input signal. Then based on these extracted features, we use a forward fully connected network for feature fusion. Finally, by analyzing the statistical distribution of these features, the discriminator will determine whether the input speech signal contains covert information or not.We designed several experiments to test the proposed model's detection ability under various conditions, including different embedding rates, different speech length, etc. Experimental results showed that the proposed model outperforms all the previous methods, especially in the case of low embedding rate, which showed state-of-the-art performance. In addition, we also tested the detection efficiency of the proposed model, and the results showed that it can achieve almost real-time detection of VoIP speech signals.
Abstract:In this paper, we explore how to computationally characterize subsurface geological structures presented in seismic volumes using texture attributes. For this purpose, we conduct a comparative study of typical texture attributes presented in the image processing literature. We focus on spatial attributes in this study and examine them in a new application for seismic interpretation, i.e., seismic volume labeling. For this application, a data volume is automatically segmented into various structures, each assigned with its corresponding label. If the labels are assigned with reasonable accuracy, such volume labeling will help initiate an interpretation process in a more effective manner. Our investigation proves the feasibility of accomplishing this task using texture attributes. Through the study, we also identify advantages and disadvantages associated with each attribute.