Abstract:In this study, we address the challenge of depression detection from speech, focusing on the potential of non-semantic features (NSFs) to capture subtle markers of depression. While prior research has leveraged various features for this task, NSFs-extracted from pre-trained models (PTMs) designed for non-semantic tasks such as paralinguistic speech processing (TRILLsson), speaker recognition (x-vector), and emotion recognition (emoHuBERT)-have shown significant promise. However, the potential of combining these diverse features has not been fully explored. In this work, we demonstrate that the amalgamation of NSFs results in complementary behavior, leading to enhanced depression detection performance. Furthermore, to our end, we introduce a simple novel framework, FuSeR, designed to effectively combine these features. Our results show that FuSeR outperforms models utilizing individual NSFs as well as baseline fusion techniques and obtains state-of-the-art (SOTA) performance in E-DAIC benchmark with RMSE of 5.51 and MAE of 4.48, establishing it as a robust approach for depression detection.
Abstract:The use of Social media to share content is on a constant rise. One of the capsize effect of information sharing on Social media includes the spread of sensitive information on the public domain. With the digital gadget market becoming highly competitive and ever-evolving, the trend of an increasing number of sensitive posts leaking information on devices in social media is observed. Many web-blogs on digital gadget market have mushroomed recently, making the problem of information leak all pervasive. Credible leaks on specifics of an upcoming device can cause a lot of financial damage to the respective organization. Hence, it is crucial to assess the credibility of the platforms that continuously post about a smartphone or digital gadget leaks. In this work, we analyze the headlines of leak web-blog posts and their corresponding official press-release. We first collect 54, 495 leak and press-release headlines for different smartphones. We train our custom NER model to capture the evolving smartphone names with an accuracy of 82.14% on manually annotated results. We further propose a credibility score metric for the web-blog, based on the number of falsified and authentic smartphone leak posts.
Abstract:Inspired by recent advances in Kolmogorov-Arnold Networks (KANs), we introduce a novel approach to latent factor conditional asset pricing models. While previous machine learning applications in asset pricing have predominantly used Multilayer Perceptrons with ReLU activation functions to model latent factor exposures, our method introduces a KAN-based autoencoder which surpasses MLP models in both accuracy and interpretability. Our model offers enhanced flexibility in approximating exposures as nonlinear functions of asset characteristics, while simultaneously providing users with an intuitive framework for interpreting latent factors. Empirical backtesting demonstrates our model's superior ability to explain cross-sectional risk exposures. Moreover, long-short portfolios constructed using our model's predictions achieve higher Sharpe ratios, highlighting its practical value in investment management.
Abstract:In this work, we focus on the detection of depression through speech analysis. Previous research has widely explored features extracted from pre-trained models (PTMs) primarily trained for paralinguistic tasks. Although these features have led to sufficient advances in speech-based depression detection, their performance declines in real-world settings. To address this, in this paper, we introduce ComFeAT, an application that employs a CNN model trained on a combination of features extracted from PTMs, a.k.a. neural features and spectral features to enhance depression detection. Spectral features are robust to domain variations, but, they are not as good as neural features in performance, suprisingly, combining them shows complementary behavior and improves over both neural and spectral features individually. The proposed method also improves over previous state-of-the-art (SOTA) works on E-DAIC benchmark.
Abstract:Nonuniform motion constraints are ubiquitous in robotic applications. Geofencing control is one such paradigm where the motion of a robot must be constrained within a predefined boundary. This paper addresses the problem of stabilizing a unicycle robot around a desired circular orbit while confining its motion within a nonconcentric external circular boundary. Our solution approach relies on the concept of the so-called Mobius transformation that, under certain practical conditions, maps two nonconcentric circles to a pair of concentric circles, and hence, results in uniform spatial motion constraints. The choice of such a Mobius transformation is governed by the roots of a quadratic equation in the post-design analysis that decides how the regions enclosed by the two circles are mapped onto the two planes. We show that the problem can be formulated either as a trajectory-constraining problem or an obstacle-avoidance problem in the transformed plane, depending on these roots. Exploiting the idea of the barrier Lyapunov function, we propose a unique control law that solves both these contrasting problems in the transformed plane and renders a solution to the original problem in the actual plane. By relating parameters of two planes under Mobius transformation and its inverse map, we further establish a connection between the control laws in two planes and determine the control law to be applied in the actual plane. Simulation and experimental results are provided to illustrate the key theoretical developments.
Abstract:The research delves into the capabilities of a transformer-based neural network for Ethereum cryptocurrency price forecasting. The experiment runs around the hypothesis that cryptocurrency prices are strongly correlated with other cryptocurrencies and the sentiments around the cryptocurrency. The model employs a transformer architecture for several setups from single-feature scenarios to complex configurations incorporating volume, sentiment, and correlated cryptocurrency prices. Despite a smaller dataset and less complex architecture, the transformer model surpasses ANN and MLP counterparts on some parameters. The conclusion presents a hypothesis on the illusion of causality in cryptocurrency price movements driven by sentiments.
Abstract:The study presents a deep learning framework aimed at synthesizing 3D MRI volumes from three-dimensional ultrasound images of the brain utilizing the Pix2Pix GAN model. The process involves inputting a 3D volume of ultrasounds into a UNET generator and patch discriminator, generating a corresponding 3D volume of MRI. Model performance was evaluated using losses on the discriminator and generator applied to a dataset of 3D ultrasound and MRI images. The results indicate that the synthesized MRI images exhibit some similarity to the expected outcomes. Despite challenges related to dataset size, computational resources, and technical complexities, the method successfully generated MRI volume with a satisfactory similarity score meant to serve as a baseline for further research. It underscores the potential of deep learning-based volume synthesis techniques for ultrasound to MRI conversion, showcasing their viability for medical applications. Further refinement and exploration are warranted for enhanced clinical relevance.
Abstract:This review paper delves into the present state of medical imaging, with a specific focus on the use of deep learning techniques for brain image synthesis. The need for medical image synthesis to improve diagnostic accuracy and decrease invasiveness in medical procedures is emphasized, along with the role of deep learning in enabling these advancements. The paper examines various methods and techniques for brain image synthesis, including 2D to 3D constructions, MRI synthesis, and the use of transformers. It also addresses limitations and challenges faced in these methods, such as obtaining well-curated training data and addressing brain ultrasound issues. The review concludes by exploring the future potential of this field and the opportunities for further advancements in medical imaging using deep learning techniques. The significance of transformers and their potential to revolutionize the medical imaging field is highlighted. Additionally, the paper discusses the potential solutions to the shortcomings and limitations faced in this field. The review provides researchers with an updated reference on the present state of the field and aims to inspire further research and bridge the gap between the present state of medical imaging and the future possibilities offered by deep learning techniques.
Abstract:Differential Dynamic Programming (DDP) is a popular technique used to generate motion for dynamic-legged robots in the recent past. However, in most cases, only the first-order partial derivatives of the underlying dynamics are used, resulting in the iLQR approach. Neglecting the second-order terms often slows down the convergence rate compared to full DDP. Multi-Shooting is another popular technique to improve robustness, especially if the dynamics are highly non-linear. In this work, we consider Multi-Shooting DDP for trajectory optimization of a bounding gait for a simplified quadruped model. As the main contribution, we develop Second-Order analytical partial derivatives of the rigid-body contact dynamics, extending our previous results for fixed/floating base models with multi-DoF joints. Finally, we show the benefits of a novel Quasi-Newton method for approximating second-order derivatives of the dynamics, leading to order-of-magnitude speedups in the convergence compared to the full DDP method.
Abstract:Model-based control for robots has increasingly been dependent on optimization-based methods like Differential Dynamic Programming and iterative LQR (iLQR). These methods can form the basis of Model-Predictive Control (MPC), which is commonly used for controlling legged robots. Computing the partial derivatives of the dynamics is often the most expensive part of these algorithms, regardless of whether analytical methods, Finite Difference, Automatic Differentiation (AD), or Chain-Rule accumulation is used. Since the second-order derivatives of dynamics result in tensor computations, they are often ignored, leading to the use of iLQR, instead of the full second-order DDP method. In this paper, we present analytical methods to compute the second-order derivatives of inverse and forward dynamics for open-chain rigid-body systems with multi-DoF joints and fixed/floating bases. An extensive comparison of accuracy and run-time performance with AD and other methods is provided, including the consideration of code-generation techniques in C/C++ to speed up the computations. For the 36 DoF ATLAS humanoid, the second-order Inverse, and the Forward dynamics derivatives take approx 200 mu s, and approx 2.1 ms respectively, resulting in a 3x speedup over the AD approach.