Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yue Yin

Dynamic Learning and Productivity for Data Analysts: A Bayesian Hidden Markov Model Perspective

Mar 26, 2025

Yue Yin

Abstract:Data analysts are essential in organizations, transforming raw data into insights that drive decision-making and strategy. This study explores how analysts' productivity evolves on a collaborative platform, focusing on two key learning activities: writing queries and viewing peer queries. While traditional research often assumes static models, where performance improves steadily with cumulative learning, such models fail to capture the dynamic nature of real-world learning. To address this, we propose a Hidden Markov Model (HMM) that tracks how analysts transition between distinct learning states based on their participation in these activities. Using an industry dataset with 2,001 analysts and 79,797 queries, this study identifies three learning states: novice, intermediate, and advanced. Productivity increases as analysts advance to higher states, reflecting the cumulative benefits of learning. Writing queries benefits analysts across all states, with the largest gains observed for novices. Viewing peer queries supports novices but may hinder analysts in higher states due to cognitive overload or inefficiencies. Transitions between states are also uneven, with progression from intermediate to advanced being particularly challenging. This study advances understanding of into dynamic learning behavior of knowledge worker and offers practical implications for designing systems, optimizing training, enabling personalized learning, and fostering effective knowledge sharing.

* 29 pages; a shorter 11-page version is accepted by HCI International (HCII) 2025;

Via

Access Paper or Ask Questions

EL-MLFFs: Ensemble Learning of Machine Leaning Force Fields

Mar 26, 2024

Bangchen Yin, Yue Yin, Yuda W. Tang, Hai Xiao

Abstract:Machine learning force fields (MLFFs) have emerged as a promising approach to bridge the accuracy of quantum mechanical methods and the efficiency of classical force fields. However, the abundance of MLFF models and the challenge of accurately predicting atomic forces pose significant obstacles in their practical application. In this paper, we propose a novel ensemble learning framework, EL-MLFFs, which leverages the stacking method to integrate predictions from diverse MLFFs and enhance force prediction accuracy. By constructing a graph representation of molecular structures and employing a graph neural network (GNN) as the meta-model, EL-MLFFs effectively captures atomic interactions and refines force predictions. We evaluate our approach on two distinct datasets: methane molecules and methanol adsorbed on a Cu(100) surface. The results demonstrate that EL-MLFFs significantly improves force prediction accuracy compared to individual MLFFs, with the ensemble of all eight models yielding the best performance. Moreover, our ablation study highlights the crucial roles of the residual network and graph attention layers in the model's architecture. The EL-MLFFs framework offers a promising solution to the challenges of model selection and force prediction accuracy in MLFFs, paving the way for more reliable and efficient molecular simulations.

* 12 pages, 3 figures

Via

Access Paper or Ask Questions

Component Segmentation of Engineering Drawings Using Graph Convolutional Networks

Dec 01, 2022

Wentai Zhang, Joe Joseph, Yue Yin, Liuyue Xie, Tomotake Furuhata, Soji Yamakawa, Kenji Shimada, Levent Burak Kara

Figure 1 for Component Segmentation of Engineering Drawings Using Graph Convolutional Networks

Figure 2 for Component Segmentation of Engineering Drawings Using Graph Convolutional Networks

Figure 3 for Component Segmentation of Engineering Drawings Using Graph Convolutional Networks

Figure 4 for Component Segmentation of Engineering Drawings Using Graph Convolutional Networks

Abstract:We present a data-driven framework to automate the vectorization and machine interpretation of 2D engineering part drawings. In industrial settings, most manufacturing engineers still rely on manual reads to identify the topological and manufacturing requirements from drawings submitted by designers. The interpretation process is laborious and time-consuming, which severely inhibits the efficiency of part quotation and manufacturing tasks. While recent advances in image-based computer vision methods have demonstrated great potential in interpreting natural images through semantic segmentation approaches, the application of such methods in parsing engineering technical drawings into semantically accurate components remains a significant challenge. The severe pixel sparsity in engineering drawings also restricts the effective featurization of image-based data-driven methods. To overcome these challenges, we propose a deep learning based framework that predicts the semantic type of each vectorized component. Taking a raster image as input, we vectorize all components through thinning, stroke tracing, and cubic bezier fitting. Then a graph of such components is generated based on the connectivity between the components. Finally, a graph convolutional neural network is trained on this graph data to identify the semantic type of each component. We test our framework in the context of semantic segmentation of text, dimension and, contour components in engineering drawings. Results show that our method yields the best performance compared to recent image, and graph-based segmentation methods.

* Preprint submitted to Computers in Industry

Via

Access Paper or Ask Questions

RecipeSnap -- a lightweight image-to-recipe model

May 04, 2022

Jianfa Chen, Yue Yin, Yifan Xu

Figure 1 for RecipeSnap -- a lightweight image-to-recipe model

Figure 2 for RecipeSnap -- a lightweight image-to-recipe model

Figure 3 for RecipeSnap -- a lightweight image-to-recipe model

Figure 4 for RecipeSnap -- a lightweight image-to-recipe model

Abstract:In this paper we want to address the problem of automation for recognition of photographed cooking dishes and generating the corresponding food recipes. Current image-to-recipe models are computation expensive and require powerful GPUs for model training and implementation. High computational cost prevents those existing models from being deployed on portable devices, like smart phones. To solve this issue we introduce a lightweight image-to-recipe prediction model, RecipeSnap, that reduces memory cost and computational cost by more than 90% while still achieving 2.0 MedR, which is in line with the state-of-the-art model. A pre-trained recipe encoder was used to compute recipe embeddings. Recipes from Recipe1M dataset and corresponding recipe embeddings are collected as a recipe library, which are used for image encoder training and image query later. We use MobileNet-V2 as image encoder backbone, which makes our model suitable to portable devices. This model can be further developed into an application for smart phones with a few effort. A comparison of the performance between this lightweight model to other heavy models are presented in this paper. Code, data and models are publicly accessible on github.

* 7 pages, 3 figures

Via

Access Paper or Ask Questions

SerialTrack: ScalE and Rotation Invariant Augmented Lagrangian Particle Tracking

Mar 23, 2022

Jin Yang, Yue Yin, Alexander K. Landauer, Selda Buyuktozturk, Jing Zhang, Luke Summey, Alexander McGhee, Matt K. Fu, John O. Dabiri, Christian Franck

Figure 1 for SerialTrack: ScalE and Rotation Invariant Augmented Lagrangian Particle Tracking

Figure 2 for SerialTrack: ScalE and Rotation Invariant Augmented Lagrangian Particle Tracking

Figure 3 for SerialTrack: ScalE and Rotation Invariant Augmented Lagrangian Particle Tracking

Figure 4 for SerialTrack: ScalE and Rotation Invariant Augmented Lagrangian Particle Tracking

Abstract:We present a new particle tracking algorithm to accurately resolve large deformation and rotational motion fields, which takes advantage of both local and global particle tracking algorithms. We call this method the ScalE and Rotation Invariant Augmented Lagrangian Particle Tracking (SerialTrack). This method builds an iterative scale and rotation invariant topology-based feature for each particle within a multi-scale tracking algorithm. The global kinematic compatibility condition is applied as a global augmented Lagrangian constraint to enhance the tracking accuracy. An open source software package implementing this numerical approach to track both 2D and 3D, incremental and cumulative deformation fields is provided.

Via

Access Paper or Ask Questions

A Model-Agnostic Causal Learning Framework for Recommendation using Search Data

Feb 10, 2022

Zihua Si, Xueran Han, Xiao Zhang, Jun Xu, Yue Yin, Yang Song, Ji-Rong Wen

Figure 1 for A Model-Agnostic Causal Learning Framework for Recommendation using Search Data

Figure 2 for A Model-Agnostic Causal Learning Framework for Recommendation using Search Data

Figure 3 for A Model-Agnostic Causal Learning Framework for Recommendation using Search Data

Figure 4 for A Model-Agnostic Causal Learning Framework for Recommendation using Search Data

Abstract:Machine-learning based recommender systems(RSs) has become an effective means to help people automatically discover their interests. Existing models often represent the rich information for recommendation, such as items, users, and contexts, as embedding vectors and leverage them to predict users' feedback. In the view of causal analysis, the associations between these embedding vectors and users' feedback are a mixture of the causal part that describes why an item is preferred by a user, and the non-causal part that merely reflects the statistical dependencies between users and items, for example, the exposure mechanism, public opinions, display position, etc. However, existing RSs mostly ignored the striking differences between the causal parts and non-causal parts when using these embedding vectors. In this paper, we propose a model-agnostic framework named IV4Rec that can effectively decompose the embedding vectors into these two parts, hence enhancing recommendation results. Specifically, we jointly consider users' behaviors in search scenarios and recommendation scenarios. Adopting the concepts in causal analysis, we embed users' search behaviors as instrumental variables (IVs), to help decompose original embedding vectors in recommendation, i.e., treatments. IV4Rec then combines the two parts through deep neural networks and uses the combined results for recommendation. IV4Rec is model-agnostic and can be applied to a number of existing RSs such as DIN and NRHUB. Experimental results on both public and proprietary industrial datasets demonstrate that IV4Rec consistently enhances RSs and outperforms a framework that jointly considers search and recommendation.

* 9 pages, 7 figures, accepted by The Web Conference 2022

Via

Access Paper or Ask Questions

Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines

Sep 06, 2020

Shuning Jin, Yue Yin, XianE Tang, Ted Pedersen

Figure 1 for Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines

Figure 2 for Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines

Figure 3 for Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines

Figure 4 for Duluth at SemEval-2020 Task 7: Using Surprise as a Key to Unlock Humorous Headlines

Abstract:We use pretrained transformer-based language models in SemEval-2020 Task 7: Assessing the Funniness of Edited News Headlines. Inspired by the incongruity theory of humor, we use a contrastive approach to capture the surprise in the edited headlines. In the official evaluation, our system gets 0.531 RMSE in Subtask 1, 11th among 49 submissions. In Subtask 2, our system gets 0.632 accuracy, 9th among 32 submissions.

* To appear in the Proceedings of the 14th International Workshop on Semantic Evaluation (SemEval 2020), December 12-13, 2020, Barcelona

Via

Access Paper or Ask Questions

Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph

Dec 31, 2018

Zhuoren Jiang, Yue Yin, Liangcai Gao, Yao Lu, Xiaozhong Liu

Figure 1 for Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph

Figure 2 for Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph

Figure 3 for Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph

Figure 4 for Cross-language Citation Recommendation via Hierarchical Representation Learning on Heterogeneous Graph

Abstract:While the volume of scholarly publications has increased at a frenetic pace, accessing and consuming the useful candidate papers, in very large digital libraries, is becoming an essential and challenging task for scholars. Unfortunately, because of language barrier, some scientists (especially the junior ones or graduate students who do not master other languages) cannot efficiently locate the publications hosted in a foreign language repository. In this study, we propose a novel solution, cross-language citation recommendation via Hierarchical Representation Learning on Heterogeneous Graph (HRLHG), to address this new problem. HRLHG can learn a representation function by mapping the publications, from multilingual repositories, to a low-dimensional joint embedding space from various kinds of vertexes and relations on a heterogeneous graph. By leveraging both global (task specific) plus local (task independent) information as well as a novel supervised hierarchical random walk algorithm, the proposed method can optimize the publication representations by maximizing the likelihood of locating the important cross-language neighborhoods on the graph. Experiment results show that the proposed method can not only outperform state-of-the-art baseline models, but also improve the interpretability of the representation model for cross-language citation recommendation task.

* The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR 2018), 635--644

Via

Access Paper or Ask Questions

Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

Jun 10, 2018

Nan Wang, Hongning Wang, Yiling Jia, Yue Yin

Figure 1 for Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

Figure 2 for Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

Figure 3 for Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

Figure 4 for Explainable Recommendation via Multi-Task Learning in Opinionated Text Data

Abstract:Explaining automatically generated recommendations allows users to make more informed and accurate decisions about which results to utilize, and therefore improves their satisfaction. In this work, we develop a multi-task learning solution for explainable recommendation. Two companion learning tasks of user preference modeling for recommendation} and \textit{opinionated content modeling for explanation are integrated via a joint tensor factorization. As a result, the algorithm predicts not only a user's preference over a list of items, i.e., recommendation, but also how the user would appreciate a particular item at the feature level, i.e., opinionated textual explanation. Extensive experiments on two large collections of Amazon and Yelp reviews confirmed the effectiveness of our solution in both recommendation and explanation tasks, compared with several existing recommendation algorithms. And our extensive user study clearly demonstrates the practical value of the explainable recommendations generated by our algorithm.

* 10 pages, SIGIR 2018

Via

Access Paper or Ask Questions