Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zijie Wang

Construction of an Organ Shape Atlas Using a Hierarchical Mesh Variational Autoencoder

Jun 18, 2025

Zijie Wang, Ryuichi Umehara, Mitsuhiro Nakamura, Megumi Nakao

Abstract:An organ shape atlas, which represents the shape and position of the organs and skeleton of a living body using a small number of parameters, is expected to have a wide range of clinical applications, including intraoperative guidance and radiotherapy. Because the shape and position of soft organs vary greatly among patients, it is difficult for linear models to reconstruct shapes that have large local variations. Because it is difficult for conventional nonlinear models to control and interpret the organ shapes obtained, deep learning has been attracting attention in three-dimensional shape representation. In this study, we propose an organ shape atlas based on a mesh variational autoencoder (MeshVAE) with hierarchical latent variables. To represent the complex shapes of biological organs and nonlinear shape differences between individuals, the proposed method maintains the performance of organ shape reconstruction by hierarchizing latent variables and enables shape representation using lower-dimensional latent variables. Additionally, templates that define vertex correspondence between different resolutions enable hierarchical representation in mesh data and control the global and local features of the organ shape. We trained the model using liver and stomach organ meshes obtained from 124 cases and confirmed that the model reconstructed the position and shape with an average distance between vertices of 1.5 mm and mean distance of 0.7 mm for the liver shape, and an average distance between vertices of 1.4 mm and mean distance of 0.8 mm for the stomach shape on test data from 19 of cases. The proposed method continuously represented interpolated shapes, and by changing latent variables at different hierarchical levels, the proposed method hierarchically separated shape features compared with PCA.

Via

Access Paper or Ask Questions

Fundamental MMSE-Rate Performance Limits of Integrated Sensing and Communication Systems

Jan 02, 2025

Zijie Wang, Xudong Wang

Figure 1 for Fundamental MMSE-Rate Performance Limits of Integrated Sensing and Communication Systems

Figure 2 for Fundamental MMSE-Rate Performance Limits of Integrated Sensing and Communication Systems

Figure 3 for Fundamental MMSE-Rate Performance Limits of Integrated Sensing and Communication Systems

Figure 4 for Fundamental MMSE-Rate Performance Limits of Integrated Sensing and Communication Systems

Abstract:Integrated sensing and communication (ISAC) demonstrates promise for 6G networks; yet its performance limits, which require addressing functional Pareto stochastic optimizations, remain underexplored. Existing works either overlook the randomness of ISAC signals or approximate ISAC limits from sensing and communication (SAC) optimum-achieving strategies, leading to loose bounds. In this paper, ISAC limits are investigated by considering a random ISAC signal designated to simultaneously estimate the sensing channel and convey information over the communication channel, adopting the modified minimum-mean-square-error (MMSE), a metric defined in accordance with the randomness of ISAC signals, and the Shannon rate as respective SAC metrics. First, conditions for optimal channel input and output distributions on the MMSE-Rate limit are derived employing variational approaches, leading to high-dimensional convolutional equations. Second, leveraging variational conditions, a Blahut-Arimoto-type algorithm is proposed to numerically determine optimal distributions and SAC performance, with its convergence to the limit proven. Third, closed-form SAC-optimal waveforms are derived, characterized by power allocation according to channel statistics/realization and waveform selection; existing methods to establish looser ISAC bounds are rectified. Finally, a compound signaling strategy is introduced for coincided SAC channels, which employs sequential SAC-optimal waveforms for channel estimation and data transmission, showcasing significant rate improvements over non-coherent "capacity". This study systematically investigates ISAC performance limits from joint estimation- and information-theoretic perspectives, highlighting key SAC tradeoffs and potential ISAC design benefits. The methodology readily extends to various metrics, such as estimation rate and the Cramer-Rao Bound.

* 35 pages, 9 figures

Via

Access Paper or Ask Questions

Interpreting Indirect Answers to Yes-No Questions in Multiple Languages

Oct 20, 2023

Zijie Wang, Md Mosharaf Hossain, Shivam Mathur, Terry Cruz Melo, Kadir Bulut Ozler, Keun Hee Park, Jacob Quintero, MohammadHossein Rezaei, Shreya Nupur Shakya, Md Nayem Uddin(+1 more)

Abstract:Yes-no questions expect a yes or no for an answer, but people often skip polar keywords. Instead, they answer with long explanations that must be interpreted. In this paper, we focus on this challenging problem and release new benchmarks in eight languages. We present a distant supervision approach to collect training data. We also demonstrate that direct answers (i.e., with polar keywords) are useful to train models to interpret indirect answers (i.e., without polar keywords). Experimental results demonstrate that monolingual fine-tuning is beneficial if training data can be obtained via distant supervision for the language of interest (5 languages). Additionally, we show that cross-lingual fine-tuning is always beneficial (8 languages).

* Accepted to EMNLP 2023 Findings

Via

Access Paper or Ask Questions

ByteCover3: Accurate Cover Song Identification on Short Queries

Mar 21, 2023

Xingjian Du, Zijie Wang, Xia Liang, Huidong Liang, Bilei Zhu, Zejun Ma

Abstract:Deep learning based methods have become a paradigm for cover song identification (CSI) in recent years, where the ByteCover systems have achieved state-of-the-art results on all the mainstream datasets of CSI. However, with the burgeon of short videos, many real-world applications require matching short music excerpts to full-length music tracks in the database, which is still under-explored and waiting for an industrial-level solution. In this paper, we upgrade the previous ByteCover systems to ByteCover3 that utilizes local features to further improve the identification performance of short music queries. ByteCover3 is designed with a local alignment loss (LAL) module and a two-stage feature retrieval pipeline, allowing the system to perform CSI in a more precise and efficient way. We evaluated ByteCover3 on multiple datasets with different benchmark settings, where ByteCover3 beat all the compared methods including its previous versions.

* Accepeted by ICASSP 2023

Via

Access Paper or Ask Questions

Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

Sep 13, 2022

Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang, Yifeng Li

Figure 1 for Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

Figure 2 for Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

Figure 3 for Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

Figure 4 for Look Before You Leap: Improving Text-based Person Retrieval by Learning A Consistent Cross-modal Common Manifold

Abstract:The core problem of text-based person retrieval is how to bridge the heterogeneous gap between multi-modal data. Many previous approaches contrive to learning a latent common manifold mapping paradigm following a \textbf{cross-modal distribution consensus prediction (CDCP)} manner. When mapping features from distribution of one certain modality into the common manifold, feature distribution of the opposite modality is completely invisible. That is to say, how to achieve a cross-modal distribution consensus so as to embed and align the multi-modal features in a constructed cross-modal common manifold all depends on the experience of the model itself, instead of the actual situation. With such methods, it is inevitable that the multi-modal data can not be well aligned in the common manifold, which finally leads to a sub-optimal retrieval performance. To overcome this \textbf{CDCP dilemma}, we propose a novel algorithm termed LBUL to learn a Consistent Cross-modal Common Manifold (C$^{3}$M) for text-based person retrieval. The core idea of our method, just as a Chinese saying goes, is to `\textit{san si er hou xing}', namely, to \textbf{Look Before yoU Leap (LBUL)}. The common manifold mapping mechanism of LBUL contains a looking step and a leaping step. Compared to CDCP-based methods, LBUL considers distribution characteristics of both the visual and textual modalities before embedding data from one certain modality into C$^{3}$M to achieve a more solid cross-modal distribution consensus, and hence achieve a superior retrieval accuracy. We evaluate our proposed method on two text-based person retrieval datasets CUHK-PEDES and RSTPReid. Experimental results demonstrate that the proposed LBUL outperforms previous methods and achieves the state-of-the-art performance.

* Accepted on ACM MM '22. arXiv admin note: text overlap with arXiv:2209.05773

Via

Access Paper or Ask Questions

CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Sep 13, 2022

Zijie Wang, Aichun Zhu, Jingyi Xue, Xili Wan, Chao Liu, Tian Wang, Yifeng Li

Figure 1 for CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Figure 2 for CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Figure 3 for CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Figure 4 for CAIBC: Capturing All-round Information Beyond Color for Text-based Person Retrieval

Abstract:Given a natural language description, text-based person retrieval aims to identify images of a target person from a large-scale person image database. Existing methods generally face a \textbf{color over-reliance problem}, which means that the models rely heavily on color information when matching cross-modal data. Indeed, color information is an important decision-making accordance for retrieval, but the over-reliance on color would distract the model from other key clues (e.g. texture information, structural information, etc.), and thereby lead to a sub-optimal retrieval performance. To solve this problem, in this paper, we propose to \textbf{C}apture \textbf{A}ll-round \textbf{I}nformation \textbf{B}eyond \textbf{C}olor (\textbf{CAIBC}) via a jointly optimized multi-branch architecture for text-based person retrieval. CAIBC contains three branches including an RGB branch, a grayscale (GRS) branch and a color (CLR) branch. Besides, with the aim of making full use of all-round information in a balanced and effective way, a mutual learning mechanism is employed to enable the three branches which attend to varied aspects of information to communicate with and learn from each other. Extensive experimental analysis is carried out to evaluate our proposed CAIBC method on the CUHK-PEDES and RSTPReid datasets in both \textbf{supervised} and \textbf{weakly supervised} text-based person retrieval settings, which demonstrates that CAIBC significantly outperforms existing methods and achieves the state-of-the-art performance on all the three tasks.

* Accepted on ACM MM '22

Via

Access Paper or Ask Questions

Benchmark of DNN Model Search at Deployment Time

Jun 01, 2022

Lixi Zhou, Arindam Jain, Zijie Wang, Amitabh Das, Yingzhen Yang, Jia Zou

Figure 1 for Benchmark of DNN Model Search at Deployment Time

Figure 2 for Benchmark of DNN Model Search at Deployment Time

Figure 3 for Benchmark of DNN Model Search at Deployment Time

Figure 4 for Benchmark of DNN Model Search at Deployment Time

Abstract:Deep learning has become the most popular direction in machine learning and artificial intelligence. However, the preparation of training data, as well as model training, are often time-consuming and become the bottleneck of the end-to-end machine learning lifecycle. Reusing models for inferring a dataset can avoid the costs of retraining. However, when there are multiple candidate models, it is challenging to discover the right model for reuse. Although there exist a number of model sharing platforms such as ModelDB, TensorFlow Hub, PyTorch Hub, and DLHub, most of these systems require model uploaders to manually specify the details of each model and model downloaders to screen keyword search results for selecting a model. We are lacking a highly productive model search tool that selects models for deployment without the need for any manual inspection and/or labeled data from the target domain. This paper proposes multiple model search strategies including various similarity-based approaches and non-similarity-based approaches. We design, implement, and evaluate these approaches on multiple model inference scenarios, including activity recognition, image recognition, text classification, natural language processing, and entity matching. The experimental evaluation showed that our proposed asymmetric similarity-based measurement, adaptivity, outperformed symmetric similarity-based measurements and non-similarity-based measurements in most of the workloads.

* arXiv admin note: text overlap with arXiv:2010.09474

Via

Access Paper or Ask Questions

DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

Sep 12, 2021

Aichun Zhu, Zijie Wang, Yifeng Li, Xili Wan, Jing Jin, Tian Wang, Fangqiang Hu, Gang Hua

Figure 1 for DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

Figure 2 for DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

Figure 3 for DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

Figure 4 for DSSL: Deep Surroundings-person Separation Learning for Text-based Person Retrieval

Abstract:Many previous methods on text-based person retrieval tasks are devoted to learning a latent common space mapping, with the purpose of extracting modality-invariant features from both visual and textual modality. Nevertheless, due to the complexity of high-dimensional data, the unconstrained mapping paradigms are not able to properly catch discriminative clues about the corresponding person while drop the misaligned information. Intuitively, the information contained in visual data can be divided into person information (PI) and surroundings information (SI), which are mutually exclusive from each other. To this end, we propose a novel Deep Surroundings-person Separation Learning (DSSL) model in this paper to effectively extract and match person information, and hence achieve a superior retrieval accuracy. A surroundings-person separation and fusion mechanism plays the key role to realize an accurate and effective surroundings-person separation under a mutually exclusion constraint. In order to adequately utilize multi-modal and multi-granular information for a higher retrieval accuracy, five diverse alignment paradigms are adopted. Extensive experiments are carried out to evaluate the proposed DSSL on CUHK-PEDES, which is currently the only accessible dataset for text-base person retrieval task. DSSL achieves the state-of-the-art performance on CUHK-PEDES. To properly evaluate our proposed DSSL in the real scenarios, a Real Scenarios Text-based Person Reidentification (RSTPReid) dataset is constructed to benefit future research on text-based person retrieval, which will be publicly available.

* Accepted by ACM MM'21

Via

Access Paper or Ask Questions

Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Oct 15, 2020

Zijie Wang, Lixi Zhou, Amitabh Das, Valay Dave, Zhanpeng Jin, Jia Zou

Figure 1 for Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Figure 2 for Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Figure 3 for Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Figure 4 for Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Abstract:Data is the king in the age of AI. However data integration is often a laborious task that is hard to automate. Schema change is one significant obstacle to the automation of the end-to-end data integration process. Although there exist mechanisms such as query discovery and schema modification language to handle the problem, these approaches can only work with the assumption that the schema is maintained by a database. However, we observe diversified schema changes in heterogeneous data and open data, most of which has no schema defined. In this work, we propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data to make the model robust to schema changes. Our experimental results demonstrate that our proposed approach is effective for two real-world data integration scenarios: coronavirus data integration, and machine log integration.

* In submission

Via

Access Paper or Ask Questions

It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing

Oct 13, 2020

Lixi Zhou, Zijie Wang, Amitabh Das, Jia Zou

Figure 1 for It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing

Figure 2 for It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing

Figure 3 for It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing

Figure 4 for It's the Best Only When It Fits You Most: Finding Related Models for Serving Based on Dynamic Locality Sensitive Hashing

Abstract:In recent, deep learning has become the most popular direction in machine learning and artificial intelligence. However, preparation of training data is often a bottleneck in the lifecycle of deploying a deep learning model for production or research. Reusing models for inferencing a dataset can greatly save the human costs required for training data creation. Although there exist a number of model sharing platform such as TensorFlow Hub, PyTorch Hub, DLHub, most of these systems require model uploaders to manually specify the details of each model and model downloaders to screen keyword search results for selecting a model. They are in lack of an automatic model searching tool. This paper proposes an end-to-end process of searching related models for serving based on the similarity of the target dataset and the training datasets of the available models. While there exist many similarity measurements, we study how to efficiently apply these metrics without pair-wise comparison and compare the effectiveness of these metrics. We find that our proposed adaptivity measurement which is based on Jensen-Shannon (JS) divergence, is an effective measurement, and its computation can be significantly accelerated by using the technique of locality sensitive hashing.

Via

Access Paper or Ask Questions