Abstract:Interior design is a complex and creative discipline involving aesthetics, functionality, ergonomics, and materials science. Effective solutions must meet diverse requirements, typically producing multiple deliverables such as renderings and design drawings from various perspectives. Consequently, interior design processes are often inefficient and demand significant creativity. With advances in machine learning, generative models have emerged as a promising means of improving efficiency by creating designs from text descriptions or sketches. However, few generative works focus on interior design, leading to substantial discrepancies between outputs and practical needs, such as differences in size, spatial scope, and the lack of controllable generation quality. To address these challenges, we propose DiffDesign, a controllable diffusion model with meta priors for efficient interior design generation. Specifically, we utilize the generative priors of a 2D diffusion model pre-trained on a large image dataset as our rendering backbone. We further guide the denoising process by disentangling cross-attention control over design attributes, such as appearance, pose, and size, and introduce an optimal transfer-based alignment module to enforce view consistency. Simultaneously, we construct an interior design-specific dataset, DesignHelper, consisting of over 400 solutions across more than 15 spatial types and 15 design styles. This dataset helps fine-tune DiffDesign. Extensive experiments conducted on various benchmark datasets demonstrate the effectiveness and robustness of DiffDesign.
Abstract:End-to-end vision-based autonomous driving has achieved impressive success, but safety remains a major concern. The safe control problem has been addressed in low-dimensional settings using safety filters, e.g., those based on control barrier functions. Designing safety filters for vision-based controllers in the high-dimensional settings of autonomous driving can similarly alleviate the safety problem, but is significantly more challenging. In this paper, we address this challenge by using frozen pre-trained vision representation models as perception backbones to design vision-based safety filters, inspired by these models' success as backbones of robotic control policies. We empirically evaluate the offline performance of four common pre-trained vision models in this context. We try three existing methods for training safety filters for black-box dynamics, as the dynamics over representation spaces are not known. We use the DeepAccident dataset that consists of action-annotated videos from multiple cameras on vehicles in CARLA simulating real accident scenarios. Our results show that the filters resulting from our approach are competitive with the ones that are given the ground truth state of the ego vehicle and its environment.
Abstract:In this paper, we propose a movable antenna (MA) enhanced scheme for wireless powered mobile edge computing (WP-MEC) system, where the hybrid access point (HAP) equipped with multiple MAs first emits wireless energy to charge wireless devices (WDs), and then receives the offloaded tasks from the WDs for edge computing. The MAs deployed at the HAP enhance the spatial degrees of freedom (DoFs) by flexibly adjusting the positions of MAs within an available region, thereby improving the efficiency of both downlink wireless energy transfer (WPT) and uplink task offloading. To balance the performance enhancement against the implementation intricacy, we further propose three types of MA positioning configurations, i.e., dynamic MA positioning, semi-dynamic MA positioning, and static MA positioning. In addition, the non-linear power conversion of energy harvesting (EH) circuits at the WDs and the finite computing capability at the edge server are taken into account. Our objective is to maximize the sum computational rate (SCR) by jointly optimizing the time allocation, positions of MAs, energy beamforming matrix, receive combing vectors, and offloading strategies of WDs. To solve the non-convex problems, efficient alternating optimization (AO) frameworks are proposed. Moreover, we propose a hybrid algorithm of particle swarm optimization with variable local search (PSO-VLS) to solve the sub-problem of MA positioning. Numerical results validate the superiority of exploiting MAs over the fixed-position antennas (FPAs) for enhancing the SCR performance of WP-MEC systems.
Abstract:Micro-expressions (MEs) are involuntary movements revealing people's hidden feelings, which has attracted numerous interests for its objectivity in emotion detection. However, despite its wide applications in various scenarios, micro-expression recognition (MER) remains a challenging problem in real life due to three reasons, including (i) data-level: lack of data and imbalanced classes, (ii) feature-level: subtle, rapid changing, and complex features of MEs, and (iii) decision-making-level: impact of individual differences. To address these issues, we propose a dual-branch meta-auxiliary learning method, called LightmanNet, for fast and robust micro-expression recognition. Specifically, LightmanNet learns general MER knowledge from limited data through a dual-branch bi-level optimization process: (i) In the first level, it obtains task-specific MER knowledge by learning in two branches, where the first branch is for learning MER features via primary MER tasks, while the other branch is for guiding the model obtain discriminative features via auxiliary tasks, i.e., image alignment between micro-expressions and macro-expressions since their resemblance in both spatial and temporal behavioral patterns. The two branches of learning jointly constrain the model of learning meaningful task-specific MER knowledge while avoiding learning noise or superficial connections between MEs and emotions that may damage its generalization ability. (ii) In the second level, LightmanNet further refines the learned task-specific knowledge, improving model generalization and efficiency. Extensive experiments on various benchmark datasets demonstrate the superior robustness and efficiency of LightmanNet.
Abstract:Artificial intelligence technology has already had a profound impact in various fields such as economy, industry, and education, but still limited. Meta-learning, also known as "learning to learn", provides an opportunity for general artificial intelligence, which can break through the current AI bottleneck. However, meta learning started late and there are fewer projects compare with CV, NLP etc. Each deployment requires a lot of experience to configure the environment, debug code or even rewrite, and the frameworks are isolated. Moreover, there are currently few platforms that focus exclusively on meta-learning, or provide learning materials for novices, for which the threshold is relatively high. Based on this, Awesome-META+, a meta-learning framework integration and learning platform is proposed to solve the above problems and provide a complete and reliable meta-learning framework application and learning platform. The project aims to promote the development of meta-learning and the expansion of the community, including but not limited to the following functions: 1) Complete and reliable meta-learning framework, which can adapt to multi-field tasks such as target detection, image classification, and reinforcement learning. 2) Convenient and simple model deployment scheme which provide convenient meta-learning transfer methods and usage methods to lower the threshold of meta-learning and improve efficiency. 3) Comprehensive researches for learning. 4) Objective and credible performance analysis and thinking.
Abstract:The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
Abstract:Context-aware decision support in the operating room can foster surgical safety and efficiency by leveraging real-time feedback from surgical workflow analysis. Most existing works recognize surgical activities at a coarse-grained level, such as phases, steps or events, leaving out fine-grained interaction details about the surgical activity; yet those are needed for more helpful AI assistance in the operating room. Recognizing surgical actions as triplets of <instrument, verb, target> combination delivers comprehensive details about the activities taking place in surgical videos. This paper presents CholecTriplet2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. The challenge granted private access to the large-scale CholecT50 dataset, which is annotated with action triplet information. In this paper, we present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods from the challenge organizers and 19 new deep learning algorithms by competing teams are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%. This study also analyzes the significance of the results obtained by the presented approaches, performs a thorough methodological comparison between them, in-depth result analysis, and proposes a novel ensemble method for enhanced recognition. Our analysis shows that surgical workflow analysis is not yet solved, and also highlights interesting directions for future research on fine-grained surgical activity recognition which is of utmost importance for the development of AI in surgery.
Abstract:Since the World Health Organization announced the COVID-19 pandemic in March 2020, curbing the spread of the virus has become an international priority. It has greatly affected people's lifestyles. In this article, we observe and analyze the impact of the pandemic on people's lives using changes in smartphone application usage. First, through observing the daily usage change trends of all users during the pandemic, we can understand and analyze the effects of restrictive measures and policies during the pandemic on people's lives. In addition, it is also helpful for the government and health departments to take more appropriate restrictive measures in the case of future pandemics. Second, we defined the usage change features and found 9 different usage change patterns during the pandemic according to clusters of users and show the diversity of daily usage changes. It helps to understand and analyze the different impacts of the pandemic and restrictive measures on different types of people in more detail. Finally, according to prediction models, we discover the main related factors of each usage change type from user preferences and demographic information. It helps to predict changes in smartphone activity during future pandemics or when other restrictive measures are implemented, which may become a new indicator to judge and manage the risks of measures or events.