Abstract:Intrinsic Image Decomposition (IID) separates an image into albedo and shading components. It is a core step in many real-world applications, such as relighting and material editing. Existing IID models achieve good results, but often use a large number of parameters. This makes them costly to combine with other models in real-world settings. To address this problem, we propose a flow matching-based solution. For this, we design a novel architecture, FlowIID, based on latent flow matching. FlowIID combines a VAE-guided latent space with a flow matching module, enabling a stable decomposition of albedo and shading. FlowIID is not only parameter-efficient, but also produces results in a single inference step. Despite its compact design, FlowIID delivers competitive and superior results compared to existing models across various benchmarks. This makes it well-suited for deployment in resource-constrained and real-time vision applications.
Abstract:Point cloud segmentation and classification are some of the primary tasks in 3D computer vision with applications ranging from augmented reality to robotics. However, processing point clouds using deep learning-based algorithms is quite challenging due to the irregular point formats. Voxelization or 3D grid-based representation are different ways of applying deep neural networks to this problem. In this paper, we propose PointResNet, a residual block-based approach. Our model directly processes the 3D points, using a deep neural network for the segmentation and classification tasks. The main components of the architecture are: 1) residual blocks and 2) multi-layered perceptron (MLP). We show that it preserves profound features and structural information, which are useful for segmentation and classification tasks. The experimental evaluations demonstrate that the proposed model produces the best results for segmentation and comparable results for classification in comparison to the conventional baselines.