Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yusuke Uchida

2.5D U-Net with Depth Reduction for 3D CryoET Object Identification

Feb 19, 2025

Yusuke Uchida, Takaaki Fukui

Abstract:Cryo-electron tomography (cryoET) is a crucial technique for unveiling the structure of protein complexes. Automatically analyzing tomograms captured by cryoET is an essential step toward understanding cellular structures. In this paper, we introduce the 4th place solution from the CZII - CryoET Object Identification competition, which was organized to advance the development of automated tomogram analysis techniques. Our solution adopted a heatmap-based keypoint detection approach, utilizing an ensemble of two different types of 2.5D U-Net models with depth reduction. Despite its highly unified and simple architecture, our method achieved 4th place, demonstrating its effectiveness.

Via

Access Paper or Ask Questions

CLRerNet: Improving Confidence of Lane Detection with LaneIoU

May 15, 2023

Hiroto Honda, Yusuke Uchida

Figure 1 for CLRerNet: Improving Confidence of Lane Detection with LaneIoU

Figure 2 for CLRerNet: Improving Confidence of Lane Detection with LaneIoU

Figure 3 for CLRerNet: Improving Confidence of Lane Detection with LaneIoU

Figure 4 for CLRerNet: Improving Confidence of Lane Detection with LaneIoU

Abstract:Lane marker detection is a crucial component of the autonomous driving and driver assistance systems. Modern deep lane detection methods with row-based lane representation exhibit excellent performance on lane detection benchmarks. Through preliminary oracle experiments, we firstly disentangle the lane representation components to determine the direction of our approach. We show that correct lane positions are already among the predictions of an existing row-based detector, and the confidence scores that accurately represent intersection-over-union (IoU) with ground truths are the most beneficial. Based on the finding, we propose LaneIoU that better correlates with the metric, by taking the local lane angles into consideration. We develop a novel detector coined CLRerNet featuring LaneIoU for the target assignment cost and loss functions aiming at the improved quality of confidence scores. Through careful and fair benchmark including cross validation, we demonstrate that CLRerNet outperforms the state-of-the-art by a large margin - enjoying F1 score of 81.43% compared with 80.47% of the existing method on CULane, and 86.47% compared with 86.10% on CurveLanes.

Via

Access Paper or Ask Questions

End-to-End Monocular Vanishing Point Detection Exploiting Lane Annotations

Aug 31, 2021

Hiroto Honda, Motoki Kimura, Takumi Karasawa, Yusuke Uchida

Figure 1 for End-to-End Monocular Vanishing Point Detection Exploiting Lane Annotations

Figure 2 for End-to-End Monocular Vanishing Point Detection Exploiting Lane Annotations

Figure 3 for End-to-End Monocular Vanishing Point Detection Exploiting Lane Annotations

Figure 4 for End-to-End Monocular Vanishing Point Detection Exploiting Lane Annotations

Abstract:Vanishing points (VPs) play a vital role in various computer vision tasks, especially for recognizing the 3D scenes from an image. In the real-world scenario of automobile applications, it is costly to manually obtain the external camera parameters when the camera is attached to the vehicle or the attachment is accidentally perturbed. In this paper we introduce a simple but effective end-to-end vanishing point detection. By automatically calculating intersection of the extrapolated lane marker annotations, we obtain geometrically consistent VP labels and mitigate human annotation errors caused by manual VP labeling. With the calculated VP labels we train end-to-end VP Detector via heatmap estimation. The VP Detector realizes higher accuracy than the methods utilizing manual annotation or lane detection, paving the way for accurate online camera calibration.

Via

Access Paper or Ask Questions

Prediction of Lane Number Using Results From Lane Detection

Dec 04, 2020

Panumate Chetprayoon, Fumihiko Takahashi, Yusuke Uchida

Figure 1 for Prediction of Lane Number Using Results From Lane Detection

Figure 2 for Prediction of Lane Number Using Results From Lane Detection

Figure 3 for Prediction of Lane Number Using Results From Lane Detection

Abstract:The lane number that the vehicle is traveling in is a key factor in intelligent vehicle fields. Many lane detection algorithms were proposed and if we can perfectly detect the lanes, we can directly calculate the lane number from the lane detection results. However, in fact, lane detection algorithms sometimes underperform. Therefore, we propose a new approach for predicting the lane number, where we combine the drive recorder image with the lane detection results to predict the lane number. Experiments on our own dataset confirmed that our approach delivered outstanding results without significantly increasing computational cost.

* GCCE 2020

Via

Access Paper or Ask Questions

Leveraging Temporal Joint Depths for Improving 3D Human Pose Estimation in Video

Nov 04, 2020

Naoki Kato, Hiroto Honda, Yusuke Uchida

Figure 1 for Leveraging Temporal Joint Depths for Improving 3D Human Pose Estimation in Video

Figure 2 for Leveraging Temporal Joint Depths for Improving 3D Human Pose Estimation in Video

Figure 3 for Leveraging Temporal Joint Depths for Improving 3D Human Pose Estimation in Video

Abstract:The effectiveness of the approaches to predict 3D poses from 2D poses estimated in each frame of a video has been demonstrated for 3D human pose estimation. However, 2D poses without appearance information of persons have much ambiguity with respect to the joint depths. In this paper, we propose to estimate a 3D pose in each frame of a video and refine it considering temporal information. The proposed approach reduces the ambiguity of the joint depths and improves the 3D pose estimation accuracy.

Via

Access Paper or Ask Questions

Improving Multi-Person Pose Estimation using Label Correction

Nov 08, 2018

Naoki Kato, Tianqi Li, Kohei Nishino, Yusuke Uchida

Figure 1 for Improving Multi-Person Pose Estimation using Label Correction

Figure 2 for Improving Multi-Person Pose Estimation using Label Correction

Figure 3 for Improving Multi-Person Pose Estimation using Label Correction

Figure 4 for Improving Multi-Person Pose Estimation using Label Correction

Abstract:Significant attention is being paid to multi-person pose estimation methods recently, as there has been rapid progress in the field owing to convolutional neural networks. Especially, recent method which exploits part confidence maps and Part Affinity Fields (PAFs) has achieved accurate real-time prediction of multi-person keypoints. However, human annotated labels are sometimes inappropriate for learning models. For example, if there is a limb that extends outside an image, a keypoint for the limb may not have annotations because it exists outside of the image, and thus the labels for the limb can not be generated. If a model is trained with data including such missing labels, the output of the model for the location, even though it is correct, is penalized as a false positive, which is likely to cause negative effects on the performance of the model. In this paper, we point out the existence of some patterns of inappropriate labels, and propose a novel method for correcting such labels with a teacher model trained on such incomplete data. Experiments on the COCO dataset show that training with the corrected labels improves the performance of the model and also speeds up training.

Via

Access Paper or Ask Questions

Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks

Sep 06, 2018

Koichi Hamada, Kentaro Tachibana, Tianqi Li, Hiroto Honda, Yusuke Uchida

Figure 1 for Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks

Figure 2 for Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks

Figure 3 for Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks

Figure 4 for Full-body High-resolution Anime Generation with Progressive Structure-conditional Generative Adversarial Networks

Abstract:We propose Progressive Structure-conditional Generative Adversarial Networks (PSGAN), a new framework that can generate full-body and high-resolution character images based on structural information. Recent progress in generative adversarial networks with progressive training has made it possible to generate high-resolution images. However, existing approaches have limitations in achieving both high image quality and structural consistency at the same time. Our method tackles the limitations by progressively increasing the resolution of both generated images and structural conditions during training. In this paper, we empirically demonstrate the effectiveness of this method by showing the comparison with existing approaches and video generation results of diverse anime characters at 1024x1024 based on target pose sequences. We also create a novel dataset containing full-body 1024x1024 high-resolution images and exact 2D pose keypoints using Unity 3D Avatar models.

* Accepted to ECCV 2018 Workshop: Computer Vision for Fashion, Art and Design. Project page is at https://dena.com/intl/anime-generation

Via

Access Paper or Ask Questions

Digital Watermarking for Deep Neural Networks

Feb 06, 2018

Yuki Nagai, Yusuke Uchida, Shigeyuki Sakazawa, Shin'ichi Satoh

Figure 1 for Digital Watermarking for Deep Neural Networks

Figure 2 for Digital Watermarking for Deep Neural Networks

Figure 3 for Digital Watermarking for Deep Neural Networks

Figure 4 for Digital Watermarking for Deep Neural Networks

Abstract:Although deep neural networks have made tremendous progress in the area of multimedia representation, training neural models requires a large amount of data and time. It is well-known that utilizing trained models as initial weights often achieves lower training error than neural networks that are not pre-trained. A fine-tuning step helps to reduce both the computational cost and improve performance. Therefore, sharing trained models has been very important for the rapid progress of research and development. In addition, trained models could be important assets for the owner(s) who trained them, hence we regard trained models as intellectual property. In this paper, we propose a digital watermarking technology for ownership authorization of deep neural networks. First, we formulate a new problem: embedding watermarks into deep neural networks. We also define requirements, embedding situations, and attack types on watermarking in deep neural networks. Second, we propose a general framework for embedding a watermark in model parameters, using a parameter regularizer. Our approach does not impair the performance of networks into which a watermark is placed because the watermark is embedded while training the host network. Finally, we perform comprehensive experiments to reveal the potential of watermarking deep neural networks as the basis of this new research effort. We show that our framework can embed a watermark during the training of a deep neural network from scratch, and during fine-tuning and distilling, without impairing its performance. The embedded watermark does not disappear even after fine-tuning or parameter pruning; the watermark remains complete even after 65% of parameters are pruned.

* This is a pre-print of an article published in International Journal of Multimedia Information Retrieval. The final authenticated version is available online at: https://doi.org/10.1007/s13735-018-0147-1 . arXiv admin note: substantial text overlap with arXiv:1701.04082

Via

Access Paper or Ask Questions

Embedding Watermarks into Deep Neural Networks

Apr 20, 2017

Yusuke Uchida, Yuki Nagai, Shigeyuki Sakazawa, Shin'ichi Satoh

Figure 1 for Embedding Watermarks into Deep Neural Networks

Figure 2 for Embedding Watermarks into Deep Neural Networks

Figure 3 for Embedding Watermarks into Deep Neural Networks

Figure 4 for Embedding Watermarks into Deep Neural Networks

Abstract:Deep neural networks have recently achieved significant progress. Sharing trained models of these deep neural networks is very important in the rapid progress of researching or developing deep neural network systems. At the same time, it is necessary to protect the rights of shared trained models. To this end, we propose to use a digital watermarking technology to protect intellectual property or detect intellectual property infringement of trained models. Firstly, we formulate a new problem: embedding watermarks into deep neural networks. We also define requirements, embedding situations, and attack types for watermarking to deep neural networks. Secondly, we propose a general framework to embed a watermark into model parameters using a parameter regularizer. Our approach does not hurt the performance of networks into which a watermark is embedded. Finally, we perform comprehensive experiments to reveal the potential of watermarking to deep neural networks as a basis of this new problem. We show that our framework can embed a watermark in the situations of training a network from scratch, fine-tuning, and distilling without hurting the performance of a deep neural network. The embedded watermark does not disappear even after fine-tuning or parameter pruning; the watermark completely remains even after removing 65% of parameters were pruned. The implementation of this research is: https://github.com/yu4u/dnn-watermark

* ICMR '17 Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval Pages 269-277

Via

Access Paper or Ask Questions

Adaptive Substring Extraction and Modified Local NBNN Scoring for Binary Feature-based Local Mobile Visual Search without False Positives

Oct 20, 2016

Yusuke Uchida, Shigeyuki Sakazawa, Shin'ichi Satoh

Figure 1 for Adaptive Substring Extraction and Modified Local NBNN Scoring for Binary Feature-based Local Mobile Visual Search without False Positives

Figure 2 for Adaptive Substring Extraction and Modified Local NBNN Scoring for Binary Feature-based Local Mobile Visual Search without False Positives

Figure 3 for Adaptive Substring Extraction and Modified Local NBNN Scoring for Binary Feature-based Local Mobile Visual Search without False Positives

Figure 4 for Adaptive Substring Extraction and Modified Local NBNN Scoring for Binary Feature-based Local Mobile Visual Search without False Positives

Abstract:In this paper, we propose a stand-alone mobile visual search system based on binary features and the bag-of-visual words framework. The contribution of this study is three-fold: (1) We propose an adaptive substring extraction method that adaptively extracts informative bits from the original binary vector and stores them in the inverted index. These substrings are used to refine visual word-based matching. (2) A modified local NBNN scoring method is proposed in the context of image retrieval, which considers the density of binary features in scoring each feature matching. (3) In order to suppress false positives, we introduce a convexity check step that imposes a convexity constraint on the configuration of a transformed reference image. The proposed system improves retrieval accuracy by 11% compared with a conventional method without increasing the database size. Furthermore, our system with the convexity check does not lead to false positive results.

Via

Access Paper or Ask Questions