Faculty Profiles - OHTANI Kento

写真a

OHTANI Kento

Organization

Graduate School of Informatics Department of Intelligent Systems 1 Designated assistant professor

Degree 1

博士（情報科学）（ 2018.3 名古屋大学）

To the head of Degree.▲

Papers 18

L-DIG: A GAN-Based Method for LiDAR Point Cloud Processing under Snow Driving Conditions.

Zhang Y, Ding M, Yang H, Niu Y, Feng Y, Ohtani K, Takeda K

Sensors (Basel, Switzerland) Vol. 23 ( 21 ) 2023.10

　More details

Language：English Publisher：Sensors (Basel, Switzerland)

LiDAR point clouds are significantly impacted by snow in driving scenarios, introducing scattered noise points and phantom objects, thereby compromising the perception capabilities of autonomous driving systems. Current effective methods for removing snow from point clouds largely rely on outlier filters, which mechanically eliminate isolated points. This research proposes a novel translation model for LiDAR point clouds, the 'L-DIG' (LiDAR depth images GAN), built upon refined generative adversarial networks (GANs). This model not only has the capacity to reduce snow noise from point clouds, but it also can artificially synthesize snow points onto clear data. The model is trained using depth image representations of point clouds derived from unpaired datasets, complemented by customized loss functions for depth images to ensure scale and structure consistencies. To amplify the efficacy of snow capture, particularly in the region surrounding the ego vehicle, we have developed a pixel-attention discriminator that operates without downsampling convolutional layers. Concurrently, the other discriminator equipped with two-step downsampling convolutional layers has been engineered to effectively handle snow clusters. This dual-discriminator approach ensures robust and comprehensive performance in tackling diverse snow conditions. The proposed model displays a superior ability to capture snow and object features within LiDAR point clouds. A 3D clustering algorithm is employed to adaptively evaluate different levels of snow conditions, including scattered snowfall and snow swirls. Experimental findings demonstrate an evident de-snowing effect, and the ability to synthesize snow effects.

DOI： 10.3390/s23218660

Scopus

PubMed
Learning to Predict Navigational Patterns From Partial Observations

Karlsson, R; Carballo, A; Lepe-Salazar, F; Fujii, K; Ohtani, K; Takeda, K

IEEE ROBOTICS AND AUTOMATION LETTERS Vol. 8 ( 9 ) page： 5592 - 5599 2023.9

　More details

Publisher：IEEE Robotics and Automation Letters

Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This letter presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enable our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception.

DOI： 10.1109/LRA.2023.3291924

Web of Science

Scopus
Synthesizing Realistic Snow Effects in Driving Images Using GANs and Real Data with Semantic Guidance

Yang, HT; Ding, M; Carballo, A; Zhang, YX; Ohtani, K; Niu, YJ; Ge, MN; Feng, Y; Takeda, K

2023 IEEE INTELLIGENT VEHICLES SYMPOSIUM, IV Vol. 2023-June 2023

　More details

Publisher：IEEE Intelligent Vehicles Symposium, Proceedings

Intelligent vehicle perception algorithms often have difficulty accurately analyzing and interpreting images in adverse weather conditions. Snow is a corner case that not only reduces visibility and contrast but also affects the stability of the road environment. While it is possible to train deep learning models on real-world driving datasets in snow weather, obtaining such data can be challenging. Synthesizing snow effects on existing driving datasets is a viable alternative. In this work, we propose a method based on Cycle Consistent Generative Adversarial Networks (CycleGANs) that utilizes additional semantic information to generate snow effects. We apply deep supervision by using intermediate outputs from the last two convolutional layers in the generator as multi-scale supervision signals for training. We collect a small set of driving image data captured under heavy snow as the translation source. We compare the generated images with those produced by various network architectures and evaluate the results qualitatively and quantitatively on the Cityscapes and EuroCity Persons datasets. Experiment results indicate that our model can synthesize realistic snow effects in driving images.

DOI： 10.1109/IV55152.2023.10186565

Web of Science

Scopus
Predictive World Models from Real-World Partial Observations

Karlsson R., Carballo A., Fujii K., Ohtani K., Takeda K.

Proceedings - 2023 IEEE International Conference on Mobility, Operations, Services and Technologies, MOST 2023 page： 152 - 166 2023

　More details

Publisher：Proceedings - 2023 IEEE International Conference on Mobility, Operations, Services and Technologies, MOST 2023

Cognitive scientists believe adaptable intelligent agents like humans perform reasoning through learned causal mental simulations of agents and environments. The problem of learning such simulations is called predictive world modeling. Recently, reinforcement learning (RL) agents leveraging world models have achieved SOTA performance in game environments. However, understanding how to apply the world modeling approach in complex real-world environments relevant to mobile robots remains an open question. In this paper, we present a framework for learning a probabilistic predictive world model for real-world road environments. We implement the model using a hierarchical VAE (HVAE) capable of predicting a diverse set of fully observed plausible worlds from accumulated sensor observations. While prior HVAE methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only. We experimentally demonstrate accurate spatial structure prediction of deterministic regions achieving 96.21 IoU, and close the gap to perfect prediction by 62 % for stochastic regions using the best prediction. By extending HVAEs to cases where complete ground truth states do not exist, we facilitate continual learning of spatial prediction as a step towards realizing explainable and comprehensive predictive world models for real-world mobile robotics applications. Code is available at https://github.com/robin-karlsson0/predictive-world-models.

DOI： 10.1109/MOST57249.2023.00024

Scopus
Efficient Training Method for Point Cloud-Based Object Detection Models by Combining Environmental Transitions and Active Learning

Yamamoto T., Ohtani K., Hayashi T., Carballo A., Takeda K.

Lecture Notes in Networks and Systems Vol. 642 LNNS page： 292 - 303 2023

　More details

Publisher：Lecture Notes in Networks and Systems

The perceptive systems used in automated driving need to function accurately and reliably in a variety of traffic environments. These systems generally perform object detection to identify the positions and attributes of potential obstacles. Among the methods which have been proposed, object detection using three-dimensional (3D) point cloud data obtained using LiDAR has attracted much attention. However, when attempting to create a detection model, annotation must be performed on a huge amount of data. Furthermore, the accuracy of 3D object detection models is dependent on the data domains used for training, such as geographic or traffic environments, so it is necessary to train models for each domain, which requires large amounts of training data for each domain. Therefore, the objective of this study is to develop a 3D object detector for new domains, even when trained with relatively small amounts of annotated data from new domains. We propose using a model that has been trained with a large amount of labeled data for pre-trained model, and simultaneously using transfer learning with limited amount of highly effective training data, selected from the target domain by active learning. Experimental evaluations show that 3D object detection models created using the proposed method perform well at a new location. We also confirm that active learning is particularly effective only limited training data available.

DOI： 10.1007/978-3-031-26889-2_26

Scopus
Auditory and visual warning information generation of the risk object in driving scenes based on weakly supervised learning

Niu, YJ; Ding, M; Zhang, YX; Ohtani, KT; Takeda, K

2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV) Vol. 2022-June page： 1572 - 1577 2022

　More details

Language：Japanese Publisher：IEEE Intelligent Vehicles Symposium, Proceedings

In this research, a two-stage risk object warning method is proposed to generate the auditory and visual warning information simultaneously from the driving scene. The auditory warning module (AWM) is designed as a classification task by combining the rough location and type information as warning sentences and treating each sentence as one class. The visual warning module (VWM) is designed as a weakly supervised method to save the labor-intensive bounding box marking of risk objects. To confirm the effectiveness of the proposed method, we also create a linguistic risk notification (LRN) dataset by describing the driving scenario as several different sentences. The average accuracy of auditory warning is 96.4% for generating the warning sentences. The average accuracy of the weakly supervised visual warning algorithm is 81.3% for getting the risk vehicle localization without any supervisory information.

DOI： 10.1109/IV51971.2022.9827382

Web of Science

Scopus
Methods of Gently Notifying Pedestrians of Approaching Objects when Listening to Music

Sakashita, Y; Ishiguro, Y; Ohtani, K; Nishino, T; Takeda, K

ADJUNCT PROCEEDINGS OF THE 35TH ACM SYMPOSIUM ON USER INTERFACE SOFTWARE & TECHNOLOGY, UIST 2022 2022

　More details

Publisher：UIST 2022 Adjunct - Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology

Many people now listen to music with earphones while walking, and are less likely to notice approaching people, cars, etc. Many methods of detecting approaching objects and notifying pedestrians have been proposed, but few have focused on low urgency situations or music listeners, and many notification methods are unpleasant. Therefore, in this work, we propose methods of gently notifying pedestrians listening to music of approaching objects using environmental sound. We conducted experiments in a virtual environment to assess directional perception accuracy and comfort. Our results show the proposed method allows participants to detect the direction of approaching objects as accurately as explicit notification methods, with less discomfort.

DOI： 10.1145/3526114.3558728

Web of Science

Scopus
ViCE: Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment

Karlsson R., Hayashi T., Fujii K., Carballo A., Ohtani K., Takeda K.

BMVC 2022 - 33rd British Machine Vision Conference Proceedings 2022

　More details

Publisher：BMVC 2022 - 33rd British Machine Vision Conference Proceedings

Recent self-supervised models have demonstrated equal or better performance than supervised methods, opening for AI systems to learn visual representations from practically unlimited data. However, these methods are typically classification-based and thus ineffective for learning high-resolution feature maps that preserve precise spatial information. This work introduces superpixels to improve self-supervised learning of dense semantically rich visual concept embeddings. Decomposing images into a small set of visually coherent regions reduces the computational complexity by O(1000) while preserving detail. We experimentally show that contrasting over regions improves the effectiveness of contrastive learning methods, extends their applicability to high-resolution images, improves overclustering performance, superpixels are better than grids, and regional masking improves performance. The expressiveness of our dense embeddings is demonstrated by improving the SOTA unsupervised semantic segmentation benchmark on Cityscapes, and for convolutional models on COCO. Code is available at https://github.com/robin-karlsson0/vice.

Scopus
FollowSelect: Path-based Menu Interaction for Intuitive Navigation Reviewed

Yusuke Sakai, Yoshio Ishiguro, Kento Ohtani, Takanori Nishino, Kazuya Takeda

Vol. 62 ( 10 ) page： 1669 - 1680 2021.10

　More details

Language：Japanese Publishing type：Research paper (scientific journal)

DOI： doi/10.20729/00213195
Manipulation of Speed Perception While in Motion Using Auditory Stimuli Reviewed

Yuta Kanayama, Yoshio Ishiguro, Takanori Nishino, Kento Ohtani, Kazuya Takeda

2021 18th International Conference on Ubiquitous Robots (UR) 2021.7

　More details

Language：English Publishing type：Research paper (international conference proceedings)
電気式人工喉頭を用いた歌唱システムにおける自然な身体動作を利用した歌唱表現付与の提案 Reviewed

大川舜平, 石黒祥生, 大谷健登, 西野隆典, 小林和弘, 戸田智基, 武田一哉

情報処理学会インタラクション2021論文集 page： 261 - 266 2021.3

　More details

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)
Driving Behavior Aware Caption Generation for Egocentric Driving Videos Using In-Vehicle Sensors

Zhang, HK; Takeda, K; Sasano, R; Adachi, Y; Ohtani, K

2021 IEEE INTELLIGENT VEHICLES SYMPOSIUM WORKSHOPS (IV WORKSHOPS) page： 287 - 292 2021

　More details

Language：Japanese Publisher：IEEE Intelligent Vehicles Symposium, Proceedings

Video captioning aims to generate textual descriptions according to the video contents. The risk assessment of autonomous driving vehicles has become essential for an insurance company for providing adequate insurance coverage, in particular, for emerging MaaS business. The insurers need to assess the risk of autonomous driving business plans with a fixed route by analyzing a large number of driving data, including videos recorded by dash cameras and sensor signals. To make the process more efficient, generating captions for driving videos can provide insurers concise information to understand the video contents quickly. A natural problem with driving video captioning is, since the absence of egovehicles in these egocentric videos, descriptions of latent driving behaviors are difficult to be grounded in specific visual cues. To address this issue, we focus on generating driving video captions with accurate behavior descriptions, and propose to incorporate in-vehicle sensors which encapsulate the driving behavior information to assist the caption generation. We evaluate our method on the Japanese driving video captioning dataset called City Traffic, where the results demonstrate the effectiveness of in-vehicle sensors on improving the overall performance of generated captions, especially on generating more accurate descriptions for the driving behaviors.

DOI： 10.1109/IVWorkshops54471.2021.9669259

Web of Science

Scopus
End-to-End Learning-based Driving System with Branches by Emphasizing Target Direction Reviewed

Seiya Shunya, Ohtani Kento, Carballo Alexander, Takeuchi Eijiro, Takeda Kazuya

Transactions of Society of Automotive Engineers of Japan Vol. 52 ( 6 ) page： 1368 - 1374 2021

　More details

Language：Japanese Publisher：Society of Automotive Engineers of Japan

End-to-end driving refers to deep learning methods for generating control signals directly from external sensors. Previous methods use a direction vector towards the target to select and turn at intersections. However, the vector has a smaller dimension than the image, and thus it is ignored during training. In this study, we propose a learning method to emphasize that vector by using L2 regularization, which enables a robot to follow trajectories with branches. We validate the system's performance by conducting experiments using several driving scenarios. Our approach allowed an autonomous robot to successfully follow trajectories, including unknown outdoor trajectories.

DOI： 10.11351/jsaeronbun.52.1368

CiNii Research
Improving target selection accuracy for vehicle touch screens

Ito K., Nishino T., Ohtani K., Takeda K., Ishiguro Y.

Adjunct Proceedings - 11th International ACM Conference on Automotive User Interfaces and Interactive Vehicular Applications, AutomotiveUI 2019 page： 176 - 180 2019.9

　More details

Language：Japanese Publisher：Adjunct Proceedings - 11th International ACM Conference on Automotive User Interfaces and Interactive Vehicular Applications, AutomotiveUI 2019

When operating the touch screen in a car, the touch point can shift due to the vibration, resulting in selection errors. Using larger target is a possible solution, but this significantly limits the amount of content that can be displayed on the touch screen. Therefore, we propose a method for in-vehicle touch screen target selection that can be used with a variety of sensors to increase selection accuracy. In this method, the vibration feature is learned by Variational Auto-Encoder based model, and it is used for estimating touch point distribution. Our experimental results demonstrate that the proposed method allows users to achieve higher target selection accuracy than conventional methods.

DOI： 10.1145/3349263.3351327

Scopus
Music Source Enhancement Using a Convolutional Denoising Autoencoder and Log-frequency Scale Spectral Features

OHTANI Kento, NIWA Kenta, NISHINO Takanori, TAKEDA Kazuya

Vol. J101-D ( 3 ) page： 615 - 627 2018.3

　More details

Language：Japanese Publisher：The Institute of Electronics, Information and Communication Engineers

We propose a music source enhancement technique which uses a convolutional denoising autoencoder (CDAE) and the log-frequency scale amplitude spectral features of musical instrument signals. The structure of the CDAE includes amplitude spectral characteristics of the sounds created by musical instruments in its network in order to estimate the amplitude spectra of the target signals. Evaluation results show that the proposed network achieves better signal-to-interference ratios (SIRs) than conventional network/input feature structures. We also propose a complementary CDAE approach, which estimates target and noise amplitude spectra simultaneously and combines them. By using complementary CDAE, SIRs of the estimated music signals are further improved.

DOI： 10.14923/transinfj.2017pdp0021

CiNii Research
畳み込み雑音除去自己符号化器と対数周波数領域スペクトル特徴を用いた楽曲音源強調 Reviewed

大谷健登

電子情報通信学会論文誌(D) Vol. J101-D page： 615 - 627 2018

　More details

CiNii Research
A Single-Dimensional Interface for Arranging Multiple Audio Sources in Three-Dimensional Space

OHTANI Kento, NIWA Kenta, TAKEDA Kazuya

IEICE Transactions on Information and Systems Vol. E100D ( 10 ) page： 2635 - 2643 2017.10

　More details

Language：English Publisher：The Institute of Electronics, Information and Communication Engineers

<p>A single-dimensional interface which enables users to obtain diverse localizations of audio sources is proposed. In many conventional interfaces for arranging audio sources, there are multiple arrangement parameters, some of which allow users to control positions of audio sources. However, it is difficult for users who are unfamiliar with these systems to optimize the arrangement parameters since the number of possible settings is huge. We propose a simple, single-dimensional interface for adjusting arrangement parameters, allowing users to sample several diverse audio source arrangements and easily find their preferred auditory localizations. To select subsets of arrangement parameters from all of the possible choices, auditory-localization space vectors (ASVs) are defined to represent the auditory localization of each arrangement parameter. By selecting subsets of ASVs which are approximately orthogonal, we can choose arrangement parameters which will produce diverse auditory localizations. Experimental evaluations were conducted using music composed of three audio sources. Subjective evaluations confirmed that novice users can obtain diverse localizations using the proposed interface.</p>

DOI： 10.1587/transinf.2017EDP7028

Web of Science

Scopus

CiNii Research
Music staging AI

Niwa K., Ohtani K., Takeda K.

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings Vol. 2017-March page： 6588 - 6589 2017

　More details

Language：Japanese Publisher：ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Through smartphones, user enables to download/listen music anytime and anywhere. As a concept of a future audio player, we propose a framework of music staging artificial intelligence (AI). In that framework, audio object signals, e.g. vocal, guitar, bass, drums and keyboards, are assumed to be extracted from stereo music signals. To visualize music as if live performance is virtually conducted, playing motion sequence is estimated by using separated signals. After adjusting the spatial arrangement of audio objects so as to each user prefers it, audio/visual rendering is conducted. We constructed two types of demonstration systems for music staging AI. In the smartphone-based implementation, each user enables to change the spatial arrangement through sliderbar dragging. Since information of user preferable spatial arrangement can be sent from each smartphone to server, it would enable to predict/recommend the user preferable spatial arrangement. In another implementation, head mount display (HMD) was utilized to dive into virtual music live performance. Each user enables to walk/teleport anywhere and audio is then changing corresponding to the user view.

DOI： 10.1109/ICASSP.2017.8005294

Scopus

▼display all

To the head of Papers.▲

KAKENHI (Grants-in-Aid for Scientific Research) 3

Development of explainable tactical evaluation technology that can be simulated from video in group behaviors

Grant number：23H03282 2023.4 - 2026.3

Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (B)

　 More details

Authorship：Coinvestigator(s)
Cross-disciplinary research on the prediction and control of real-world interactions based on evidence and causality

Grant number：21H04892 2021.4 - 2024.3

Grants-in-Aid for Scientific Research Grant-in-Aid for Scientific Research (A)

　 More details

Authorship：Coinvestigator(s)
楽器の空間配置制御に基づく楽曲の聴覚印象を多様に変化させる音響空間再生システム

Grant number：16J11472 2016.4 - 2018.3

科学研究費助成事業特別研究員奨励費

大谷健登

　 More details

Authorship：Principal investigator

Grant amount：\1300000 （ Direct Cost: \1300000 ）

To the head of KAKENHI (Grants-in-Aid for Scientific Research).▲