Updated on 2023/12/06


Graduate School of Informatics Department of Intelligent Systems 1 Designated assistant professor
Designated assistant professor

Degree 1

  1. 博士(情報科学) ( 2018.3   名古屋大学 ) 


Papers 18

  1. L-DIG: A GAN-Based Method for LiDAR Point Cloud Processing under Snow Driving Conditions.

    Zhang Y, Ding M, Yang H, Niu Y, Feng Y, Ohtani K, Takeda K

    Sensors (Basel, Switzerland)   Vol. 23 ( 21 )   2023.10

     More details

    Language:English   Publisher:Sensors (Basel, Switzerland)  

    LiDAR point clouds are significantly impacted by snow in driving scenarios, introducing scattered noise points and phantom objects, thereby compromising the perception capabilities of autonomous driving systems. Current effective methods for removing snow from point clouds largely rely on outlier filters, which mechanically eliminate isolated points. This research proposes a novel translation model for LiDAR point clouds, the 'L-DIG' (LiDAR depth images GAN), built upon refined generative adversarial networks (GANs). This model not only has the capacity to reduce snow noise from point clouds, but it also can artificially synthesize snow points onto clear data. The model is trained using depth image representations of point clouds derived from unpaired datasets, complemented by customized loss functions for depth images to ensure scale and structure consistencies. To amplify the efficacy of snow capture, particularly in the region surrounding the ego vehicle, we have developed a pixel-attention discriminator that operates without downsampling convolutional layers. Concurrently, the other discriminator equipped with two-step downsampling convolutional layers has been engineered to effectively handle snow clusters. This dual-discriminator approach ensures robust and comprehensive performance in tackling diverse snow conditions. The proposed model displays a superior ability to capture snow and object features within LiDAR point clouds. A 3D clustering algorithm is employed to adaptively evaluate different levels of snow conditions, including scattered snowfall and snow swirls. Experimental findings demonstrate an evident de-snowing effect, and the ability to synthesize snow effects.

    DOI: 10.3390/s23218660



  2. Learning to Predict Navigational Patterns From Partial Observations

    Karlsson, R; Carballo, A; Lepe-Salazar, F; Fujii, K; Ohtani, K; Takeda, K

    IEEE ROBOTICS AND AUTOMATION LETTERS   Vol. 8 ( 9 ) page: 5592 - 5599   2023.9

     More details

    Publisher:IEEE Robotics and Automation Letters  

    Human beings cooperatively navigate rule-constrained environments by adhering to mutually known navigational patterns, which may be represented as directional pathways or road lanes. Inferring these navigational patterns from incompletely observed environments is required for intelligent mobile robots operating in unmapped locations. However, algorithmically defining these navigational patterns is nontrivial. This letter presents the first self-supervised learning (SSL) method for learning to infer navigational patterns in real-world environments from partial observations only. We explain how geometric data augmentation, predictive world modeling, and an information-theoretic regularizer enable our model to predict an unbiased local directional soft lane probability (DSLP) field in the limit of infinite data. We demonstrate how to infer global navigational patterns by fitting a maximum likelihood graph to the DSLP field. Experiments show that our SSL model outperforms two SOTA supervised lane graph prediction models on the nuScenes dataset. We propose our SSL method as a scalable and interpretable continual learning paradigm for navigation by perception.

    DOI: 10.1109/LRA.2023.3291924

    Web of Science


  3. Synthesizing Realistic Snow Effects in Driving Images Using GANs and Real Data with Semantic Guidance

    Yang, HT; Ding, M; Carballo, A; Zhang, YX; Ohtani, K; Niu, YJ; Ge, MN; Feng, Y; Takeda, K

    2023 IEEE INTELLIGENT VEHICLES SYMPOSIUM, IV   Vol. 2023-June   2023

     More details

    Publisher:IEEE Intelligent Vehicles Symposium, Proceedings  

    Intelligent vehicle perception algorithms often have difficulty accurately analyzing and interpreting images in adverse weather conditions. Snow is a corner case that not only reduces visibility and contrast but also affects the stability of the road environment. While it is possible to train deep learning models on real-world driving datasets in snow weather, obtaining such data can be challenging. Synthesizing snow effects on existing driving datasets is a viable alternative. In this work, we propose a method based on Cycle Consistent Generative Adversarial Networks (CycleGANs) that utilizes additional semantic information to generate snow effects. We apply deep supervision by using intermediate outputs from the last two convolutional layers in the generator as multi-scale supervision signals for training. We collect a small set of driving image data captured under heavy snow as the translation source. We compare the generated images with those produced by various network architectures and evaluate the results qualitatively and quantitatively on the Cityscapes and EuroCity Persons datasets. Experiment results indicate that our model can synthesize realistic snow effects in driving images.

    DOI: 10.1109/IV55152.2023.10186565

    Web of Science


  4. Predictive World Models from Real-World Partial Observations

    Karlsson R., Carballo A., Fujii K., Ohtani K., Takeda K.

    Proceedings - 2023 IEEE International Conference on Mobility, Operations, Services and Technologies, MOST 2023     page: 152 - 166   2023

     More details

    Publisher:Proceedings - 2023 IEEE International Conference on Mobility, Operations, Services and Technologies, MOST 2023  

    Cognitive scientists believe adaptable intelligent agents like humans perform reasoning through learned causal mental simulations of agents and environments. The problem of learning such simulations is called predictive world modeling. Recently, reinforcement learning (RL) agents leveraging world models have achieved SOTA performance in game environments. However, understanding how to apply the world modeling approach in complex real-world environments relevant to mobile robots remains an open question. In this paper, we present a framework for learning a probabilistic predictive world model for real-world road environments. We implement the model using a hierarchical VAE (HVAE) capable of predicting a diverse set of fully observed plausible worlds from accumulated sensor observations. While prior HVAE methods require complete states as ground truth for learning, we present a novel sequential training method to allow HVAEs to learn to predict complete states from partially observed states only. We experimentally demonstrate accurate spatial structure prediction of deterministic regions achieving 96.21 IoU, and close the gap to perfect prediction by 62 % for stochastic regions using the best prediction. By extending HVAEs to cases where complete ground truth states do not exist, we facilitate continual learning of spatial prediction as a step towards realizing explainable and comprehensive predictive world models for real-world mobile robotics applications. Code is available at https://github.com/robin-karlsson0/predictive-world-models.

    DOI: 10.1109/MOST57249.2023.00024


  5. Efficient Training Method for Point Cloud-Based Object Detection Models by Combining Environmental Transitions and Active Learning

    Yamamoto T., Ohtani K., Hayashi T., Carballo A., Takeda K.

    Lecture Notes in Networks and Systems   Vol. 642 LNNS   page: 292 - 303   2023

     More details

    Publisher:Lecture Notes in Networks and Systems  

    The perceptive systems used in automated driving need to function accurately and reliably in a variety of traffic environments. These systems generally perform object detection to identify the positions and attributes of potential obstacles. Among the methods which have been proposed, object detection using three-dimensional (3D) point cloud data obtained using LiDAR has attracted much attention. However, when attempting to create a detection model, annotation must be performed on a huge amount of data. Furthermore, the accuracy of 3D object detection models is dependent on the data domains used for training, such as geographic or traffic environments, so it is necessary to train models for each domain, which requires large amounts of training data for each domain. Therefore, the objective of this study is to develop a 3D object detector for new domains, even when trained with relatively small amounts of annotated data from new domains. We propose using a model that has been trained with a large amount of labeled data for pre-trained model, and simultaneously using transfer learning with limited amount of highly effective training data, selected from the target domain by active learning. Experimental evaluations show that 3D object detection models created using the proposed method perform well at a new location. We also confirm that active learning is particularly effective only limited training data available.

    DOI: 10.1007/978-3-031-26889-2_26


  6. Auditory and visual warning information generation of the risk object in driving scenes based on weakly supervised learning

    Niu, YJ; Ding, M; Zhang, YX; Ohtani, KT; Takeda, K

    2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV)   Vol. 2022-June   page: 1572 - 1577   2022

     More details

    Language:Japanese   Publisher:IEEE Intelligent Vehicles Symposium, Proceedings  

    In this research, a two-stage risk object warning method is proposed to generate the auditory and visual warning information simultaneously from the driving scene. The auditory warning module (AWM) is designed as a classification task by combining the rough location and type information as warning sentences and treating each sentence as one class. The visual warning module (VWM) is designed as a weakly supervised method to save the labor-intensive bounding box marking of risk objects. To confirm the effectiveness of the proposed method, we also create a linguistic risk notification (LRN) dataset by describing the driving scenario as several different sentences. The average accuracy of auditory warning is 96.4% for generating the warning sentences. The average accuracy of the weakly supervised visual warning algorithm is 81.3% for getting the risk vehicle localization without any supervisory information.

    DOI: 10.1109/IV51971.2022.9827382

    Web of Science


  7. Methods of Gently Notifying Pedestrians of Approaching Objects when Listening to Music

    Sakashita, Y; Ishiguro, Y; Ohtani, K; Nishino, T; Takeda, K


     More details

    Publisher:UIST 2022 Adjunct - Adjunct Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology  

    Many people now listen to music with earphones while walking, and are less likely to notice approaching people, cars, etc. Many methods of detecting approaching objects and notifying pedestrians have been proposed, but few have focused on low urgency situations or music listeners, and many notification methods are unpleasant. Therefore, in this work, we propose methods of gently notifying pedestrians listening to music of approaching objects using environmental sound. We conducted experiments in a virtual environment to assess directional perception accuracy and comfort. Our results show the proposed method allows participants to detect the direction of approaching objects as accurately as explicit notification methods, with less discomfort.

    DOI: 10.1145/3526114.3558728

    Web of Science


  8. ViCE: Improving Dense Representation Learning by Superpixelization and Contrasting Cluster Assignment

    Karlsson R., Hayashi T., Fujii K., Carballo A., Ohtani K., Takeda K.

    BMVC 2022 - 33rd British Machine Vision Conference Proceedings     2022

     More details

    Publisher:BMVC 2022 - 33rd British Machine Vision Conference Proceedings  

    Recent self-supervised models have demonstrated equal or better performance than supervised methods, opening for AI systems to learn visual representations from practically unlimited data. However, these methods are typically classification-based and thus ineffective for learning high-resolution feature maps that preserve precise spatial information. This work introduces superpixels to improve self-supervised learning of dense semantically rich visual concept embeddings. Decomposing images into a small set of visually coherent regions reduces the computational complexity by O(1000) while preserving detail. We experimentally show that contrasting over regions improves the effectiveness of contrastive learning methods, extends their applicability to high-resolution images, improves overclustering performance, superpixels are better than grids, and regional masking improves performance. The expressiveness of our dense embeddings is demonstrated by improving the SOTA unsupervised semantic segmentation benchmark on Cityscapes, and for convolutional models on COCO. Code is available at https://github.com/robin-karlsson0/vice.


  9. FollowSelect: Path-based Menu Interaction for Intuitive Navigation Reviewed

    Yusuke Sakai, Yoshio Ishiguro, Kento Ohtani, Takanori Nishino, Kazuya Takeda

      Vol. 62 ( 10 ) page: 1669 - 1680   2021.10

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    DOI: doi/10.20729/00213195

  10. Manipulation of Speed Perception While in Motion Using Auditory Stimuli Reviewed

    Yuta Kanayama, Yoshio Ishiguro, Takanori Nishino, Kento Ohtani, Kazuya Takeda

    2021 18th International Conference on Ubiquitous Robots (UR)     2021.7

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  11. 電気式人工喉頭を用いた歌唱システムにおける自然な身体動作を利用した歌唱表現付与の提案 Reviewed

    大川舜平, 石黒 祥生, 大谷 健登, 西野 隆典, 小林 和弘, 戸田 智基, 武田 一哉

    情報処理学会インタラクション2021論文集     page: 261 - 266   2021.3

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

  12. Driving Behavior Aware Caption Generation for Egocentric Driving Videos Using In-Vehicle Sensors

    Zhang, HK; Takeda, K; Sasano, R; Adachi, Y; Ohtani, K


     More details

    Language:Japanese   Publisher:IEEE Intelligent Vehicles Symposium, Proceedings  

    Video captioning aims to generate textual descriptions according to the video contents. The risk assessment of autonomous driving vehicles has become essential for an insurance company for providing adequate insurance coverage, in particular, for emerging MaaS business. The insurers need to assess the risk of autonomous driving business plans with a fixed route by analyzing a large number of driving data, including videos recorded by dash cameras and sensor signals. To make the process more efficient, generating captions for driving videos can provide insurers concise information to understand the video contents quickly. A natural problem with driving video captioning is, since the absence of egovehicles in these egocentric videos, descriptions of latent driving behaviors are difficult to be grounded in specific visual cues. To address this issue, we focus on generating driving video captions with accurate behavior descriptions, and propose to incorporate in-vehicle sensors which encapsulate the driving behavior information to assist the caption generation. We evaluate our method on the Japanese driving video captioning dataset called City Traffic, where the results demonstrate the effectiveness of in-vehicle sensors on improving the overall performance of generated captions, especially on generating more accurate descriptions for the driving behaviors.

    DOI: 10.1109/IVWorkshops54471.2021.9669259

    Web of Science


  13. End-to-End Learning-based Driving System with Branches by Emphasizing Target Direction Reviewed

    Seiya Shunya, Ohtani Kento, Carballo Alexander, Takeuchi Eijiro, Takeda Kazuya

    Transactions of Society of Automotive Engineers of Japan   Vol. 52 ( 6 ) page: 1368 - 1374   2021

     More details

    Language:Japanese   Publisher:Society of Automotive Engineers of Japan  

    End-to-end driving refers to deep learning methods for generating control signals directly from external sensors. Previous methods use a direction vector towards the target to select and turn at intersections. However, the vector has a smaller dimension than the image, and thus it is ignored during training. In this study, we propose a learning method to emphasize that vector by using L2 regularization, which enables a robot to follow trajectories with branches. We validate the system's performance by conducting experiments using several driving scenarios. Our approach allowed an autonomous robot to successfully follow trajectories, including unknown outdoor trajectories.

    DOI: 10.11351/jsaeronbun.52.1368

    CiNii Research

  14. Improving target selection accuracy for vehicle touch screens

    Ito K., Nishino T., Ohtani K., Takeda K., Ishiguro Y.

    Adjunct Proceedings - 11th International ACM Conference on Automotive User Interfaces and Interactive Vehicular Applications, AutomotiveUI 2019     page: 176 - 180   2019.9

     More details

    Language:Japanese   Publisher:Adjunct Proceedings - 11th International ACM Conference on Automotive User Interfaces and Interactive Vehicular Applications, AutomotiveUI 2019  

    When operating the touch screen in a car, the touch point can shift due to the vibration, resulting in selection errors. Using larger target is a possible solution, but this significantly limits the amount of content that can be displayed on the touch screen. Therefore, we propose a method for in-vehicle touch screen target selection that can be used with a variety of sensors to increase selection accuracy. In this method, the vibration feature is learned by Variational Auto-Encoder based model, and it is used for estimating touch point distribution. Our experimental results demonstrate that the proposed method allows users to achieve higher target selection accuracy than conventional methods.

    DOI: 10.1145/3349263.3351327


  15. Music Source Enhancement Using a Convolutional Denoising Autoencoder and Log-frequency Scale Spectral Features

    OHTANI Kento, NIWA Kenta, NISHINO Takanori, TAKEDA Kazuya

      Vol. J101-D ( 3 ) page: 615 - 627   2018.3

     More details

    Language:Japanese   Publisher:The Institute of Electronics, Information and Communication Engineers  

    We propose a music source enhancement technique which uses a convolutional denoising autoencoder (CDAE) and the log-frequency scale amplitude spectral features of musical instrument signals. The structure of the CDAE includes amplitude spectral characteristics of the sounds created by musical instruments in its network in order to estimate the amplitude spectra of the target signals. Evaluation results show that the proposed network achieves better signal-to-interference ratios (SIRs) than conventional network/input feature structures. We also propose a complementary CDAE approach, which estimates target and noise amplitude spectra simultaneously and combines them. By using complementary CDAE, SIRs of the estimated music signals are further improved.

    DOI: 10.14923/transinfj.2017pdp0021

    CiNii Research

  16. 畳み込み雑音除去自己符号化器と対数周波数領域スペクトル特徴を用いた楽曲音源強調 Reviewed

    大谷 健登

    電子情報通信学会論文誌(D)   Vol. J101-D   page: 615 - 627   2018

     More details

  17. A Single-Dimensional Interface for Arranging Multiple Audio Sources in Three-Dimensional Space

    OHTANI Kento, NIWA Kenta, TAKEDA Kazuya

    IEICE Transactions on Information and Systems   Vol. E100D ( 10 ) page: 2635 - 2643   2017.10

     More details

    Language:English   Publisher:The Institute of Electronics, Information and Communication Engineers  

    <p>A single-dimensional interface which enables users to obtain diverse localizations of audio sources is proposed. In many conventional interfaces for arranging audio sources, there are multiple arrangement parameters, some of which allow users to control positions of audio sources. However, it is difficult for users who are unfamiliar with these systems to optimize the arrangement parameters since the number of possible settings is huge. We propose a simple, single-dimensional interface for adjusting arrangement parameters, allowing users to sample several diverse audio source arrangements and easily find their preferred auditory localizations. To select subsets of arrangement parameters from all of the possible choices, auditory-localization space vectors (ASVs) are defined to represent the auditory localization of each arrangement parameter. By selecting subsets of ASVs which are approximately orthogonal, we can choose arrangement parameters which will produce diverse auditory localizations. Experimental evaluations were conducted using music composed of three audio sources. Subjective evaluations confirmed that novice users can obtain diverse localizations using the proposed interface.</p>

    DOI: 10.1587/transinf.2017EDP7028

    Web of Science


    CiNii Research

  18. Music staging AI

    Niwa K., Ohtani K., Takeda K.

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   Vol. 2017-March   page: 6588 - 6589   2017

     More details

    Language:Japanese   Publisher:ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings  

    Through smartphones, user enables to download/listen music anytime and anywhere. As a concept of a future audio player, we propose a framework of music staging artificial intelligence (AI). In that framework, audio object signals, e.g. vocal, guitar, bass, drums and keyboards, are assumed to be extracted from stereo music signals. To visualize music as if live performance is virtually conducted, playing motion sequence is estimated by using separated signals. After adjusting the spatial arrangement of audio objects so as to each user prefers it, audio/visual rendering is conducted. We constructed two types of demonstration systems for music staging AI. In the smartphone-based implementation, each user enables to change the spatial arrangement through sliderbar dragging. Since information of user preferable spatial arrangement can be sent from each smartphone to server, it would enable to predict/recommend the user preferable spatial arrangement. In another implementation, head mount display (HMD) was utilized to dive into virtual music live performance. Each user enables to walk/teleport anywhere and audio is then changing corresponding to the user view.

    DOI: 10.1109/ICASSP.2017.8005294


▼display all

KAKENHI (Grants-in-Aid for Scientific Research) 3

  1. Development of explainable tactical evaluation technology that can be simulated from video in group behaviors

    Grant number:23H03282  2023.4 - 2026.3

    Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (B)

      More details


  2. Cross-disciplinary research on the prediction and control of real-world interactions based on evidence and causality

    Grant number:21H04892  2021.4 - 2024.3

    Grants-in-Aid for Scientific Research  Grant-in-Aid for Scientific Research (A)

      More details


  3. 楽器の空間配置制御に基づく楽曲の聴覚印象を多様に変化させる音響空間再生システム

    Grant number:16J11472  2016.4 - 2018.3

    科学研究費助成事業  特別研究員奨励費

    大谷 健登

      More details

    Authorship:Principal investigator 

    Grant amount:\1300000 ( Direct Cost: \1300000 )