Papers - TODA Tomoki
-
Multi-modal video summarization based on two-stage fusion of audio, visual, and recognized text information Reviewed
Z. Yang, J. He, T. Toda
Proc. APSIPA ASC page: 6 pages 2024.12
-
Multi-speaker text-to-speech training with speaker anonymized data Reviewed International coauthorship
W.-C. Huang, Y.-C. Wu, T. Toda
IEEE Signal Processing Letters Vol. 31 page: 2995 - 2999 2024.10
-
2DP-2MRC: 2-dimensional pointer-based machine reading comprehension method for multimodal moment retrieval Reviewed
J. He, T. Toda
Proc. INTERSPEECH page: 5073 - 5077 2024.9
-
CtrSVDD: a benchmark dataset and baseline analysis for controlled singing voice deepfake detection Reviewed International coauthorship
Y. Zang, J. Shi, Y. Zhang, R. Yamamoto, J. Han, Y. Tang, S. Xu, W. Zhao, J. Guo, T. Toda, Z. Duan
Proc. INTERSPEECH page: 4783 - 4787 2024.9
-
Exploring the robustness of text-to-speech synthesis based on diffusion probabilistic models to heavily noisy transcriptions Reviewed
J. Feng, Y. Yasuda, T. Toda
Proc. INTERSPEECH page: 4408 - 4412 2024.9
-
QHM-GAN: neural vocoder based on quasi-harmonic modeling Reviewed
S. Chen, T. Toda
Proc. INTERSPEECH page: 3889 - 3893 2024.9
-
Multimodal fusion of music theory-inspired and self-supervised representations for improved emotion recognition Reviewed International coauthorship
X. Shi, X. LI, T. Toda
Proc. INTERSPEECH page: 3724 - 3728 2024.9
-
Quantifying the effect of speech pathology on automatic and human speaker verification Reviewed International coauthorship
B. Halpern, T. Tienkamp, W.-C. Huang, L.P. Violeta, T. Rebernik, S. de Visscher, M.J.H. Witjes, M. Wieling, D. Abur, T. Toda
Proc. INTERSPEECH page: 3015 - 3019 2024.9
-
Embedding learning for preference-based speech quality assessment Reviewed
C.-H. Hu, Y. Yasuda, T. Toda
Proc. INTERSPEECH page: 2685 - 2689 2024.9
-
Challenge of singing voice synthesis using only text-to-speech corpus with FIRNet source-filter neural vocoder Reviewed
T. Okamoto, Y. Ohtani, S. Shimizu, T. Toda, H. Kawai
Proc. INTERSPEECH page: 1870 - 1874 2024.9
-
Discriminative neighborhood smoothing for generative anomalous sound detection Reviewed
T. Fujimura, K. Imoto, T. Toda
Proc. EUSIPCO page: 156 - 160 2024.8
-
Unsupervised training of neural network-based virtual microphone estimator Reviewed
J. Wang, T. Toda
Proc. EUSIPCO page: 256 - 260 2024.8
-
Robust sequence-to-sequence voice conversion for electrolaryngeal speech enhancement in noisy and reverberant conditions Reviewed
D. Ma, Y. Choi, F. Li, C. Xie, K. Kobayashi, T. Toda
Proc. IEEE EMBC page: 4 pages 2024.7
-
音声のMOS評価法の限界と大規模比較評価の新しい可能性 Invited Reviewed
安田 裕介, 戸田 智基
日本音響学会誌 Vol. 80 ( 7 ) page: 393 - 400 2024.7
-
合成音声の客観評価とVoiceMOSチャレンジ Invited Reviewed International coauthorship
クーパー エリカ, ホワン ウェンチン, ツァオ ユ, ワン シンミン, 戸田 智基, 山岸 順一
日本音響学会誌 Vol. 80 ( 7 ) page: 381 - 392 2024.7
-
A review on subjective and objective evaluation of synthetic speech Invited Reviewed International coauthorship
E. Cooper, W.-C. Huang, Y. Tsao, H.-M. Wang, T. Toda, J. Yamagishi
Acoustical Science and Technology Vol. 45 ( 4 ) page: 161 - 183 2024.7
-
Mandarin speech reconstruction from tongue motion ultrasound images based on generative adversarial networks Reviewed International coauthorship
F. Li, F. Shen, D. Ma, S. Zhang, J. Zhou, L. Wang, F. Fan, T. Liu, X. Chen, T. Toda, H. Niu
Proc. IEEE EMBC page: 4 pages 2024.7
-
Unequally spaced sound field interpolation for rotation-robust beamforming Reviewed
S. Luan, Y. Wakabayashi, T. Toda
IEEE/ACM Transactions on Audio, Speech and Language Processing Vol. 32 page: 3185 - 3199 2024.6
-
Pretraining and adaptation techniques for electrolaryngeal speech recognition Reviewed
L.P. Violeta, D. Ma, W.-C. Huang, T. Toda
IEEE/ACM Transactions on Audio, Speech and Language Processing Vol. 32 page: 2777 - 2789 2024.5
-
Audio difference learning for audio captioning Reviewed
T. Komatsu, Y. Fujita, K. Takeda, T. Toda
Proc. IEEE ICASSP page: 1456 - 1460 2024.4