Updated on 2026/02/19

写真a

 
HUANG Wen Chin
 
Organization
Graduate School of Informatics Department of Intelligent Systems 1 Assistant Professor
Graduate School
Graduate School of Informatics
Undergraduate School
School of Informatics Department of Computer Science
Title
Assistant Professor
Profile
2018年台湾・国立台湾大学学士号,2021年名古屋大学修士号,2024年同大学博士号.2017年から2019年まで台湾・中央研究院情報科学研究所にて研究助手を務める.現在,名古屋大学大学院情報学研究科助教.Voice Conversion Challenge 2020およびVoiceMOS Challenge 2022の共同オーガナイザー.音声変換と音声品質評価を中心に,音声処理へのディープラーニングの応用を研究.ISCSLP2018最優秀学生論文賞,APSIPA ASC2021最優秀論文賞受賞

Degree 3

  1. 博士(情報学) ( 2024.3   名古屋大学 ) 

  2. 修士(情報学) ( 2021.3   名古屋大学 ) 

  3. Bachelor of Science ( 2018.6   National Taiwan University ) 

Research Interests 8

  1. 音声品質評価

  2. voice conversion

  3. 音声情報処理

  4. speech processing

  5. speech synthesis

  6. voice conversion

  7. 自動音声評価

  8. 深層学習

Research Areas 2

  1. Informatics / Perceptual information processing

  2. Informatics / Perceptual information processing  / 音声情報処理

Research History 3

  1. Nagoya University   Graduate School of Informatics   Assistant Professor

    2024.4

  2. Google DeepMind   Student researcher

    2023.4 - 2024.3

      More details

    Country:Japan

  3. Japan Society for the Promotion of Science

    2021.4 - 2024.3

      More details

    Country:Japan

Education 1

  1. Nagoya University   Graduate School of Informatics   Department of Intelligent Systems

    2021.4 - 2024.3

      More details

    Country: Japan

Professional Memberships 4

  1. 電子情報通信学会

    2025.4

  2. 日本音響学会

    2024.4

  3. IEEE

    2020

  4. ISCA

    2019

Committee Memberships 5

  1. VoiceMOS Challenge   Organizing Committee Member  

    2024   

  2. Singing Voice Conversion Challenge   Organizing Committee Member  

    2023   

  3. VoiceMOS Challenge   Organizing Committee Member  

    2023   

  4. VoiceMOS Challenge   Organizing Committee Member  

    2022   

  5. Voice Conversion Challenge   Organizing Committee Member  

    2020   

 

Papers 27

  1. Severity-controllable pathological text-to-speech synthesis for clinical applications Open Access

    Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, Tomoki Toda

    IEEE Transactions on Neural Systems and Rehabilitation Engineering   Vol. 34   page: 573 - 582   2026

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1109/TNSRE.2026.3651761

    Open Access

    Web of Science

    Scopus

    PubMed

  2. VERSA-v2: a modular and scalable toolkit for speech and audio evaluation with expanded metrics, visualization, and LLM integration Reviewed

    Jiatong Shi, Bo-Hao Su, Shikhar Bharadwaj, Yiwen Zhao, Shih-Heng Wang, Jionghao Han, Haoran Wang, Wei Wang, Wenhao Feng, Yuxun Tang, Siddhant Arora, Jinchuan Tian, William Chen, Hye-jin Shim, Wangyou Zhang, Wen-Chin Huang, Shinji Watanabe

    IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)     2025.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  3. HighRateMOS: sampling-rate aware modeling for speech quality assessment Reviewed

    Wenze Ren, Yi-Cheng Lin, Wen-Chin Huang, Ryandhimas E. Zezario, Szu-Wei Fu, Sung-Feng Huang, Erica Cooper, Haibin Wu, Hung-Yu Wei, Hsin-Min Wang, Hung-yi Lee, Yu Tsao

    IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)     2025.12

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  4. The AudioMOS Challenge 2025 Reviewed

    Wen-Chin Huang, Hui Wang, Cheng Liu, Yi-Chiao Wu, Andros Tjandra, Wei-Ning Hsu, Erica Cooper, Yong Qin, Tomoki Toda

    IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)     2025.12

     More details

    Authorship:Lead author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)  

  5. Hierarchical Symbolic Music Generation with Variational Autoencoder-Based Bar-Wise Feature Sequences Reviewed

    Keito Sawada, Wen-Chin Huang, Tomoki Toda

    2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     page: 299 - 304   2025.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipaasc65261.2025.11249414

  6. Advancing Speech Quality Assessment Through Scientific Challenges and Open-Source Activities Invited Reviewed

    Wen-Chin Huang

    2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     page: 2552 - 2557   2025.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipaasc65261.2025.11249197

  7. An Evaluation of Supervised Virtual Microphone Estimators in Reverberant Sound Fields Reviewed

    Kimihiro Hattori, Wen-Chin Huang, Kazuya Takeda, Tomoki Toda

    2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     page: 125 - 130   2025.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipaasc65261.2025.11249347

  8. Designing a Music Difficulty Measure for Controllable Automatic Piano Rearrangement Reviewed

    Hikari Miyaji, Keito Sawada, Wen-Chin Huang, Tomoki Toda

    2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     page: 246 - 251   2025.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipaasc65261.2025.11249163

  9. Disfluency Disentanglement Enhancement in Spoken-Text-Style Transfer for Spontaneous Speech Synthesis Reviewed

    Yuuto Nakata, Daiki Yoshioka, Wen-Chin Huang, Tomoki Toda

    2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     page: 1098 - 1103   2025.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipaasc65261.2025.11248977

  10. Estimating Speaker's Seating Position from Monaural Speech in a Simulated Vehicle Interior Sound Field Reviewed

    Masataka Kaneko, Wen-Chin Huang, Tomoki Toda

    2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)     page: 625 - 629   2025.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    DOI: 10.1109/apsipaasc65261.2025.11249106

  11. Adjusting Bias in Anomaly Scores via Variance Minimization for Domain-Generalized Discriminative Anomalous Sound Detection Reviewed

    Matsumoto, Masaaki, Fujimura, Takuya, Huang, WenChin, Toda, Tomoki

    Proceedings of the 10th Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2025)     page: 25 - 29   2025.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  12. Resolving Domain Mismatches in Electrolaryngeal Speech Enhancement With Linguistic Intermediates

    Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda

    IEEE Journal of Selected Topics in Signal Processing   Vol. 19 ( 5 ) page: 827 - 839   2025.7

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1109/JSTSP.2025.3584195

    Web of Science

    Scopus

  13. Investigating Factors Related to the Naturalness of Synthesized Unison Singing

    Nishizawa K., Yamamoto R., Huang W.C., Toda T.

    ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings     2025

     More details

    Publisher:ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings  

    Singing voice synthesis (SVS) technology has progressed rapidly in recent years. However, vocal ensemble synthesis has not yet been widely explored. In this work, we focus on unison singing, which is to have several singers singing the same melody together. Our goal is to understand what acoustic properties affect the naturalness of the synthesized unison singing.We utilize NNSVS, an SVS toolkit that allows us to manipulate individual acoustic features, including timing, f0, and spectrum features, in a fully data-driven manner to investigate their effect in unison singing synthesis. Through listening tests, it was shown that the fluctuation in timing and f0 is an important factor in synthesizing natural unison singing. Furthermore, we discovered the potential to generate unison singing using an SVS model trained only with a single singer dataset.

    DOI: 10.1109/ICASSP49660.2025.10889744

    Scopus

  14. Serenade: A Singing Style Conversion Framework Based On Audio Infilling

    Violeta L.P., Huang W.C., Toda T.

    European Signal Processing Conference     page: 411 - 415   2025

     More details

    Publisher:European Signal Processing Conference  

    We propose Serenade, a novel framework for the singing style conversion (SSC) task. Although singer identity conversion has made great strides in the previous years, converting the singing style of a singer has been an unexplored research area. We find three main challenges in SSC: modeling the target style, disentangling source style, and retaining the source melody. To model the target singing style, we use an audio infilling task by predicting a masked segment of the target mel-spectrogram with a flow-matching model using the complement of the masked target mel-spectrogram along with disentangled acoustic features. On the other hand, to disentangle the source singing style, we use a cyclic training approach, where we use synthetic converted samples as source inputs and reconstruct the original source mel-spectrogram as a target. Finally, to retain the source melody better, we investigate a post-processing module using a source-filter-based vocoder and resynthesize the converted waveforms using the original F0 patterns. Our results showed that the Serenade framework can handle generalized SSC tasks with the best overall similarity score, especially in modeling breathy and mixed singing styles. We also found that resynthesizing with the original F0 patterns alleviated out-of-tune singing and improved naturalness, but found a slight tradeoff in similarity due to not changing the F0 patterns into the target style.

    DOI: 10.23919/EUSIPCO63237.2025.11226227

    Scopus

  15. Serenade: A Singing Style Conversion Framework Based on Audio Infilling.

    Lester Phillip Violeta, Wen-Chin Huang, Tomoki Toda

    EUSIPCO     page: 411 - 415   2025

     More details

    Publishing type:Research paper (international conference proceedings)  

    Other Link: https://dblp.uni-trier.de/rec/conf/eusipco/2025

  16. VAE-SiFiGAN: Source-Filter HiFi-GAN Based on Variational Autoencoder Representations with Enhanced Pitch Controllability

    Ogita K., Yoneyama R., Huang W.C., Toda T.

    European Signal Processing Conference     page: 531 - 535   2025

     More details

    Publisher:European Signal Processing Conference  

    Source-filter HiFi-GAN (SiFi-GAN) is a neural vocoder offering fast, high-quality voice synthesis with fundamental frequency (F0) controllability. However, SiFi-GAN takes hand-crafted acoustic features from traditional signal processing as input, causing some limitations, such as sound quality degradation in F0 extrapolation. This paper proposes VAE-SiFiGAN, which learns latent representations from Mel-spectrograms via a variational autoencoder (VAE). The latent representations learned through the probabilistic framework enable SiFi-GAN to better model the stochastic components in speech signals, achieving sound quality improvements in F0 modification. Furthermore, to address the insufficient F0 controllability caused by the entanglement of Mel-spectrograms and F0 information, we propose to guide the latent representation learning process with hand-crafted features less affected by F0 and used only during training. Experimental results show that VAE-SiFiGAN achieves superior F0 controllability compared to SiFi-GAN.

    DOI: 10.23919/EUSIPCO63237.2025.11226579

    Scopus

  17. VAE-SiFiGAN: Source-Filter HiFi-GAN Based on Variational Autoencoder Representations with Enhanced Pitch Controllability.

    Kenichi Ogita, Reo Yoneyama, Wen-Chin Huang, Tomoki Toda

    EUSIPCO     page: 531 - 535   2025

     More details

    Publishing type:Research paper (international conference proceedings)  

    Other Link: https://dblp.uni-trier.de/rec/conf/eusipco/2025

  18. Music Similarity Representation Learning Focusing on Individual Instruments with Source Separation and Human Preference Invited Reviewed Open Access

    Takehiro Imamura, Yuka Hashizume, Wen-Chin Huang, Tomoki Toda

    APSIPA Transactions on Signal and Information Processing   Vol. 14 ( 4 )   2025

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Emerald  

    DOI: 10.1561/116.20250016

    Open Access

    Web of Science

    Scopus

  19. Investigating Factors Related to the Naturalness of Synthesized Unison Singing

    Nishizawa, K; Yamamoto, R; Huang, WC; Toda, T

    2025 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)     2025

     More details

  20. A review on subjective and objective evaluation of synthetic speech

    Cooper Erica, Huang Wen-Chin, Tsao Yu, Wang Hsin-Min, Toda Tomoki, Yamagishi Junichi

    Acoustical Science and Technology   Vol. 45 ( 4 ) page: 161 - 183   2024.7

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:ACOUSTICAL SOCIETY OF JAPAN  

    <p>Evaluating synthetic speech generated by machines is a complicated process, as it involves judging along multiple dimensions including naturalness, intelligibility, and whether the intended purpose is fulfilled. While subjective listening tests conducted with human participants have been the gold standard for synthetic speech evaluation, its costly process design has also motivated the development of automated objective evaluation protocols. In this review, we first provide a historical view of listening test methodologies, from early in-lab comprehension tests to recent large-scale crowdsourcing mean opinion score (MOS) tests. We then recap the development of automatic measures, ranging from signal-based metrics to model-based approaches that utilize deep neural networks or even the latest self-supervised learning techniques. We also describe the VoiceMOS Challenge series, a scientific event we founded that aims to promote the development of data-driven synthetic speech evaluation. Finally, we provide insights into unsolved issues in this field as well as future prospects. This review is expected to serve as an entry point for early academic researchers to enrich their knowledge in this field, as well as speech synthesis practitioners to catch up on the latest developments.</p>

    DOI: 10.1250/ast.e24.12

    Web of Science

    Scopus

    CiNii Research

  21. Objective assessment of synthetic speech and the VoiceMOS Challenge

    Cooper Erica, Huang Wen-Chin, Tsao Yu, Wang Hsin-Min, Toda Tomoki, Yamagishi Junichi

    THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN   Vol. 80 ( 7 ) page: 381 - 392   2024.7

     More details

    Language:Japanese   Publisher:Acoustical Society of Japan  

    DOI: 10.20697/jasj.80.7_381

    CiNii Research

  22. Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition.

    Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda

    IEEE ACM Trans. Audio Speech Lang. Process.   Vol. 32   page: 2777 - 2789   2024

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1109/TASLP.2024.3402557

    Web of Science

    Scopus

  23. A Large-Scale Evaluation of Speech Foundation Models.

    Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee

    IEEE ACM Trans. Audio Speech Lang. Process.   Vol. 32   page: 2884 - 2899   2024

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1109/TASLP.2024.3389631

    Web of Science

    Scopus

  24. Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders. Open Access

    Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda

    ICASSP     page: 10961 - 10965   2024

     More details

    Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1109/ICASSP48485.2024.10447197

    Web of Science

    Scopus

    Other Link: https://dblp.uni-trier.de/db/conf/icassp/icassp2024.html#VioletaHMYKT24

  25. AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion.

    HUANG Wen-Chin, 小林和弘, 小林和弘, 戸田智基

    日本音響学会研究発表会講演論文集(CD-ROM)   Vol. 2024   2024

     More details

  26. THE VOICEMOS CHALLENGE 2024: BEYOND SPEECH QUALITY PREDICTION

    Huang, WC; Fu, SW; Cooper, E; Zezario, RE; Toda, T; Wang, HM; Yamagishi, J; Tsao, Y

    2024 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT     page: 803 - 810   2024

     More details

    Publisher:Proceedings of 2024 IEEE Spoken Language Technology Workshop Slt 2024  

    We present the third edition of the VoiceMOS Challenge, a scientific initiative designed to advance research into automatic prediction of human speech ratings. There were three tracks. The first track was on predicting the quality of 'zoomed-in' high-quality samples from speech synthesis systems. The second track was to predict ratings of samples from singing voice synthesis and voice conversion with a large variety of systems, listeners, and languages. The third track was semi-supervised quality prediction for noisy, clean, and enhanced speech, where a very small amount of labeled training data was provided. Among the eight teams from both academia and industry, we found that many were able to outperform the baseline systems. Successful techniques included retrieval-based methods and the use of non-self-supervised representations like spectrograms and pitch histograms. These results showed that the challenge has advanced the field of subjective speech rating prediction.

    DOI: 10.1109/SLT61566.2024.10832295

    Web of Science

    Scopus

  27. Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data

    Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda

    IEEE Signal Processing Letters   Vol. 31   page: 2995 - 2999   2024

     More details

    Publishing type:Research paper (scientific journal)  

    DOI: 10.1109/LSP.2024.3482701

    Web of Science

    Scopus

▼display all

MISC 1

  1. AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion.

    HUANG Wen-Chin, 小林和弘, 小林和弘, 戸田智基

    日本音響学会研究発表会講演論文集(CD-ROM)   Vol. 2024   2024

     More details

Presentations 5

  1. Challenges in self-supervised speech representation-based voice conversion Invited

    Wen-Chin Huang

    ASA-ASJ Joint Meeting  2025.12.3 

     More details

    Event date: 2025.12

    Language:English   Presentation type:Oral presentation (invited, special)  

  2. Automatic quality assessment for speech and beyond International conference

    Wen-Chin Huang, Erica Cooper, Jiatong Shi

    INTERSPEECH  2025.8.17 

     More details

    Event date: 2025.8

    Language:English   Presentation type:Public lecture, seminar, tutorial, course, or other speech  

  3. Fundamentals, Prospectives and Challenges in Deep-learning based Voice Conversion Invited

    HUANG Wen-Chin

    Research Center for Information Technology Innovation (CITI), Academia Sinica  2024.8.14 

     More details

    Presentation type:Public lecture, seminar, tutorial, course, or other speech  

  4. Progress and Future Perspectives on Deep-learning based Voice Conversion Invited

    HUANG Wen-Chin

    2024.10.22 

     More details

    Language:Japanese   Presentation type:Oral presentation (invited, special)  

  5. Automatic quality assessment for speech and beyond Invited

    Wen-Chin Huang

    Conversational AI Reading Group, Mila/Concordia University  2025.5.15 

     More details

    Language:English   Presentation type:Public lecture, seminar, tutorial, course, or other speech  

Research Project for Joint Research, Competitive Funding, etc. 4

  1. Universal, Explainable and Extensible Automatic Evaluation of Synthesized Speech

    Grant number:25K00143  2025.2 - 2029.3

    Grants-in-Aid for Scientific Research  Kiban B

    Erica Cooper, Wen-Chin Huang

      More details

    Grant amount:\18720000 ( Direct Cost: \14400000 、 Indirect Cost:\4320000 )

  2. Audiobox Responsible Generation Grant

    2024.11

    Unrestricted Research Gift

      More details

    Authorship:Principal investigator 

  3. Google Research Grant

    2024.9

    Unrestrcited gift

      More details

    Authorship:Principal investigator 

  4. Augmented speech communication using multi-modal signals with real-time, low-latency voice conversion

    Grant number:22KJ1519  2023.3 - 2024.3

    Grants-in-Aid for Scientific Research  Grant-in-Aid for JSPS Fellows

      More details

    Grant amount:\2200000 ( Direct Cost: \2200000 )

KAKENHI (Grants-in-Aid for Scientific Research) 1

  1. Augmented speech communication using multi-modal signals with real-time, low-latency voice conversion

    Grant number:21J20920  2021.4 - 2024.3

    Japan Society for the Promotion of Science  Grants-in-Aid for Scientific Research  Grant-in-Aid for JSPS Fellows

 

Teaching Experience (On-campus) 2

  1. 確率及び統計演習

    2024

  2. プログラミング2演習

    2024

Teaching Experience (Off-campus) 4

  1. Data processing tools

    2025.7 Nagoya University)

  2. Computer Science Experiments

    2025.4 Nagoya University)

  3. Programming 2

    2024.11 Nagoya University)

  4. 確率及び統計演習

    2024.10 名古屋大学 情報学部)