Faculty Profiles - HUANG Wen Chin

写真a

HUANG Wen Chin

Organization

Graduate School of Informatics Department of Intelligent Systems 1 Assistant Professor

Graduate School

Graduate School of Informatics

Undergraduate School

School of Informatics Department of Computer Science

Homepage

https://unilight.github.io/

Profile

2018年台湾・国立台湾大学学士号，2021年名古屋大学修士号，2024年同大学博士号．2017年から2019年まで台湾・中央研究院情報科学研究所にて研究助手を務める．現在，名古屋大学大学院情報学研究科助教．Voice Conversion Challenge 2020およびVoiceMOS Challenge 2022の共同オーガナイザー．音声変換と音声品質評価を中心に，音声処理へのディープラーニングの応用を研究．ISCSLP2018最優秀学生論文賞，APSIPA ASC2021最優秀論文賞受賞

Degree 3

博士（情報学）（ 2024.3 名古屋大学）
修士（情報学）（ 2021.3 名古屋大学）
Bachelor of Science （ 2018.6 National Taiwan University ）

To the head of Degree.▲

Research Interests 6

音声品質評価
voice conversion
音声情報処理
speech processing
speech synthesis
voice conversion

To the head of Research Interests.▲

Research Areas 2

Informatics / Perceptual information processing
Informatics / Perceptual information processing / 音声情報処理

To the head of Research Areas.▲

Research History 2

Nagoya University Graduate School of Informatics Assistant Professor

2024.4
Google DeepMind Student researcher

2023.4 - 2024.3

　 More details

Country：Japan

To the head of Research History.▲

Education 1

Nagoya University Graduate School of Informatics Department of Intelligent Systems

2021.4 - 2024.3

　 More details

Country： Japan

To the head of Education.▲

Professional Memberships 1

日本音響学会

2024.4

To the head of Professional Memberships.▲

Committee Memberships 5

VoiceMOS Challenge Organizing Committee Member

2024
Singing Voice Conversion Challenge Organizing Committee Member

2023
VoiceMOS Challenge Organizing Committee Member

2023
VoiceMOS Challenge Organizing Committee Member

2022
Voice Conversion Challenge Organizing Committee Member

2020

To the head of Committee Memberships.▲

Papers 10

Investigating Factors Related to the Naturalness of Synthesized Unison Singing

Nishizawa K., Yamamoto R., Huang W.C., Toda T.

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings 2025

　More details

Publisher：ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

Singing voice synthesis (SVS) technology has progressed rapidly in recent years. However, vocal ensemble synthesis has not yet been widely explored. In this work, we focus on unison singing, which is to have several singers singing the same melody together. Our goal is to understand what acoustic properties affect the naturalness of the synthesized unison singing.We utilize NNSVS, an SVS toolkit that allows us to manipulate individual acoustic features, including timing, f0, and spectrum features, in a fully data-driven manner to investigate their effect in unison singing synthesis. Through listening tests, it was shown that the fluctuation in timing and f0 is an important factor in synthesizing natural unison singing. Furthermore, we discovered the potential to generate unison singing using an SVS model trained only with a single singer dataset.

DOI： 10.1109/ICASSP49660.2025.10889744

Scopus
Resolving Domain Mismatches in Electrolaryngeal Speech Enhancement With Linguistic Intermediates

Violeta L.P., Huang W.C., Ma D., Yamamoto R., Kobayashi K., Toda T.

IEEE Journal on Selected Topics in Signal Processing 2025

　More details

Publisher：IEEE Journal on Selected Topics in Signal Processing

We investigate the use of linguistic intermediates to resolve domain mismatches in the electrolaryngeal (EL) speech enhancement task. We first propose the use of linguistic encoders to produce bottleneck feature intermediates, and use a recognition, alignment, and synthesis framework, effectively improving performance due to the removal of the timbre mismatches between the pretraining (typical) and fine-tuning (EL) data. We then further improve this by introducing discrete text intermediates, which effectively alleviate temporal mismatches between the source (EL) and target (typical) data to improve prosody modeling. Our findings show that by simply using bottleneck feature intermediates, more intelligible and naturally sounding speech can already be synthesized, as shown by a significant 16% improvement in character error rate and 0.83 improvement in naturalness score compared to the baseline. Moreover, through the use of discrete phoneme-level intermediates, we can further improve the modeling of the temporal structure of typical speech and get another absolute improvement of 1.4% in character error rate and 0.2 in naturalness compared to the initially proposed system. Finally, we also verify these findings on a larger pseudo-EL dataset of 14 speakers and another set of 3 real-world EL speakers, which consistently show that using the phoneme-level intermediates is most effective approach in terms of phoneme error rate. We conclude the research by summarizing the advantages and disadvantages of each proposed technique.

DOI： 10.1109/JSTSP.2025.3584195

Scopus
A review on subjective and objective evaluation of synthetic speech Open Access

Cooper Erica, Huang Wen-Chin, Tsao Yu, Wang Hsin-Min, Toda Tomoki, Yamagishi Junichi

Acoustical Science and Technology Vol. 45 ( 4 ) page： 161 - 183 2024.7

　More details

Language：English Publishing type：Research paper (scientific journal) Publisher：ACOUSTICAL SOCIETY OF JAPAN

<p>Evaluating synthetic speech generated by machines is a complicated process, as it involves judging along multiple dimensions including naturalness, intelligibility, and whether the intended purpose is fulfilled. While subjective listening tests conducted with human participants have been the gold standard for synthetic speech evaluation, its costly process design has also motivated the development of automated objective evaluation protocols. In this review, we first provide a historical view of listening test methodologies, from early in-lab comprehension tests to recent large-scale crowdsourcing mean opinion score (MOS) tests. We then recap the development of automatic measures, ranging from signal-based metrics to model-based approaches that utilize deep neural networks or even the latest self-supervised learning techniques. We also describe the VoiceMOS Challenge series, a scientific event we founded that aims to promote the development of data-driven synthetic speech evaluation. Finally, we provide insights into unsolved issues in this field as well as future prospects. This review is expected to serve as an entry point for early academic researchers to enrich their knowledge in this field, as well as speech synthesis practitioners to catch up on the latest developments.</p>

DOI： 10.1250/ast.e24.12

Open Access

Web of Science

Scopus

CiNii Research
Objective assessment of synthetic speech and the VoiceMOS Challenge

Cooper Erica, Huang Wen-Chin, Tsao Yu, Wang Hsin-Min, Toda Tomoki, Yamagishi Junichi

THE JOURNAL OF THE ACOUSTICAL SOCIETY OF JAPAN Vol. 80 ( 7 ) page： 381 - 392 2024.7

　More details

Language：Japanese Publisher：Acoustical Society of Japan

DOI： 10.20697/jasj.80.7_381

CiNii Research
Pretraining and Adaptation Techniques for Electrolaryngeal Speech Recognition. Open Access

Lester Phillip Violeta, Ding Ma, Wen-Chin Huang, Tomoki Toda

IEEE ACM Trans. Audio Speech Lang. Process. Vol. 32 page： 2777 - 2789 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/TASLP.2024.3402557

Open Access

Web of Science

Scopus
A Large-Scale Evaluation of Speech Foundation Models. Open Access

Shu-Wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li 0001, Abdelrahman Mohamed, Shinji Watanabe 0001, Hung-yi Lee

IEEE ACM Trans. Audio Speech Lang. Process. Vol. 32 page： 2884 - 2899 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/TASLP.2024.3389631

Web of Science

Scopus
Electrolaryngeal Speech Intelligibility Enhancement through Robust Linguistic Encoders. Open Access

Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, Ryuichi Yamamoto, Kazuhiro Kobayashi, Tomoki Toda

ICASSP page： 10961 - 10965 2024

　More details

Publishing type：Research paper (international conference proceedings)

DOI： 10.1109/ICASSP48485.2024.10447197

Web of Science

Scopus

Other Link： https://dblp.uni-trier.de/db/conf/icassp/icassp2024.html#VioletaHMYKT24
The Voicemos Challenge 2024: Beyond Speech Quality Prediction

Huang W.C., Fu S.W., Cooper E., Zezario R.E., Toda T., Wang H.M., Yamagishi J., Tsao Y.

Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024 page： 803 - 810 2024

　More details

Publisher：Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

We present the third edition of the VoiceMOS Challenge, a scientific initiative designed to advance research into automatic prediction of human speech ratings. There were three tracks. The first track was on predicting the quality of 'zoomed-in' high-quality samples from speech synthesis systems. The second track was to predict ratings of samples from singing voice synthesis and voice conversion with a large variety of systems, listeners, and languages. The third track was semi-supervised quality prediction for noisy, clean, and enhanced speech, where a very small amount of labeled training data was provided. Among the eight teams from both academia and industry, we found that many were able to outperform the baseline systems. Successful techniques included retrieval-based methods and the use of non-self-supervised representations like spectrograms and pitch histograms. These results showed that the challenge has advanced the field of subjective speech rating prediction.

DOI： 10.1109/SLT61566.2024.10832295

Scopus
Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data Open Access

Wen-Chin Huang, Yi-Chiao Wu, Tomoki Toda

IEEE Signal Processing Letters Vol. 31 page： 2995 - 2999 2024

　More details

Publishing type：Research paper (scientific journal)

DOI： 10.1109/LSP.2024.3482701

Open Access

Web of Science

Scopus
AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion.

HUANG Wen-Chin, 小林和弘, 小林和弘, 戸田智基

日本音響学会研究発表会講演論文集(CD-ROM) Vol. 2024 2024

　More details

J-GLOBAL

▼display all

To the head of Papers.▲

MISC 1

AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion.

HUANG Wen-Chin, 小林和弘, 小林和弘, 戸田智基

日本音響学会研究発表会講演論文集(CD-ROM) Vol. 2024 2024

　More details

J-GLOBAL

To the head of MISC.▲

Presentations 3

Automatic quality assessment for speech and beyond International conference

Wen-Chin Huang, Erica Cooper, Jiatong Shi

INTERSPEECH 2025.8.17

　More details

Event date： 2025.8

Language：English Presentation type：Public lecture, seminar, tutorial, course, or other speech
Fundamentals, Prospectives and Challenges in Deep-learning based Voice Conversion Invited

HUANG Wen-Chin

Research Center for Information Technology Innovation (CITI), Academia Sinica 2024.8.14

　More details

Presentation type：Public lecture, seminar, tutorial, course, or other speech
Progress and Future Perspectives on Deep-learning based Voice Conversion Invited

HUANG Wen-Chin

2024.10.22

　More details

Language：Japanese Presentation type：Oral presentation (invited, special)

To the head of Presentations.▲

Research Project for Joint Research, Competitive Funding, etc. 3

Universal, Explainable and Extensible Automatic Evaluation of Synthesized Speech

Grant number：25K00143 2025.2 - 2029.3

Grants-in-Aid for Scientific Research Kiban B

Erica Cooper, Wen-Chin Huang

　 More details

Grant amount：\18720000 （ Direct Cost: \14400000 、 Indirect Cost：\4320000 ）
Audiobox Responsible Generation Grant

2024.11

Unrestricted Research Gift

　 More details

Authorship：Principal investigator
Google Research Grant

2024.9

Unrestrcited gift

　 More details

Authorship：Principal investigator

To the head of Research Project for Joint Research, Competitive Funding, etc..▲

KAKENHI (Grants-in-Aid for Scientific Research) 1

Augmented speech communication using multi-modal signals with real-time, low-latency voice conversion

Grant number：21J20920 2021.4 - 2024.3

Japan Society for the Promotion of Science Grants-in-Aid for Scientific Research Grant-in-Aid for JSPS Fellows

To the head of KAKENHI (Grants-in-Aid for Scientific Research).▲

Teaching Experience (On-campus) 2

確率及び統計演習

2024
プログラミング2演習

2024

To the head of Teaching Experience (On-campus).▲