A. 華語語音技術 Speech & Language Technologies for Mandarin Chinese (MC)
1. | Voice Dictation of Mandarin Chinese Lin-shan Lee IEEE Signal Processing Magazine Vol. 14, No. 4, Jul 1997, pp. 63-101 ( 本文為該期專刊中總共兩篇主題論文之一,是一篇完整探討華語語音辨識相關問題及技術之回顧性論文(review paper)。本論文讓我國國旗和中國大陸國旗並排在刊物封面上發行至全球圖書館展示及讀者手中,且我國國旗比中國大陸國旗略大,顯然因為海峽兩岸均說華語,而本文作者在台灣。This is a review paper presenting the problems and approaches in speech recognition for Mandarin Chinese. This is one of the two feature articles of the issue. The national flags of Taiwan (ROC) and China (PRC) were both shown on the cover of the issue sent to readers and libraries worldwide, with that of Taiwan (ROC) slightly larger, obviously because people in both Taiwan and China speak Chinese, but the author of the paper is in Taiwan. ) See more... | IJ-55 |
2. | The Synthesis Rules in A Chinese Text-to-speech System Lin-shan Lee, Chiu-yu Tseng, Ming Ouh-young IEEE Transactions on Acoustics, Speech and Signal Processing Vol. ASSP-37, No. 9, Sep 1989, pp. 1309-1320 ( 世上第一篇期刊論文詳細描述全球第一套華語語音合成技術(「文句翻語音」,Text-to-Speech)是如何研究出來並如何操作。The earliest journal paper in the world presenting all details regarding how the very first set of Chinese text-to-speech synthesis technologies were invented and how they worked. ) See more... | IJ-24 |
3. | An Efficient Natural Language Processing System Specially Designed for the Chinese Language Lin-shan Lee, Lee-feng Chien, Long-ji Lin, James Huang, K. J. Chen Computational Linguistics Vol. 17, No. 4, Dec 1991, pp. 347-374 ( 世上第一篇期刊論文詳細描述全球第一套電腦自動分析華文自然語言之句型結構之技術是如何研究出來並如何操作。The earliest journal paper in the world presenting all details regarding how the very first set of technologies for computer analysis of Chinese natural language sentence structures were invented and how they worked. ) See more... | IJ-37 |
4. | Golden Mandarin (I) - A Real-time Mandarin Speech Dictation Machine for Chinese Language with Very Large Vocabulary Lin-shan Lee, Chiu-yu Tseng, Hung-yan Gu, Fu-hua Liu, C. H. Chang, Y. H. Lin, Yumin Lee, S. L. Tu, S. H. Hsieh, C. H. Chen IEEE Transactions on Speech and Audio Processing Vol. 1, No. 2, Apr 1993, pp. 158-179 ( 詳細描述全球第一台極大詞彙華語語音辨識系統「金聲一號」(Golden Mandarin (I))是如何研究出來並如何操作之期刊論文。The journal paper presenting all details regarding how the very first Chinese speech recognition system with very large vocabulary in the world, Golden Mandarin (I), was invented and how it worked. ) See more... | IJ-41 |
5. | Structural Features of Chinese Language – Why Chinese Spoken Language Processing is Special and Where We Are Lin-shan Lee 1998 International Symposium on Chinese Spoken Language Processing, Singapore Dec 1998, pp. 1-15 keynote speech ( 早年有相當長時間,海峽兩岸學者是不相來往的。為了讓海峽兩岸及全球各地的華語語音研究者可以交流切磋,在各方促成下乃有了「華語語音國際研討會(International Symposium on Chinese Spoken Language Processing, ISCSLP)」,議定由新加坡、中國大陸、台灣、香港四地輪流主辦,兩年一次,1998年第一屆大會在新加坡,兩岸學者在新加坡融冰。第一屆大會的開幕主題演講(Opening Keynote)由首先完成領域內幾乎所有最重要的核心關鍵技術的李教授主講,海峽兩岸及港、新、世界各地所有相關領域的研究團隊之領袖人物及學者、研究者齊聚聆聽,李教授乃自動成為領域內公認的最重要領袖人物。本文就是他當時撰寫的演講內容,主軸是如何考慮華語語言特性,盤點已有技術,並展望未來網路世界的發展趨勢。In early years researchers in Mainland China and those in Taiwan couldn’t communicate. International Symposium on Chinese Spoken Language Processing (ISCSLP) was organized for this purpose, to be rotated and held in Singapore, Mainland China, Taiwan and Hong Kong every two years. The first ISCSLP was held in 1998 in Singapore, as an “Ice Breaking” event. Being the key researcher having successfully developed almost all most important fundamental core technologies in the area, Professor Lee was invited to give the opening keynote at this event. Leaders of almost all research groups working on related areas in Mainland China, Taiwan, Hong Kong, Singapore and all over the world were present and listened to this keynote. Professor Lee was thus naturally recognized as the sole most important leader in this area. This paper was written for that keynote, reviewing the past achievements in different areas and looking forward to the future, primarily focusing on the approaches to handle structural features of Chinese language and considering the network environment. ) See more... | IC-128 |
6. | Complete Recognition of Continuous Mandarin Speech for Chinese Language with Very Large Vocabulary Using Very Limited Training Data Hsin-Min Wang, Tai-Hsuan Ho, Rung-Chiung Yang, Jia-Lin Shen, Bo-Ren Bai, Jenn-Chau Hong, Wei-Peng Chen, Tong-Lo Yu, Lin-shan Lee IEEE Trans. Speech and Audio Processing Vol. 5, No. 2, Mar 1997, pp. 195-200 ( 詳細描述全球第一台可以辨識極大詞彙的連續長句的華語語音辨識系統「金聲三號工作站版」(Golden Mandarin (III) Work Station Version)是如何研究出來並如何操作之期刊論文。The journal paper presenting how the first continuous Chinese speech recognition system for long utterances and very large vocabulary, Golden Mandarin (III) Work Station Version, was invented and how it worked. ) See more... | IJ-53 |
7. | Special Speech Recognition Approaches for the Highly Confusing Mandarin Syllables Based on Hidden Markov Models Lin-shan Lee, Chiu-yu Tseng, Fu-hua Liu, C. H. Chang, Hung-yan Gu, S. H. Hsieh, C. H. Chen Computer Speech and Language Vol. 5, No. 2, Apr 1991, pp. 181-201 ( 詳細討論全球第一台極大詞彙華語語音辨識系統「金聲一號(Golden Mandarin (I))」中,為辨識華語共408個高度混淆的單音所使用的408個「隱藏式馬可夫模型(Hidden Markov Models)」是如何訓練出來的以及所有實驗結果的期刊論文。The journal paper explaining how the 408 Hidden Markov Models for recognizing the 408 highly confusing Mandarin syllables were trained with experimental results for the first Chinese speech recognition system in the world, Golden Mandarin (I). ) See more... | IJ-32 |
8. | Markov Modeling of Mandarin Chinese for Decoding the Phonetic Sequence into Chinese Characters Hung-yan Gu, Chiu-yu Tseng, Lin-shan Lee Computer Speech and Language Vol. 5, No. 4, Oct 1991, pp. 363-377 ( 詳細討論全球第一台極大詞彙華語語音辨識系統「金聲一號(Golden Mandarin (I))」中,為每一個辨識出來的單音,決定使用者所想要輸入的字,究竟是同音字中哪一個,所使用的選字模型的研究,考量及實驗結果的期刊論文。The journal paper describing how the character selection model, which was used for predicting the desired character out of the many homonym characters for the recognized syllable, was developed and considered with experimental results for the first Chinese speech recognition system in the world, Golden Mandarin (I). ) See more... | IJ-35 |
9. | Isolated Mandarin Syllable Recognition Based Upon the Segmental Probability Models (SPM) Ren-Yuan Lyu, I-Chung Hong, Jia-Lin Shen, Ming-Yu Lee, Lin-shan Lee IEEE Trans. Speech and Audio Processing Vol. 6, No. 3, May 1998, pp. 293-299 | IJ-57 |
10. | Automatic Selection of Phonetically Distributed Sentence Sets for Speaker Adaptation with Application to Large Vocabulary Mandarin Speech Recognition Jia-lin Shen, Hsin-min Wang, Ren-yuan Lyu, Lin-shan Lee Computer Speech and Language Vol. 13, No. 1, Jan. 1999, pp. 79-97 | IJ-58 |
11. | A Chinese Text-to-speech System Based on A Syllable Concatenation Model Ming Ouh-young, Chiu-yu Tseng, Lin-shan Lee 1986 International Conference on Acoustic, Speech and Signal Processing, IEEE, Tokyo, Japan Apr 1986, pp. 2439-2442 | IC-24 |
12. | A Chinese Natural Language Processing System Based Upon the Theory of Empty Categories Long-ji Lin, K. J. Chen, James Huang, Lin-shan Lee Fifth National Conference on Artificial Intelligence, AAAI, Philadelphia, PA, U.S.A. Aug 1986, pp. 1059-1062 | IC-26 |
13. | A Mandarin Dictation Machine Based Upon Chinese Natural Language Analysis Lin-shan Lee, Chiu-yu Tseng, K. J. Chen, James Huang The 10th International Joint Conference on Artificial Intelligence, AAAI, Milano, Italy Aug 1987, pp. 619-621 | IC-37 |
14. | A Real-time Mandarin Dictation Machine for Chinese Language with Unlimited Texts and Very Large Vocabulary Lin-shan Lee, Chiu-yu Tseng, Hung-yan Gu, Keh-jiann Chen, Fu-hua Liu, C. H. Chang, S. H. Hsieh, C. H. Chen International Conference on Acoustics, Speech and Signal Processing, IEEE, Albuquerque, NM, U.S.A. Apr 1990, pp. 65-68 | IC-50 |
15. | Golden Mandarin (III)-A User-Adaptive Prosodic-Segment-Based Mandarin Dictation Machine for Chinese Language with Very Large Vocabulary Ren-yuan Lyu, Lee-Feng Chien, Shiao-Hong Hwang, Hung-Yun Hsieh, Rung-Chiuan Yang, Bo-Ren Bai, Jia-Chi Weng, Yen-Ju Yang, Shi-Wei Lin, Keh-Jiann Chen, Chiu-Yu Tseng, Lin-shan Lee International Conference on Acoustics, Speech and Signal Processing, Detroit, U.S.A. May 1995, pp. 57-60 | IC-85 |
16. | Complete Recognition Of Continuous Mandarin Speech for Chinese Language with Very Large Vocabulary But Limited Training Data Hsin-min Wang, Jia-lin Shen, Yen-Ju Yang, Chiu-Yu Tseng, Lin-shan Lee International Conference on Acoustics, Speech and Signal Processing5, Detroit, U.S.A. May 1995, pp. 61-64 | IC-86 |
B. 語音資訊搜尋 Speech Information Retrieval (SR)
17. | Spoken Content Retrieval - Beyond Cascading Speech Recognition with Text Retrieval Lin-shan Lee, James Glass, Hung-yi Lee, Chun-an Chan IEEE/ACM Transactions on Audio, Speech and Language Processing Vol. 23, No. 9, Sep 2015, pp. 1389-1420 ( 這是該期刊的Overview paper。該期刊自2010年才開始有這類論文,必須由跨團隊的作者群針對領域內之一重大問題合作完成以免偏頗,可以比一般論文長很多,每年至多只能有4篇;本篇論文由李教授邀請MIT學者共同撰寫,主題是語音搜尋之新天地。This is an overview paper in the journal. The overview papers were started in 2010 in this journal. It has to be on an important problem in the area, having co-authors from different groups to have a balanced view, can be much longer than a regular paper and at most 4 such papers each year. This paper gives an overview of the new concepts and directions of speech information retrieval, co-authored by a scientist in MIT. ) See more... | IJ-89 |
18. | Spoken Document Understanding and Organization Lin-shan Lee, Berlin Chen IEEE Signal Processing Magazine Vol. 22, No. 5, Sep 2005, pp. 42-60. Special Issue on Speech Technology in Human-machine Communication ( 本期刊為全面探討語音技術之專刊共包括9篇論文,每隔幾年才會有一本這樣的專刊。本文為回顧性論文(review paper),討論為語音資訊建立知識體系,以協助使用者找到想要的資訊的全新思維,並描述台大的實驗系統,其功能為當時全球所僅見。This is one of 9 papers selected by the special issue on speech technology. Usually there is only one such special issue in a few years. This is a review paper presenting the new framework for constructing the semantic structures out of speech information to help users find desired information, including a prototype system developed at National Taiwan University with functionalities first seen in the world at the time of publication. ) See more... | IJ-68 |
19. | Voice-based Information Retrieval - how far are we from the text-based information retrieval? Lin-shan Lee IEEE Workshop on Automatic Speech Recognition and Understanding, Merano, Italy Dec 2009, pp. 26-43 invited paper ( 2009年李教授應邀在國際會議(ASRU)發表主題演講(Keynote Speech)時所撰寫的全文,相當完整描述了當時李教授心目中的「語音版的Google」的藍圖,可見「語音版的Google」的遼闊視野及大架構在當時已相當成型,雖然還有不少精緻技術尚未問世,也還沒有「語音版的Google」的名稱。全文以「語音版的資訊搜尋」和「文字版的資訊搜尋(也就是Google)」的比較為主軸,聚焦搜尋的精確性及方便性兩大問題,並展望未來發展空間。This paper was written for the keynote speech offered in the ASRU workshop by professor Lee, overviewing the concepts and detailed technologies of voice-based information retrieval by comparing it with the very successful text-based counterpart (Google), and focused on the key issues of retrieval accuracy and efficiency. It can be seen the broad view and clear framework of his vision of “ a spoken version of Google” was already in this talk, although some of the advanced technologies and the title of “ a spoken version of Google” didn’t appear yet. ) See more... | IC-235 |
20. | Multi-layered Summarization of Spoken Document Archive by Information Extraction and Semantic Structuring Lin-shan Lee, Sheng-Yi Kong, Yi-Cheng Pan, Yi-Sheng Fu, Yu-Tsun Huang International Conference on Spoken Language Processing, Pittsburgh, U.S.A. Sep 2006 ( 這是國際語音學會(International Speech Communication Association, ISCA)2006年在美國匹茲堡召開年度旗艦大會(Interspeech 2006)時的一個特別專題小組(Special Session),討論的主題是語音資訊摘要(Speech Information Summarization),全球共錄取六篇論文,本篇為其中之一,主題是語音資訊之瞭解及重組(Spoken Document Understanding and Organization)相關之技術及效能評估。李教授口頭發表此論文時,並同時在現場展示在台大所研究完成的電視新聞瀏覽器當時剛完成之最新版本,其諸多功能為當時全球所有語言所僅見。Full paper presenting the technologies and performance evaluation of “ spoken document understanding and organization“; presented in the Special Session on Speech Information Summarization in the Interspeech Conference as 1 out of 6 papers accepted globally; the presentation including a demonstration system developed at National Taiwan University with functionalities not seen elsewhere before. ) See more... | IC-204 |
21. | Voice Access of Global Information for Broadband Wireless: Technologies of Today and Challenges of Tomorrow Lin-shan Lee, Yumin Lee Proceedings of the IEEE Vol. 89, No. 1, Jan. 2001, pp. 41-57 (invited paper) ( Proceedings of the IEEE為IEEE的「總會級」的頂級期刊,不屬任一專業學會,不分專業,只刊登最頂級的論文。本文為其中一本寬頻無線通訊專刊(Special Issue on Broadband Wireless Communications)中的一篇邀請論文(invited paper)。 Invited paper in a Special Issue on Broadband Wireless Communications in the proceedings of IEEE, which is the top journal on the IEEE level, not belonging to any society. ) See more... | IJ-60 |
22. | Spoken Knowledge Organization by Semantic Structuring and a Prototype Course Lecture System for Personalized Learning Hung-yi Lee, Sz-Rung Shiang, Ching-Feng Yeh, Yun-Nung Chen, Yu Huang, Sheng-Yi Kong, Lin-shan Lee IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 22, No. 5, May 2014, pp. 883-898 ( 本期刊論文提出諸多新技術,可為線上課程之錄音錄影建立知識體系及重組數位內容,方便學習者可以搜尋、選擇並找到想學習的相關知識,並以在2009年ICASSP所發表的,在當時獨步全球的,「台大虛擬教師(NTU Virtual Instructor)」為案例說明。This journal paper proposed new technologies for semantic structuring and organizing the recorded online courses using speech technologies for conveniently learning selected knowledge by learners, with a courses lecture system developed at National Taiwan University called “NTU Virtual Instructor” as an example. ) See more... | IJ-84 |
23. | Improved Semantic Retrieval of Spoken Content by Document/Query Expansion with Random Walk over Acoustic Similarity Graphs Hung-yi Lee, Lin-shan Lee IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 22, No. 1, Jan 2014, pp. 80-94 ( 本期刊論文提出可以做到根據語音中的語意做搜尋(而非比對字面)的關鍵技術,這是超越過去傳統的「先辨識,再搜尋」做法的一大方向。This journal paper proposed to retrieve speech information based on semantics (rather then literal matching), which is a key direction to proceed beyond the conventional concept of “recognition first, followed by retrieval.” ) See more... | IJ-83 |
24. | Semantic Analysis and Organization of Spoken Documents Based on Parameters Derived from Latent Topics Sheng-Yi Kong, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 19, No. 7, Sep 2011, pp. 1875-1889 ( 本期刊論文提出兩個基於主題分析所求出的關鍵性參數,可以廣泛應用在語音資訊的語意分析中的各種任務中,包括抽關鍵詞、建立摘要或標題、建構知識體系等;實驗做在華語的電視新聞上,但其應用並不限於華語。This journal paper proposed two key parameters based on topic analysis, which are useful in different tasks of semantic analysis of speech, including key term extraction, title and summary generation and semantic structuring; results not limited to Chinese although tested with data in Chinese. ) See more... | IJ-76 |
25. | Integrating Recognition and Retrieval with Relevance Feedback for Spoken Term Detection Hung-yi Lee, Chia-Ping Chen, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 20, No. 7, Sep 2012, pp. 2095-2110 ( 本期刊論文提出整合「辨識」及「搜尋」兩項任務以求取全域最佳化,而可以做到更佳的語音資訊搜尋的全新概念;這是超越過去傳統的「先辨識,再搜尋」的一大新方向。This journal paper proposed a completely new concept for speech information retrieval: integrating recognition and retrieval and optimizing them jointly as a whole rather than directly cascading them. ) See more... | IJ-79 |
26. | Model-based Unsupervised Spoken Term Detection with Spoken Queries Chun-an Chan, Lin-shan Lee IEEE Transactions on Audio, Speech, and Language Processing Vol. 21, No. 7, Jul 2013, pp. 1330-1342 ( 本期刊論文提出超越過去傳統的「先辨識,再搜尋」的想法的語音資訊搜尋的一大新方向:直接由語料庫中學出基本音的模型,並用這些模型直接比對使用者輸入的聲音指令和語料庫中的聲音,根本不用做辨識而直接比對聲音。This journal paper proposed a completely new concept for unsupervised speech information retrieval without recognition:matching the user spoken queries directly with the sound signals in the dataset based on models automatically learned from the dataset; no need for recognition at all, so everything is achieved in an unsupervised way. ) See more... | IJ-81 |
27. | Enhanced Spoken Term Detection Using Support Vector Machines and Weighted Pseudo Examples Hung-yi Lee, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 21, No. 6, Jun 2013, pp. 1272-1284 ( 本期刊論文提出超越過去傳統的「先辨識,再搜尋」的語音資訊搜尋的一大新方向:充分使用其他找到的語句,並用這些語句建立模型以提昇搜尋的精確度。This journal paper proposed new approaches for spoken term detection by training a support vector machine model using other utterances found in the first retrieval results to be used to improve the retrieval accuracy. This is a completely new direction beyond the conventional concept of “recognition first, followed by retrieval.” ) See more... | IJ-80 |
28. | Interactive Spoken Document Retrieval with Suggested Key Terms Ranked by a Markov Decision Process Yi-Cheng Pan, Hung-yi Lee, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 20, No. 2, Feb 2012, pp. 632-645 ( 本期論文提出「語音版的Google」中的一個關鍵性觀念:使用者輸入指令後,機器用找到的語音資訊建構出「局部性知識體系(Local Semantic Structure)」,在這裡是關鍵詞建成的樹狀結構(或階層式關鍵詞圖),而使用者可以據以用對話方式和幾器在這棵樹上互動,找到所需資訊。This journal paper proposed a key concept for “a spoken version of Google” : a key term hierarchy is constructed based on the speech information found from a user query, and the user can then interact with the machine over this hierarchy via dialogues to find the desired information. ) See more... | IJ-77 |
29. | Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection Cheng-Tao Chung, Cheng-Yu Tsai, Chia-Hsiang Liu, Lin-shan Lee IEEE/ACM Transactions on Audio, Speech and Language Processing Vol. 23, No. 10, Oct 2017, pp.1914-1928 | IJ-91 |
30. | Unsupervised Discovery of Structured Acoustic Tokens with Applications to Spoken Term Detection Cheng-Tao Chung, Lin-shan Lee IEEE/ACM Transactions on Audio, Speech and Language Processing Vol. 26, No. 2, Feb 2018, pp.394-405 | IJ-92 |
31. | Performance Analysis for Lattice-Based Speech Indexing Approaches Using Words and Subword Units Yi-Cheng Pan, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 18, No. 6, Aug 2010, pp. 1562-1574 ( 本期刊論文為各種「詞圖(Word Graph)」上的搜尋演算法做了最早、最完整的分析,包括探討其搜尋正確性及所需計算資源間之折衝,並提出用「次詞圖(Subword graph)」以克服「詞典外詞彙(out-of-vocabulary )」的難題。This journal paper offered the first complete analysis on a whole set of speech indexing approaches over word graphs considering the tradeoffs between retrieval accuracy and computation requirements, and proposed to use subword graphs to handle the out-of-vocabulary word problems. ) See more... | IJ-75 |
32. | Discriminating Capabilities of Syllable-based Features and Approaches of Utilizing Them for Voice Retrieval of Speech Information in Mandarin Chinese Berlin Chen, Hsin-Min Wang, Lin-shan Lee IEEE Transactions on Speech and Audio Processing Vol. 10, No. 5, Jul 2002, pp. 303-314 ( 本文為最早的詳盡完整的期刊論文,指出華語語音資訊搜尋時,基於單音所發展出來的搜尋特徵是極為有效的。This paper offered the earliest analysis verifying the syllable-based indexing features are very useful for Chinese speech information retrieval. ) See more... | IJ-64 |
33. | SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering Yung-Sung Chuang, Chi-Liang Liu, Hung-yi Lee, Lin-shan Lee Interspeech, virtual conference due to Covid-19 Oct. 2020, pp. 4168-4172 | IC-323 |
34. | Towards Machine Comprehension of Spoken Content: Initial TOEFL Listening Comprehension Test by Machine Bo-Hsiang Tseng, Sheng-Syun Shen, Hung-yi Lee, Lin-shan Lee Interspeech, San Francisco, U.S.A. Sept 2016, pp. 2731-2735 | IC-302 |
35. | Structuring Lectures in Massive Open Online Courses (MOOCs) for Efficient Learning by Linking Similar Sections and Predicting Prerequisites Sheng-syun Shen, Hung-yi Lee, Shang-wen Li, Victor Zue, Lin-shan Lee Interspeech, Dresden, Germany Sept 2015, pp. 1363-1367 | IC-295 |
36. | Semantic Retrieval of Personal Photos using Matrix Factorization and Two-layer Random Walk Fusing Sparse Speech Annotations with Visual Features Yuan-ming Liou, Yi-sheng Fu, Hung-yi Lee, Lin-shan Lee Interspeech, Singapore Sept 2014, pp. 1762-1766 | IC-291 |
37. | Learning on Demand-Course Lecture Distillation by Information Extraction and Semantic Structuring for Spoken Documents Sheng-Yi Kong, Miao-Ru Wu, Che-Kuang Lin, Yi-Sheng Fu, Lin-shan Lee International Conference on Acoustics, Speech and Signal Processing, Taipei, Taiwan, Republic of China Apr 2009, pp. 4709-4712 | IC-232 |
38. | Unsupervised Discovery of Linguistic Structure Including Two-level Acoustic Patterns Using Three Cascaded Stages of Iterative Optimization Cheng-Tao Chung, Chun-an Chan, Lin-shan Lee International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada May 2013, pp. 8081-8085 | IC-278 |
39. | Unsupervised Spoken Term Detection with Spoken Queries by Multi-Level Acoustic Patterns with Varying Model Granularity Cheng-Tao Chung, Chun-an Chan, Lin-shan Lee IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy May 2014, pp. 7864-7868 | IC-287 |
40. | Towards Unsupervised Semantic Retrieval of Spoken Content with Query Expansion Based on Automatically Discovered Acoustic Patterns Yun-Chiao Li, Hung-yi Lee, Cheng-Tao Chung, Chun-an Chan, Lin-shan Lee IEEE Automatic Speech Recognition and Understanding Workshop, Olomouc, Czech Dec 2013, pp. 198-203 | IC-285 |
41. | Enhancing Query Expansion for Semantic Retrieval of Spoken Content with Automatically Discovered Acoustic Patterns Hung-yi Lee, Yun-Chiao Li, Cheng-Tao Chung, Lin-shan Lee International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada May 2013, pp. 8297-8301 | IC-274 |
42. | Supervised Spoken Document Summarization Jointly Considering Utterance Importance and Redundancy by Structured Support Vector Machine Hung-yi Lee, Yu-yu Chou, Yow-Bang Wang, Lin-shan Lee Interspeech, Portland, U.S.A. Sep 2012, pp. 2342-2345 | IC-266 |
43. | Automatic Key Term Extraction From Spoken Course Lectures Using Branching Entropy and Prosodic/Semantic Features un-Nung Chen, Yu Huang, Sheng-Yi Kong, Lin-shan Lee IEEE Workshop on Spoken Language Technology, Berkeley, California, U.S.A. Dec 2010, pp. 265-270 | IC-246 |
44. | Interactive Spoken Content Retrieval by Extended Query Model and Continuous State Space Markov Decision Process Tsung-Hsien Wen, Hung-yi Lee, Pei-Hao Su, Lin-shan Lee International Conference on Acoustics, Speech and Signal Processing, Vancouver, Canada May 2013, pp. 8510-8514 | IC-276 |
45. | A Multi-Modal Dialogue System for Information Navigation and Retrieval across Spoken Document Archives with Topic Hierarchies Yi-Cheng Pan, Chien-Chih Wang, Ya-Chao Hsieh, Te-Hsuan Lee, Yen-shin Lee, Yi-Sheng Fu, Yu-Tsun Huang, Lin-shan Lee Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop, San Juan Nov 2005 | IC-194 |
C. 多元語音技術 Blooming Speech Technologies (BT)
46. | Towards Structured Deep Neural Network for Automatic Speech Recognition Yi-Hsiu Liao, Hung-yi Lee, Lin-shan Lee IEEE Automatic Speech Recognition and Understanding Workshop, Scottsdale, Arizona, U.S.A. Dec 2015, pp. 137-144 | IC-297 |
47. | An Initial Attempt for Phoneme Recognition Using Structured Support Vector Machine (SVM) Hao Tang, Chao-Hong Meng, Lin-shan Lee International Conference on Acoustics, Speech and Signal Processing, Dallas, Texas, U.S.A. Mar 2010, pp. 4926-4929 | IC-237 |
48. | Pronunciation Modeling with Reduced Confusion for Mandarin Chinese Using A Three-stage Framework Ming-Yi Tsai, Fu-Chiang Chou, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 15, No. 2, Feb 2007, pp. 661-675 ( 在流利自發的語音(Spontaneous Speech)中,人不見得會照詞典所列的音標發音,造成辨識困難。本論文提出一整套技術,可藉助一套語料庫建立發音變異(Pronunciation Variation)的模型來提昇辨識效果。This paper proposed a new framework to model pronunciation variation in spontaneous Mandarin speech for better recognition performance. ) See more... | IJ-71 |
49. | Improved Features and Models for Detecting Edit Disfluencies in Transcribing Spontaneous Mandarin Speech Che-Kuang Lin, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 17, No. 7, Sep 2009, pp. 1263-1278 ( 人類日常語言中自動有非常多不流利(Disfluency)現象,如中斷、插入「啊」、「嗯」等,猶疑或修正等,造成辨識困難。本文是第一篇完整探討如何處理華語中這些問題而能更正確地辨識語音的期刊論文。This paper was the first journal paper proposing a whole set of features and models for handling the very difficult disfluency problem in spontaneous Chinese speech. ) See more... | IJ-74 |
50. | Higher Order Cepstral Moment Normalization for Improved Robust Speech Recognition Chang-wen Hsu, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 17, No. 2, Feb 2009, pp. 205-220 ( 這是一篇極富創意並有特色的期刊論文,討論背景雜訊干擾下的語音辨識,實驗是做在國際通用的西方語言的語料庫上。由於過去通常是對信號特徵做一階、二階的正規化(Normalization)效果不錯,本文提出可以做到任意階的演算法,並證明越高階效果越好。This paper proposed the new family of higher order cepstrum moment normalization techniques for robust speech recognition under noisy environment, and showed better performance can be achieved with higher order normalization. ) See more... | IJ-73 |
51. | Optimization of Temporal Filters for Constructing Robust Features in Speech Recognition Jeih-Weih Hung, Lin-shan Lee IEEE Transactions on Speech and Audio Processing Vol. 14, No. 3, May 2006, pp. 808-832 | IJ-69 |
52. | Histogram-Based Quantization for Robust and/or Distributed Speech Recognition Chia-Yu Wan, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 16, No. 4, May 2008, pp. 859-873 | IJ-72 |
53. | Segmental Eigenvoice with Delicate Eigenspace for Improved Speaker Adaptation Yu Tsao, Shang-Ming Lee, Lin-shan Lee IEEE Transactions on Speech and Audio Processing Vol. 13, No. 3, May 2005, pp. 399-411 | IJ-67 |
54. | An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification Ching-Feng Yeh, Lin-shan Lee IEEE/ACM Transactions on Audio, Speech and Language Processing Vol. 23, No. 7, Jul 2015, pp. 1144-1159 ( 雙語夾雜的語音辨識有相當時間不曾被人注意過,琳山老師是首先發表論文指出此一問題並提出解決方案的學者,後來才有世界各地的國際團隊跟進。本論文是一篇完整描述中英雙語夾雜的問題及做法的期刊論文,以琳山老師自己上課錄音為素材,其中有15%是英文。This paper proposed a completely new framework for recognizing Mandarin-English bilingual code-switched speech based on professor Lee’s lecture recordings with 15% of English. Professor Lee was the first in the world who pointed out this problem and proposed solutions. Other teams worldwide then followed. ) See more... | IJ-88 |
55. | A Set of Corpus-based Text-to-speech Synthesis Technologies for Mandarin Chinese Fu-Chiang Chou, Chiu-Yu Tseng, Lin-shan Lee IEEE Transactions on Speech and Audio Processing Vol. 10, No. 7, Oct 2002, pp. 481-494 | IJ-65 |
56. | Computer-aided Analysis and Design for Spoken Dialogue Systems Based on Quantitative Simulations Bor-shen Lin, Lin-shan Lee IEEE Transactions on Speech and Audio Processing Vol. 9, No. 5, Jul 2001, pp. 534-548 | IJ-61 |
57. | A Recursive Dialogue Game for Personalized Computer-Aided Pronunciation Training Pei-hao Su, Chuan-hsun Wu, Lin-shan Lee IEEE/ACM Transactions on Audio, Speech and Language Processing Vol. 23, No. 1, Jan 2015, pp. 127-141 ( 本論文提出極具創意的想法:讓非母語的學習者學習語言的對話遊戲;學習者可以和機器在玩對話遊戲之中,學會母語以外的語言。This paper proposed a very novel concept : computer assisted language learning using dialogue games, so the learner can learn a language by playing a dialogue game. ) See more... | IJ-86 |
58. | Supervised Detection and Unsupervised Discovery of Pronunciation Error Patterns for Computer-Assisted Language Learning Yow-Bang Wang, Lin-shan Lee IEEE/ACM Transactions on Audio, Speech and Language Processing Vol. 23, No. 3, Mar 2015, pp. 564-579 ( 本論文提出極具創意的想法:讓機器由非母 語的語言學習者的錄音中,直接自動學出學習者發音的「錯誤類型(Error Patterns)」,也就是學習者易於說錯的音;有了這些「錯誤類型」,機器也可自動偵錯,抓到學習者說錯的地方。This paper proposed a very novel concept : let the machine automatically discover the pronunciation error patterns for non-native learners from a dataset, and detect these error patterns for computer-assisted language learning. ) See more... | IJ-87 |
59. | A Perceptually Constrained GSVD-based Approach for Enhancing Speech Corrupted by Colored Noise Gwo-Hwa Ju, Lin-shan Lee IEEE Transactions on Audio, Speech and Language Processing Vol. 15, No. 1, Jan 2007, pp. 119-134 | IJ-70 |
60. | Pronunciation Variation Analysis Based on Acoustic and Phonemic Distance Measures with Application Examples on Mandarin Chinese Ming-Yi Tsai, Lin-shan Lee IEEE 8th Automatic Speech Recognition and Understanding Workshop, St. Thomas, US Virgin Islands, U.S.A. Dec 2003, pp. 117-122 | IC-177 |
61. | Improved Spontaneous Mandarin Speech Recognition by Disfluency Interruption Point (IP) Detection Using Prosodic Features Che-Kuang Lin, Lin-shan Lee European Conference on Speech Communication and Technology, Lisbon Sep 2005, pp. 1621-1624 | IC-191 |
62. | Bilingual Acoustic Modeling with State Mapping and Three-Stage Adaptation for Transcribing Unbalanced Code-Mixed Lectures Ching-Feng Yeh, Liang-Che Sun, Chao-Yu Huang, Lin-shan Lee International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic May 2011, pp. 5020-5023 | IC-248 |
63. | Personalized Speech Recognizer with Keyword-based Personalized Lexicon and Language Model using Word Vector Representations Ching-Feng Yeh, Yuan-ming Liou, Hung-yi Lee, Lin-shan Lee Interspeech, Dresden, Germany Sept 2015, pp. 3521-3525 | IC-296 |
64. | Feature Analysis for Emotion Recognition from Mandarin Speech Considering the Special Characteristics of Chinese Language Yi-Hao Kao, Lin-shan Lee International Conference on Spoken Language Processing, Pittsburgh, U.S.A. Sep 2006 | IC-205 |
65. | A Syllable-Based Chinese Spoken Dialogue System for Telephone Directory Services Primarily Trained with A Corpus Yen-Ju Yang, Lin-shan Lee 1998 International Conference on Spoken Language Processing, Sydney, Australia Nov 1998 | IC-123 |
D. 深度學習 Deep Learning (DL)
66. | Completely Unsupervised Phoneme Recognition by Adversarially Learning Mapping Relationships from Audio Embeddings Da-Rong Liu, Kuan-Yu Chen, Hung-yi Lee, Lin-shan Lee Interspeech, Hyderabad, India Sept. 2018, pp. 3748-3752 | IC-313 |
67. | Completely Unsupervised Phoneme Recognition by a Generative Adversarial Network Harmonized with Iteratively Refined Hidden Markov Models Kuan-Yu Chen, Che-Ping Tsai, Da-Rong Liu, Hung-yi Lee, Lin-shan Lee Interspeech, Graz, Austria Sept. 2019, pp. 1856-1860 | IC-319 |
68. | Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning Alexander H. Liu, Tao Tu, Hung-yi Lee, Lin-shan Lee IEEE International Conference on Acoustics, Speech and Signal Processing, virtual conference due to Covid-19 May. 2020, pp. 7259-7263 | IC-321 |
69. | Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model Alexander H. Liu, Hung-yi Lee, Lin-shan Lee IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK May 2019, pp. 6176-6180 | IC-316 |
70. | Sequence-to-Sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding Alexander H. Liu, Tzu-Wei Sung, Shun-Po Chuang, Hung-yi Lee, Lin-shan Lee IEEE International Conference on Acoustics, Speech and Signal Processing, virtual conference due to Covid-19 May. 2020, pp. 7879-7883 | IC-322 |
71. | Towards Lifelong Learning of End-to-end ASR Heng-Jui Chang, Hung-yi Lee, Lin-shan Lee accepted by Interspeech, virtual conference due to Covid-19 Aug. - Sept. 2021 | IC-327 |
72. | AudioWord2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-yi Lee, Lin-shan Lee Interspeech, San Francisco, U.S.A. Sept 2016, pp. 765-769 | IC-300 |
73. | Phonetic-and-Semantic Embedding of Spoken Words with Applications in Spoken Content Retrieval Yi-Chen Chen, Sung-Feng Huang, Chia-Hao Shen, Hung-yi Lee, Lin-shan Lee IEEE Workshop on Spoken Language Technology (SLT), Athens, Greece Dec. 2018, pp. 941-948 | IC-315 |
74. | Multi-target Voice Conversion without Parallel Data by Adversarially Learning Disentangled Audio Representations Ju-chieh Chou, Cheng-chieh Yeh, Hung-yi Lee, Lin-shan Lee Interspeech, Hyderabad, India Sept. 2018, pp. 501-505 | IC-312 |
75. | Rhythm-Flexible Voice Conversion without Parallel Data Using Cycle-Gan over Phoneme Posteriorgram Sequences Cheng-chieh Yeh, Po-chun Hsu, Ju-chieh Chou, Hung-yi Lee, Lin-shan Lee IEEE Workshop on Spoken Language Technology (SLT), Athens, Greece Dec. 2018, pp. 274-281 | IC-314 |
76. | FragmentVC: Any-to-Any Voice Conversion by End-to-End Extracting and Fusing Fine-Grained Voice Fragments With Attention Yist Y. Lin, Chung-Ming Chien, Jheng-Hao Lin, Hung-yi Lee, Lin-shan Lee IEEE International Conference on Acoustics, Speech and Signal Processing, virtual conference due to Covid-19 Jun. 2021, pp. 5939-5943 | IC-326 |
77. | Defending Your Voice: Adversarial Attack on Voice Conversion Chien-yu Huang, Yist Y. Lin, Hung-yi Lee, Lin-shan Lee IEEE Workshop on Spoken Language Technology (SLT), virtual conference due to Covid-19 Jan. 2021, pp.552-559. | IC-324 |
E. 早期電信研究 Early Contributions in Communications (EC)
78. | An Exact Performance Analysis of the Clipped Diversity Combining Receiver for FH/MFSK Systems Against A Band Multitone Jammer Jinn-Ja Chang, Lin-shan Lee IEEE Transactions on Communications Vol. 42, No. 2/3/4, Feb/Mar/Apr 1994, pp. 700-710 ( 早年在散譜通信(Spread Spectrum)領域之研究作品舉例Typical example for work on spread spectrum ) | IJ-46 |
79. | Multi-h Phase Coded Modulations with Asymmetric Modulation Indices Hong-Kuang Hwang, Lin-shan Lee, Sin-Horng Chen IEEE Journal on Selected Areas in Communications Vol. SAC-7, No. 9, Dec 1989 Special Issue on Bandwidth and Power Efficient Coded Modulations, pp. 1450-1461 ( 早年在調變(Modulation)技術領域之研究作品舉例Typical example for work on modulation techniques ) | IJ-25 |
80. | Minimum Likelihood - A New Concept for Symbol Synchronization Jung-Hui Chiu, Lin-shan Lee IEEE Transactions on Communications Vol. COM-35, No. 5, May 1987, pp. 545-549 ( 早年在同步(Synchronization)問題上之研究作品舉例Typical example for work on Synchronization ) | IJ-18 |
81. | A General Theory for Asynchronous Speech Encryption Techniques Lin-shan Lee, Ger-chih Chou IEEE Journal on Selected Areas in Communications Vol. SAC-4, No. 2, Mar. 1986 Special Issue on Military Communications, pp. 280-287 ( 早年在通信安全性(Communication security)問題上之研究作品舉例Typical example for work on Communication security ) | IJ-17 |
F. 工程教育 Engineering Education (EE)
82. | "Taiwan : Meeting the New Challenges" in "Special Issue on Engineering Education - A Global View" Lin-shan Lee IEEE Communications Magazine Vol. 30, No. 11, Nov 1992. pp. 18-26 ( 這本刊物在當時是IEEE相關領域刊物中少數訂閱數最高者之一(當時沒有citation統計),這一期是「工程教育專刊——全球觀點(Special Issue on Engineering Education- A Global View)」,共邀請全球9篇論文;其中2篇來自亞洲(另一篇是日本的),其他均來自歐美,包括一篇巴西。當年李教授已自台大資工系主任卸任而擔任中研院資訊所所長,並擔任IEEE Communications Society 的亞太區主席,被認為是台灣工程教育界的代表性人物,因而受邀撰寫其中一篇;本文也被列為全本刊物中第一篇,甚受矚目。This magazine enjoyed the highest readership in related area in IEEE in those years (no citation in those days). This paper was one out of the 9 globally invited papers in this special issue on Engineering Education. Another paper from Japan, all others from Europe and America. In 1992 professor Lee served as the director of Institute of Information Science in America Sinica after his term as the head of Dept of Computer Science of National Taiwan University, and served as the regional chair for Asia Pacific of IEEE Communications Society. This paper appeared as the first paper in the issue, thus highly visible. ) See more... | IJ-39 |