让机器听懂世界，触及人类梦想还有多远

陈孝良

doi:10.3981/j.issn.1000-7857.2018.03.004

科技导报 >

2018 , Vol. 36 >Issue 3: 36 - 40

DOI: https://doi.org/10.3981/j.issn.1000-7857.2018.03.004

科技纵横

让机器听懂世界，触及人类梦想还有多远

陈孝良

展开

中国科学院声学研究所, 北京 100190

陈孝良,副研究员,研究方向为声视频融合,电子信箱:cxl@mail.ioa.ac.cn

收稿日期: 2017-11-22

修回日期: 2018-02-03

网络出版日期: 2018-03-01

收起

Making the machine understand the human world

CHEN Xiaoliang

Expand

Institute of Acoustics, Chinese Academy of Sciences, Beijing 100190, China

Received date: 2017-11-22

Revised date: 2018-02-03

Online published: 2018-03-01

Fold

摘要

语言交互能力是人类认知发展、终身学习的基础，这为人类开启了智慧之门。人工智能时代，语言交互也将是人类和机器之间表达思想、交流知识、相互沟通的重要工具，这就需要让机器听懂复杂场景下的人类语言并且适应人类几千年进化形成的远场语音交互习惯，从而让机器真正认知人类世界，为机器产生类人智能提供一种参考。

关键词： 麦克风阵列; 语音识别; 自然语言理解; 远场语音交互

本文引用格式

陈孝良 . 让机器听懂世界，触及人类梦想还有多远[J]. 科技导报, 2018 , 36(3) : 36 -40 . DOI: 10.3981/j.issn.1000-7857.2018.03.004

Abstract

The ability of language is a basis of human cognitive development and lifelong learning, which opens the door for human wisdom. In the era of artificial intelligence, language is also an indispensible tool for the machine to express ideas, exchange knowledge and communicate with human world. The key to make the machine truly recognize the human world is to let the machine not only understand human language in complex scenarios but also adapt to the far-field voice interaction habits that have been formed by human evolution for thousands of years. This article hopes to provide a reference for development of machines with human intelligence.

Key words： microphone array; automation speech recognition; natural language processing; far-field speech interaction

参考文献

[1] Jackson H, Stockwell P. An introduction to the nature and functions of language[M]. New York & London:Continuum International Publishing Group, 2010.
[2] Jurafsky D, Martin J H. Speech and language processing:An introduction to natural language processing, computational linguistics, and speech recognition[J]. 2000, 36(23):161-187.
[3] Keshet J, Bengio S. Automatic speech and speaker recognition:Large margin and kernel methods[M]. West Susse:Wiley, 2009.
[4] Huang X, Acero A, Hon H W. Spoken language processing:A guide to theory, algorithm, and system development[M]. New Jersey:Prentice Hall, 2001.
[5] Rabiner L, Juang B H. Fundamentals of speech recognition[M]. Beijing:Tsinghua University Press, 1999.
[6] Jurafsky D, Martin J H. Speech and language processing:An introduction to natural language processing, computational linguistics, and speech recognition[J]. 2000, 36(23):161-187.
[7] Li D, Dong Y. Deep learning:Methods and applications[J]. Foundations and Trends^® in sigal processing, 2014,7(3-4).
[8] Angus J, Howard D. Acoustics and Psychoacoustics, 3rd edition[J]. Elsevier Ltd Oxford, 2016, 54:365-436.
[9] 程建春. 声学原理[M]. 北京:科学出版社, 2012. Cheng Jianchun. Acoustics principle[M]. Beijing:Science Press, 2012.
[10] Everest F A, Pohlmann K C. Master handbook of acoustics[M]. New York:McGraw-Hill, 2001.
[11] Sabour S, Frosst N, Hinton G E. Dynamic routing between capsules[J]. Neural Information Processing Systems, 2017.
[12] Beranek L L, Mellow T J. Acoustics:Sound fields and transducers[M]. Oxford& Waltham:Elsevier, 2012:449-479.
[13] Ma G, Yang M, Sheng P, et al. Acoustic metamaterial with simultaneously negative effective mass density and bulk modulus:US, US 8857564 B2[P]. 2014.
[14] Greif S, Zsebök S, Schmieder D, et al. Acoustic mirrors as sensory traps for bats[J]. Science, 2017, 357(6355):1045.
[15] Jordan M I, Mitchell T M. Machine learning:Trends, perspectives, and prospects[J]. Science, 2015, 349(6245):255-260.

Options

文章导航

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计