AlphaGo的突破与兵棋推演的挑战

胡晓峰; 贺筱媛; 陶九阳

doi:10.3981/j.issn.1000-7857.2017.21.006

科技导报 >

2017 , Vol. 35 >Issue 21: 49 - 60

DOI: https://doi.org/10.3981/j.issn.1000-7857.2017.21.006

研究论文

AlphaGo的突破与兵棋推演的挑战

胡晓峰 ,
贺筱媛 ,
陶九阳

展开

1. 国防大学信息作战与指挥训练教研部, 北京 100091;
2. 陆军工程大学指挥信息系统学院, 南京 210007

胡晓峰,教授,研究方向为战争模拟、军事运筹和军事信息系统工程,电子信箱:xfhu@vip.sina.com

收稿日期: 2016-09-06

修回日期: 2017-06-18

网络出版日期: 2017-11-16

基金资助

军民共用重大研究计划联合基金项目（U1435218）；国家自然科学基金项目（61174156，61273189，61174035，61374179，61403400，61403401）

收起

AlphaGo's breakthrough and challenges of wargaming

HU Xiaofeng ,
HE Xiaoyuan ,
TAO Jiuyang

Expand

1. Department of Information Operation & Command Training, National Defense University, Beijing 100091, China;
2. College of Command Information Systems, Army Engineering University, Nanjing 210007, China

Received date: 2016-09-06

Revised date: 2017-06-18

Online published: 2017-11-16

Fold

摘要

概述了AlphaGo的原理、方法创新、技术突破和在认识论上的意义。分析了兵棋推演面临的瓶颈，指出了作战智能态势认知是亟需突破的关键环节。提出了解决作战态势智能认知的实现途径。展望了"人机智能"为兵棋推演带来的新机遇。

关键词： AlphaGo; 深度学习; 兵棋推演; 态势认知

本文引用格式

胡晓峰 , 贺筱媛 , 陶九阳 . AlphaGo的突破与兵棋推演的挑战[J]. 科技导报, 2017 , 35(21) : 49 -60 . DOI: 10.3981/j.issn.1000-7857.2017.21.006

Abstract

This paper summarizes the principles, new methods, technological breakthrough, and the epistemological sense of AlphaGo. Then the bottleneck of intelligent wargaming is analyzed, and the significance of intelligent situation awareness in wargaming is addressed. Next, the way to realize situation awareness in operations is proposed. Finally, new challenges of man-machine intelligence for wargaming are discussed.

Key words： AlphaGo; deep learning; wargaming; situation awareness

参考文献

[1] Campbell M, Jr A J H, Hsu F H. Deep Blue[J]. Artificial Intelligence, 2002, 134(1/2):57-83.
[2] Silver D, Huang A. Mastering the game of go with deep neural net-works and tree search[J]. Nature, 2016(529):484-489.
[3] Allis L V. Searching for solutions in games and artifcial intelligence[D]. Maastricht Netherlands:University Limburg, 1994.
[4] Newnan M E J. The structure and function of complex networks[J]. Si-am Review, 2006, 45(2):167-256.
[5] Wang F Y, Zhang J J, Zheng X H. Where does AlphaGo go:From church turing thesis to AlphaGo thesis and beyond[J]. IEEE/CAAJour-nal of Automatica Sinica, 2016, 3(2):113-120.
[6] Hopfield J J. Neural networks and physical systems with emergent col-lective computational abilities[J]. Proceedings of the National Academy of Sciences. 1982,79(8):2554-2558.
[7] Andrade M A, Chacón P, Merelo J J, et al. Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsu-pervised learning neural network[J]. Protein Engineering, 1993, 6(4):383-390.
[8] Anguita D, Gomes B A. Mixing floating and fixed-point formats for neu-ral network learning on neuroprocessors[J]. Microprocessing & Micropro-gramming, 1996, 41(10):757-769.
[9] Hinton G E, Osindero S, Teh Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[10] Schölkopf B, Platt J, Hofmann T. Greedy layer-wise training of deep networks[C]//International Conference on Neural Information Process-ing Systems. Cambridge:MIT Press, 2006:153-160.
[11] Arel I, Rose D C, Karnowski T P. Deep machine learning:A new fron-tier in artificial intelligence research[J]. IEEE Computational Intelli-gence Magazine, 2010, 5(4):13-18.
[12] Lecun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[13] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems. South Lake Tahoe, Nevada, USA:Curran Associates Inc, 2012:1097-1105.
[14] Jaderberg M, Simonyan K, Vedaldi A, et al. Reading Text in the Wild with Convolutional Neural Networks[J]. International Journal of Com-puter Vision, 2016, 116(1):1-20.
[15] Clark C, Storkey A. Teaching deep convolutional neural networks to play go[J]. Eprint Arxiv, 2015:1766-1774.
[16] Tesauro G, Galperin G R. On-line policy improvement using montecarlo search[J]. Advances in Neural Information Processing Systems, 1996(9):1068-1074.
[17] Browne C B, Powley E, Whitehouse D, et al. A survey of monte carlo tree search methods[J]. IEEE Transactions on Computational Intelli-gence & Ai in Games, 2012, 4(1):1-43.
[18] 陶九阳, 吴琳, 胡晓峰. AlphaGo技术原理分析及人工智能军事应用展望[J]. 指挥与控制学报, 2016, 2(2):114-120. Tao Jiuyang, Wu Lin, Hu Xiaofeng. Principle analysis on AlphaGo and perspectives in military application of artificial intelligence[J]. Journal of Command and Control, 2016, 2(2):114-120.
[19] Sutton R, Barto A. Reinforcement learning:An introduction[M]. Massa-chusetts:MIT Press, 1998.
[20] Kimura H, Miyazaki K, Kobayashi S. Reinforcement learning in POM-DPs with function approximation[C]//Fourteenth International Confer-ence on Machine Learning. Sydney:Morgan Kaufmann Publishers Inc. 1997:152-160.
[21] Hasselt H V, Guez A, Silver D. Deep reinforcement learning with dou-ble Q-learning[C]//Proceedings of the 30th AAAI Conference on Artifi-cial Intelligence. Phoenix, Arizona USA:American Institute of Aero-nautics and Astronautics, 2016.
[22] Alberts D S. The agility advantage:A survival guide for complex enter-prises and endeavors[M]. Washington:CCRP, 2011:23-71.
[23] Peter P P. The art of wargaming[M]. Annapolis Maryland:Naval Insti-tute Press, 1990:1-21.
[24] Peter P P. Ed Mcgrady. Why wargaming works[J]. Rhode Island:Na-val War College Review, 2011, 64(3):111-133.
[25] Blank Dennis. Military wargaming:A commercial battlefield[J]. Jane's Defence Weekly, 2004(2):5-9.
[26] Rose D. Designing a system on system wargame[R]. Ohio:US Air Force Research Laboratory, 2006.
[27] 胡晓峰, 司光亚, 吴琳, 等. 战争模拟原理与系统[M]. 北京:国防大学出版社, 2009. Hu Xiaofeng, Si Guangya, Wu Lin, et al. War gaming & simulation principle and system[M]. Beijing:National Defense University Press, 2009.
[28] Caffrey J, Mattew B. Intelligent computing and wargaming[C]//The In-ternational Society for Optical Engineering Orlando, Florida:The Soci-ety of Photo-Optical Instrumentation Engineers, 2011:5-9.
[29] Musman S, Temin A. Evaluating the impact of cyber attacks on mis-sions[J]. M&S Journal, 2013(7):25-36.
[30] Endsley M. Toward a theory of situation awareness in dynamic systems[J]. Human Factors, 1995, 37(1):35-64.
[31] Oosthuizen R, Pretorius L. System dynamics modelling of situation awareness[C]//Military Communications and Information Systems Con-ference. Piscataway, NJ:IEEE, 2015:1-6.
[32] Tadda G, Salerno J J. Realizing situation awareness within a cyber en-vironment[J]. Proceedings of Spie, 2006, Doi:10.1117/12.665763.
[33] Sutton R S, Barto A G. Reinforcement learning:An introduction[J]. IEEE Transactions on Neural Networks, 2005, 16(1):285-286.
[34] Kimura H, Miyazaki K, Kobayashi S. Reinforcement learning in POM-DPs with function approximation[C]//Fourteenth International Confer-ence on Machine Learning. Netherlands:Morgan Kaufmann Publish-ers Inc. 1997:152-160.
[35] Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[36] Bellemare M G, Veness J, Investigating contingency awareness using Atari 2600 games[C]//AAAI Conference on Artificial Intelligence. Washington:AAAI, 2013:864-871.
[37] Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep rein-forcement learning[J]. Computer Science, 2013, arXiv:1312.5602.
[38] Ji S, Xu W, Yang M, et al. 3D convolutional neural networks for hu-man action Recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(1):221-231.
[39] Jayanth Koushik. Understanding convolutional neural networks[J]. Computer Science, 2016, arXiv:1605.09081v1.
[40] Pan S, Yang Q. A survey on transfer learning[J]. Knowledge and Data Engineering, IEEE Transactions on, 2010, 22(10):1345-1359.
[41] Lake B M, Salakhutdinov R, Tenenbaum J B. Human-level concept learning through probabilistic program induction[J]. Science, 2015, 350(6266):1332-1338.
[42] Assael J A M, Wang Z, Shahriari B, et al. Heteroscedastic treed bayes-ian optimisation[J]. Computer Science, 2014, arXiv:1410.7172.
[43] Heinrich J, Silver D. Deep reinforcement learning from self-play in imperfect-information games[J]. Computer Science, 2016, arXiv:1603. 01121v1.
[44] Iii T J L, Epelman M A, Smith R L. A fictitious play approach to large-scale optimization[J]. Operations Research, 2003, 53(3):477-489.
[45] Ponsen M, De Jong S, Lanctot M. Computing approximate Nash equi-libria and robust best-responses using sampling[J]. Journal of Artifi-cial Intelligence Research, 2011, 42(1):575-605.
[46] Ernest N, Carroll D, Schumacher C, et al. Genetic fuzzy based artifi-cial intelligence for unmanned combat aerial[J]. Journal of Defense Management, 2016, 6(1):1-7.
[47] Cordon O. A historical review of evolutionary learning methods for Mamdani-type fuzzy rule-based systems:Designing interpretable ge-netic fuzzy systems[J]. International Journal of Approximate Reason-ing, 2011(52):894-913.

Options

文章导航

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献

联系我们

访问统计