专题:大数据战略

开源软件中的大数据管理技术

  • 江天 ,
  • 乔嘉林 ,
  • 黄向东 ,
  • 王建民
展开
  • 清华大学软件学院, 大数据研究中心;大数据系统软件国家工程实验室, 北京 100084
江天,博士研究生,研究方向为大数据系统,电子信箱:jiangtia18@mails.tsinghua.edu.cn;黄向东,助理研究员,研究方向为分布式系统及时序数据库,电子信箱:huangxdong@tsinghua.edu.cn

收稿日期: 2019-11-08

  修回日期: 2020-01-31

  网络出版日期: 2020-04-01

基金资助

国家重点研发计划项目(2016YFB0501504);国家自然科学基金项目(U1509213,61802224)

Big data technologies in open source software: A survey

  • JIANG Tian ,
  • QIAO Jialin ,
  • HUANG Xiangdong ,
  • WANG Jianmin
Expand
  • School of Software, Research Center for Big Data, Tsinghua University;National Engineering Laboratory for Big Data Software, Beijing 100084, China

Received date: 2019-11-08

  Revised date: 2020-01-31

  Online published: 2020-04-01

摘要

随着谷歌文件系统和宽表结构为代表的技术打破依赖关系数据库管理海量数据的限制,以Apache Hadoop为代表的开源大数据管理系统软件新技术与系统不断涌现,并快速成熟应用。针对Apache开源社区中面向在线事务处理和在线分析处理场景的大数据管理软件,介绍了大数据管理中的数据存储、数据分区、副本机制、分布式协议等,并比较分析了分布式文件系统、键值库、时序数据库等典型分布式数据管理系统的优缺点。

本文引用格式

江天 , 乔嘉林 , 黄向东 , 王建民 . 开源软件中的大数据管理技术[J]. 科技导报, 2020 , 38(3) : 103 -114 . DOI: 10.3981/j.issn.1000-7857.2020.03.007

Abstract

The Google's GFS and Big Table have broken the limitations of the technology of having to use the relational databases to manage the big data in the past decade. A number of open source big data management systems, such as the Apache Hadoop, carry the technology further by developing more matured technologies and applications. This paper reviews the big data management systems in addressing the usage scenarios of the OLTP and the OLAP based on the Apache software, and the state of art of the data storage engine, the data partition, the data replication, the distributed system protocol, together with a comparison of the pros and the cons of the current distributed file system, the key value store, and the time series database.

参考文献

[1] Ghemawat S, Gobioff H, Leung S-T. The Google file system[C]//Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles. New York:ACM, 2003:29-43.
[2] Chang F, Dean J, Ghemawat S, et al. Bigtable:A distributed storage system for structured data[J]. ACM Transactions on Computer Systems, 2008, 26(2):1-26.
[3] Wu C, Huang Y F, Lee J. Comparisons between MongoDB and MS-SQL databases on the TWC website[J]. Journal of Software Engineering and Applications, 2015, 4:35-41.
[4] Apache Projects List[EB/OL].[2019-11-17]. https://projects.apache.org/projects.html?category.
[5] Kofler M. InnoDB Tables and Transactions[M]//The Definitive Guide to MySQL. Apress, Berkeley, CA, 2004:239-259.
[6] Sears R, Ramakrishnan R. bLSM:A general purpose log structured merge tree[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2012:217-28.
[7] O'Neil P, Cheng E, Gawlick D, et al. The log-structured merge-tree (LSM-tree)[J]. Acta Informatica, 33(4):351-385.
[8] Shetty P, Spillane R, Malpani R, et al. Building workloadindependent storage with VT-trees[C]//Usenix Conference on File & Storage Technologies. New York:ACM, 2013:13-17.
[9] Zhang Z, Yue Y, He B, et al. Pipelined compaction for the LSM-tree[C]//2014 IEEE 28th International Parallel and Distributed Processing Symposium. Piscataway N J:IEEE, 2014:777-786.
[10] Pan F, Yue Y, Xiong J. dCompaction:Delayed compaction for the LSM-tree[J]. International Journal of Parallel Programming, 2017, 45(6):1310-1325.
[11] Abadi D. The design and implementation of modern column-oriented database systems[J]. Foundations and Trends in Databases, 2012, 5(3):197-280.
[12] Melnik S, Gubarev A, Long J J, et al. Dremel:Interactive analysis of web-scale datasets[C]//Proceedings of the 36th International Conference on Very Large Data Bases. Piscataway N J:IEEE, 2010:330-339.
[13] Bian H, Yan Y, Tao W, et al. Wide table layout optimization based on column ordering and duplication[C]//Proceedings of the 2017 ACM International Conference on Management of Data. New York:ACM, 2017:299-314.
[14] Schlosser R, Kossmann J, Boissier M. Efficient scalable multi-attribute index selection using recursive strategies[C]//2019 IEEE 35th International Conference on Data Engineering. Piscataway N J:IEEE, 2019:209-220.
[15] Felber P, Kropf P, Schiller E, et al. Survey on load balancing in peer-to-peer distributed hash tables[J]. IEEE Communications Surveys & Tutorials, 2013, 16(1):473-492.
[16] Lakshman A, Malik P. Cassandra:a decentralized structured storage system[J]. ACM SIGOPS Operating Systems Review, 2010, 44(2):35-40.
[17] Black B. Cassandra Troubleshooting[Z]. Cassandra Summit 2010, 2010.
[18] Paiva J, Ruivo P, Romano P, et al. Auto placer:Scalable self-tuning data placement in distributed key-value stores[J]. ACM Transactions on Autonomous and Adaptive Systems (TAAS), 2015, 9(4):19.
[19] Virtual nodes/balance[EB/OL].[2019-11-17]. http://wiki.apache.org/cassandra/VirtualNodes/Balance.
[20] Dabek F, Kaashoek M F, Karger D, et al. Wide-area cooperative storage with CFS[C]//ACM SIGOPS Operating Systems Review. New York:ACM, 2001, 35(5):202-215.
[21] Stoica I, Morris R, Karger D, et al. Chord:A scalable peer-to-peer lookup service for internet applications[J]. ACM SIGCOMM Computer Communication Review, 2001, 31(4):149-160.
[22] Chen Z, Yang S, Tan S, et al. Hybrid range consistent hash partitioning strategy-A new data partition strategy for NoSQL Database[C]//201312th IEEE International Conference on Trust, Security and Privacy in Computing and Communications. Piscataway N J:IEEE, 2013:1161-1169.
[23] Kuhlenkamp J, Klems M, Röss O. Benchmarking scalability and elasticity of distributed database systems[J]. Proceedings of the VLDB Endowment, 2014, 7(12):1219-1230.
[24] Wang X, Loguinov D. Load-balancing performance of consistent hashing:asymptotic analysis of random node join[J]. IEEE/ACM Transactions on Networking, 2007, 15(4):892-905.
[25] Hsiao H C, Chung H Y, Shen H, et al. Load rebalancing for distributed file systems in clouds[J]. IEEE Transactions on Parallel and Distributed Systems, 2012, 24(5):951-962.
[26] Parker D S, Popek G J, Rudisin G, et al. Detection of mutual inconsistency in distributed systems[J]. IEEE Transactions on Software Engineering, 1983(3):240-247.
[27] Lu H, Veeraraghavan K, Ajoux P, et al. Existential consistency:Measuring and understanding consistency at Facebook[C]//Proceedings of the 25th Symposium on Operating Systems Principles. New York:ACM, 2015:295-310.
[28] Agrawal D, El Abbadi A, Singh A K. Consistency and orderability:Semantics-based correctness criteria for databases[J]. ACM Transactions on Database Systems, 1993, 18(3):460-486.
[29] Thomas R H. A majority consensus approach to concurrency control for multiple copy databases[J]. ACM Transactions on Database Systems, 1979, 4(2):180-209.
[30] Wada H, Fekete A, Zhao L, et al. Data consistency properties and the trade-offs in commercial cloud storage:the consumers' perspective[C]. 5th The biennial Conference on Innovative Data Systems Research, Asilomar, CA, January 9-12, 2011.
[31] Bailis P, Venkataraman S, Franklin M J, et al. Probabilistically bounded staleness for practical partial quorums[J]. Proceedings of the VLDB Endowment, 2012, 5(8):776-787.
[32] Vogels W. Eventually consistent[J]. Communications of the ACM, 2009, 52(1):40-44.
[33] Bailis P, Ghodsi A, Hellerstein J M, et al. Bolt-on causal consistency[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2013:761-772.
[34] Terry D B, Demers A J, Petersen K, et al. Session guarantees for weakly consistent replicated data[C]//Proceedings of the Third International Conference on Parallel and Distributed Information Systems, 1994. Piscataway N J:IEEE, 1994:140-149.
[35] Terry D B, Demers A J, Petersen K, et al. Session guarantees for weakly consistent replicated data[C]//Proceedings of 3rd International Conference on Parallel and Distributed Information Systems. Piscataway N J:IEEE, 1994:140-149.
[36] Nayate A, Dahlin M, Iyengar A. Transparent information dissemination[C]//ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing. Berlin:Springer, 2004:212-231.
[37] Bourne S. A conversation with Bruce Lindsay[J]. Queue, 2004, 2(8):22-33.
[38] Urgaonkar B, Ninan A G, Raunak M S, et al. Maintaining mutual consistency for cached web objects[C]//Proceedings 21st International Conference on Distributed Computing Systems. Piscataway N J:IEEE, 2001:371-380.
[39] Theel O, Raynal M. Static and dynamic adaptation of transactional consistency[C]//Proceedings of the Thirtieth Hawaii International Conference on System Sciences. Piscataway N J:IEEE, 1997, 1:533-542.
[40] Perrin M, Petrolia M, Mostefaoui A, et al. Consistent shared data types:Beyond memory[D]. Nantes:Université de Nantes, 2014.
[41] Bailis P, Ghodsi A, Hellerstein J M, et al. Bolt-on causal consistency[C]//Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2013:761-772.
[42] Brewer E A. Lessons from giant-scale services[J]. IEEE Internet Computing, 2001, 5(4):46-55.
[43] Abadi D. Consistency tradeoffs in modern distributed database system design:CAP is only part of the story[J]. Computer, 2012, 45(2):37-42.
[44] Davidson S B, Garcia-Molina H, Skeen D. Consistency in a partitioned network:A survey[J]. ACM Computing Surveys (CSUR), 1985, 17(3):341-370.
[45] Tanenbaum A S, Van Steen M. Distributed systems:principles and paradigms[M]. Upper Saddle River:PrenticeHall Inc., 2007.
[46] Bermbach D, Kuhlenkamp J. Consistency in distributed storage systems[C]//International Conference on NetworkedSystems. Berlin:Springer, 2013:175-189.
[47] HDFS architecture guide[EB/OL].[2019-10-07]. https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#Replication+Pipelining.
[48] Dai W, Ibrahim I, Bassiouni M. A new replica placement policy for hadoop distributed file system[C]//2016 IEEE 2nd International Conference on Big Data Security on Cloud (Big Data Security), IEEE International Conference on High Performance and Smart Computing (HPSC), IEEE International Summit (Confluence). Piscataway N J:IEEE, 2014:36-39.
[49] Ye X, Huang M, Zhu D, et al. A novel blocks placement strategy for Hadoop[C]//2012 IEEE/ACIS 11th International Conference on Computer and Information Science. Piscataway N J:IEEE, 2012:3-7.
[50] Patel N M, Patel N M, Hasan M I, et al. Improving HDFS write performance using efficient replica placement[C]//5th International Conference-Confluence The Next Generation Information Technology Conference on Intelligent Data and Security (IDS). Piscataway N J:IEEE, 2016:262-267.
[51] Higai A, Takefusa A, Nakada H, et al. A study of effective replica reconstruction schemes at node deletion for HDFS[C]//201414th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Piscataway N J:IEEE, 2014:512-521.
[52] Qu K, Meng L, Yang Y. A dynamic replica strategy based on Markov model for hadoop distributed file system (HDFS)[C]//20164th International Conference on Cloud Computing and Intelligence Systems (CCIS). Piscataway N J:IEEE, 2016:337-342.
[53] 舒继武, 罗象宏, 存储系统中的纠删码研究综述[J]. 计算机研究与发展, 2012, 49(1):1-11.
[54] Haddock W, Curry M L, Bangalore P V, et al. GPU erasure coding for campaign storage[M]//Kunkel J M, Yokota R, Taufer M, et al. High Performance Computing. Cham:Springer International Publishing, 2017:145-159.
[55] Chen H, Fu S. Parallel erasure coding:Exploring task parallelism in erasure coding for enhanced bandwidth and energy efficiency[C]//2016 IEEE International Conference on Networking, Architecture and Storage (NAS). Piscataway N J:IEEE, 2016:1-4.
[56] Lamport L. Paxos made simple[J]. ACM Sigact News, 2001, 32(4):18-25.
[57] Ongaro D, Ousterhout J. Ousterhout. In Search of an Understandable Consensus Algorithm[C]//Proceedings of the 2014 USENIX conference on USENIX Annual Technical Conference. Berkeley:USENIX, 2014:305-320.
[58] Lamport L. Fast paxos[J]. Distributed Computing, 2006, 19(2):79-103.
[59] Lamport L, Massa M. Cheap paxos[C]//International Conference on Dependable Systems and Networks, 2004. Piscataway N J:IEEE, 2004:307-314.
[60] Gafni E, Lamport L. Disk paxos[J]. Distributed Computing, 2003, 16(1):1-20.
[61] Copeland C, Zhong H. Tangaroa:A byzantine fault tolerant raft[J/OL].[2019-09-30]. http://www.scs.stanford.edu/14au-cs244b/labs/projects/copeland_zhong.pdf.
[62] Arora V, Mittal T, Agrawal D, et al. Leader or majority:Why have one when you can have both? improving read scalability in raft-like consensus protocols[C]//9th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 17). Berkeley:USENIX, 2017:46-51.
[63] Apache Accumulo[EB/OL].[2019-11-17]. https://accumulo.apache.org.
[64] LevelDB[EB/OL].[2019-11-17]. https://github.com/google/leveldb.
[65] RocksDB[EB/OL].[2019-11-17]. https://rocksdb.org.
[66] Wang L, Ding G, Zhao Y, et al. Optimization of leveldb by separating key and value[C]//201718th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT). Piscataway N J:IEEE, 2017:421-428.
[67] Mei F, Cao Q, Jiang H, et al. LSM-tree managed storage for large-scale key-value store[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 30(2):400-414.
[68] 饶毓琳. 基于LSM-Tree的持久化缓存机制的优化研究[D]. 武汉:华中科技大学, 2016.
[69] TimescaleDB[EB/OL].[2019-11-17]. https://www.timescale.com.
[70] OpenTSDB[EB/OL].[2019-11-17]. http://opentsdb.net.
[71] KairosDB[EB/OL].[2019-11-17]. http://kairosdb.github. io.
[72] InfluxDB[EB/OL].[2019-11-17]. https://www.influxdata.com/products/influxdb-overview.
[73] Apache IoTDB[EB/OL].[2019-11-17]. https://iotdb.apache.org.
[74] Rhea S, Wang E, Wong E, et al. Littletable:A time-series database and its uses[C]//Proceedings of the 2017 ACM International Conference on Management of Data. New York:ACM, 2017:125-138.
[75] 徐化岩, 初彦龙. 基于influxDB的工业时序数据库引擎设计[J]. 计算机应用与软件, 2019, 36:9.
[76] Balis B, Bubak M, Harezlak D, et al. Towards an operational database for real-time environmental monitoring and early warning systems[J]. Procedia Computer Science, 2017, 108:2250-2259.
[77] Przymus P, Kaczmarski K. Dynamic compression strategy for time series database using GPU[M]//New Trends in Databases and Information Systems. Berlin:Springer, 2014:235-244.
[78] Llusa S A, Vila-Marta S, Escobet C T. Formalism for a multiresolution time series database model[J]. Information Systems, 2016, 56:19-35.
[79] Sevcech J, Bielikova M. Symbolic time series representation for stream data processing[C]//2015 IEEE Trustcom/BigDataSE/ISPA. Piscataway N J:IEEE, 2015, 2:217-222.
文章导航

/