[1] Li J Z, Liu X M. An important aspect of big data: Data usability[J]. Journal of Computer Research and Development, 2013, 50(6): 1147-1162.
[2] Eckerson W W. Data quality and the bottom line: Achieving business success through a commitment to high quality data[R]. Renton, WA: The Data Warehousing Institute, 2000: 12-20.
[3] Institute of Medicine. To err is human: Building a safer health system[M]. Washington: The National Academies Press, 1999.
[4] Bohannon P, Fan W F, Flaster M, et al. A cost-based model and effec tive heuristic for repairing constraints by value modification[C]. ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, June 14-16, 2005.
[5] English L. Plain English on data quality: Information quality manage ment: The next frontier[J]. DM Review Magazine, 2000.
[6] Ben W, Schulz S. Credit card statistics, industry facts, debt statistics[EB/OL]. 2010-03-19, [2014-09-25]. http://www.creditcards.com.
[7] Gartner. Gartner says more than 50 percent of data warehouse projects will have limited acceptance or will be failures through 2007[EB/ OL]. 2005-02-24, [2014-09-25]. http://www.gartner.com/newsroom/id/ 492112.
[8] Elmagarmid A K, Ipeirotis P G, Verykios V S. Duplicate record detec tion: A survey[J]. IEEE Transactions on Knowledge and Data Engineer ing, 2007, 19(1): 1-16.
[9] Christen P. A survey of indexing techniques for scalable record linkage and deduplication[J]. IEEE Transactions on Knowledge and Data Engi neering, 2012, 24(9): 1537-1555.
[10] Rahm E, Do H H. Data cleaning: Problems and current approaches[J].Bulletin of the Institute of Electrical and Electronics Engineers Data Engineering Bulletin, 2000, 23(4): 3-13.
[11] Fan W F, Geerts F, Jia X B, et al. Conditional functional dependen cies for capturing data inconsistencies[J]. ACM Transactions on Data base Systems, 2008, 33(2): 1-48.
[12] Bravo L, Fan W F, Ma S. Extending dependencies with conditions[C]. The 33rd International Conference on Very Large Data Bases, Univer sity of Vienna, Austria, September 23-27, 2007.
[13] Fan W F, Geerts F, Wijsen J. Determining the currency of data[C]. The 30th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), Athens, Greece, June 12-16, 2011.
[14] Cao Y, Fan W F, Yu W Y. Determining the relative accuracy of attri butes[C]. 2013 International Conference on Management of Data, New York, USA, June 23-28, 2013.
[15] Chiang F, Miller R J. Discovering data quality rules[J]. The Proceed ings of the VLDB Endowment, 2008, 1(1): 1166-1177.
[16] Fan W F, Geerts F, Li J Z, et al. Discovering conditional functional dependencies[J]. IEEE Transactions on Knowledge and Data Engineer ing, 2011, 23(5): 683-698.
[17] Chu X, Ilyas I F, Papotti P. Discovering denial constraints[J]. The Pro ceedings of the VLDB Endowment, 2013, 6(13): 1498-1509.
[18] Bauckmann J, Abedjan Z, Leser U, et al. Discovering conditional in clusion dependencies[C]. The 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, October 29-November 2, 2012.
[19] Loshin D. Master data management[M]. San Francisco: Morgan Kaufmann, 2008.
[20] Fan W F, Geerts F. Relative information completeness[J]. ACM Trans actions on Database Systems, 2010, 35(4): 27-35.
[21] Bohannon P, Fan W, Flaster M, et al. A cost-based model and effec tive heuristic for repairing constraints by value modification[C]. ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005.
[22] Cong G, Fan W, Geerts F, et al. Improving data quality: Consistency and accuracy[C]. The 33rd International Conference on Very Large Data Bases, University of Vienna, Austria, September 23-27, 2007.
[23] Arenas M, Bertossi L E, Chomicki J, et al. Scalar aggregation in incon sistent databases[J]. Theoretical Computer Science, 2003, 296(3): 405-434.
[24] Geerts F, Mecca G, Papotti P, et al. The LLUNATIC data-cleaning framework[J]. The Proceedings of the VLDB Endowment, 2013, 6(9): 625-636.
[25] Fan W F, Geerts F, Tang N, et al. Inferring data currency and consis tency for conflict resolution[C]. 29th IEEE International Conference on Data Engineering, Brisbane, April 8-12, 2013.
[26] Galland A, Abiteboul S, Marian A, et al. Corroborating information from disagreeing views[C]. The third ACM International Conference on Web Search and Data Mining, New York, USA, February 3-6, 2010.
[27] Dong X L, Berti-Equille L, Srivastava D. Integrating conflicting data: The role of source dependence[J]. The Proceedings of the VLDB En dowment-PVLDB, 2009, 2(1): 550-561.
[28] Dong X L, Berti-Equille L, Srivastava D. Truth discovery and copying detection in a dynamic world[J]. The Proceedings of the VLDB Endow ment-PVLDB, 2009, 2(1): 562-573.
[29] Zhao B, Rubinstein B I P, Gemmell J, et al. A bayesian approach to discovering truth from conflicting sources for data integration[J]. The Proceedings of the VLDB Endowment, 2012, 5(6): 550-561.
[30] Lakshminarayan K, Harp S A, Goldman R, et al. Imputation of miss ing data using machine learning techniques[C]. The Second Interna tional Conference on Knowledge Discovery and Data Mining, Portland, Oregon, August 2-4, 1996.
[31] Mayfield C, Neville J, Prabhakar S. ERACER: A database approach for statistical inference and data cleaning[C]. ACM SIGMOD Interna tional Conference on Management of Data, Indianapolis, Indiana, USA, June 6-10, 2010.
[32] Setiawan N A, Venkatachalam P, Hani A F M. Missing attribute value prediction based on artificial neural network and rough set theory[J]. Biomedical Engineering and Informatics, 2008, 1: 306-310.
[33] Hua M, Pei J. Cleaning disguised missing data: A heuristic approach[C]. The 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15, 2007.
[34] Lin X M, Wang W. Set and string similarity queries: A survey[J]. Chi nese Journal of Computers, 2011, 34(10): 1853-1862.
[35] Leopoldo B. Database repairing and consistent query answering[M]. California: Morgan & Claypool, 2011.
[36] Bry F. Query answering in information systems with integrity con straints[M]//Integrity and Internal Control in Information Systems. New York: Springer, 1997: 113-130.
[37] Arenas M, Bertossi L, Chomicki J. Consistent query answers in incon sistent databases[C]. Eighteenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Philadelphia, Pennsyl vania, May 31-June 2, 1999.
[38] Kolaitis P G, Pema E, Tan W C. Efficient querying of inconsistent databases with binary integer programming[J]. The Proceedings of the VLDB Endowment, 2013, 6(6): 397-408.
[39] Barceló P, Bertossi L. Logic programs for querying inconsistent databases[M]//Practical Aspects of Declarative Languages. New York: Springer, 2003: 208-222.
[40] Fuxman A, Fazli E, Miller R J. Conquer: Efficient management of inconsistent databases[C]. ACM SIGMOD International Conference on Management of Data, Baltimore, Maryland, USA, June 14-16, 2005.
[41] Fuxman A, Miller R J. First-order query rewriting for inconsistent da tabases[J]. Journal of Computer and System Sciences, 2007, 73(4): 610-635.
[42] Wijsen J. Consistent query answering under primary keys: A character ization of tractable queries[C]. The 12th International Conference on Database Theory, St Petersburg, Russia, March 23-25, 2009.
[43] Greco S, Pijcke F, Wijsen J, et al. Certain query answering in partially consistent databases[J]. Proceedings of the VLDB Endowment, 2014, 7 (5): 32-65.
[44] Maslowski D, Wijsen J. Counting database repairs that satisfy conjunc tive queries with self-joins[C]. The 17th International Conference on Database Theory, Athens, Greece, March 24-28, 2014.
[45] Maslowski D, Wijsen J. On counting database repairs[C]. The 4th International Workshop on Logic in Databases, San Miniato, March 25, 2011.
[46] Khalefa M E, Mokbel M F, Levandoski J J. Skyline query processing for incomplete data[C]. 2008 IEEE 24th International Conference on Data Engineering (ICDE 08), Cancun, April 7-12, 2008.
[47] Alwan A A, Ibrahim H, Udzir N I, et al. Skyline queries over incom plete multidmensional database[C]. The 3rd International Conference on Computing and Informatics, Bandung, June 8-9, 2011.
[48] Bharuka R, Kumar P S. Finding skylines for incomplete data[C]//Pro ceedings of the Twenty-Fourth Australasian Database Conference. Gold Coast, Queensland: Australian Computer Society, 2013, 137: 109-117.
[49] Miao X, Gao Y, Chen L, et al. On efficient k-skyband query processing over incomplete data[M]//Database Systems for Advanced Applica tions. Berlin Heidelberg: Springer, 2013: 424-439.
[50] Gao Y, Miao X, Cui H, et al. Processing k-skyband, constrained skyline, and group-by skyline queries on incomplete data[J]. Expert Systems with Applications, 2014, 41(10): 4959-4974.
[51] Hadjali A, Pivert O, Prade H. Possibilistic contextual skylines with incomplete preferences[C]//Proceeding of 2010 International Conference of Soft Computing and Pattern Recognition. New York, USA: Institute of Electrical and Electronics Engineers, 2010: 57-62.
[52] Arefin M S, Morimoto Y. Skyline sets queries from databases with missing values[C]//Proceeding of 22nd International Conference on Computer Theory and Applications. Chengdu: Institute of Electrical and Electronics Engineers, 2012: 24-29.
[53] Markus E, Patrick R, Florian W, et al. Handling of NULL values in preference database queries[C]. 20th European Conference on Artificial Intelligence, Montpellier, France, August 27-31, 2012.
[54] Kolb L, Thor A, Rahm E, et al. Efficient deduplication with hadoop[J]. The Proceedings of the VLDB Endowment, 2012, 5(12): 1878-1881.
[55] Kolb L, Thor A, Rahm E. Load balancing for MapReduce-based entity resolution[C]. International Council for Open and Distance Education, Washington D C, April 1-5, 2012.
[56] Kolb L, Thor A, Rahm E. Block-based load balancing for entity reso lution with MapReduce[C]. The 20th ACM International Conference on Information and Knowledge Management, Glasgow, United King dom, October 24-28, 2011.
[57] Huo R, Wang H Z, Zhu R, et al. Entity identification in big data based on MapReduce[J]. EIBM, 2013, 50(S2): 20-35.
[58] Jin L, Wang H Z, Huang S B, et al. Missing value imputation in big data based on Map-Reduce[J]. Journal of Computer Research and Devel opment, 2013, 50(Sl): 312-321.
[59] Vernica R, Carey M J, Li C. Efficient parallel set-similarity joins using mapreduce[C]. ACM SIGMOD International Conference on Management of Data, Indianapolis, Indiana, USA, June 6-10, 2010.
[60] Metwally A, Faloutsos C. V-smart-join: A scalable mapreduce frame work for all-pair similarity joins of multisets and vectors[J]. The Proceedings of the VLDB Endowment, 2012: 213-300.
[61] Afrati F N, Sarma A D, Menestrina D, et al. Fuzzy joins using mapre duce[C]. International Council for Open and Distance Education, Washington D C, April 1-5, 2012.
[62] Okcan A, Riedewald M. Processing theta-joins using mapreduce[C]. ACM SIGMOD International Conference on Management of Data, Athens, Greece, June 12-16, 2011.
[63] Deng D, Li G L, Hao S, et al. MassJoin: A mapreduce-based method for scalable string similarity joins[C]. 2014 IEEE 30th International Conference on Data Engineering, Moscow, Russia, March 31-April 4, 2014.
[64] Sarma A D, He Y Y, Chaudhuri S. ClusterJoin: A similarity joins framework using MapReduce[J]. The Proceedings of the VLDB Endow ment, 2014, 7(12): 1059-1070.
[65] Wang H Z, Li M D, Bu Y Y, et al. A big data cleaning parfait[C]. The 23rd ACM International Conference on Information and Knowledge Management, Shanghai, Nov 3-7, 2014: 10-23.
[66] Bornhövd C, Lin T, Haller S, et al. Integrating automatic data acquisi tion with business processes experiences with saps auto-id infrastruc ture[J]. The Proceedings of the VLDB Endowment, 2004, 30: 1182-1188.
[67] Rao J, Doraiswamy S, Thakkar H, et al. A deferred cleansing method for rfid data analytics[C]. The 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006.
[68] Jeffery S, Garofalakis M, Franklin M. Adaptive cleaning for rfid data streams[C]. The 32nd International Conference on Very Large Data Bases, Seoul, Korea, September 12-15, 2006.
[69] Tran T, Sutton C, Cocci R, et al. Probabilistic inference over rfid streams in mobile environments[C]. The 25th International Conference on Data Engineering, March 29-April 2, 2009.
[70] Chen H, Ku W, Wang H, et al. Leveraging spatio-temporal redundan cy for rfid data cleansing[C]. ACM SIGMOD International Conference on Management of Data, Indianapolis, Indiana, USA, June 6-10, 2010.
[71] Zhao Z, Ng W. A model-based approach for RFID data stream cleans ing[C]. The 21st ACM International Conference on Information and Knowledge Management, Maui, Hawaii, October 29-November 2, 2012.
[72] Zhu X Q, Zhang P, Wu X D, et al. Cleansing noisy data streams[C]. The IEEE International Conference on Data Mining, Cancún, México, December 15-19, 2008.
[73] Fan W F, Li J Z, Ma S, et al. Interaction between record matching and data repairing[C]. ACM SIGMOD International Conference on Management of Data, Athens, Greece, June 12-16, 2011.
[74] Fan W F, Geerts F, Tang N, et al. Inferring data currency and consis tency for conflict resolution[C]. The 29th IEEE International Confer ence on Data Engineering, Brisbane, April 8-12, 2013.
[75] Ebaid A, Elmagarmid A K, Llyas I, et al. NADEEF: A generalized data cleaning system[J]. The Proceedings of the VLDB Endowment, 2013, 6(12): 1218-1221.
[76] Demartini G, Difallah D E, Cudre-Mauroux P. ZenCrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]. The 21st World Wide Web Conference, Lyon, France, April 16-20, 2012.
[77] Wang J, Kraska T, Franklin M J, et al. CrowdER: Crowdsourcing entity resolution[J]. The Proceedings of the VLDB Endowment, 2012, 5(11): 1483-1494.
[78] Wang J N, Li G L, Kraska T, et al. Leveraging transitive relations for crowdsourced joins[C]. International Conference on Management of Da ta, New York, USA, June 22-27, 2013.
[79] Ye C, Wang H Z. Capture missing values based on crowdsourcing[J]. Lecture Notes in Computer Science, 2014, 8491: 783-792.
[80] Ye C, Wang H Z, Gao H, et al. Truth discovery based on crowdsourc ing[J]. Lecture Notes in Computer Science, 2014, 8485: 453-458.
[81] Tong Y X, Cao C C, Zhang C J, et al. CrowdCleaner: Data cleaning for multi-version data on the web via crowdsourcing[C]. 2014 IEEE 30th International Conference on Data Engineering, Moscow, Russia, March 31-April 4, 2014.
[82] Lofi C, El Maarry K, Balke W T. Skyline queries over incomplete da ta-error models for focused crowd-sourcing[M]//Conceptual Modeling. Berlin: Springer, 2013: 298-312.
[83] Lofi C, El Maarry K, Balke W T. Skyline queries in crowd-enabled databases[C]. The 16th International Conference on Extending Data base Technology, Genoa, Italy, March 18-22, 2013.
[84] Li Z X, Sharaf M A, Sitbon L, et al. A web-based approach to data imputation[J]. World Wide Web, 2014, 17(5): 873-897 .
[85] Chen Y C, Li J Z, Luo J Z. ITCI: An information theory based classifi cation algorithm for incomplete data[J]. Lecture Notes in Computer Science, 2014, 8485: 167-179.