Although the massive data pose many challenges and invalidate earlier designs, they provide many great opportunities, and most of all, instead of making decisions based on small sets of data or calibration, decisions can now be made based on the data itself. Various Big Data applications have emerged, such as Social networking, Enterprise data management, Scientific applications, Mobile computing, Scalable and elastic data management, Scalable data analytics, etc.
Meanwhile, many distributed data processing frameworks/systems have been proposed to deal with Big Data problem. MapReduce is the most successful distributed computing platform whose fundamental idea is to simplify the parallel processing, and has been widely applied. MapReduce systems are good at complex analytics and extract-transform-load tasks at large scale, however it also suffers from its reduced functionality. There also exist many other distributed data processing systems that go beyond the MapReduce framework. These systems have been designed to address various problems not well handled by MapReduce, e.g., Dremel for Interactive analysis, GraphLab for Graph analysis, STORM for stream processing, Spark for memory computing.
The Big Data presents us the challenges and opportunities in designing new data processing systems for managing and processing the massive data. The potential research topics in this field lie in all phases of data management pipeline that includes data acquisition, data integration, data modelling, query processing, data analysis, etc. Besides, the Big Data also brings great challenges and opportunities to other computer science disciplines such as system architecture, storage system, system software and software engineering.
More information is available in the article, titled "Big Data: the driver for innovation in databases" which appears in theNational Science Review, Volume 1, Issue 1, Pp. 27-30 - doi:10.1093/nsr/nwt020.
The abstract is available at http://nsr.oxfordjournals.org/content/1/1/27.extract