![]() GOALS AND CHALLENGES OF ANALYZING BIG DATA Valid statistical analysis for Big Data is becoming increasingly important. The massive amounts of high-dimensional data bring both opportunities and new challenges to data analysis. For example, scientific advances are becoming more and more data-driven and researchers will more and more think of themselves as consumers of data. This trend will have deep impact on science, engineering and business. The existing trend that data can be produced and stored more massively and cheaply is likely to maintain or even accelerate in the future. This is also true in other areas such as social media analysis, biomedical imaging, high-frequency finance, analysis of surveillance videos and retail sales. ![]() For example, in genomics we have seen a dramatic drop in price for whole genome sequencing. Such a Big Data movement is driven by the fact that massive amounts of very high-dimensional or unstructured data are continuously produced and stored with much cheaper cost than they used to be. We are entering the era of Big Data-a term that refers to the explosion of available information. What is new about Big Data and how they differ from the traditional small- or medium-scale data? This paper overviews the opportunities and challenges brought by Big Data, with emphasis on the distinguished features of Big Data and statistical and computational methods as well as computing architecture to deal with them. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.īig Data, noise accumulation, spurious correlation, incidental endogeneity, data storage, scalability INTRODUCTIONīig Data promise new levels of scientific discovery and economic value. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogenous assumptions in most statistical methods for Big Data cannot be validated due to incidental endogeneity. We also provide various new perspectives on the Big Data analysis and computation. This paper gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. These challenges are distinguished and require new computational and statistical paradigm. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity and measurement errors. On the one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. Big Data bring new opportunities to modern society and challenges to data scientists.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |