Big Data describes the coming together of both new technologies and new ways of doing science. The storage capacities and processing power of modern supercomputers and distributed computer networks is reaching the exascale threshhold, just as statistical methods and data mining techniques make it sensible to capture all the data produced by experiments, even if we don’t quite know what it is we’re searching for when we do so.
Big Data is about the petabytes of results from particle physics, systems biology and Earth simulation science – how we deal with that volume of data and how we use it. But it’s also about the variety of data being produced. Life sciences, social sciences and cognitive sciences produce data of many different types, including images, for example, as well as text-based data, so categorising and storing it all becomes a challenge. And in medicine, as data becomes obtainable at an ever-faster rate, there is an opportunity to mesh data from different source – from physiological feedback and genetic screening – to determine the course of intervention particular to individual patients.
Open data is about a change in attitude towards data sharing among academic researchers and publishers, brought about partly in response to the challenges and opportunities of an online world. Researcher and academic publishers are seeing the value in being open about the data they collect. Where once both would have had a vested interest in controlling access to data – the researcher to ensure publication in a prestigious journal with no prior ‘leaks’ of their valuable results; the publisher to ensure no-one can copy their manuscripts without permission – now it is almost universally recognised that publishers can still make money and academics can get good jobs when the publications (and data) are made freely available. In fact, the model of open access publishing was just one move towards open data – open data itself has required an even bigger leap of faith.
It’s through a combination of the big data and open data concepts that scientists, social scientists and decision makers in government and industry think we can tackle some of the Grand Challenges of the 21st Century: climate change, population growth and resource scarcity, pandemics, and energy security.