Put simply, data is information, whether from measurements of the World around us, or from within us: everything from our genetic codes to the languages we speak. Data, the product of experiment in physical and social sciences, has existed as a concept for hundreds of years, but it’s only thanks to the developments in computing power, storage size and transfer capacity that we’re now able to process and understand very large sets of data or ‘datasets’ – such as work depednding on the human genome and how it interacts with biomolecules in our cells, or humungous star atlases. Being able to handle large datasets is important, because the more information we have about our genes, the World’s climate, and the properties of matter, the more ready we will be to tackle the Grand Challenges of the 21sth Century, such as climate change and global disease
Data can be quantitative – which captures number-based measurements – or qualitative, which captures everything else. In a computer, qualitative information, such as transcripts of interviews, audio or video recordings, and images, are all stored in digitised formats. Only 20 years ago most audio and video recording and photographic equipment captured analogue representations of the subject matter under study, but nowadays most devices that are sold are digital. Tremendous effort is also being put into digitising historical documents to make them readily available to researchers, and technologies such as optical character recognition, file compression algorithms, and ever-smarter searching capabilities mean that sorting through and making sense of diverse qualitative data is becoming a reality, paving the way for eHumanities.
Data is diverse – and it is our key to understanding the Universe.