What do you think of e-ScienceCity? Click here!

Storing data

The relative importance of storage in the development of computer systems has shifted over the decades. What was once a bottleneck in both computer system and software design – due to the relatively high cost per unit of both temporary memory (RAM) and permanent storage (hard disks, floppy disks, flash memory) – is now inexpensive and abundantly available. The key question now is not whether we should store data (in fact, there are good reasons why scientists should keep all their data (see Processing Data) but where we should store it.

An increasingly popular solution is to store data ‘in the cloud’, which means it is off-site but accessible over the internet. This can be both more convenient for users wanting to get their data wherever they are (providing they have internet access, of course!) and more cost-effective, thanks to the economies of scale achieved by the storage provider. However, whether data stored in the cloud is secure from hackers and whether it is subject to any clauses that could challenge an individual’s right to own the data and do with it as they please could be called into question.

Cloud services are provided both by private companies (such as box.com, dropbox.com or Amazon S3) and, for publicly funded research, there are scientific or ‘academic’ clouds offering storage to cloud-based applications (examples include Venus-C, Globus, Stratuslab and KC Class). While services offered by the private companies may be perfectly secure and reliable for the purposes of research in academia, researchers need to be aware of any clauses in the terms of service of a private provider that could infringe on their rights to that data – some academic institutions have questioned the user licence agreements of these services, and advocate the research-focused clouds.

Whether it’s stored ‘locally’ on a computers internal hard drive, a memory stick, or a business or home server storage, or stored ‘in the cloud’, it has to reside somewhere in one or more physical locations. This could expose the data to risk from cybercrimes that attack the infrastructure itself.

The best solutions distribute and duplicate the data around the globe – just as a user might back up important data to a USB stick or external hard drive.