What do you think of e-ScienceCity? Click here!

Data with history

The need to record information in physical form stretches back before writing: it grew alongside human evolution. From the picture-writing of prehistory, which were used to record tax collections thousands of years ago in the eastern Mediterranean; to the Domesday Book, which recorded settlements, populations and resources in England after the Norman conquest; recording the numbers involved in early economies allowed those economies to be harnessed and directed, usually to the benefit of those in power.

Numbers aren’t the only kind of data, but the laws of numbers are the basis for empirical science, without which the sciences of physics, chemistry and biology could not have developed. A franciscan friar in England, Roger Bacon, described in the 12th Century how recording the numbers we could obtain from observation about natural phenomena could help us develop the sciences. Over the next few hundred years, most great breakthroughs in science and engineering were made by individuals making just such careful observations about the world: from Copernicus, to Kepler, Galileo, Leonardo, Newton, and Hooke. Many of those written data records still exist, and due to the fame of the scientists who made them they are priceless artefacts. But they are perhaps even more valuable due to the scientific principle they enshrine: reproducibility.

Reproducibility means that, if you were to perform an experiment exactly as it was recorded in those books, you should (with a bit of luck!) get results fairly close the numbers written down hundreds of years ago.

Decimal to Binary
The very first digital computers were electromechanical, which made them slower than their later valve or transistor-driven counterparts, but they were also much larger. This was due not only to the fact that electromechanical relays – essentially, standard-sized switches – are much, much larger than transistors in modern computer circuitry, but each digital unit had to hold ten relays: one for each decimal digit.

This wasn’t a very sensible solution. Electrical systems, which can reliably operate by two-way switches, can more easily operate using the binary counting system, which consists of just 0 and 1 – for ‘off’ and ‘on’, respectively. It’s also quick to describe large numbers using a bank of switches in binary. For example, large numbers such as 5008354867 would need 33 switches in binary (in the sequence 100101010100001010110111000110011) whereas a decimal system would need 100 switches (10 banks of 10, each bank holding the value for a single digit).

The fact that they are grouped into 10s also impacts on reliability – only one switch need to be faulty in each bank for the entire thing to need replacing.

Konrad Zuse’s Z series of computers were possibly the first to use binary as a means to carry out calculations.

Binary to hexadecimal
Early electronic computers, which used valves, were often constructed to a pattern that followed the so-called von Neumann architecture. Von Neumann described a computer as requiring an arithmetic unit with input and output connected to a control unit (CPU) which would read programs from and write output to a memory bank. The working memory was effectively the same as the storage memory.

The later Harvard architecture, a modified version of which is the basis for computers used to this day, allowed separate long-term storage, which we would recognise as being equivalent to storage media such as hard disks, flash memory and optical media, such as recordable DVDs. There was no distinction from semi-permanent memory, the RAM of a computer, so that would be included too. Storing data as a string of 0s and 1s turns out to be inefficient, so hexadecimal was chosen instead. Hexadecimal is just another counting system, with sixteen ‘numbers’ rather than the 10 (from 0 to 9) that we have. The numbers are the same from 0 to 9, with values from 10 to 16 replaced by the letters A–F, i.e. 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F. It just so happens that it is fairly simple to convert between binary (for working memory/processing) and hexadecimal, thanks to the transistor-based CPUs (central processing unit) of computers being developed from the 1960s onwards.

Media Formats
The history of computing has bore witness to a startling number of different media being used to store both data and programs. Early computers used reels of punched tape or stacks of punched cards. Floppy disks of various designs dominated the market for removable media for over twenty years. At the dawn of the optical storage age, which we are just seeing the last remnants of with the dying DVD-RW format, laser discs were used to store data – including for the BBC’s Doomsday project. After only a few years, the technology had moved on, and the data they contained was almost lost before a data preservation operation began in the early 2000s. In the 1990s, we had zip disks and DAT (Digital Audio Tape) tapes, and after that memory sticks, which are being produced in ever-greater capacities. During the last thirty years, the capacity of most hard disks in desktop computers has increased by 20,000 times but the price per megabyte has fallen by a factor of 10 million.