Stuart Ball, JNCC/NBN
The rapid development of technology over the last twenty years has had a very significant impact on environmental informatics both in terms of the volume of electronic data accessible and the technologies available for exploring and processing these. The presentation focuses around the experience gained from two initiatives aiming to provide easy access to biological records. First, the National Biodiversity Network (http://data.nbn.org.uk) currently provides access to almost 60 million records across the UK and secondly, GBIF, the Global Biodiversity Information Facility (http://data.gbif.org) provides an equivalent service at a global scale hosting over 200 million records. The experience from these initiatives and the analysis of the data delivered from them has given insights into which of the technological advances are most relevant to these sorts of systems. In essence while emerging technologies have a role to play, generally simple solutions often work best both in terms of uptake and scalability. GBIF attempted to create a very distributed network to provide direct access to the data but suffered from lack of reliability of the individual nodes and bottlenecks in the transfer of the data. In addition whilst data formats such as XML work well for relatively small volumes of data it does not scale well to larger volumes, such as when a user wants to create a cache of all relevant data within a domain of interest. The analytical techniques for analysing the sorts of data now available are becoming more accessible but are still relatively computationally intensive. Over the next five years there is likely to be an increased demand to repeatedly run these techniques against all of the data available. Given that the current systems already suffer from bottlenecks around indexing the relatively large amounts of data available, developments are now are focusing on the potential use of clusters to more rapidly generate these sorts of outputs.