Access to distributed data repositories

Distributed data access consists of a set of components that provide the wide range of data-related services required by the SIMDAT application scenarios:

transparently access data that resides in remote file-based repositories or databases across the Grid
efficiently transmit large amounts of data between different nodes on the Grid, be it for transmitting files or for synchronizing between data repositories
effectively manage the storage, replication and synchronization of data at local and remote Grid nodes and support indirect access to data based on (meta)data catalogues
handle the semantic mediation between different data models and replications, using dynamic techniques based on ontologies

The Data infrastructure is based on using an extending existing, third-party components wherever possible, with SIMDAT focusing on hardening the SW components, ensuring interoperability between themselves and integration with the GRIA-based SIMDAT Grid infrastructure, and extending functionality or improving performance where required by the application scenarios.

: OGSA-DAI Service

The central elements of the Data infrastructure are based on the OGSA-DAI package. OGSA-DAI provides the framework for accessing file repositories and databases through a Web Service interface regardless of their location.

Automatic distribution, replication and synchronization of data is performed through the IGOR-FS distributed filesystem. IGOR-FS partitions files (and directories) into blocks, each of which is uniquely characterized by it’s hash value. Blocks are looked up by hash value, and chains of blocks are likewise assembled by referencing hash values. In a Grid, a network of IGOR daemons provide access to file blocks – they can uniquely identify and verify each block regardless of its location (since blocks are indexed by their content, not their location), and manage adaptive, local caches of blocks. Synchronization of changes is fully automatic – IGOR-FS is designed for the case of one/few writers and many readers, and changes to a file are automatically propagated, since they amount to creating a new sequence of blocks rather than modifying existing blocks. This scheme also delivers a very powerful version control functionality.

Role in SIMDAT

OGSA-DAI is used by SIMDAT as a common interface to local data repositories (f.i. abstracting the large variety of archive systems used by Weather centers in the Meteorology application area), and as the standard interface for accessing and manipulating data across a Grid (in the Automotive and Aerospace application scenarios).

IGOR-FS is used in the Pharmaceutical scenario to distribute large gene and protein databases amongst partners. Here, it really shines, since only blocks actually used by an application will be transferred, and since changes/updates are managed in a totally transparent way.

The results of this technology can be found under Grid Solution Portfolio at Data.