A nemisys of sharing data has always been the diversity of terms, symbols and structure used in the data - the vocabulary of there data its self.  There are three basic approaches to this problem:

  • on the one end every data source is differetent - it has its own terms and structure.  If you want some other vocabulary or structure you have to convert it, but creating these converters is expensive and time consuming, it doesn't lend it's self to federation and a global solution.
  • on the other end of this spectrum you have the universal data model - every term and structure is controled and normalized. The problem here is that it is nearly imposible to get agreement on these terms and structures, the debates are never ending
  • a modern approach is to use "ontologies" where the meaining of each term is very well defined and can be mapped by technology.  As exciting as this is it has proved very difficult to define the semanitcs of everything and the technologies to match terms is in an early state - getting agreement on meaning is just as hard as getting agrement on terms

We propose a middle ground - shared concept "hubs".  A hub is just a trusted source of common information, one that defines a vocabulary of terms and concepts.  Concepts in a hub are then shared among the data sets that want to leverage it.  Publishers are then able to trust one or more of these hubs and relate their terms (if they are different) to the hub terms.  So it is up to the publisher of information to accept a hub and there can be more than one.  The vocabulary within a hub is controled by the publisher, but since they "own" that hub and (we expect) any one hub has some domain where that publisher has some credibility.  By being able to "ground" data in these hubs and allowing for multiple hubs, we will produce a "marketplace" of trusted interoperability points - and those that are the most trusted will grow and become established, and thus link the data grounded in those hubs.  Shared concept hubs become the Metadata of the data cloud.

Since hubs are just data they can also be grounded - hubs can be gounded in other hubs, making a network of trusted vocabularies.  Hubs can also use ontologies to define their concepts - so that as semantics becomes more and more practical we will have more automatic ways to gound our data and our hubs.

It is the job of  the data cloud's platform - GAIN to provide for tusted hubs and "grounding" published data sets in hubs.
The data in the could it's self would have minimal strucure - it would be the job of the platform to convery information in the hub to the vocabulary and structure that makes sense to a user.

The above illistrates using a shared concept hub to ground vocabulary concepts.