Data citations


This is where I hope we will post discussion and finding on how we should handle data citations.

 

Some ideas to throw out there are:

How to handle citing specific pieces or subsets of data?  Would like to point to atomic pieces of data if possible.

For example a specific row in a CSV file

or a specific table in an excel spreadsheet (I think cells are easier)

or even a dataset on the cloud.

 

People I am trying to recruit are:

David Strauss - large scale systems architect

Brian Fitzpatrick - Data Liberation League at Google

Brian Aker - Architect for mySQL and Drizzle

Peeps from microsoft research

need EC2 and S3 expert

 

 

Some Random Ideas

 

 

 

  1. Referencing the whole object.  In order to do this, we need a UID for the object.  This is addressed in the text document citability problem.
  2. Referencing a part of the object relative to the whole.  e.g. in a table-like structure, this could be row/column addressing.