• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Data citations

Page history last edited by Kurt Bollacker 14 years, 1 month ago

This is where I hope we will post discussion and finding on how we should handle data citations.


Some ideas to throw out there are:

How to handle citing specific pieces or subsets of data?  Would like to point to atomic pieces of data if possible.

For example a specific row in a CSV file

or a specific table in an excel spreadsheet (I think cells are easier)

or even a dataset on the cloud.


People I am trying to recruit are:

David Strauss - large scale systems architect

Brian Fitzpatrick - Data Liberation League at Google

Brian Aker - Architect for mySQL and Drizzle

Peeps from microsoft research

need EC2 and S3 expert



Some Random Ideas



  • Consider the more general problem of citing structured objects that are not text documents (text documents are a special case important enough to warrant their own model).  This brings us to at least two parts of the problem


  1. Referencing the whole object.  In order to do this, we need a UID for the object.  This is addressed in the text document citability problem.
  2. Referencing a part of the object relative to the whole.  e.g. in a table-like structure, this could be row/column addressing.


  • Structured object may contain other structured objects that actually in multiple places in multiple other objects (e.g. representations like database denormalizations can cause this).  We have to decide whether sub-objects are objects on their own or are should only be cited relative to the containing object.  A figure in an XLS file is a good test case.


  • Datasets can be thought of as a large structured object, allowing us to reduce the problem back the containment issue.  I don't believe it matters if the dataset is "in the cloud".




Comments (0)

You don't have permission to comment on this page.