| 
  • This workspace has been inactive for over 11 months, and is scheduled to be reclaimed. Make an edit or click here to mark it as active.
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Data citations

Page history last edited by Kurt Bollacker 14 years, 3 months ago

This is where I hope we will post discussion and finding on how we should handle data citations.

 

Some ideas to throw out there are:

How to handle citing specific pieces or subsets of data?  Would like to point to atomic pieces of data if possible.

For example a specific row in a CSV file

or a specific table in an excel spreadsheet (I think cells are easier)

or even a dataset on the cloud.

 

People I am trying to recruit are:

David Strauss - large scale systems architect

Brian Fitzpatrick - Data Liberation League at Google

Brian Aker - Architect for mySQL and Drizzle

Peeps from microsoft research

need EC2 and S3 expert

 

 

Some Random Ideas

 

 

  • Consider the more general problem of citing structured objects that are not text documents (text documents are a special case important enough to warrant their own model).  This brings us to at least two parts of the problem

 

  1. Referencing the whole object.  In order to do this, we need a UID for the object.  This is addressed in the text document citability problem.
  2. Referencing a part of the object relative to the whole.  e.g. in a table-like structure, this could be row/column addressing.

 

  • Structured object may contain other structured objects that actually in multiple places in multiple other objects (e.g. representations like database denormalizations can cause this).  We have to decide whether sub-objects are objects on their own or are should only be cited relative to the containing object.  A figure in an XLS file is a good test case.

 

  • Datasets can be thought of as a large structured object, allowing us to reduce the problem back the containment issue.  I don't believe it matters if the dataset is "in the cloud".

 

 

 

Comments (0)

You don't have permission to comment on this page.