| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Get control of your email attachments. Connect all your Gmail accounts and in less than 2 minutes, Dokkio will automatically organize your file attachments. You can also connect Dokkio to Drive, Dropbox, and Slack. Sign up for free.

View
 

Data citations

Page history last edited by Kurt Bollacker 10 years, 5 months ago

This is where I hope we will post discussion and finding on how we should handle data citations.

 

Some ideas to throw out there are:

How to handle citing specific pieces or subsets of data?  Would like to point to atomic pieces of data if possible.

For example a specific row in a CSV file

or a specific table in an excel spreadsheet (I think cells are easier)

or even a dataset on the cloud.

 

People I am trying to recruit are:

David Strauss - large scale systems architect

Brian Fitzpatrick - Data Liberation League at Google

Brian Aker - Architect for mySQL and Drizzle

Peeps from microsoft research

need EC2 and S3 expert

 

 

Some Random Ideas

 

 

  • Consider the more general problem of citing structured objects that are not text documents (text documents are a special case important enough to warrant their own model).  This brings us to at least two parts of the problem

 

  1. Referencing the whole object.  In order to do this, we need a UID for the object.  This is addressed in the text document citability problem.
  2. Referencing a part of the object relative to the whole.  e.g. in a table-like structure, this could be row/column addressing.

 

  • Structured object may contain other structured objects that actually in multiple places in multiple other objects (e.g. representations like database denormalizations can cause this).  We have to decide whether sub-objects are objects on their own or are should only be cited relative to the containing object.  A figure in an XLS file is a good test case.

 

  • Datasets can be thought of as a large structured object, allowing us to reduce the problem back the containment issue.  I don't believe it matters if the dataset is "in the cloud".

 

 

 

Comments (0)

You don't have permission to comment on this page.