This is where I hope we will post discussion and finding on how we should handle data citations.

Some ideas to throw out there are:

How to handle citing specific pieces or subsets of data? Would like to point to atomic pieces of data if possible.

For example a specific row in a CSV file

or a specific table in an excel spreadsheet (I think cells are easier)

or even a dataset on the cloud.

People I am trying to recruit are:

David Strauss - large scale systems architect

Brian Fitzpatrick - Data Liberation League at Google

Brian Aker - Architect for mySQL and Drizzle

Peeps from microsoft research

need EC2 and S3 expert

Some Random Ideas

Consider the more general problem of citing structured objects that are not text documents (text documents are a special case important enough to warrant their own model). This brings us to at least two parts of the problem

Referencing the whole object. In order to do this, we need a UID for the object. This is addressed in the text document citability problem.
Referencing a part of the object relative to the whole. e.g. in a table-like structure, this could be row/column addressing.

Structured object may contain other structured objects that actually in multiple places in multiple other objects (e.g. representations like database denormalizations can cause this). We have to decide whether sub-objects are objects on their own or are should only be cited relative to the containing object. A figure in an XLS file is a good test case.

Datasets can be thought of as a large structured object, allowing us to reduce the problem back the containment issue. I don't believe it matters if the dataset is "in the cloud".

Data citations