• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Stop wasting time looking for files and revisions. Connect your Gmail, DriveDropbox, and Slack accounts and in less than 2 minutes, Dokkio will automatically organize all your file attachments. Learn more and claim your free account.



Page history last edited by Silona Bonewald 10 years, 4 months ago

Some notes on creating Citable live datasets.


One reason we are so interested in citable live data is that we believe it creates a level of accountability in government data that doesn't exist currently.  Tt also makes it easier for other people to use and reference that data in their own work.


Steps for making data Citable:

1) What data is there? (Identifiable)

2) Where is it? (discoverable)

3) What does it mean?

4) How has it changed over time? (delta)

5) Who's touched it? (nice but not necessary other than publisher)


Ideally it should be discoverable, we should be able to either pull on a regular (daily?) basis or have a mode of notification of changes so that we can accurately archive.


A basic Microformat for a queried Dataset:


Time of Query

Answer and Hash of Answer (dataset result)

URI (location of Source Data - publisher)

Source Data, Time of Source Data, Hash of Source


We could also simply capture the entire dataset itself, hash it and store it (storage costs are minimal approx ($450 on s3) even for a Dataset of 10G that changes 1% daily.

Depending on traffic we could just just capture the Query's dataset and store it as well (esp if diverse queries are rare.)

With new queries we can check via the hash to see if the exact same query results were given before and return same pointer/reference.



Comments (0)

You don't have permission to comment on this page.