• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Buried in cloud files? We can help with Spring cleaning!

    Whether you use Dropbox, Drive, G-Suite, OneDrive, Gmail, Slack, Notion, or all of the above, Dokkio will organize your files for you. Try Dokkio (from the makers of PBworks) for free today.

  • Dokkio (from the makers of PBworks) was #2 on Product Hunt! Check out what people are saying by clicking here.



Page history last edited by Silona Bonewald 12 years, 1 month ago

Some notes on creating Citable live datasets.


One reason we are so interested in citable live data is that we believe it creates a level of accountability in government data that doesn't exist currently.  Tt also makes it easier for other people to use and reference that data in their own work.


Steps for making data Citable:

1) What data is there? (Identifiable)

2) Where is it? (discoverable)

3) What does it mean?

4) How has it changed over time? (delta)

5) Who's touched it? (nice but not necessary other than publisher)


Ideally it should be discoverable, we should be able to either pull on a regular (daily?) basis or have a mode of notification of changes so that we can accurately archive.


A basic Microformat for a queried Dataset:


Time of Query

Answer and Hash of Answer (dataset result)

URI (location of Source Data - publisher)

Source Data, Time of Source Data, Hash of Source


We could also simply capture the entire dataset itself, hash it and store it (storage costs are minimal approx ($450 on s3) even for a Dataset of 10G that changes 1% daily.

Depending on traffic we could just just capture the Query's dataset and store it as well (esp if diverse queries are rare.)

With new queries we can check via the hash to see if the exact same query results were given before and return same pointer/reference.



Comments (0)

You don't have permission to comment on this page.