• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Discussions and Questions

Page history last edited by Ian Bicking 14 years, 7 months ago

Want to discuss something?  You can't really discuss it on the mailing list (which is announce-only), but you can discuss it here.

Add a new link off this page, or just add a new section to the bottom.  Inline discussions I guess are more threadable and trackable?  I'm not sure.  Anyone can edit, but you do have to register first.


From Ian, July 27 2009:


OK, so giving a quick look at Thomas and thinking about what I think

are the goals of Citability, here's some thoughts or questions:


1. Is Thomas (http://thomas.loc.gov/) pretty much all the data we're looking to make citable, at least federally and as a first step?


Silona> actually the plan was for ANY public govt docs to be citable any level, any country, any city...

I actually would love to kill WestLaw and their copyrighted citation process!


Ian> sure, I figured federal legislation is what we can all be more interested in to start with; once we have examples of that, applying it locally will be a lot easier.  As I mention, in this case getting the data from Thomas isn't the hard part, it's figuring out what to do with that data.


2. Are we really just trying to make a better Thomas?


3. Thomas includes some static resources (bills) and some more timely information (bill status).  Does this timely information fit into what gets cited?  Does it involve creating ongoing timelines?


4. Is the XML version of legislation contain everything we want to display?  There's a bunch of metadata in there besides just the bill. (For whatever reason, the XML has somewhat nicer URLs than the HTML) Not everything has XML; maybe only at a certain point in the passage of a bill is it translated?


5. Do we want to translate the XML into HTML + microformats or something?  There's stuff like this:


<action-date date="20090106">January 6, 2009</action-date>

<action-desc><sponsor name-id="J000032">Ms. Jackson-Lee of

Texas</sponsor> introduced the following bill; which was referred to the

<committee-name committee-id="HAS00">Committee on Armed



There's handy information in there, but no particularly good HTML equivalent.  We could link these things up, e.g.:


  <a href="/name/J000032" class="fn">Ms. Jackson-Lee of Texas</a>


That is, we translate each of these IDs into a URL.  We could try to maintain a list of backlinks as a starting point for these pages, but simply the URL itself is a useful identifier.


The XML itself has lots of ids on sections and paragraphs.  These aren't present in the HTML, which is unfortunate, because I am guessing they may(?) be stable, and ids are linkable (just not easy to discover).


Silona> that all sounds awesome and a good idea but does go beyond the goal of citability but PLEASE

don't let that stop you!  We love Microformats here and making things more readable.



6. Another added feature over Thomas, I guess, is versioning?  That is, regularly polling the Thomas site to see updates, and keeping a record of all updates.  Do we actually need index pages for all bills then?  I think Silona was talking about a URL structure like:


  /bill/House/111/HR65/ -> the latest bill

  /bill/House/111/HR65/20090301 -> the bill as it was on March 1st 2009.


But I wonder if /bill/House/111/HR65 should actually be an index page of all versions?


Silona> version will happen by the tools we create to sit on top of this standard but we are focusing

on citations here so degradable URL's for referencability are primary! 


7. Would the HTML be the canonical version of all pages?  That is, will the HTML be a parseable and documented format?  (More documented than just "HTML", but with specific classes with specific meanings)


Silona> Yes I also want to do hashes for the archiving servers so that citations can be

easily verifiable.  David Strauss wants to make the archive server w a distributed versioning system

like Github or bizzare so that we create a verification trail.  i think that is a great idea.


Ian> Given the text, an attempt at stable ids (it's a fuzzy problem, so at best it is an attempt) and a feed to allow people to easily track incremental updates, I think the versioning can be implemented on top.  I don't think hashes are important, as you can actually compare the text itself.  No paragraph is all that big, and a hash is almost a compression system in this case ~Ian


If this is all just Thomas data but with better linkability, then the scraping seems pretty simple but there's a lot of questions about what

a truly canonical source of legislative information should be.  Or, if this is something else, then probably all my questions are off base ;)


Silona> this is reason I would like the govt to do it so the URL can be ARCHIVE.HOUSE.GOV

govtrack.us and whitehouse.wikia.com and others already do permalinks but those are not

"official" sources.  This is a mechanism to create official clonable sources with verifiable hashes.


Ian> To be canonical, I think you really have to be complete.  So if archive.house.gov contains the bill, it needs to have at least all the information available through Thomas.  That includes some funky metadata.


Ian> It occurs to me generally that it would be very doable right now to construct an example of "the bill page we want".  That is, take one bill and hand-massage it into what exactly would show up on archive.house.gov.  Once that is done, and there's a sort of rough consensus on the scope and goals, implementations should be easy.  Or at least, much much easier -- the choice is one of how to do it, not what to do.

Comments (10)

judell said

at 10:59 am on Aug 7, 2009

> It occurs to me generally that it would be very doable right now to construct an example of "the bill page we want".

Hey Ian, small world, I only just realized -- talking with Silona just now for my podcast -- that you're here. Coincidentally (or not) I made the same suggestion in our conversation. A hand-built prototype will be a really valuable piece -- not only to drive implementation, but also to illustrate the concept for everyone.

Jack Twilley said

at 12:26 pm on Aug 14, 2009

I second the call for a hand-built prototype. A good example can help people understand the concepts far quicker and easier than a Powerpoint presentation or a simple description.

Silona Bonewald said

at 2:28 pm on Aug 14, 2009

Cool then - want to start writing it? I think maybe we should host on github?

Jack Twilley said

at 9:00 pm on Aug 16, 2009

Do you have a target jurisdiction in mind? That would be an important first step.

Silona Bonewald said

at 3:39 pm on Aug 18, 2009

well nysenate was looking like a best bet since they are working in drupal and are redoing much.

Silona Bonewald said

at 3:40 pm on Aug 18, 2009

just need money to pay for the prototype or someone to program it - bc beyond my expertise in the server arena.

gwachob@... said

at 8:51 pm on Oct 20, 2009

Has there been any thought to adopting the blue book citations formats and simply transforming them to URIs, by mostly just removing whitespace?

Yes, the blue book itself isn't free, but the main citation formats are pretty easy to list out (maybe even with their cooperation?)

gwachob@... said

at 8:52 pm on Oct 20, 2009

LII coming through again: http://www.law.cornell.edu/citation/

gwachob@... said

at 8:56 pm on Oct 20, 2009

Also see efforts to create open citation formats http://www.aallnet.org/committee/citation/ucg/index.html and http://www.abanet.org/citation/

I think they are actually quite close. The point here is that if there is a target format, and we can simply create URLs by squeezing the locator part of those citations, then its a win, because they can be human-unpacked for the purpose of "plain old book" research....

Silona Bonewald said

at 7:49 pm on Oct 24, 2009

The idea of the legal notation is derived directly from the preexisting form. So I think borrowing from these formats again for URL construction and internal linkage is a great idea. I think you should add it as another standard for a specific case and add in a page! Thanks Gabe!

You don't have permission to comment on this page.