If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Citable Documents Specification Feedback

Page history last edited by (account deleted) 14 years ago

This page is for providing feedback on the Citable Documents Specification.

Please group feedback by respective section in the specification, then use a simple list under that section. Please date-stamp and sign your feedback.

4.0 CitableDocumentLocator Syntax

2010-04-08 from Mark. Consider removing much of the semantic detail from the identifier (or, at least don't require it). For instance, a CitableDocumentLocator could be:

<resolver-host>/<identifier> : for linking to the latest version of the document.

<resolver-host>/<identifier>#fragment : for linking to a specific section.

<resolver-host>/<identifier>?version=<versionid> : for linking to a specific version.

Where <resolver-host> is the service that's keeping track of the document versions. It could even create the identifiers.

A great deal of work has been done on persistent identifiers that can help here. Look at crossref.org and their use of the Handle System (http://www.cnri.reston.va.us/).

Giving a document a permanent name and using that name whenever you want to cite it is very useful. You don't need to remember what date a particular version was released in order to remember the URL.

Having the extra layer of indirection is also good because you can delegate the management of the URI space and resolution services (that's what the Handle System mentioned above does).

2009-219 from Tantek. Regarding <datetimestamp>, a few points:

Please use YYYYMMDDTHHMMSS instead of YYYYMMDDHHMMSS in order to at least create a valid ISO8601 datetime (with "T" separator between the date and time). This is a minor point of syntax, thus it is better to err on the side of conforming to an existing standard.
Allow coarser granularity for <datetimestamp>, most systems will not need seconds (or even minutes) granularity, e.g. thus allow the following:

YYYYMMDDTHHMM
YYYYMMDDTHH
YYYYMMDD
YYYYMM
YYYY

Allow hyphenation of dates. From experience with readability in microformats, it may be desirable to use hyphens and thus also permit:

YYYY-MM
YYYY-MM-DD
YYYY-DDD (ordinal ISO8601 date)

Permit the <datetimestamp> to be a suffix rather than a path component, e.g. allow both <documentname>-<datetimestamp> as well as <documentname>/<datetimestamp>. The reasons for this are (1) it may be easier for some systems to simply put versions of a document in the same folder, and (2) there is pre-existing experience/conventions with doing so (e.g. W3C Pubrules, see details/links on PastWork page. e.g. "shortname-YYYYMMDD".)

2009-08-08 from Kevin D. Keck, to amplify and extend Tantek's points:

I agree about allowing omission of time (providing a simple date), and conforming to ISO8601.
I would recommend using specifically the ISO8601 profiles specified in XML Schema (and also used in RDF, by reference): xsd:dateTime and xsd:date. Both require hyphens within the date, and xsd:dateTime also requires colons within the time part. These profiles use more characters, but are much more human-readable. They do not support the ordinal date (YYYY-DDD) format, and I feel this is a virtue as it also is not very human readable.
The standard really ought to specify whether timestamps are to be understood as local (according to the time zone of the host machine) or as UTC. If they are to be understood as UTC (as are timestamps in, e.g., HTTP headers), then the time should be required to end with a 'Z', for compatibility with XML and RDF.

We also need to discuss additional formats:

PDF

CVS and Excel spreadsheets

Websites

RSS feed

Video

Return to

Citable Documents Specification

Comments (4)

Thanks for the comments guys! I will revise as follows:
1. I will allow omission of time
2. I will change the format to require dashes to separate fields in the date and colons in the time
3. I will permit either dash or slash to separate documentname and datetimestamp
4. I will use a T to separate date from time. QUESTION to commenters - should "T" be separated from informational fields by dashes on both sides?
5. I will support coarser time
6. I will NOT support ordinal date; the rationale here is that if the same document is annotated two ways (once in one archive and once in another) it will not be easy to tell that the documents are the same, as their datetimestamps will be lexically different.
7. The specification already says that datetimestamps are UTC; I will revise to require the field to end with Z to indicate this.

In my background, the hard part is not creating urls for documents that already have citations formats (though that is valuable), the hard part is creating citation formats for content whose citations include page numbers.

In general, citations (more broadly speaking than URIs) should have the following properties:
* vendor (ie publisher)-neutral
* medium neutral (I shouldn't have to have a specific publisher's book to find the content, and I should also be able to find the content in an old-fashioned library if I am offline)
* public domain (nobody should own the citations - this has been a huge issue, believe or not in legal publishing)

A URI (URL)-scheme seems like it could address many of these - but in print, how would this scheme be used? Would things like bluebook citations be replaced by URLs? Would I then have URLs sprinkled all over govt documents (in print)? (That is not out of the question, btw -- its just that its a culture shock).

I guess my point is that while we are happy to see stuff moving online with law.gov, lets not forget that there is a huuuuuge world of dead-tree users out there and there are valid reasons to support their use cases as well.

I think if something was first distributed as a book - it may be easier to create a URL w somethign similar to a book annotation and hide the internal references. But UGH what a difficult site that would be to read! I think most of these are posted as PDF's and maybe that part should go be included on the PDF page?

But for a website, I see no reason to make it more complex than pagename datetimestamp and paragraph

Tantek: I believe I've now addressed your comments.

Gabe: Page numbers are formatting artifacts rather than structural elements of a document. My objective here has been to create a format which will allow a document's structural sections to be referenced directly, without requiring recourse to metadata such as page numbers (which vary according to page size etc...) This format is intended to be medium-neutral in the sense that the display medium doesn't modify the document's underlying structure & so references of the type we define should be preserved even when the display medium changes. Documents which are natively Citable in compliance with this specification don't require index metadata, and so don't create a requirement or opportunity for proprietary third-party metadata indexes like those currently in use in legal publishing. Citations in print should be to chapter and verse (in other words, "Section 1, Subsection a", and the chapter and verse headings should be printed as part of the document itself. There could be an automatic translation from this format to a canonical Citable Document Subdivision Locator format for particular reference structures, but it's not clear that what's suitable for dead-tree reference translates automatically to electronic reference or vice-versa.

You don't have permission to comment on this page.