| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

ArchiveServers

Page history last edited by Mike Gifford 14 years ago

 

Citability Archive Server

 

The archive server will run on a backbone of Bazaar, a distributed, file-centric version control tool with an excellent Python API and good cross-platform support. Using Bazaar provides many facilities to quickly develop an effective and transparent system:

 

  • As a file-based version control tool, it inherently handles archiving changing documents over time.
  • As a distributed tool, it allows anonymous replica creation ("branching") with efficient updates.
  • Digitally signing (and verifying) revisions is built-in, creating a tamper-evident and repairable system if multiple, distrusting parties replicate the data. Forging revisions would fail the signature check. Tampering with the source revision history would cause the "pull" process on a replica to refuse destruction of its existing data (unless forced).
  • Bazaar efficiently and correctly records document moves and deletions.
  • As a file-based version control tool, it has good support for versioned binary data, like images.

 

It is also a scalable design:

 

  • Large sites can either shard archives into multiple branches (making cloning somewhat harder) or use multi-level stacked branches.
  • Bazaar uses a highly efficient "group compress" approach to storing revisions. Minor document changes take minimal space.
  • A system like Gearman can queue archive updates and collapse identical queued update requests.

 

An archive server will support at least three public interfaces:

 

  • A Bazaar branch (via bzr://) for anonymous replication.
  • A wrapper for Bazaar that can display the content for a Citability-spec URL, which includes a timestamp and a path. David Strauss wrote a prototype of this in about 10 minutes. It just:
    • Runs "bzr cat --revision=date:[timestamp] [path]" (or its Python API equivalent) for the requested timestamp/path combination
    • Adds appropriate anchor tags to allow citing a specific section, paragraph, or equivalent
  • A URL to ping with a URL to archive and the known hash of raw content. This allows the Citability server to queue archival operations but quickly weed out most redundant requests. As mentioned above, archival requests may queue through a system like Gearman to support periods of heavy traffic. When there's a new revision to archive:
    • The content is downloaded and hashed.
    • The hash is cached so repeated requests to archive the same content get ignored.
    • The content is written to the Bazaar branch and committed.

 

Description: Server-hosted ability to archive documents, process (or create) anchor tags, and show documents using the URL specification.

 

Project Lead: David Strauss

 

Project Team

Name Role(s) E-mail Webpage IM contact info Comments
David Strauss  Spec wrangler  david  http://fourkitchens.com/bios/davidstrauss     
           
           

 

Links:

Where it is hosted? Launchpad: https://launchpad.net/citability

Test environment? None, yet.

 

Documents:

Listed on Launchpad:

https://blueprints.launchpad.net/citability

 

Binaries:

Hosted on Launchpad:

https://launchpad.net/citability/+download

 

Comments (0)

You don't have permission to comment on this page.