The KIM Technology Watch Report: http://metadaten-twr.org

home  |  filed under: Identifier

ARK (Archival Resource Key): a Persistent Identifier Solution

Autor: John A. Kunze, University of California, jak@ucop.edu

Abstract
Ein Archival Resource Key (ARK) ist eine URL, die die langzeitig verfügbare, digitale Referenz von Informationsobjekten jeglicher Art ermöglicht. Die California Digital Library (CDL) nutzt die Software „noid“, um ARKs zu generieren. ARKs setzen sich aus einer Folge von Zeichen (eineindeutig und unveränderbar) zusammen, die den Namen der Organisation (durch eine NAAN identifiziert), die dem Label “ark:” folgen. Davor kann optional der Protokoll- und Hostnamen einer URL stehen.

An ARK1 is a URL created to support persistent, long-term access to information objects. ARKs can identify objects of any type: digital documents, databases, images, software, and websites, as well as physical objects (books, bones, statues, etc.) and intangible objects (chemicals, diseases, vocabulary terms, performances).

ARK
ARKs and other persistent identifiers are necessary and useful because both the protocols used to access objects (such as http and ftp) and the sites that host the objects are subject to change. An ARK contains parts that are impervious to such changes and parts that are flexible enough to support changing user service expectations around a stable object core.

An ARK is represented by a sequence of characters that contains the label, “ark:”, optionally preceded by the protocol name (”http://”) and hostname that begins every URL. That first part of the URL, or the “Name Mapping Authority” (NMA), is mutable and replaceable, as neither the web server itself nor the current web protocols are expected to last longer than the identified objects.  It is possible to use an NMA hostname that is longer-lived than that of your own organization (as has been done effectively with hosts such as http://n2t.info/ and http://doi.org/).

The immutable, globally unique identifier follows the “ark:” label. This includes a “Name Assigning Authority Number” (NAAN) identifying the naming organization, followed by the name that it assigns to the object. Here is a diagrammed example:

                  http://example.org/ark:/13030/654xz321/s3/f8.05v.tiff
                  \________________/ \__/ \___/\_______/\_____________/
                    (replaceable)      |    |      |       Qualifier
                       |          ARK-Label |      |    (NMA-supported)
                       |                    |      |
         Name Mapping Authority (NMA)       |     Name (NAA-assigned)
                                            |
                           Name Assigning Authority Number (NAAN)

The NAAN used above, 13030, represents the California Digital Library (CDL). A sampling of other institutions registered for ARK assignment includes:

    12025    US National Library of Medicine
    13030    California Digital Library
    13960    Internet Archive
    27927    Portico/Ithaka Electronic-Archiving Initiative
    12148    National Library of France
    78319    Google
    64269    Digital Curation Centre

To generate ARKs, the California Digital Library (CDL) uses the open-source “noid” (nice opaque identifiers, rhymes with “employed”) software2. The noid software can also serve as an institution’s “identifier resolver”. Please send email to support-cdl-l@ucop.edu if you are interested in generating ARKs. An ARK provides extra services above and beyond those of an ordinary URL. Instead of connecting to one thing, an ARK should connect to three things:

  • the object itself,
  • a brief metadata record if you append a single question mark to the ARK, and
  • a maintenance commitment from the current server when you append two question marks.

In a web browser, for example, if you enter

    http://ark.cdlib.org/ark:/13030/tf5p30086k?

it returns a brief machine- and eye-readable metadata record, such as:

    erc:
    who:   (:unav) unavailable
    what:  Truckee River, below Truckee Station, looking towards Eastern
            Summit. -- Photographer's number: 222 -- Photographer's series:
            Central Pacific Railroad, California.
    when:  (:unav) unavailable
    where: http://ark.cdlib.org/ark:/13030/tf5p30086k

It is a side-benefit of ARKs that an object’s metadata doesn’t need an identifier different from that for the object, which cuts in half the number of identifiers that need to be generated and managed.

References

  1. The complete ARK specification: http://www.cdlib.org/inside/diglib/ark/arkspec.pdf
  2. The noid software documentation: http://www.cdlib.org/inside/diglib/ark/noid.pdf

More information about the author: John A. Kunze


Leave a Reply