USBNv1.0
Formal Specification

USBN v1.0

Authoritative reference for implementers. Normative statements use RFC 2119 terminology. For design rationale and historical notes, see the accompanying paper.

Overview

This document specifies the Universal Standard Book Number (USBN) v1.0, a deterministic, registry-free, 13-character alphanumeric identifier for books derived from bibliographic metadata. USBNs require no central registry and can be independently computed by any party with access to a book's title page.

Identifier format

A USBN is a 13-character string matching the regular expression:

U[0-9A-HJKMNP-TV-Z]{12}

That is: the literal letter U followed by exactly 12 characters from the Crockford Base32 alphabet. A WSBN has the same data format but begins with W and is computed without the year.

USBNs and WSBNs are case-insensitive. A conformant parser MUST uppercase its input before validation or lookup. The canonical display form is uppercase, but uazja136wfyxf and UAZJA136WFYXF MUST resolve to the same identifier.

Input specification

A USBN is computed from three fields:

FieldTypeSource
TITLEstringTitle as printed on the title page
AUTHORstringAuthor as printed on the title page
YEARintegerFour-digit publication year

For a WSBN the YEAR field is omitted.

Canonical input string

The canonical input string for a USBN is formed as:

S = NORMALIZE(TITLE || " " || AUTHOR || " " || STR(YEAR))

For a WSBN:

S = NORMALIZE(TITLE || " " || AUTHOR)

Normalisation pipeline

An implementation MUST apply the following steps in order:

  1. Concatenation with a single ASCII space between fields.
  2. NFKD decomposition (Unicode normalisation form KD).
  3. Combining-character stripping: remove all characters in Unicode general category M (Mark).
  4. Uppercase using locale-independent Unicode case mapping.
  5. Whitespace collapse: replace each run of Unicode whitespace with a single ASCII space.
  6. Trim leading and trailing whitespace.

Alphabet

The alphabet is Crockford Base32 — 32 characters, single case, excluding the four letters I, L, O, and U which are the most visually ambiguous in handwriting and print:

0 1 2 3 4 5 6 7 8 9 A B C D E F G H J K M N P Q R S T V W X Y Z

Hash function

BLAKE2s (RFC 7693) is used with an 8-byte (64-bit) digest, no key, no salt, no personalisation. The hash input is the UTF-8 encoding of the canonical input string.

Encoding procedure

Given the 8-byte BLAKE2s digest H:

  1. Compute n64 = BE(H) (64-bit unsigned integer).
  2. Compute n = n64 >> 4 (the top 60 bits).
  3. Encode n in Crockford Base32, most-significant digit first.
  4. Left-pad with 0 to exactly 12 characters.
  5. Prepend U for a USBN or W for a WSBN.

Twelve Crockford Base32 characters encode exactly 32¹² = 2⁶⁰ values. Taking 60 bits of BLAKE2s output gives a lossless, padding-free encoding.

Collision analysis

A 60-bit hash space contains 2⁶⁰ ≈ 1.15 × 10¹⁸ possible values. 50% birthday collision probability occurs at approximately 1.26 billion entries.

CorpusSizeP(≥1 collision)
Typical library1 M0.00004 %
Large union catalog10 M0.004 %
Pre-ISBN corpus60 M0.156 %
Estimated global corpus150 M0.97 %

Canonical test vectors

Any conformant implementation MUST produce these identifiers for these inputs:

Title / Author / YearUSBNWSBN
The Outline of History / H. G. Wells / 1949 UAZJA136WFYXF WC17225YANQAM
The Outline of History / H. G. Wells / 1961 UQHJ8P28DXHRC WC17225YANQAM
George Washington A Biography / Douglas Southall Freeman / 1949 UVKK6DS3YWESM WGVKGH0WKR66C
College Calculus with Analytic Geometry / Murray H. Protter / 1964 UGM4Y9KZVGYH7 WDYNK8KP7FHSG
Über die Relativitätstheorie / Albert Einstein / 1916 URAYHF9EDXKGQ W718QV0NXA405
The Elements of Style / William Strunk Jr. and E. B. White / 1959 U4TMJP8GE1DSF W4NPQT7637D53

URI scheme

USBNs and WSBNs MAY be represented as URNs:

urn:usbn:UAZJA136WFYXF
urn:wsbn:WC17225YANQAM

Or as HTTP URIs via the reference resolver (planned):

https://openusbn.org/UAZJA136WFYXF
https://openusbn.org/WC17225YANQAM

Security considerations

USBN has no security requirements. The hash function is used for compactness and determinism, not for authentication or confidentiality. An attacker who can construct metadata producing a chosen USBN gains no advantage, as USBNs carry no authority or access rights.