USBN v1.0
Authoritative reference for implementers. Normative statements use RFC 2119 terminology. For design rationale and historical notes, see the accompanying paper.
Overview
This document specifies the Universal Standard Book Number (USBN) v1.0, a deterministic, registry-free, 13-character alphanumeric identifier for books derived from bibliographic metadata. USBNs require no central registry and can be independently computed by any party with access to a book's title page.
Identifier format
A USBN is a 13-character string matching the regular expression:
U[0-9A-HJKMNP-TV-Z]{12}
That is: the literal letter U followed by exactly 12
characters from the Crockford Base32 alphabet. A WSBN has the
same data format but begins with W and is computed
without the year.
USBNs and WSBNs are case-insensitive. A
conformant parser MUST uppercase its input before validation or
lookup. The canonical display form is uppercase, but
uazja136wfyxf and UAZJA136WFYXF MUST
resolve to the same identifier.
Input specification
A USBN is computed from three fields:
| Field | Type | Source |
|---|---|---|
TITLE | string | Title as printed on the title page |
AUTHOR | string | Author as printed on the title page |
YEAR | integer | Four-digit publication year |
For a WSBN the YEAR field is omitted.
Canonical input string
The canonical input string for a USBN is formed as:
S = NORMALIZE(TITLE || " " || AUTHOR || " " || STR(YEAR)) For a WSBN:
S = NORMALIZE(TITLE || " " || AUTHOR) Normalisation pipeline
An implementation MUST apply the following steps in order:
- Concatenation with a single ASCII space between fields.
- NFKD decomposition (Unicode normalisation form KD).
- Combining-character stripping: remove all characters in Unicode general category M (Mark).
- Uppercase using locale-independent Unicode case mapping.
- Whitespace collapse: replace each run of Unicode whitespace with a single ASCII space.
- Trim leading and trailing whitespace.
Alphabet
The alphabet is Crockford Base32 — 32 characters, single case, excluding the four letters I, L, O, and U which are the most visually ambiguous in handwriting and print:
0 1 2 3 4 5 6 7 8 9 A B C D E F G H J K M N P Q R S T V W X Y Z Hash function
BLAKE2s (RFC 7693) is used with an 8-byte (64-bit) digest, no key, no salt, no personalisation. The hash input is the UTF-8 encoding of the canonical input string.
Encoding procedure
Given the 8-byte BLAKE2s digest H:
- Compute
n64 = BE(H)(64-bit unsigned integer). - Compute
n = n64 >> 4(the top 60 bits). - Encode
nin Crockford Base32, most-significant digit first. - Left-pad with
0to exactly 12 characters. - Prepend
Ufor a USBN orWfor a WSBN.
Twelve Crockford Base32 characters encode exactly
32¹² = 2⁶⁰ values. Taking 60 bits of BLAKE2s output
gives a lossless, padding-free encoding.
Collision analysis
A 60-bit hash space contains 2⁶⁰ ≈ 1.15 × 10¹⁸
possible values. 50% birthday collision probability occurs at
approximately 1.26 billion entries.
| Corpus | Size | P(≥1 collision) |
|---|---|---|
| Typical library | 1 M | 0.00004 % |
| Large union catalog | 10 M | 0.004 % |
| Pre-ISBN corpus | 60 M | 0.156 % |
| Estimated global corpus | 150 M | 0.97 % |
Canonical test vectors
Any conformant implementation MUST produce these identifiers for these inputs:
| Title / Author / Year | USBN | WSBN |
|---|---|---|
| The Outline of History / H. G. Wells / 1949 | UAZJA136WFYXF | WC17225YANQAM |
| The Outline of History / H. G. Wells / 1961 | UQHJ8P28DXHRC | WC17225YANQAM |
| George Washington A Biography / Douglas Southall Freeman / 1949 | UVKK6DS3YWESM | WGVKGH0WKR66C |
| College Calculus with Analytic Geometry / Murray H. Protter / 1964 | UGM4Y9KZVGYH7 | WDYNK8KP7FHSG |
| Über die Relativitätstheorie / Albert Einstein / 1916 | URAYHF9EDXKGQ | W718QV0NXA405 |
| The Elements of Style / William Strunk Jr. and E. B. White / 1959 | U4TMJP8GE1DSF | W4NPQT7637D53 |
URI scheme
USBNs and WSBNs MAY be represented as URNs:
urn:usbn:UAZJA136WFYXF
urn:wsbn:WC17225YANQAM Or as HTTP URIs via the reference resolver (planned):
https://openusbn.org/UAZJA136WFYXF
https://openusbn.org/WC17225YANQAM Security considerations
USBN has no security requirements. The hash function is used for compactness and determinism, not for authentication or confidentiality. An attacker who can construct metadata producing a chosen USBN gains no advantage, as USBNs carry no authority or access rights.