Learn — what is a USBN?

Every book published after 1970 has an ISBN. Every book published before 1970 does not. That sentence contains the whole motivation for USBN — and, if you care about books, it's the kind of sentence that bothers you a little more every time you read it.

The International Standard Book Number was standardised as ISO 2108 in 1970 to solve a real problem: publishers, booksellers, and libraries needed a universal way to refer to books. Before then, every participant in the book trade had its own naming conventions — publisher imprints, Library of Congress catalogue cards, ad hoc bookseller codes — and none of them talked to each other. ISBN fixed this by assigning, from a central registry, a unique 13-digit number to every new edition. The scheme worked. For the last fifty-odd years, "look it up by ISBN" has been the default for anything touching the book trade.

But ISBN has two structural limitations that matter more than people realise. First, it is assigned, not derived — a publisher has to register with a national agency, pay for a block of identifiers, and stamp one on each new book. Self-publishers, institutional presses, zines, pamphlets, and anything off the commercial rail all get skipped. Second, and more importantly, ISBN did not exist before 1970. The estimated sixty million editions published before that date — the whole body of printed human culture up to the moon landing — have no ISBN. None. They never will.

The registry trap

There are three partial solutions that tried to plug this gap. OCLC Control Numbers (OCNs) are sequential integers assigned by the OCLC union catalog; over a billion of them have been issued. Library of Congress Control Numbers (LCCNs) have existed since the 1890s, but only for Library of Congress holdings. Open Library Identifiers (OLIDs), from the Internet Archive, cover a growing subset of books in the Open Library project. All three are useful. All three share the same structural limitation: a book must first be catalogued by a member institution before it has an identifier.

A used-book dealer in Detroit holding a 1923 volume that no institution has previously catalogued has, under these systems, no standard identifier to point to at all. She can create an internal SKU, she can describe it in prose, she can scan the title page — but there is no universal number she can write down that another dealer across the country would recognise, without either of them asking a central authority for permission.

Compute it from the book itself

The premise of USBN is blunt: if the goal is a universal identifier, and the problem with existing schemes is the registry, remove the registry. Compute the identifier directly from the book's title page using a fixed algorithm, so that anyone who can read the title page can produce the same identifier without asking anyone else's permission.

This technique is borrowed, deliberately, from cryptocurrency. Bitcoin addresses are not assigned by a central bank; they are computed — deterministically — from public keys. IPFS content identifiers are not assigned by a registry; they are computed from file contents. The trick is the same in both cases: a well-chosen hash function gives you compact, collision-resistant identifiers that any participant can produce independently and that everyone can agree on without coordination. USBN applies this trick to structured bibliographic metadata — specifically, the title, author, and publication year as printed on the book's title page.

The algorithm, step by step

The algorithm has three steps. Each is deliberately simple.

I. Normalise the input

The three fields — title, author, year — are concatenated with a single space between each. The result is put through a fixed normalisation pipeline: Unicode NFKD decomposition, then removal of combining marks, then uppercase conversion, then whitespace collapsing, then trimming. The point of this pipeline is not elegance; it is reproducibility. A cataloger in Vienna writing "Über die Relativitätstheorie / Albert Einstein / 1916" must produce a byte-identical string to a cataloger in Tokyo transcribing the same title page, regardless of locale settings, keyboard layout, or typewriter convention. After normalisation, both produce UBER DIE RELATIVITATSTHEORIE ALBERT EINSTEIN 1916.

II. Hash with BLAKE2s

The normalised string is hashed using BLAKE2s, a modern cryptographic hash function chosen for its speed, configurable output length, and strong collision resistance. We use an 8-byte (64-bit) digest and keep the top 60 bits. Why 60? Because the next step, Base32 encoding, packs exactly 5 bits per character — and 12 characters of Base32 hold exactly 60 bits, with no padding waste and no truncation loss.

III. Encode in Crockford Base32

The 60-bit fingerprint is written out in Crockford Base32 (the same alphabet Douglas Crockford designed for human-readable machine data), left-padded to exactly twelve characters, and prefixed with the literal letter U to mark it as a USBN. The Crockford alphabet excludes the four letters most commonly confused in handwriting — I, L, O, U — and uses only uppercase, which is why USBNs are case-insensitive: a handwritten uazja136wfyxf, an OCR'd UAZJA136WFYXF, and a typed Uazja136Wfyxf all resolve to the same identifier. You can copy a USBN off a napkin.

Works, editions, and the WSBN

One subtlety the draft of USBN got wrong, and v1.0 gets right, is the distinction between a work and an edition. H. G. Wells wrote The Outline of History once, but it was reprinted in 1949, again in 1961, and many times after. Each printing is a distinct edition; all of them share a single underlying work. A cataloger sometimes needs to refer to a particular printing ("the one with the 1949 dust jacket") and sometimes needs to group all printings ("anything called Outline of History by H. G. Wells").

USBN v1.0 provides both. The USBN is computed from title + author + year and identifies a specific edition. The WSBN — Work Standard Book Number — is computed from title + author alone (no year) and identifies the underlying work. Every printing of the same book produces the same WSBN, while each printing has a distinct USBN. The two are siblings: same algorithm, same alphabet, same length; only the prefix differs (U for edition, W for work).

Why thirteen characters?

The answer is mostly aesthetic and mostly practical at once. A USBN is thirteen characters long because an ISBN is thirteen characters long. Any library system that already renders an ISBN-13 in a 13-character MARC field or UI column can render a USBN in the same slot, with zero layout changes. Any database schema that reserves a CHAR(13) for ISBN can reuse it for USBN. And any human already comfortable reading a 13-digit ISBN off a barcode is already comfortable reading a 13-character USBN off a catalogue card.

The letter U at the front is not a character you have to remember separately — it distinguishes USBNs from ISBNs at a glance. An ISBN-13 begins with 978 or 979; a USBN begins with U. No context is needed to tell them apart.

Will it collide?

The honest answer is: essentially never, and we have the numbers. A 60-bit hash gives 2⁶⁰ ≈ 1.15 × 10¹⁸ distinct values. The birthday paradox says you'd expect your first collision at around √(π/2 · 2⁶⁰) ≈ 1.26 × 10⁹ entries — about 1.26 billion — which is roughly eight times the estimated global book corpus of all editions ever published.

Concretely: for a typical library of a million books, the probability of a collision is about 4 parts per hundred million. For the estimated pre-ISBN corpus of sixty million editions, it is about 0.16%. For the full global corpus of 150 million editions, it is under 1%. USBN v1.0 is designed so that a union catalog holding every book ever written would experience at most a handful of hash collisions total — and those would be obvious the moment someone tried to use both identifiers.

What USBN is not

USBN is not a database. It is not a registry. It does not contain any information about a book — it is a reference you attach to the book's own metadata, the way an ISBN is. Looking up "what book is UAZJA136WFYXF?" requires a resolver somewhere that stores the (USBN → metadata) mapping. Building that resolver is the next phase of this project; openusbn.org will host a reference resolver with a browsable catalogue and a simple submission protocol. For now, USBN v1.0 is the name, and any bibliographic database — WorldCat, Open Library, LibraryThing, your own private SQLite file — can use it as a key.

Start using it

The easiest way to try USBN is the converter on the home page. Type a title, an author, and a year; the identifier appears instantly. For programmatic use, the reference implementations in Python and JavaScript are small enough to read in five minutes and copy into your own project. The formal specification is the authoritative description for anyone building an interoperable implementation, and the accompanying paper (PDF, 11 pages) documents the design history, the collision analysis, and the defects in the earlier draft that motivated the v1.0 revision.

USBN is released under the MIT License. The paper describing the design and its history will be submitted to the Code4Lib Journal. Maintained by Euler's Identity LLC.