Code & implementations

Python

Standard library only — hashlib, unicodedata, re. No pip dependencies. Drop into any project. The snippet below is the core; the complete annotated file lives at code/usbn.py in the reference repository.

import hashlib, re, unicodedata

ALPHABET = "0123456789ABCDEFGHJKMNPQRSTVWXYZ"  # Crockford Base32
BASE, DIGEST_BYTES, HASH_BITS, ENCODED_LEN = 32, 8, 60, 12

def normalize_fields(*fields):
    raw = " ".join(str(f) for f in fields)
    nfkd = unicodedata.normalize("NFKD", raw)
    stripped = "".join(c for c in nfkd if not unicodedata.combining(c))
    return re.sub(r"\s+", " ", stripped.upper()).strip()

def _base32_encode(n):
    if n == 0: return ALPHABET[0]
    out = []
    while n:
        n, r = divmod(n, BASE)
        out.append(ALPHABET[r])
    return "".join(reversed(out))

def _hash_to_identifier(canonical, prefix):
    digest = hashlib.blake2s(canonical.encode("utf-8"),
                             digest_size=DIGEST_BYTES).digest()
    n = int.from_bytes(digest, "big") >> 4           # top 60 bits
    encoded = _base32_encode(n).rjust(ENCODED_LEN, ALPHABET[0])
    return prefix + encoded[-ENCODED_LEN:]

def generate_usbn(title, author, year):
    return _hash_to_identifier(normalize_fields(title, author, year), "U")

def generate_wsbn(title, author):
    return _hash_to_identifier(normalize_fields(title, author), "W")

JavaScript

Browser or Node, single npm dependency (blakejs, ≈4 kB). The exact same computation as the Python version; both are tested against the shared test vectors.

import { blake2s } from 'blakejs';

const ALPHABET = '0123456789ABCDEFGHJKMNPQRSTVWXYZ';
const BASE = BigInt(ALPHABET.length);

function normalizeFields(...fields) {
  const raw = fields.map(String).join(' ');
  const nfkd = raw.normalize('NFKD');
  const stripped = nfkd.replace(/\p{M}/gu, '');
  return stripped.toUpperCase().replace(/\s+/g, ' ').trim();
}

function _hashToIdentifier(canonical, prefix) {
  const bytes = new TextEncoder().encode(canonical);
  const digest = blake2s(bytes, null, 8);
  let n = 0n;
  for (const b of digest) n = (n << 8n) | BigInt(b);
  n = n >> 4n;  // top 60 bits

  let out = '';
  while (n > 0n) {
    const r = Number(n % BASE);
    out = ALPHABET[r] + out;
    n = n / BASE;
  }
  return prefix + out.padStart(12, '0').slice(-12);
}

export const generateUSBN = (t, a, y) =>
  _hashToIdentifier(normalizeFields(t, a, y), 'U');
export const generateWSBN = (t, a) =>
  _hashToIdentifier(normalizeFields(t, a), 'W');

Command-line usage

If you have the Python reference installed, you can generate USBNs from the shell:

$ python3 -c "from usbn import generate_usbn; print(generate_usbn('The Outline of History', 'H. G. Wells', 1949))"
UAZJA136WFYXF

Integrating USBN into your catalog

The recommended integration pattern is:

Store the USBN alongside each existing identifier (OCLC, LCCN, OLID).
Store the WSBN as a separate column — this becomes your work-level join key.
Compute USBNs lazily, at the time a record is first saved, from the title-page metadata you already have.
Do not rename or remove USBN columns across updates. A USBN is stable for the life of the spec; only the underlying resolver moves.

Verifying your implementation

The test vectors file contains canonical inputs and their expected USBN / WSBN outputs. Any implementation that produces these values is conformant. If it doesn't, the bug is yours. The test vectors are versioned with the spec; future versions of USBN will ship new vectors.

Full reference repository on GitHub →