Working code
Pure stdlib Python and minimal-dependency JavaScript. Ninety-eight lines of Python; a hundred of JavaScript. Read them in five minutes, paste them into your project in five more.
Python
Standard library only — hashlib, unicodedata,
re. No pip dependencies. Drop into any project. The
snippet below is the core; the complete annotated file lives at
code/usbn.py
in the reference repository.
import hashlib, re, unicodedata
ALPHABET = "0123456789ABCDEFGHJKMNPQRSTVWXYZ" # Crockford Base32
BASE, DIGEST_BYTES, HASH_BITS, ENCODED_LEN = 32, 8, 60, 12
def normalize_fields(*fields):
raw = " ".join(str(f) for f in fields)
nfkd = unicodedata.normalize("NFKD", raw)
stripped = "".join(c for c in nfkd if not unicodedata.combining(c))
return re.sub(r"\s+", " ", stripped.upper()).strip()
def _base32_encode(n):
if n == 0: return ALPHABET[0]
out = []
while n:
n, r = divmod(n, BASE)
out.append(ALPHABET[r])
return "".join(reversed(out))
def _hash_to_identifier(canonical, prefix):
digest = hashlib.blake2s(canonical.encode("utf-8"),
digest_size=DIGEST_BYTES).digest()
n = int.from_bytes(digest, "big") >> 4 # top 60 bits
encoded = _base32_encode(n).rjust(ENCODED_LEN, ALPHABET[0])
return prefix + encoded[-ENCODED_LEN:]
def generate_usbn(title, author, year):
return _hash_to_identifier(normalize_fields(title, author, year), "U")
def generate_wsbn(title, author):
return _hash_to_identifier(normalize_fields(title, author), "W")
JavaScript
Browser or Node, single npm dependency (blakejs,
≈4 kB). The exact same computation as the Python version; both
are tested against the shared test vectors.
import { blake2s } from 'blakejs';
const ALPHABET = '0123456789ABCDEFGHJKMNPQRSTVWXYZ';
const BASE = BigInt(ALPHABET.length);
function normalizeFields(...fields) {
const raw = fields.map(String).join(' ');
const nfkd = raw.normalize('NFKD');
const stripped = nfkd.replace(/\p{M}/gu, '');
return stripped.toUpperCase().replace(/\s+/g, ' ').trim();
}
function _hashToIdentifier(canonical, prefix) {
const bytes = new TextEncoder().encode(canonical);
const digest = blake2s(bytes, null, 8);
let n = 0n;
for (const b of digest) n = (n << 8n) | BigInt(b);
n = n >> 4n; // top 60 bits
let out = '';
while (n > 0n) {
const r = Number(n % BASE);
out = ALPHABET[r] + out;
n = n / BASE;
}
return prefix + out.padStart(12, '0').slice(-12);
}
export const generateUSBN = (t, a, y) =>
_hashToIdentifier(normalizeFields(t, a, y), 'U');
export const generateWSBN = (t, a) =>
_hashToIdentifier(normalizeFields(t, a), 'W');
Command-line usage
If you have the Python reference installed, you can generate USBNs from the shell:
$ python3 -c "from usbn import generate_usbn; print(generate_usbn('The Outline of History', 'H. G. Wells', 1949))"
UAZJA136WFYXF
Integrating USBN into your catalog
The recommended integration pattern is:
- Store the USBN alongside each existing identifier (OCLC, LCCN, OLID).
- Store the WSBN as a separate column — this becomes your work-level join key.
- Compute USBNs lazily, at the time a record is first saved, from the title-page metadata you already have.
- Do not rename or remove USBN columns across updates. A USBN is stable for the life of the spec; only the underlying resolver moves.
Verifying your implementation
The test vectors file contains canonical inputs and their expected USBN / WSBN outputs. Any implementation that produces these values is conformant. If it doesn't, the bug is yours. The test vectors are versioned with the spec; future versions of USBN will ship new vectors.