CLiki: B-Tries

B-Tries

A prototypical implementation of the data structure described in the paper "B-tries for disk-based string management" (PDF). The authors present B-tries as a possible alternative to B-trees, which have structural similarities, and provided inspiration for B-tries. In this B-trie implementation, both keys and values are (unsigned-byte 8) vectors.

Glossary

block n. an 8 KB block in a B-trie file.

block header n. four bytes: a type signature followed by a key range.

block pointer n. a 32-bit block number, stored in little-endian format.

bucket n. a block that contains keys and values packed together as alternating cells. The keys in a bucket are suffixes of keys that have been partially consumed by the trie structure. Buckets are leaf blocks, having no children, and their parents are trie nodes. If the bucket is pure, then there is only one pointer to it from the parent node. Otherwise, it's a hybrid bucket, and there are multiple (adjacent) pointers from the trie node to the bucket.

cell n. the unit of allocation in a block. (No relation to Cells.) The cells are allocated from the end of the block, while the cell pointers are arrayed at the front of the block, just after the header. See the functions allocate-cell and get-cell. For an overview of how this kind of cellular structure works, see the article Slotted Pages: A database page layout.

cell pointer n. a 32-bit pointer within a block to a cell stored in that same block. A cell pointer comprises (from low to high bits) a 6-bit type tag, 13-bit length, and 13-bit offset. The first cell pointer, numbered zero, points to free space. There may be more than one free cell, in practice, but that would imply fragmentation. Refer to the functions get-cell-pointer and get-datum.

datum n. a key or value. This terminology is borrowed from Berkeley DB.

engulf v.t. gather adjacent trie pointers to be covered by a single hybrid bucket.

exhausted key n. a key that is entirely consumed by the traversal of trie nodes.

exhaust pointer n. a cell pointer to the value corresponding to an exhausted key.

free cell n. a cell whose pointer is tagged with the type code +free-cell+.

hoist v.t. move a key that became exhausted by the introduction of a new trie node.

hybrid bucket n. a non-pure bucket, which has a key range greater than one byte.

hybridize v.t. transform a pure bucket into a hybrid bucket covering the full key range (0–255) under a new trie node. This is done because a pure bucket cannot be split horizontally, so there must be a new trie node parenting the pure bucket. After this vertical split is completed, the bucket is split horizontally into a pair of (directly adjacent) buckets that are pointed to by the parent trie node. See the function hybridize-bucket.

key distribution n. a vector containing the counts of occurrences for each leading key byte in a bucket.

key range n. the range of leading key bytes covered by a bucket. Either 0–255 or some narrower range.

null pointer n. a cell pointer having zero length and zero offset.

pure bucket n. a bucket with a key range covering one—and only one—byte.

purify v.t. adjust a trie node so that there is only one pointer to a pure bucket.

split factor n. the target ratio when a bucket is split into right and left buckets.

split point n. the byte determining which bucket (left or right) a key will be in.

sync v.t. write a block to disk.

trie cell n. the cell containing trie pointers (256 of them, to be exact).

trie pointer n. a block pointer from a trie node to a child block.

The code

Author: David Mullen
License: Apache 2

;;; Every block is 8 KB, with a fixed-size four-byte header. ;;; Block pointers and cell pointers are 32-bit little-endian. (defconstant +block-size+ 8192) (defconstant +pointer-size+ 4) (defconstant +header-size+ 4) ;;; Each block leads with a 16-bit signature. (defconstant +trie-tag+ 20052 "ASCII TN.") (defconstant +bucket-tag+ 16980 "ASCII TB.") ;;; Cells have 6-bit type tags. (defconstant +free-cell+ #o00) (defconstant +trie-cell+ #o01) (defconstant +data-cell+ #o02) (defconstant +free-pointer-offset+ +header-size+ "Cell number zero.") (defconstant +max-cell-count+ (1- (/ +block-size+ +pointer-size+)) "Hard limit on cell count in blocks.") (defconstant +split-factor+ 3/4 "Right/left bucket fill factor.") (defconstant +cache-size+ 128 "Number of blocks in memory.") (deftype octet () "Datum element-type." '(unsigned-byte 8)) (defstruct bt (stream nil) (blocks nil)) (defmethod print-object ((bt bt) stream) (cond ((eq (bt-stream bt) nil) (call-next-method)) ((let* ((bt-name (namestring (bt-stream bt))) (length (file-length (bt-stream bt))) (block-count (/ length +block-size+))) (print-unreadable-object (bt stream :type t :identity t) (format stream "~S (~D block~:P)" bt-name block-count)))))) (defun %open-bt (pathname) (open pathname :direction :io :if-does-not-exist :create :if-exists :overwrite :element-type 'octet)) (defun open-bt (pathname) (let ((bt (make-bt :stream (%open-bt pathname)))) (when (zerop (file-length (bt-stream bt))) (allocate-block bt +trie-tag+)) (values bt (bt-stream bt)))) (defun close-bt (bt) (setf (bt-blocks bt) nil) (unless (eq (bt-stream bt) nil) (with-accessors ((stream bt-stream)) bt (prog1 (close stream) (setq stream nil))))) (defmacro %prune-cache (cache) (let ((tail-pos (1- +cache-size+))) `(when (> (list-length ,cache) +cache-size+) (rplacd (nthcdr ,tail-pos ,cache) nil)))) (defun get-block (bt number) (let ((entry (assoc number (bt-blocks bt))) entry-tail) (cond (entry (with-accessors ((cache bt-blocks)) bt (unless (eq entry (car cache)) (loop for tail on cache when (eq (cadr tail) entry) do (setq entry-tail (cdr tail)) (setf (cdr tail) (cddr tail))) ;; Move to front of cache. (rplacd entry-tail cache) (setq cache entry-tail) (%prune-cache cache)) (cdr entry))) ;; Not in cache, read from file. (t (file-position (bt-stream bt) (* +block-size+ number)) (let ((block (make-array +block-size+ :element-type 'octet))) (assert (= (read-sequence block (bt-stream bt)) +block-size+)) (progn (push (cons number block) (bt-blocks bt)) block)))))) (defun get-block-number (bt block) (or (car (rassoc block (bt-blocks bt))) (error "No block found: ~S." block))) (defun sync-block (bt block &optional number) (let ((number (or number (get-block-number bt block)))) (file-position (bt-stream bt) (* +block-size+ number)) (values (write-sequence block (bt-stream bt)) number))) (defun make-cell-pointer (type length offset) (check-type type (unsigned-byte 6) "a cell type") (check-type length (unsigned-byte 13) "a length") (check-type offset (unsigned-byte 13) "an offset") (logior (ash offset 19) (ash length 6) type)) (defun read-u32 (block offset) (logior (aref block offset) (ash (aref block (+ offset 1)) 8) (ash (aref block (+ offset 2)) 16) (ash (aref block (+ offset 3)) 24))) (defun (setf read-u32) (new-value block offset) (loop for byte-position fixnum from 0 below 32 by 8 for octet = (ldb (byte 8 byte-position) new-value) do (setf (aref block offset) octet) (incf offset) finally (return new-value))) (defun free-space (block) "Return free space and free offset as multiple values." (let ((free-pointer (read-u32 block +free-pointer-offset+))) (assert (= (ldb (byte 6 0) free-pointer) +free-cell+)) (values (ldb (byte 13 6) free-pointer) (ldb (byte 13 19) free-pointer)))) (defsetf free-space (block) (free-space free-offset) `(multiple-value-prog1 (values ,free-space ,free-offset) (setf (read-u32 ,block +free-pointer-offset+) (make-cell-pointer +free-cell+ ,free-space ,free-offset)))) (define-symbol-macro +free-space+ (symbol-macrolet ((total-block-overhead (+ +header-size+ +pointer-size+))) (values (- +block-size+ total-block-overhead) total-block-overhead))) (defun block-type (block) (let ((lo-byte (aref block 0)) (hi-byte (aref block 1))) (logior (ash hi-byte 8) lo-byte))) (defun (setf block-type) (type-signature block) (check-type type-signature (unsigned-byte 16)) (setf (aref block 0) (ldb (byte 8 0) type-signature)) (setf (aref block 1) (ldb (byte 8 8) type-signature))) (defun make-trie-cell () (make-array (* 256 +pointer-size+) :element-type 'octet :initial-element 0)) (defun allocate-block (bt type-signature) (let* ((block (make-array +block-size+ :element-type 'octet :initial-element 0)) (stream (bt-stream bt)) (file-length (file-length stream)) (number (/ file-length +block-size+)) (position (* +block-size+ number))) (check-type number (unsigned-byte 32)) (setf (block-type block) type-signature) (setf (free-space block) +free-space+) (when (eql type-signature +trie-tag+) ;; Reserve space for both exhaust pointers and child pointers. (loop for key-index of-type (unsigned-byte 8) from 0 to 255 do (allocate-cell block +data-cell+ nil (1+ key-index)) finally (allocate-cell block +trie-cell+ (make-trie-cell)))) (file-position stream position) (write-sequence block stream) (values (get-block bt number) number))) (defun allocate-cell (block type-code &optional vector cell-number) (multiple-value-bind (free-space free-offset) (free-space block) (assert (or (not cell-number) (< cell-number +max-cell-count+))) (let* ((cell-number (or cell-number (/ (- free-offset +header-size+) +pointer-size+))) (pointer-offset (+ +header-size+ (* +pointer-size+ (the fixnum cell-number)))) (pointer-space-excursion (- (+ pointer-offset +pointer-size+) free-offset)) (total-space-requirement (+ (length vector) (max 0 pointer-space-excursion)))) (assert (>= free-space total-space-requirement)) (let* ((datum-end (+ free-offset free-space)) (datum-start (- datum-end (length vector)))) (declare (type (unsigned-byte 16) datum-start)) (declare (type (unsigned-byte 16) datum-end)) (decf free-space total-space-requirement) (let ((cell-pointer (make-cell-pointer type-code (- datum-end datum-start) (if vector datum-start 0)))) (setf (read-u32 block pointer-offset) cell-pointer) (when vector (setf (subseq block datum-start datum-end) vector)) (let ((free-offset (max free-offset (+ pointer-offset +pointer-size+)))) (multiple-value-prog1 (values cell-pointer cell-number free-space) (setf (free-space block) (values free-space free-offset))))))))) (defun trie-node-lookup (block key key-start) (let* ((exhaust-pointer-p (eql (1+ key-start) (length key))) (fn (if exhaust-pointer-p #'exhaust-pointer #'trie-pointer)) (key-index (aref key key-start))) (check-type key-index octet) (values (funcall fn block key-index) key-index exhaust-pointer-p))) (defun null-pointer-p (cell-pointer) "Null check ignores cell type bits." (zerop (ldb (byte 26 6) cell-pointer))) (defun get-datum (block cell-pointer) (when (null-pointer-p cell-pointer) (return-from get-datum nil)) (make-array (ldb (byte 13 6) cell-pointer) :element-type 'octet :displaced-index-offset (ldb (byte 13 19) cell-pointer) :displaced-to block)) (defun get-cell-pointer (block cell-number) (assert (< 0 cell-number (cell-count block))) (let ((pointer-index (* +pointer-size+ cell-number))) (read-u32 block (+ +header-size+ pointer-index)))) (defun find-cell (block type-code &optional (errorp t)) (loop for cell-number from 1 below (cell-count block) for cell-pointer = (get-cell-pointer block cell-number) for cell-type of-type fixnum = (ldb (byte 6 0) cell-pointer) when (eql cell-type type-code) return (get-datum block cell-pointer) finally (when errorp (error "Cell type not found: ~D." type-code)))) (defun cell-count (block) (let ((free-offset (nth-value 1 (free-space block)))) (/ (- free-offset +header-size+) +pointer-size+))) (defun get-cell (block cell-number) "Read the data stored in the cell having CELL-NUMBER." (get-datum block (get-cell-pointer block cell-number))) (defun key-range (bucket) "Get key range (values: low, high)." (let ((lo-key-byte (aref bucket 2)) (hi-key-byte (aref bucket 3))) (assert (<= lo-key-byte hi-key-byte)) (values lo-key-byte hi-key-byte))) (defsetf key-range (bucket) (lo-key-byte hi-key-byte) `(set-key-range ,bucket ,lo-key-byte ,hi-key-byte)) (defun set-key-range (bucket lo-key-byte hi-key-byte) (multiple-value-prog1 (values lo-key-byte hi-key-byte) (assert (<= 0 lo-key-byte hi-key-byte 255)) (let ((was-hybrid-p (hybrid-bucket-p bucket)) (now-pure-p (eql (setf (aref bucket 2) lo-key-byte) (setf (aref bucket 3) hi-key-byte)))) ;; Going hybrid->pure consumes keys. (when (and was-hybrid-p now-pure-p) (loop with pure-bucket-key-index of-type octet = lo-key-byte with cell-count of-type fixnum = (cell-count bucket) for cell-number of-type fixnum from 1 below cell-count by 2 for offset fixnum = (+ +header-size+ (* +pointer-size+ cell-number)) for pointer of-type (unsigned-byte 32) = (read-u32 bucket offset) do (symbol-macrolet ((pointer-offset (ldb (byte 13 19) pointer))) (assert (eql (aref bucket pointer-offset) pure-bucket-key-index)) (progn (incf pointer-offset) (decf (ldb (byte 13 6) pointer))) (setf (read-u32 bucket offset) pointer))))))) (defun find-in-bucket (bucket key &optional (key-start 0)) (when (pure-bucket-p bucket) (setq key-start (1+ key-start))) (loop with cell-count of-type fixnum = (cell-count bucket) for cell-number fixnum from 1 below cell-count by 2 do (let ((key-in-bucket (get-cell bucket cell-number))) (unless (mismatch key-in-bucket key :start2 key-start) (return (get-cell bucket (the fixnum (1+ cell-number)))))))) (defun trie-node-p (block) (eql (block-type block) +trie-tag+)) (defun pure-bucket-p (block) (symbol-macrolet ((key-range (key-range block))) (when (eql (block-type block) +bucket-tag+) (multiple-value-call #'= key-range)))) (defun hybrid-bucket-p (block) (symbol-macrolet ((key-range (key-range block))) (when (eql (block-type block) +bucket-tag+) (multiple-value-call #'< key-range)))) (defun cell-space (&rest vectors) (loop for vector in vectors sum (length vector) sum +pointer-size+)) (defun bucket-key-distribution (bucket) (loop with counts = (make-array 256 :initial-element 0) with cell-count of-type fixnum = (cell-count bucket) for cell-number fixnum from 1 below cell-count by 2 for key-in-bucket = (get-cell bucket cell-number) do (incf (svref counts (aref key-in-bucket 0))) finally (return counts))) (defun find-split-point (bucket) (multiple-value-bind (lo hi) (key-range bucket) (loop with best-split-point of-type octet = hi with key-distribution = (bucket-key-distribution bucket) with distincts fixnum = (count-if #'plusp key-distribution) with best-split-margin of-type fixnum = most-positive-fixnum initially (when (eql distincts 1) (return key-distribution)) for split-point of-type octet from lo to hi for left-count = (reduce #'+ key-distribution :end split-point) for right-count = (reduce #'+ key-distribution :start split-point) for ratio = (when (plusp left-count) (/ right-count left-count)) for split-margin = (when ratio (abs (- ratio +split-factor+))) when (and split-margin (> right-count 0) (< split-margin best-split-margin)) do (setq best-split-margin split-margin best-split-point split-point) finally (return best-split-point)))) (defun move-bucket-cells (old-bucket new-bucket split-point) (assert (< 0 split-point 256) (split-point) "Bad SPLIT-POINT.") (loop with cell-count fixnum = (cell-count old-bucket) for cell-number fixnum from 1 below cell-count by 2 for key = (copy-seq (get-cell old-bucket cell-number)) for value = (copy-seq (get-cell old-bucket (1+ cell-number))) if (< (aref key 0) split-point) append (list key value) into retained-cells else do (allocate-cell new-bucket +data-cell+ key) (allocate-cell new-bucket +data-cell+ value) ;; After moving the cells that need to be moved into ;; the new bucket, reset free space in the old bucket. ;; Then pack together all the RETAINED-CELLS in the ;; old bucket, and adjust key ranges on both buckets. finally (setf (free-space old-bucket) +free-space+) (dolist (datum retained-cells) (allocate-cell old-bucket +data-cell+ datum)) (multiple-value-bind (old-lo old-hi) (key-range old-bucket) (setf (key-range new-bucket) (values split-point old-hi)) (setf (key-range old-bucket) (values old-lo (1- split-point))) (return split-point)))) (defun purify-bucket (bt bucket trie-node key-distribution) (let ((key-leader (position-if #'plusp key-distribution))) (loop for bucket-number = (get-block-number bt bucket) for i of-type octet from 0 to 255 ;; We have to undo the work that was done by ENGULF-POINTERS. do (symbol-macrolet ((pointer (trie-pointer trie-node i))) (when (and (/= i key-leader) (eql pointer bucket-number)) (setq pointer 0))) ;; Now there is only one pointer to this bucket. finally (set-key-range bucket key-leader key-leader) (sync-block bt bucket) (sync-block bt trie-node) (return (split-pure-bucket bt bucket trie-node))))) (defun split-hybrid-bucket (bt bucket trie-node) (let ((bucket-split-point (find-split-point bucket))) (cond ((typep bucket-split-point 'simple-vector) (purify-bucket bt bucket trie-node bucket-split-point)) ((multiple-value-bind (new-bucket new-bucket-number) (allocate-block bt +bucket-tag+) (move-bucket-cells bucket new-bucket bucket-split-point) (multiple-value-bind (new-lo new-hi) (key-range new-bucket) (loop for i of-type octet from new-lo to new-hi do (setf (trie-pointer trie-node i) new-bucket-number) finally (sync-block bt bucket) (sync-block bt new-bucket)))))))) (defun delete-bucket-cell (bucket cell-number) (multiple-value-bind (free-space free-offset) (free-space bucket) (let ((cell-offset (+ +header-size+ (* +pointer-size+ cell-number)))) (replace bucket bucket :start1 cell-offset :end1 (- free-offset +pointer-size+) :start2 (+ cell-offset +pointer-size+) :end2 free-offset) ;; Free the space that was occupied by the pointer. ;; The actual data may remain until there's a split. (setf (free-space bucket) (values (+ free-space +pointer-size+) (- free-offset +pointer-size+)))))) (defun trie-pointer (trie-node key-index) (let ((trie-vector-cell (find-cell trie-node +trie-cell+))) (read-u32 trie-vector-cell (* key-index +pointer-size+)))) (defun (setf trie-pointer) (trie-pointer trie-node key-index) (let ((trie-vector-cell (find-cell trie-node +trie-cell+))) (setf (read-u32 trie-vector-cell (* key-index +pointer-size+)) trie-pointer))) (defun exhaust-pointer (trie-node key-index) ;; Cell number is one more than key byte. (let ((pointer-index (* +pointer-size+ (1+ key-index)))) (read-u32 trie-node (+ +header-size+ pointer-index)))) (defun (setf exhaust-pointer) (exhaust-pointer trie-node key-index) (let ((offset (+ +header-size+ (* +pointer-size+ (1+ key-index))))) (setf (read-u32 trie-node offset) exhaust-pointer))) (defun hoist-exhaustible-keys (bucket trie-node) "Transfer one-byte keys from BUCKET to the parent TRIE-NODE." (loop with cell-count of-type fixnum = (cell-count bucket) with cell-number of-type fixnum = 1 while (< cell-number cell-count) do (let* ((key-in-bucket (get-cell bucket cell-number)) (exhaustiblep (= (length key-in-bucket) 1)) (leading-key-byte (aref key-in-bucket 0)) (exhaust-number (1+ leading-key-byte))) ;; If it's a multi-byte key then skip the pair. (cond ((not exhaustiblep) (incf cell-number 2)) ((let ((value (get-cell bucket (the fixnum (1+ cell-number))))) (allocate-cell trie-node +data-cell+ value exhaust-number) ;; After deleting the key, the ;; value has the same CELL-NUMBER. (delete-bucket-cell bucket cell-number) (delete-bucket-cell bucket cell-number) (decf cell-count 2))))))) (defun hybridize-bucket (bt bucket) "BUCKET will cover the full range under a new TRIE-NODE." (loop with trie-node = (allocate-block bt +trie-tag+) with bucket-number = (get-block-number bt bucket) with pure-bucket-key-index = (key-range bucket) for i of-type octet from 0 to 255 do (setf (trie-pointer trie-node i) bucket-number) finally (setf (key-range bucket) (values 0 255)) (hoist-exhaustible-keys bucket trie-node) (sync-block bt bucket bucket-number) (return (values (sync-block bt trie-node) pure-bucket-key-index)))) (defun split-pure-bucket (bt bucket parent-node) (multiple-value-bind (child-node key-index) (hybridize-bucket bt bucket) (progn (split-hybrid-bucket bt bucket child-node) (sync-block bt child-node)) (setf (trie-pointer parent-node key-index) (get-block-number bt child-node)))) (defun engulf-pointers (trie-node block-pointer key-index) (setf (trie-pointer trie-node key-index) block-pointer) (let ((lo-key-index key-index) (hi-key-index key-index)) (declare (type octet lo-key-index hi-key-index)) ;; Set adjacent null pointers (to the right). (loop while (< hi-key-index 255) while (zerop (trie-pointer trie-node (1+ hi-key-index))) do (setf (trie-pointer trie-node (incf hi-key-index)) block-pointer)) ;; Same thing, but to the left. (loop while (> lo-key-index 0) finally (return (values lo-key-index hi-key-index)) while (zerop (trie-pointer trie-node (1- lo-key-index))) do (setf (trie-pointer trie-node (decf lo-key-index)) block-pointer)))) (defun make-bucket (bt parent-node key-index) (let* ((bucket (allocate-block bt +bucket-tag+)) (number (get-block-number bt bucket))) (multiple-value-bind (lo-key-index hi-key-index) (engulf-pointers parent-node number key-index) (set-key-range bucket lo-key-index hi-key-index) (sync-block bt parent-node) (sync-block bt bucket)))) (defmacro %add-to-bucket (key value split-function-name) (let ((total-required-space `(cell-space ,key ,value))) `(cond ((> ,total-required-space (free-space next-block)) ;; Split the bucket and retry the insertion. (,split-function-name bt next-block block) ;; Update trie node. (sync-block bt block)) ;; There's space for KEY and VALUE. (t (allocate-cell next-block +data-cell+ ,key) (allocate-cell next-block +data-cell+ ,value) (sync-block bt next-block) (return next-block))))) (defun add-to-trie (bt block key value &optional (key-start 0)) (loop (multiple-value-bind (pointer key-index exhaust-pointer-p) (trie-node-lookup block key key-start) (let (next-block) (cond (exhaust-pointer-p (allocate-cell block +data-cell+ value (1+ key-index)) (return (sync-block bt block))) ((zerop pointer) (make-bucket bt block key-index)) ((pure-bucket-p (setq next-block (get-block bt pointer))) (let ((partially-consumed-key (subseq key (1+ key-start)))) (%add-to-bucket partially-consumed-key value split-pure-bucket))) ((hybrid-bucket-p next-block) (let ((partially-consumed-key (subseq key key-start))) (%add-to-bucket partially-consumed-key value split-hybrid-bucket))) ((trie-node-p next-block) (setq block next-block) (incf key-start)) ((error "Failed to acquire bucket: ~S, ~S in ~S." key value bt))))))) (defun fetch (key bt) (let ((root (get-block bt 0))) (find-in-trie bt root key))) (defun find-in-trie (bt block key &optional (key-start 0)) (loop (multiple-value-bind (pointer key-index exhaust-pointer-p) (trie-node-lookup block key key-start) (declare (ignore key-index)) (let (next-block) (cond ((zerop pointer) (return nil)) (exhaust-pointer-p (return (get-datum block pointer))) ((trie-node-p (setq next-block (get-block bt pointer))) (setq block next-block key-start (1+ key-start))) ((return (find-in-bucket next-block key key-start)))))))) (defun (setf fetch) (value key bt) (symbol-macrolet ((root (get-block bt 0))) (add-to-trie bt root key value) value))

Quick demo

A B-trie file, words.bt, has been created from a Linux words file, which contains 45,402 words (in S-expression format: see words.sxp). The B-trie maps each lower- and mixed-case word to its uppercase equivalent (by string-upcase). It's a silly test, but an easy one to run, since we can verify that each key/value pair is correctly stored in the B-trie.

? (open-bt "words.bt") #<BT "words.bt" (267 blocks) #x21005BB92D> ? (defvar *words* *) *WORDS*

This B-trie is 267 blocks—or 2.08 MB, about half of which is free space. To consult the dictionary we'll need a pair of functions to convert between octet vectors and ASCII strings:

(defun ascii-octets (string) (loop initially (unless string (return nil)) with length fixnum = (length string) with vector = (make-array length :element-type 'octet) for i fixnum below length for char of-type character = (char string i) for char-code of-type (integer 0 (#.char-code-limit)) = (char-code char) unless (typep char-code 'octet) do (error "~S isn't 8-bit ASCII." char) do (setf (aref vector i) char-code) finally (return vector))) (defun octets-ascii (vector) (loop initially (unless vector (return nil)) with length fixnum = (length vector) with string = (make-string length) for i fixnum from 0 below length for char-code = (aref vector i) for char = (code-char char-code) do (setf (char string i) char) finally (return string)))

Then we can see that silverware is correctly mapped to SILVERWARE:

? (fetch (ascii-octets "silverware") *words*) #(83 73 76 86 69 82 87 65 82 69) ? (octets-ascii *) "SILVERWARE"

Ideas for expansion

Compare with the venerable cdb, as well as extendible hashing, and VivaceGraph's linear hashing scheme (linear-hash.lisp).
Implement overflow blocks for large values. Also consider logical, variable-sized blocks instead of physical, fixed-size blocks.
How to implement transactional B-tries: write-ahead logging, shadow paging, or copy-on-write? No shortage of options here.

persistence