engineering

A PDF engine built for the case file

Whether an attorney is building an appeal or a judge is deciding one, the work is the same: read the record. And the law requires reading all of it — a Board decision must rest on “the entire record … and upon consideration of all evidence and material of record” (38 U.S.C. § 7104(a)) — across thousands of scanned, often decades-old pages. A few seconds of friction on a single page is nothing by itself. Multiplied across every page, all day, and hundreds of reviewers, it becomes hours of lost time — and decisions that wait.

Meeting that bar — every page, faithfully, fast, and at scale — is not what a general-purpose PDF library is built for. A general-purpose library does a little of everything a PDF allows: view, edit, fill and compute forms, run embedded scripts, sign, save. Legal review needs just one of them — viewing — done exceptionally well: read the filed document faithfully, even when the file would defeat an ordinary viewer.

So over the past five years we took an open-source engine and refactored it into a read-only engine for legal review: removing everything the work doesn't touch, and rebuilding what it does around the realities of a case file — thousands of pages, mostly scanned, much of it decades old and damaged, read for hours at a time.

A general-purpose PDF library does a little of everything — render, edit, fill forms, run scripts, sign, save. We removed all of it but rendering and rebuilt a read-only engine for legal review: it opens fast even on giant files, has no slowdown hours in, and searches the whole record instantly — tuned to the case file.

The result is something a reviewer feels as they work through every case: the record opens fast and stays fast — so the hours go to reviewing, analyzing, and deciding, not waiting on a file to load. That is what the engineering translates to.

We made it read-only, on purpose

The first and largest change was a subtraction. A general-purpose PDF library carries code to edit documents, fill and compute form fields, and write changes back to a file. A legal-review engine should do none of that, so we removed it — more than 160 modules, handlers, and code paths in all.

This isn’t mainly about a leaner engine, though it is leaner. It’s that a document under review is evidence. The software that displays it has no need — and shouldn’t have the ability — to change it. A read-only engine cannot alter the record; what’s left does one job, render the filed document faithfully, with far less that can go wrong. (It does start a little faster, with less to load — but that’s the smallest of the reasons.)

It’s fast, even on the biggest files

A case file is the hardest thing you can hand a PDF reader — thousands of scanned pages, often gigabytes, re-saved across decades. That’s exactly where an ordinary reader bogs down: slow to open, slow to jump deep into, and apt to grind to a halt a few hours into a session. Ours doesn’t. A jump deep into a 16,000-page record lands in seconds, and hour eight is as quick as hour one.

Getting there meant rebuilding the parts that buckle on a file this size — how it loads, and how it holds memory over a long session — so a giant document opens without the wait and stays responsive from the first hour to the eighth.

We made search find what’s actually on the page

The one fact that could decide a case could be within one filled box on a form — an examiner’s opinion, a diagnosis date, a signature. An ordinary reader reads the blank form and skips the filled-in value, so a search for it comes back empty even though it’s right there on screen. We pull the form-field values into the search index too.

Scanned text breaks in a different way: it’s often drawn on the page a fragment at a time, so a stock text extractor turns “Department of Veterans Affairs” into “D epartment of V eterans A ffairs,” and a clean search matches none of it. We extract text the way the page is actually drawn — reconstructing the real word breaks — so what you search is what you see.

Why this is the hard part

None of these changes came from a roadmap. Each came from a real file that broke something — the 3-gigabyte master, the pages whose fonts were unreadable, the form whose answer search couldn’t find. And because the engine ships to every deployment, a fix for one operation’s worst file benefits all of them.

Writing custom PDF code is ordinary — most document tools are built on one engine or another. Refactoring one down to read-only legal review, and optimizing it for that exact work, proven in production across more than 20,000 real case files and across different ways of getting a document to the screen — that is not. The files that stop other software are ordinary here.

Proven on a real caseload. Built for any operation.

And this is only the beginning. What runs today is fast; what’s coming is faster still — more soon.