From Chaos to Clarity: Building a Name-Matching System That Processes 1.5 Million Records in 10 Minutes

Oct 15

How lean product thinking and creative integration overcame a national data bottleneck

Every product manager eventually faces this moment: You're handed an "impossible" problem with legacy constraints, no budget for modern tools, and stakeholders who need results yesterday. Mine came in the form of a deceptively simple question: "Can you help us identify military trainees across our federal casework systems?"

Simple question. Crushing reality.

The investigators were facing a daily tsunami, every morning brought new training rosters and new case assignments. The DAILY 2-4 hour manual matching process was so overwhelming that field offices had started making an impossible choice: skip the matching entirely and hope nothing critical fell through the cracks. When your daily workload requires 4 hours of manual work before you can even start your real job, something breaks. Usually, it's the process. Sometimes, it's the people.

The Perfect Storm of Complexity

Before writing a single line of code, I embedded myself with the field office. What I discovered was a masterclass in system incompatibility:

The Daily Workflow From Hell:

Morning: Receive training rosters from multiple military installations
Problem 1: Rosters only contained names and DoD ID numbers - no SSNs
Problem 2: The legacy federal mainframe predated DoD IDs - it only knew SSNs
Problem 3: No common identifier meant matching could only happen by name
Problem 4: Military names (nicknames, variations, dropped middle names, transcription errors) rarely matched perfectly
Result: 2-4 hours of manual searching, guessing, and hoping

By noon, investigators were either behind on their real work or had skipped the matching entirely. Both options meant risk.

The human cost was staggering: Brilliant investigators were burning out on data entry while critical investigations sat untouched. They rotated the responsibility and sometimes gave up entirely, the backlog had become infinite.

Understanding the Real Problem

This wasn't just a data problem. It was an identifier architecture failure spanning two massive bureaucracies that would never align their systems. The DoD wasn't going to add SSNs to training rosters (privacy concerns led DoD to begin phasing out SSNs in 2008). The federal mainframe wasn't getting upgraded to understand DoD IDs (millions in costs, years of work).

Most consultants would have recommended a multi-million dollar integration project. A new middleware layer. Years of requirements gathering. But investigators needed help tomorrow, not in three years.

Building SENTRY: Embracing the Chaos

I named the solution SENTRY - partly for morale (clear identity creates ownership), partly as a promise (a sentry never stops watching).

The Technical Approach: Making Names Work When Numbers Won't

Challenge 1: The Identifier Gap
With no common ID between systems, names became our only bridge. But matching "Bob Smith" to "Robert Smith Jr." to "SMITH, ROBERT NMI" isn't trivial at scale. I implemented a multi-layer approach:

Levenshtein distance for fuzzy matching
Military name pattern recognition (last-name-first, rank prefixes, suffix handling)
Confidence scoring based on name uniqueness and additional context
Parallel matching strategies that could catch edge cases

Challenge 2: The Mainframe Wall
The IBM mainframe was designed for human operators in the 1980s. No API. No exports. Just green screens and function keys. Using VBA and EHLLAPI (IBM's screen-scraping protocol), I built an automation layer that could:

Navigate screens like a human operator
Submit queries at machine speed
Parse responses from fixed-position text screens
Handle mainframe timeouts and session management

Challenge 3: Context as a Validator
Names alone weren't enough. I integrated:

Training graduation timelines (when someone should appear in the system, when they would be available for interviews)
Geographic installation data (where they should be and where the case should be reassigned to)
Historical patterns (how long between training begin/end and case assignment)

This context turned probable matches into confident actionable identifications.

The Product Philosophy: Start Where They Are

Great product design starts with empathy for what already exists, not what we wish existed. What made SENTRY successful wasn't the algorithm, it was accepting reality:

Work with what exists: Don't wait for perfect data
Make failure graceful: Show confidence scores, not just matches
Build trust incrementally: Start with high-confidence matches, expand from success
Design for exhaustion: When users are already overwhelmed, every click matters

The Compound Effect of Daily Automation

The metrics tell the tactical story:

Processing time: 2-4 hours daily -> 10 minutes daily
Coverage: Sporadic (when time allowed) -> Consistent (every single day)
Scale: Thousands of manual searches -> 1.5 million+ automated comparisons
Discovery: Found subjects that had been missed for months

But the strategic impact was transformational:

Investigative capacity returned: 15-20 hours per week per office reclaimed
Backlogs eliminated: Offices could finally catch up on cases that had been languishing
Morale restored: Investigators were investigating, not scanning multiple spreadsheets
Compliance achieved: Daily processing meant nothing fell through the cracks

One office told me: "For the first time in two years, we're actually caught up."

Lessons for Modern Product Leadership

Building SENTRY taught me principles I apply to every AI/ML product today:

1. The worst integration is the one between systems that will never talk
Don't wait for organizational alignment. Build bridges where they are, not where they should be.

2. Daily friction compounds exponentially
A weekly 4-hour task is annoying. A daily 4-hour task is unsustainable. Identify where frequency multiplies pain.

3. Imperfect automation beats perfect planning
SENTRY wasn't elegant. But version 1.0 shipped in less than 40 hours, not years, and started delivering value on day one.

4. Trust is earned in production, not PowerPoint
We started with a subset of high-confidence matches. Success bred permission to expand.

5. The identifier problem is everywhere
From customer data platforms to AI training sets, the challenge of matching entities without common keys is universal. Solve it once, apply it everywhere.

From SENTRY to Strategic AI

SENTRY was my baptism in intelligent automation. It proved that with pragmatic product thinking, you can deliver transformative outcomes without transformative budgets. Today, when I architect RAG systems or design multi-agent workflows, I still ask the SENTRY questions:

What manual process is breaking people?
What systems will never naturally integrate?
How can we deliver value tomorrow, not next year?
What's the minimum automation that creates maximum relief?

The tools have evolved: vector embeddings, LLMs, semantic search. But the core challenge remains: How do we build systems that give humans their time back?

A Call for Pragmatic Innovation

Somewhere right now, an analyst is manually copying names between systems that don't talk. A nurse is re-entering data that exists somewhere else. A field office is choosing between bad options because the good option takes too long.

These aren't just technology problems. They're product problems. And they're solvable. Not with million-dollar platforms, but with creative integration, fast iteration, and a willingness to embrace imperfection. The most strategic thing you can do isn't always building the newest tech. Sometimes it's making the old tech dance together just well enough that people can get back to work.

That's where real transformation lives, in the 2-4 hours you give back, every single day.

Sean Lavigne

From Chaos to Clarity: Building a Name-Matching System That Processes 1.5 Million Records in 10 Minutes

From 0 to 1: Building eApp, WatchTower, SENTRY & FireSpec

When “Agile” Becomes a Weapon: The Product Owner’s Impossible Position