We don't just claim to be better. We prove it.
The first open, reproducible benchmark for sanctions screening APIs. 122 test cases across 15 categories. 900K+ real entities. Transparent results.
Why these numbers matter
In sanctions screening, missing a sanctioned person (false negative) can result in OFAC fines up to $330,000 per violation. Flagging innocent people (false positive) wastes your compliance team's time and blocks legitimate customers.
Verifex achieves 97.6% recall — meaning it catches virtually every sanctioned entity across 900K+ entries including 851K PEP records. With an 84.7% precision rate enhanced by AI verification, it minimizes the noise that overwhelms compliance teams.
Head-to-head comparison
Same dataset. Same test cases. Different results.
| Metric | Verifex | Industry Baseline |
|---|---|---|
| F1 Score | 90.7% | 66.7% |
| Recall | 97.6% | 57.4% |
| Precision | 84.7% | 79.6% |
| False Negatives | 2 | 29 |
| Compliance Risk Score | 0.64 | 4.72 |
| Avg Response (cached) | 182ms | ~500ms |
| Entities Screened Against | 900K+ | ~50K |
Industry Baseline represents a leading API-first sanctions screening provider tested on the same dataset. Lower Compliance Risk Score is better — it weights missed sanctions 10x more than false alarms, reflecting real regulatory risk.
Accuracy by category
Tested across 15 categories of name matching challenges.
"Vladimir Putin" → finds Putin across 6 sanctions lists
"Vladmir Putin" (typo), "Kim Jung Un" → still finds the match
"Qassem Suleimani" → matches "Qasem Soleimani" across romanization variants
"Wladimir Putin" (German), "Poutine Vladimir" (French) → matches
"Kaddafi Muammar" → matches "Muammar Gaddafi" by sound
"Putin, Vladimir", "Jong Un Kim" → matches regardless of name order
"Bank Melli Iran", "IRGC", "Hezbollah/Hizballah" → matches organization names
"Emmanuel Macron", "Xi Jinping", "Narendra Modi" → matches 851K+ PEP records
"Alisher Usmanov" → matched across OFAC, EU, and UK simultaneously
"Putin", "Soleimani", "Kadyrov" → surname-only matches work
"Putin Street Cafe", "Samsung Electronics" → correctly identified as NOT matches
With 851K PEPs, common names are flagged for review. AI reduces noise by 50%.
The matching pipeline
4-stage engine that combines traditional algorithms with AI verification.
Normalization
Strip diacritics, normalize Unicode, transliterate Arabic/Cyrillic/Greek. "Müller" → "Muller", "Владимир" → "Vladimir"
Multi-Algorithm Matching
Soft TF-IDF + Jaro-Winkler + Monge-Elkan + Double Metaphone. Each algorithm catches different types of name variations.
IDF Weighting
Common names ("Mohammed", "Trading") get lower weight. Rare names ("Soleimani", "Kadyrov") get higher weight.
AI Verification
Ambiguous matches verified by AI language model. Reduces false positives by up to 50% on hard cases.
Methodology
Our benchmark suite contains 122 test cases across 15 categories, run against 900K+ real entities including 851K PEP records:
- •85 true positive cases — real sanctioned entities and PEPs from OFAC, UN, EU, UK, Australia, Canada, Switzerland, and Wikidata PEP database, tested with spelling variations, transliterations, phonetic variants, word order changes, and partial names.
- •37 true negative cases — fictional names, generic businesses, substring traps, and adversarial inputs that should NOT trigger a match.
Every test case has a predetermined expected result. We measure:
- •Recall (sensitivity) — what percentage of sanctioned entities did we find?
- •Precision — of the entities we flagged, how many were actually sanctioned?
- •F1 Score — the harmonic mean of precision and recall.
The Compliance Risk Score is our custom metric: (False Negative Rate × 10) + (False Positive Rate × 1). It weights missed sanctions 10x higher than false alarms, reflecting the asymmetric risk in compliance: missing a match means regulatory penalties, while a false positive means a brief manual review.