Open Benchmark

We don't just claim to be better. We prove it.

The first open, reproducible benchmark for sanctions screening APIs. 122 test cases across 15 categories. 900K+ real entities. Transparent results.

90.7%F1 Score
97.6%Recall
2Missed Entities

Why these numbers matter

In sanctions screening, missing a sanctioned person (false negative) can result in OFAC fines up to $330,000 per violation. Flagging innocent people (false positive) wastes your compliance team's time and blocks legitimate customers.

Verifex achieves 97.6% recall — meaning it catches virtually every sanctioned entity across 900K+ entries including 851K PEP records. With an 84.7% precision rate enhanced by AI verification, it minimizes the noise that overwhelms compliance teams.

Head-to-head comparison

Same dataset. Same test cases. Different results.

MetricVerifexIndustry Baseline
F1 Score90.7%66.7%
Recall97.6%57.4%
Precision84.7%79.6%
False Negatives229
Compliance Risk Score0.644.72
Avg Response (cached)182ms~500ms
Entities Screened Against900K+~50K

Industry Baseline represents a leading API-first sanctions screening provider tested on the same dataset. Lower Compliance Risk Score is better — it weights missed sanctions 10x more than false alarms, reflecting real regulatory risk.

Accuracy by category

Tested across 15 categories of name matching challenges.

Exact Match
100%

"Vladimir Putin" → finds Putin across 6 sanctions lists

Spelling Variations
100%

"Vladmir Putin" (typo), "Kim Jung Un" → still finds the match

Arabic Transliteration
100%

"Qassem Suleimani" → matches "Qasem Soleimani" across romanization variants

Cyrillic Transliteration
100%

"Wladimir Putin" (German), "Poutine Vladimir" (French) → matches

Phonetic Matching
100%

"Kaddafi Muammar" → matches "Muammar Gaddafi" by sound

Word Order
100%

"Putin, Vladimir", "Jong Un Kim" → matches regardless of name order

Entity Names
92%

"Bank Melli Iran", "IRGC", "Hezbollah/Hizballah" → matches organization names

PEP Screening
92%

"Emmanuel Macron", "Xi Jinping", "Narendra Modi" → matches 851K+ PEP records

Multi-Source
100%

"Alisher Usmanov" → matched across OFAC, EU, and UK simultaneously

Partial Names
100%

"Putin", "Soleimani", "Kadyrov" → surname-only matches work

Substring Traps
100%

"Putin Street Cafe", "Samsung Electronics" → correctly identified as NOT matches

Common Names
AI-Enhanced

With 851K PEPs, common names are flagged for review. AI reduces noise by 50%.

The matching pipeline

4-stage engine that combines traditional algorithms with AI verification.

1

Normalization

Strip diacritics, normalize Unicode, transliterate Arabic/Cyrillic/Greek. "Müller" → "Muller", "Владимир" → "Vladimir"

2

Multi-Algorithm Matching

Soft TF-IDF + Jaro-Winkler + Monge-Elkan + Double Metaphone. Each algorithm catches different types of name variations.

3

IDF Weighting

Common names ("Mohammed", "Trading") get lower weight. Rare names ("Soleimani", "Kadyrov") get higher weight.

4

AI Verification

Ambiguous matches verified by AI language model. Reduces false positives by up to 50% on hard cases.

Methodology

Our benchmark suite contains 122 test cases across 15 categories, run against 900K+ real entities including 851K PEP records:

  • 85 true positive cases — real sanctioned entities and PEPs from OFAC, UN, EU, UK, Australia, Canada, Switzerland, and Wikidata PEP database, tested with spelling variations, transliterations, phonetic variants, word order changes, and partial names.
  • 37 true negative cases — fictional names, generic businesses, substring traps, and adversarial inputs that should NOT trigger a match.

Every test case has a predetermined expected result. We measure:

  • Recall (sensitivity) — what percentage of sanctioned entities did we find?
  • Precision — of the entities we flagged, how many were actually sanctioned?
  • F1 Score — the harmonic mean of precision and recall.

The Compliance Risk Score is our custom metric: (False Negative Rate × 10) + (False Positive Rate × 1). It weights missed sanctions 10x higher than false alarms, reflecting the asymmetric risk in compliance: missing a match means regulatory penalties, while a false positive means a brief manual review.

See for yourself.

Run our benchmark against your current provider. The dataset is open source.