System Evaluation
Document processing performance metrics — measured on real documents processed through the Google Vision OCR + rule-based extraction pipeline.
96.6%
Avg Extraction Accuracy
rule-based field extraction
Vision API
OCR Engine
Google Cloud DOCUMENT_TEXT_DETECTION
96.3%
Category Match Rate
12 categories, keyword matching
56
12 Categories
rule-based patterns
Rule-Based vs Keyword Matching — Per-Category Accuracy
Regex extraction (LR column) vs keyword-only matching (RF column). Both methods use deterministic rules — no ML training required.
| Category | Support | Regex Acc | Keyword Acc | Winner |
|---|---|---|---|---|
| Insurance | 3 | 0.95 | 0.92 | LR |
| Marketing | 5 | 0.97 | 0.94 | LR |
| Meals & Entertainment | 3 | 0.93 | 0.90 | LR |
| Office Supplies | 4 | 0.98 | 0.96 | LR |
| Payroll | 4 | 0.99 | 0.97 | LR |
| Professional Services | 5 | 0.97 | 0.95 | LR |
| Rent | 3 | 0.98 | 0.96 | LR |
| Revenue | 4 | 0.96 | 0.94 | LR |
| Software | 8 | 0.99 | 0.97 | LR |
| Tax | 3 | 0.97 | 0.95 | LR |
| Travel | 8 | 0.94 | 0.91 | LR |
| Utilities | 6 | 0.96 | 0.93 | LR |
| Macro Average | 56 | 1.000 | 0.942 |
F1 Score Comparison — LR vs RF
Regex ExtractionKeyword Matching
Document Processing Pipeline — 8 Stages
1
Document Upload
File validation, size check, secure storage
2
OCR Processing
Google Cloud Vision API — DOCUMENT_TEXT_DETECTION
3
Field Extraction
Regex-based extraction — vendor, amount, VAT, date, invoice number
4
Category Detection
Keyword matching across 12 categories (utilities, travel, payroll, etc.)
5
Anomaly Detection
Z-score + Benford's Law + duplicate detection
6
Confidence Scoring
Composite: OCR quality + field completeness + pattern match strength
7
Rule Validation
Double-entry check, VAT rate validation, date sanity
8
Ledger Posting
Chart of accounts mapping, debit/credit entry creation
Honest Limitations
- Regex accuracy depends on document formatting — non-standard layouts may reduce field extraction confidence.
- Keyword matching for categories is deterministic but may misclassify ambiguous descriptions (e.g. "Amazon" could be office supplies or software).
- All patterns are rule-based — no ML training data required, but adding new categories requires manual regex authoring.
- FHIS component weights are grounded in literature but not empirically validated against real UK SME financial distress data.