V10: Redaction Patterns V13: Physical Evidence P23: Location Patterns V15: Financial Verification D5: Procedural Violations

10 Rare Signals: The Questions the Archive Barely Answers

10 of the 24 Questions appear in less than 1% of the corpus. These rare signals are disproportionately valuable — each document tagged with them is a needle in a 1.3-million-document haystack. The rarest: V10 (Redaction Patterns: 811 docs, 0.06%), V13 (Physical Evidence: 4,040 docs, 0.31%), P23 (Location Patterns: 4,589 docs, 0.35%), V15 (Financial Verification: 5,492 docs, 0.42%), D5 (Procedural Violations: 7,103 docs, 0.54%).

## Rare Question Codes (< 1% of Corpus)

| Code | Name | Docs | % | Significance |
|------|------|------|---|-------------|
| V10 | Redaction Patterns | 811 | 0.06% | CRITICAL |
| V13 | Physical Evidence | 4,040 | 0.31% | HIGH |
| P23 | Location Patterns | 4,589 | 0.35% | HIGH |
| V15 | Financial Verification | 5,492 | 0.42% | MEDIUM |
| D5 | Procedural Violations | 7,103 | 0.54% | MEDIUM |
| P20 | Employee Networks | 7,161 | 0.55% | MEDIUM |
| P21 | Financial Networks | 8,425 | 0.65% | MEDIUM |
| V9 | Document Integrity | 10,548 | 0.81% | MEDIUM |
| V12 | Witness Consistency | 12,430 | 0.95% | MEDIUM |
| V14 | Victim Testimony | 12,610 | 0.97% | MEDIUM |

## Why Rare Signals Matter

In a corpus of 1,306,136 documents, questions that appear in fewer than 1% of documents are either:
1. **Genuinely rare** — the archive doesn't contain much evidence on this angle
2. **Under-detected** — the keyword classifier misses relevant documents
3. **Suppressed** — these types of evidence were systematically removed

Each possibility is investigatively significant. Option 3 in particular would mean these rare signals identify the topics where suppression was MOST effective.

**Generator**: `statistical_v1` v1.0.0
**Date**: 2026-02-24