Sources
- SEC filings (10-Q, 10-K, 8-K) - primary source for every public US-listed SaaS company that discloses NRR. Pulled via SEC EDGAR's structured filing feed.
- Earnings press releases - companies' investor relations pages, scraped for the most recent disclosure each quarter.
- Founder submissions - private B2B SaaS founders submit their NRR via the free calculator. Work-email-only, anonymized by default.
Extraction pipeline
- Discovery - for each tracked ticker, the scraper enumerates every recent 10-Q, 10-K, and 8-K filing, plus the company's IR press release archive.
- Per-company regex - hand-tuned per-company extractors handle each disclosure format (DBNRR, NDR, NRR, dollar net retention, etc.).
- LLM fallback - when regex returns nothing or low confidence, Vertex AI Gemini (gemini-2.5-flash) parses the disclosure with a strict JSON schema. Outputs are cross-validated against regex when both succeed.
- Cross-source agreement - when the same value appears in both regex and LLM output, OR in two different filings (e.g. press release + 10-Q), confidence is bumped.
- Human review - every scraper run opens a PR. A human reviewer confirms the value against the source URL before merging.
The 7 verification gates
An auto-extracted disclosure is only marked verified if ALL seven of these pass:
- NRR within 50%–250% range
- Qualifier is "exact" (not "above", "below", or "approximately")
- Period is fully determined (fiscal year + fiscal quarter, or fiscal year for full-year disclosures)
- Confidence ≥ 0.85
- Multi-source agreement: ≥2 candidates within 1pp, OR regex + LLM agree within 1pp
- YoY change ≤ 25pp vs prior verified disclosure
- QoQ change ≤ 12pp vs prior verified disclosure
If any gate fails, the entry is flagged pending-manual-verify and excluded from public benchmarks until a human signs off.
Cell publication rules
A cell page goes live only when:
- ≥2 distinct companies have verified disclosures in that cell
- ≥2 verified disclosures total
- No single company contributes >50% of the data
This avoids "single-company medians" that would mislead viewers.
Conflict handling
If a previously-verified value is contradicted by a new scrape:
- The old record is demoted to
pending-manual-verify - The new record is also written as
pending-manual-verify - The cell page is regenerated without either value
- A human resolves the conflict and re-verifies
Human-verified entries are never overwritten by the scraper.
Update cadence
The scraper runs daily at 06:00 UTC. New disclosures appear in PRs within 24 hours of company filings. The historical backfill covers all SEC filings since 2020.
Current state
- 35 public companies tracked
- 343 verified disclosures
- 18 disclosures pending verification
- 7 live cell pages
Found an error?
Every disclosure links to its source URL. If the published value differs from the company's actual filing, please email us with the URL and the correct value. We re-verify within 24 hours.
Citing this data
The dataset is free to cite. Please attribute as: "NRR data from cust.co/nrr-benchmark, sourced from SEC filings." Per-cell and per-company JSON is available via the cells API and companies API.