About & methodology
How Capitol Releases works
Capitol Releases archives official press output from all 100 U.S. senators and 435 U.S. representatives. The goal is a searchable public record with enough provenance that a reporter can cite it and a developer can audit it.
What we collect
We collect original content from official .gov member websites: press releases, statements, op-eds, blog posts, floor statements, letters and photo releases.
The collection window starts Jan. 1, 2025. For seat changes, the archive follows the current officeholder only from the day that person took office.
| Type | Definition |
|---|---|
press_releasePress release | The default class for original announcements from a member's news, media or press section. |
statementStatement | A public statement posted by the office, usually without a separate legislative action attached. |
op_edOp-ed | Signed commentary or opinion writing republished on the official site. |
blogBlog post | Original posts from member blog, diary, newsletter or similar site sections. |
floor_statementFloor statement | Floor remarks when a member's office publishes them on its own press page. |
letterLetter | Published letters to agencies, officials, colleagues or constituents. |
photo_releasePhoto release | Photo-only or media-advisory items. Stored, but excluded from default public feeds. |
presidential_actionPresidential action | White House actions stored in the same schema for federal executive coverage. |
otherOther | Original official content that does not fit a more specific class. Reviewed during cleanup. |
What we don't
We do not collect third-party clippings, "In the News" mentions, campaign content, campaign websites, interviews or outside media hits.
We do not backfill predecessor coverage when a seat changes hands. We also do not collect voting records, bill tracking or campaign finance records. Those records already exist elsewhere, including Congress.gov and the FEC.
How dates work
Every record can carry two date fields beyond the timestamp itself: date_source and date_confidence. They record where the date came from and how much the parser trusts it.
Most dates come from metadata, listing text or page-level date elements. About 1% of records have null dates, mostly ColdFusion sites where the date is embedded in body text rather than exposed as metadata.
Provenance
Every record stores source_url, scrape_run and scraped_at. The source URL is the office's page. The scrape run ties the row back to a collector pass. The scrape timestamp says when Capitol Releases saw it.
Records are never hard-deleted. If a source URL stops resolving on repeated checks, the row stays in the archive and gets a deleted_at tombstone.
Update cadence
GitHub Actions runs collection four times a day: 13:00, 17:00, 21:00 and 01:00 UTC. The same schedule refreshes WordPress JSON silos used for op-eds, newsletters, blogs and related official sections.
A health check runs before every collection pass. It verifies that configured source pages respond, selectors still find items and dates remain parseable.
Coverage status
The live coverage diagnostic is expected at docs/coverage-diagnostic-2026-05-03.json. Until that lands, this page points to the current House trouble list.
House coverage trouble sites, May 3, 2026
| Metric | Status | Note |
|---|---|---|
| U.S. senators (clean) | 311 / 100 | 3 documented gaps; rest publishing on schedule. |
| U.S. House (configured) | 437 / 435 | Every member has a source row. Two non-voting delegate seats exclude (DC, PR, etc., counted in the 437 active rows). |
| House — reaches Jan. 2025 | 418 / 437 | 95.7% have ≥10 records reaching back to early 2025. |
| House — bulletproof accounted for | 435 / 437 | 99.5% — clean rows plus 17 members tagged as documented gaps (Phase 2 scrapers, scraper bugs, low-volume offices). |
| House — open trouble list | 2 | Members with shallow archives where the cause is still under investigation. |
Known low-volume offices
Some offices publish rarely or not at all. Those rows will be marked in the seed files once the expected_low_volume and expected_zero fields land.
| Name | Chamber | District/state | Status | Reason | Last verified |
|---|---|---|---|---|---|
| Adelita S. Grijalva | House | AZ-7 | Expected low volume | Sworn in 2025-09 after winning special election to replace her father Raúl Grijalva (died March 2025). Limited archive expected — 64 records since first day. | 2026-05-03 |
| Alan Armstrong | Senate | OK | Expected zero | Sworn in 2026-03-24 to fill ND seat vacated by Hoeven retirement; office is still in setup phase, no press releases published yet (verified 2026-05-03). | 2026-04-15 |
| Ashley Moody | Senate | FL | Expected low volume | Appointed March 5, 2025 to fill FL Senate seat vacated when Marco Rubio became Secretary of State. Listing reaches first publishing date; pre-appointment coverage doesn't apply. | 2026-05-03 |
| Christian D. Menefee | House | TX-18 | Expected low volume | Sworn in February 2, 2026 after winning special election for Sylvester Turner's former Houston seat (TX-18). Limited archive expected — recent appointment. | 2026-05-03 |
| Clay Fuller | House | GA-14 | Expected low volume | Sworn in April 14, 2026 after winning special election to replace Marjorie Taylor Greene (GA-14). Limited archive expected — recent appointment. | 2026-05-03 |
| Guy Reschenthaler | House | PA-14 | Expected low volume | Chief Deputy Whip in 119th Congress. Whip-team work focuses on internal vote counting and member outreach rather than public press operations, which explains the relatively light press output (40 records starting March 2025). Verified by visual scan of his media listing 2026-05-03 — this is his real output, not a scraper gap. | 2026-05-03 |
| Jim Jordan | House | OH-4 | Expected zero | Found 13 items on listing page. | 2026-05-03 |
| Jon Husted | Senate | OH | Expected low volume | Appointed February 18, 2025 to fill OH Senate seat vacated when JD Vance became Vice President. Listing reaches first publishing date; pre-appointment coverage doesn't apply. | 2026-05-03 |
| Sheila Cherfilus-McCormick | House | FL-20 | Expected low volume | Resigned April 21, 2026 while facing House Ethics action and federal charges. Coverage stops at resignation date. | 2026-05-03 |
Schema history
The schema was renamed in May 2026 as the project moved from a Senate-only archive to Congress-wide coverage. The old senators table became officials, and press_releases became official_site_items. Compatibility views remain during the transition.