Methodology
How Capitol Releases works
Capitol Releases archives official press output from the 535 voting seats in the U.S. Congress, with 437 House member rows configured for launch. The goal is a searchable public record with enough provenance that a reporter can cite it and a developer can audit it.
What we collect
We collect original content from official .gov member websites: press releases, statements, op-eds, blog posts, floor statements, letters and photo releases.
The collection window starts Jan. 1, 2025. For seat changes, the archive follows the current officeholder only from the day that person took office.
| Type | Definition |
|---|---|
press_releasePress release | The default class for original announcements from a member's news, media or press section. |
statementStatement | A public statement posted by the office, usually without a separate legislative action attached. |
op_edOp-ed | Signed commentary or opinion writing republished on the official site. |
blogBlog post | Original posts from member blog, diary, newsletter or similar site sections. |
floor_statementFloor statement | Floor remarks when a member's office publishes them on its own press page. |
letterLetter | Published letters to agencies, officials, colleagues or constituents. |
photo_releasePhoto release | Photo-only or media-advisory items. Stored, but excluded from default public feeds. |
presidential_actionPresidential action | White House actions stored in the same schema for federal executive coverage. |
otherOther | Original official content that does not fit a more specific class. Reviewed during cleanup. |
What we don't
We do not collect third-party clippings, "In the News" mentions, campaign content, campaign websites, interviews or outside media hits.
We do not backfill predecessor coverage when a seat changes hands. We also do not collect voting records, bill tracking or campaign finance records. Those records already exist elsewhere, including Congress.gov and the FEC.
How dates work
Every record can carry two date fields beyond the timestamp itself: date_source and date_confidence. They record where the date came from and how much the parser trusts it.
Most dates come from metadata, listing text or page-level date elements. About 1% of records have null dates, mostly ColdFusion sites where the date is embedded in body text rather than exposed as metadata.
Provenance
Every record stores source_url, scrape_run and scraped_at. The source URL is the office's page. The scrape run ties the row back to a collector pass. The scrape timestamp says when Capitol Releases saw it.
Records are never hard-deleted. If a source URL stops resolving on repeated checks, the row stays in the archive and gets a deleted_at tombstone.
Update cadence
GitHub Actions runs collection four times a day: 13:00, 17:00, 21:00 and 01:00 UTC. The same schedule refreshes WordPress JSON silos used for op-eds, newsletters, blogs and related official sections.
A health check runs before every collection pass. It verifies that configured source pages respond, selectors still find items and dates remain parseable.
Coverage status
The live coverage diagnostic is expected at docs/coverage-diagnostic-2026-05-03.json. Until that lands, this page points to the current House trouble list.
House coverage trouble sites, May 3, 2026
| Metric | Status | Note |
|---|---|---|
| U.S. senators | 100 / 100 | 90 clean, 10 documented gaps |
| House members configured | 437 | Every configured member has a source row |
| House members reaching Jan. 2025 | 323 | 74% of configured House rows |
| House trouble list | 39 | Zero, null-date, selector, pagination or low-volume cases |
Known low-volume offices
Some offices publish rarely or not at all. Those rows will be marked in the seed files once the expected_low_volume and expected_zero fields land.
| Name | Chamber | District/state | Status | Reason | Last verified |
|---|---|---|---|---|---|
| Alan Armstrong | Senate | OK | Expected zero | Sworn in 2026-03-24 to fill ND seat vacated by Hoeven retirement; office is still in setup phase, no press releases published yet (verified 2026-05-03). | 2026-04-15 |
Schema history
The schema was renamed in May 2026 as the project moved from a Senate-only archive to Congress-wide coverage. The old senators table became officials, and press_releases became official_site_items. Compatibility views remain during the transition.