Skip to content

Methodology

How Capitol Releases works

Capitol Releases archives official press output from the 535 voting seats in the U.S. Congress, with 437 House member rows configured for launch. The goal is a searchable public record with enough provenance that a reporter can cite it and a developer can audit it.

What we collect

We collect original content from official .gov member websites: press releases, statements, op-eds, blog posts, floor statements, letters and photo releases.

The collection window starts Jan. 1, 2025. For seat changes, the archive follows the current officeholder only from the day that person took office.

TypeDefinition
press_release
Press release
The default class for original announcements from a member's news, media or press section.
statement
Statement
A public statement posted by the office, usually without a separate legislative action attached.
op_ed
Op-ed
Signed commentary or opinion writing republished on the official site.
blog
Blog post
Original posts from member blog, diary, newsletter or similar site sections.
floor_statement
Floor statement
Floor remarks when a member's office publishes them on its own press page.
letter
Letter
Published letters to agencies, officials, colleagues or constituents.
photo_release
Photo release
Photo-only or media-advisory items. Stored, but excluded from default public feeds.
presidential_action
Presidential action
White House actions stored in the same schema for federal executive coverage.
other
Other
Original official content that does not fit a more specific class. Reviewed during cleanup.

What we don't

We do not collect third-party clippings, "In the News" mentions, campaign content, campaign websites, interviews or outside media hits.

We do not backfill predecessor coverage when a seat changes hands. We also do not collect voting records, bill tracking or campaign finance records. Those records already exist elsewhere, including Congress.gov and the FEC.

How dates work

Every record can carry two date fields beyond the timestamp itself: date_source and date_confidence. They record where the date came from and how much the parser trusts it.

Most dates come from metadata, listing text or page-level date elements. About 1% of records have null dates, mostly ColdFusion sites where the date is embedded in body text rather than exposed as metadata.

Provenance

Every record stores source_url, scrape_run and scraped_at. The source URL is the office's page. The scrape run ties the row back to a collector pass. The scrape timestamp says when Capitol Releases saw it.

Records are never hard-deleted. If a source URL stops resolving on repeated checks, the row stays in the archive and gets a deleted_at tombstone.

Update cadence

GitHub Actions runs collection four times a day: 13:00, 17:00, 21:00 and 01:00 UTC. The same schedule refreshes WordPress JSON silos used for op-eds, newsletters, blogs and related official sections.

A health check runs before every collection pass. It verifies that configured source pages respond, selectors still find items and dates remain parseable.

Coverage status

The live coverage diagnostic is expected at docs/coverage-diagnostic-2026-05-03.json. Until that lands, this page points to the current House trouble list.

House coverage trouble sites, May 3, 2026

MetricStatusNote
U.S. senators100 / 10090 clean, 10 documented gaps
House members configured437Every configured member has a source row
House members reaching Jan. 202532374% of configured House rows
House trouble list39Zero, null-date, selector, pagination or low-volume cases

Known low-volume offices

Some offices publish rarely or not at all. Those rows will be marked in the seed files once the expected_low_volume and expected_zero fields land.

NameChamberDistrict/stateStatusReasonLast verified
Alan ArmstrongSenateOKExpected zeroSworn in 2026-03-24 to fill ND seat vacated by Hoeven retirement; office is still in setup phase, no press releases published yet (verified 2026-05-03).2026-04-15

Schema history

The schema was renamed in May 2026 as the project moved from a Senate-only archive to Congress-wide coverage. The old senators table became officials, and press_releases became official_site_items. Compatibility views remain during the transition.