Data Sources
PolicySoup ingests legislative data from official and authoritative public sources. This page describes each upstream source, what it provides, and how we use it.
Legislative data
Congress.gov API
The primary source for bill metadata, actions, committee referrals, cosponsor lists, official CRS summaries, bill text availability, subject classifications, nominations, and related-bill cross-references. Maintained by the Library of Congress and updated continuously as Congress acts.
House Office of the Clerk
Roll call vote XML feeds for House floor votes, member data, and committee membership lists. Used to ingest individual legislator vote positions and to verify committee assignments.
United States Senate
Roll call vote records for Senate floor votes, including individual senator positions. Senate vote data is parsed from the official XML feeds published after each roll call.
unitedstates/congress-legislators
github.com/unitedstates/congress-legislators
A community-maintained, authoritative dataset of current and historical members of Congress. Provides biographical details, party affiliation, state and district assignments, social media handles, and official website URLs. We extend this dataset with Bluesky handle discovery for legislators active on the AT Protocol.
News coverage
PolicySoup ingests news articles from a curated set of RSS feeds to surface relevant coverage alongside legislation. Current sources include:
- BBC News – US & Canada politics
- NPR – Politics feed
- PBS NewsHour – Politics
- Politico – Congress feed
- Axios – General news feed
- The Hill – News feed
Articles are matched to relevant bills using keyword and metadata heuristics. News coverage is supplementary — it does not affect vote data, bill metadata, or AI analysis.
Social discussion
PolicySoup surfaces public messages on the ATmosphere that reference legislation. Matched posts are displayed alongside bill pages as public discussion.
AI analysis
PolicySoup generates plain-language analysis from bill text and related metadata using large language models (LLMs). Analysis may include:
- Plain-language summaries – What the bill does, in plain English.
- Stakeholder impacts – How the bill could affect taxpayers, specific industries, social groups, government agencies, and foreign policy.
- Passage outlook – An assessment of how likely the bill is to advance through each chamber, based on its sponsors, co-sponsors, committee status, and political environment.
- Ideological perspectives – How the bill is likely to be viewed from liberal, centrist, and conservative vantage points, including potential benefits and concerns from each.
- Legislative design notes – An assessment of the bill's drafting quality, scope alignment, and technical craftsmanship.
These outputs are interpretive aids, not official descriptions. They are generated automatically and have not been reviewed or endorsed by any government body, lawmaker, or policy expert.
Known limitations
- Data lag. Congressional metadata — particularly bill actions, committee assignments, and vote records — can lag behind real-world events by hours or days depending on when official sources publish updates.
- AI accuracy. AI-generated summaries and analysis can miss important context, overstate certainty, or reflect gaps in the source material. Bills with sparse official text or limited public record are harder to analyse accurately.
- Passage predictions. Passage likelihood scores are model predictions based on observable signals at a point in time. They are not forecasts and should not be treated as outcomes.
- Bill type variation. Symbolic resolutions, commemorative bills, and procedural measures require different interpretation than substantive legislation. Our analysis attempts to flag these cases, but context always matters.
- Historical coverage. Data completeness is strongest for recent congresses (118th and 119th). Older sessions may have incomplete vote records, missing bill text, or limited analysis.
- News matching. Article-to-bill matching is automated and imperfect. Some articles may be linked to the wrong bill, and many relevant articles may not be matched at all.
For questions about our data or methodology, contact hello@policysoup.com.