Methodology
Data sources
- Utility registry — EPA Safe Drinking Water Information System (SDWIS), which lists all ~50,000 active community water systems in the US. EPA SDWIS ↗
- Rate schedules — collected directly from each utility's published documents: PDFs, web pages, and official fee schedules.
- Median household income — US Census Bureau American Community Survey (ACS) 2022 5-year estimates, variable B19013_001E, at the census tract and zip code tabulation area (ZCTA) level. ACS 5-year ↗
- Service area boundaries — EPA Water System Service Areas dataset, where available. EPA Service Areas ↗
- Census tract boundaries — Census Bureau TIGER/Line Cartographic Boundary Files, 2022 vintage. Census Boundaries ↗
Data collection and extraction
Rate data is sourced directly from official utility publications. We use Google Gemini Flash to read rate schedule documents — PDFs and web pages — and extract structured rate information into a consistent schema adapted from the Open Water Rate Specification (OWRS) ↗. PDFs are sent to the model in full as native file uploads; HTML pages are sent as text. This handles complex table layouts, multi-column formats, and scanned documents without manual preprocessing.
A classifier model screens each document before extraction to reject wrong document types (sewer-only schedules, connection fee sheets, payment portals, etc.). Bill calculation, tier ordering, and unit conversion are all deterministic rule-based code, not model inference. The source document, model name, and extraction date are recorded for every entry and shown in the calculation detail on each utility's page.
Benchmark calculation
All rates are standardized to 6,000 gallons per month for a residential customer with a 5/8" or 3/4" meter — a typical US household (~50 gallons per person per day for a 4-person household). This consumption level is used as a reference in EPA affordability analysis. EPA Water Affordability Needs Assessment ↗
The monthly cost includes:
- Fixed base / service charge
- Volumetric charges across all applicable tiers
It excludes taxes, sewer fees, stormwater charges, and other surcharges, which vary widely and are often billed separately.
Seasonal rates
Some utilities charge different volumetric rates by season. Our approach:
- Benchmark — we report the peak season rate as the headline figure, consistent with established affordability analysis methodology. Teodoro (2018), Journal AWWA ↗
- Rationale — a household that cannot afford the peak bill faces genuine access risk regardless of what they pay in other months.
- Full breakdown — all seasonal rates are shown in the calculation detail on each utility's page.
Zone-based rates
Some utilities charge different rates by geographic pressure zone or service district. Our approach:
- Zone selection — we use Zone 1 (or the lowest-numbered, lowest-elevation zone), which represents the standard rate for the typical customer.
- In-district — inside-city vs. outside-city distinctions are not treated as zones; we always use the in-district residential rate.
- Multi-zone notation — the zone name is recorded for utilities with multiple zones.
Billing frequency
Some utilities invoice bimonthly or quarterly rather than monthly. We normalize all figures to a monthly equivalent:
- Volume is scaled to the billing period (e.g. ×2 for bimonthly, ×3 for quarterly).
- The full-period bill is computed against the scaled volume and tier structure.
- The result is divided back to a per-month figure.
This ensures direct comparability across utilities regardless of how they invoice customers.
Water burden
Water burden is calculated as annual water cost as a percentage of median household income:
The EPA's affordability guidance applies a 4% threshold to combined water and sewer costs. Since whatwatercosts.org reports water-only rates, we use 2% as the equivalent benchmark for water service alone. Utilities where water costs exceed 2% of median household income are flagged as a potential affordability concern. EPA Financial Capability Assessment Guidance (1997) ↗
Median household income is matched to each utility in two stages:
- Primary — tract-weighted: for utilities with a mapped service area polygon (from the EPA Water System Service Areas dataset), we intersect the polygon with 2022 Census tract boundaries and compute a population-weighted average of median household income across all overlapping tracts. Source: ACS 2022 5-year estimates, B19013_001E + B01003_001E.
- Fallback — zip code: for utilities without a service area polygon, we match to the zip code tabulation area (ZCTA) from SDWIS. Source: ACS 2022 5-year estimates, B19013_001E.
Even with tract-level matching, service area boundaries are an approximation. A utility's true customer base may differ from its mapped footprint.
Data quality
Every extraction passes a two-layer validation protocol before publication. Status labels describe the outcome. They do not mean the utility has approved the number — all figures are standardized estimates for comparison, not official bills.
Hard checks
These must all pass or the extraction is rejected outright, regardless of other results.
- Known unit — the rate unit must be one of: gal, kgal, ccf, hcf, cf, m3. An unknown unit means we cannot compute a bill.
- Tier boundary integrity — tier boundaries must be gapless and contiguous: each tier's start equals the previous tier's end, the first tier starts at 0, and the last tier has no upper bound.
- Bill sanity ($10–$300) — the computed 6,000-gallon benchmark bill must be within a plausible range for U.S. residential water. Bills outside this range almost always indicate a unit confusion, billing-period error, or commercial rate being applied.
Scored checks
Each check is pass, fail, or not applicable. An extraction must pass at least 80% of applicable checks to be published. Results are shown in the "How we calculated this" detail on each utility's page.
- Monotonic pricing — tier prices should be non-decreasing (conservation pricing). Declining block rates are flagged but not rejected.
- Price range plausible — per-unit prices must fall within known bounds for the stated unit (e.g. $0.75–$25/ccf). Values outside these ranges almost always indicate a unit mismatch.
- Fixed charge range — the base service charge must be between $0 and $150/month. Higher values usually mean a quarterly or bimonthly charge was mistakenly treated as monthly.
- No zero tier prices — free water is almost never correct for a residential rate schedule.
- Tier count (1–8) — more than 8 tiers is highly unusual and suggests a parsing error.
- Tier boundary scale — tier boundaries must be consistent with the stated unit. Boundaries of 6,000 with unit=kgal implies 6 million gallons — a common sign of gallon/kgal confusion.
- Residential customer class — the extracted tariff should be for residential customers.
- Effective date parseable — if an effective date is present, it must be a valid date and not more than one year in the future.
- Plausible for state — the benchmark bill must fall within the Tukey fence (Q1 − 1.5×IQR to Q3 + 1.5×IQR) of other utilities in the same state. Only applied when at least 10 state peers have valid rates.
Status labels
- flagged
- Failed a hard check or scored below 80% on the scored checklist. Not shown in comparisons, rankings, or averages.
- model estimate
- Passed all hard checks and ≥80% of scored checks. Extracted by a single model pass. The extraction and bill calculation are internally consistent but have not been independently verified.
- model verified
- A second model independently extracted the same rate schedule and reached the same result, increasing confidence in the figures.
- human verified
- A person manually checked the estimate against the source document and confirmed or corrected it.
Limitations
Rate schedules change. We do our best to keep data current but cannot guarantee real-time accuracy. Seasonal rates, drought surcharges, and budget-based tiers may not be fully captured. Low-income assistance programs are not reflected in the benchmark figure. Service area boundaries are approximations; a utility's true customer base may not align with its mapped footprint.
If you find an error, please use the feedback form on the utility's page or contact us directly.
How to cite
If you use whatwatercosts.org data in research or reporting, please cite it as:
Raw data and machine-readable descriptions are available at /llms.txt.
Open source
We're committed to transparency. Full open source code is coming soon. In the meantime, data corrections and feedback are always welcome via the form on each utility's page or by contacting us directly.