Sources, Methodology, & Analytical Choices
The story of location-driven salary variation demands clear assumptions.
Focus
This slide shows the evidence base behind the focus. The sources and methodology are included so the claim that salary differences are mainly tied to COLA and population density can be evaluated transparently.
Primary Data Source
Bureau of Labor Statistics (BLS) Occupational Employment and Wage Statistics (OEWS):
May 2024 state-level wage estimates for standard occupation codes aligned with Data Practitioner roles (Data Scientist, Data Engineer, Data Analyst, Business Analyst, Data Architect).
Geographic & Economic Data
Population Density: U.S. Census Bureau state-level population density (2024 estimates).
Cost of Living (COLA Index): BLS Average Energy Prices; state-level housing and general cost of living proxies (2024 baseline = 100, U.S. average).
Role-to-Occupation Mapping
Broad role descriptors map to BLS standard occupational categories:
- • Data Scientist → 15-2051 (Computer Scientists & IT Researchers)
- • Data Engineer → 15-1252 (Software Developers)
- • Data Analyst → 13-1111 (Management Analysts)
- • Business Analyst → 13-1111 (Management Analysts)
- • Data Architect → 15-1243 (Database Architects)
This is an analytical approximation, not an exhaustive labor market census. Real practitioners span multiple roles and titles.
Location Categories & "Remote"
Remote work: Treated as its own category because remote workers negotiate with national labor markets, not local COLA. This disconnects them from geographic anchors (density, COLA).
Regional grouping: Northeast, Southeast, Midwest, Southwest, and West are aggregated into location types (Urban High-COLA, Suburban Medium-COLA, etc.) to reveal patterns. This is a simplification; high heterogeneity exists within regions.
What We Don't Measure
- • Real purchasing power: COLA provides a proxy, but individual costs vary (e.g., some earn higher remote salaries while living in low-COLA areas).
- • Experience & education: BLS averages mask variation within roles across seniority levels.
- • Underemployment & underreporting: Gig workers, contract roles, and informal arrangements are not captured.
- • Benefits, equity, and total compensation: The analysis is salary-only.
Data Pipeline
The notebook data_practitioner_salary_analysis.ipynb fetches BLS OEWS data via API, aggregates by role and state, enriches with COLA and density metadata, and exports to JSON. The Next.js app then reads these static JSON files and renders interactive visualizations. This pipeline is designed to support the story focus by separating role effects from the geographic effects of COLA and population density.
Project Download
Download the complete Next.js project archive, including the static data files used in this story.
Download Project ZipLimitations & Next Steps
- • COLA is a coarse proxy for living costs; housing costs dominate but vary drastically within regions.
- • Sample sizes vary by occupation and region (see "count" in JSON), affecting reliability.
- • Remote salary classification is approximate; many "remote" roles may be hybrid or constrained to certain regions.
- • Future work: compare salaries to career trajectory, retention, and actual cost-of-living burden by location.