California Public Expenditure Analysis¶
Exploratory Data Analysis | 2013-2023
This notebook analyzes city-level public expenditure data across California's four regions: Northern California, Southern California, Central California, and Sierra Nevada.
Data source: California State Controller's Office
Files used: City_Expenditures_Per_Capita.csv, city_to_region.csv
1. Load & Merge Data¶
import pandas as pd
expenditure = pd.read_csv('City_Expenditures_Per_Capita.csv')
regions = pd.read_csv('city_to_region.csv')
df = expenditure.merge(regions, on='Entity Name', how='inner')
df = df[(df['Fiscal Year'] >= 2013) & (df['Fiscal Year'] <= 2023)]
df = df.dropna(subset=['Expenditures Per Capita'])
print(f'Records: {len(df):,}')
print(f'Cities: {df["Entity Name"].nunique()}')
print(f'Years: {df["Fiscal Year"].min()} - {df["Fiscal Year"].max()}')
table_style = [
{'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
{'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
{'selector': '', 'props': [('border-collapse', 'collapse')]}
]
df.head().style.set_table_styles(table_style)
Records: 5,147 Cities: 468 Years: 2013 - 2023
| Entity Name | Fiscal Year | Total Expenditures | Estimated Population | Expenditures Per Capita | Region | |
|---|---|---|---|---|---|---|
| 0 | Adelanto | 2023 | 23547968 | 36656.000000 | 642.000000 | Southern California |
| 1 | Adelanto | 2022 | 21497555 | 36357.000000 | 591.000000 | Southern California |
| 2 | Adelanto | 2021 | 20580944 | 36569.000000 | 563.000000 | Southern California |
| 3 | Adelanto | 2020 | 26376314 | 35663.000000 | 740.000000 | Southern California |
| 4 | Adelanto | 2019 | 30886709 | 35136.000000 | 879.000000 | Southern California |
2. Regional Overview¶
Total expenditure and 2023 population by region.
totals = df.groupby('Region')['Total Expenditures'].sum().sort_values(ascending=False)
pop_2023 = df[df['Fiscal Year'] == 2023].groupby('Region')['Estimated Population'].sum()
overview = pd.DataFrame({
'Total Expenditure ($B)': (totals / 1e9).round(1).apply(lambda x: f'{x:.1f}'),
'Share (%)': (totals / totals.sum() * 100).round(1).apply(lambda x: f'{x:.1f}'),
'Population 2023 (M)': (pop_2023 / 1e6).round(2).apply(lambda x: f'{x:.2f}')
}).reset_index()
overview.style.set_table_styles([
{'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
{'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
{'selector': '', 'props': [('border-collapse', 'collapse')]}
]).hide(axis='index')
| Region | Total Expenditure ($B) | Share (%) | Population 2023 (M) |
|---|---|---|---|
| Central California | 59.5 | 6.6 | 3.42 |
| Northern California | 340.7 | 37.5 | 9.42 |
| Sierra Nevada | 3.3 | 0.4 | 0.12 |
| Southern California | 505.2 | 55.6 | 18.94 |
3. Mean vs Median Per Capita¶
When mean is much higher than median, outliers are skewing the average.
stats = df.groupby('Region')['Expenditures Per Capita'].agg(
Mean='mean',
Median='median',
Std='std'
).round(0).astype(int)
stats['Mean/Median Ratio'] = (stats['Mean'] / stats['Median']).round(1)
stats['Outlier Warning'] = stats['Mean/Median Ratio'].apply(
lambda x: 'Yes - investigate' if x > 3 else 'No'
)
stats.reset_index().style.set_table_styles([
{'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
{'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
{'selector': '', 'props': [('border-collapse', 'collapse')]}
]).hide(axis='index')
| Region | Mean | Median | Std | Mean/Median Ratio | Outlier Warning |
|---|---|---|---|---|---|
| Central California | 1806 | 1288 | 2162 | 1.400000 | No |
| Northern California | 2561 | 1827 | 5409 | 1.400000 | No |
| Sierra Nevada | 2508 | 2198 | 1921 | 1.100000 | No |
| Southern California | 11762 | 1343 | 118710 | 8.800000 | Yes - investigate |
4. Identify Outliers¶
Top 10 highest per-capita cities across all years. These can distort regional averages.
top10 = (
df[['Entity Name', 'Fiscal Year', 'Expenditures Per Capita', 'Estimated Population', 'Region']]
.sort_values('Expenditures Per Capita', ascending=False)
.head(10)
.reset_index(drop=True)
)
top10['Expenditures Per Capita'] = top10['Expenditures Per Capita'].apply(lambda x: f'${x:,.0f}')
top10['Estimated Population'] = top10['Estimated Population'].apply(lambda x: f'{x:,.0f}')
top10.style.set_table_styles([
{'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
{'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
{'selector': '', 'props': [('border-collapse', 'collapse')]}
])
| Entity Name | Fiscal Year | Expenditures Per Capita | Estimated Population | Region | |
|---|---|---|---|---|---|
| 0 | Vernon | 2015 | $2,376,905 | 123 | Southern California |
| 1 | Vernon | 2014 | $2,366,253 | 122 | Southern California |
| 2 | Vernon | 2013 | $2,203,322 | 121 | Southern California |
| 3 | Vernon | 2016 | $1,674,096 | 210 | Southern California |
| 4 | Vernon | 2023 | $1,373,375 | 205 | Southern California |
| 5 | Vernon | 2022 | $1,330,643 | 208 | Southern California |
| 6 | Vernon | 2021 | $1,299,582 | 210 | Southern California |
| 7 | Vernon | 2018 | $1,275,596 | 209 | Southern California |
| 8 | Vernon | 2017 | $1,254,691 | 209 | Southern California |
| 9 | Vernon | 2019 | $848,814 | 301 | Southern California |
5. Per Capita Growth by Region (2013 to 2023)¶
Using median to avoid outlier distortion.
y2013 = df[df['Fiscal Year'] == 2013].groupby('Region')['Expenditures Per Capita'].median()
y2023 = df[df['Fiscal Year'] == 2023].groupby('Region')['Expenditures Per Capita'].median()
growth = pd.DataFrame({'2013 Median': y2013, '2023 Median': y2023})
growth['Change (%)'] = ((growth['2023 Median'] / growth['2013 Median'] - 1) * 100).round(0).astype(int)
growth['2013 Median'] = growth['2013 Median'].apply(lambda x: f'${x:,.0f}')
growth['2023 Median'] = growth['2023 Median'].apply(lambda x: f'${x:,.0f}')
growth['Change (%)'] = growth['Change (%)'].apply(lambda x: f'+{x}%' if x > 0 else f'{x}%')
growth.reset_index().style.set_table_styles([
{'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
{'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
{'selector': '', 'props': [('border-collapse', 'collapse')]}
]).hide(axis='index')
| Region | 2013 Median | 2023 Median | Change (%) |
|---|---|---|---|
| Central California | $1,066 | $1,802 | +69% |
| Northern California | $1,412 | $2,387 | +69% |
| Sierra Nevada | $2,004 | $2,795 | +39% |
| Southern California | $1,061 | $1,706 | +61% |
6. Top 5 & Bottom 5 Cities per Region¶
Excluding Vernon and Industry (industrial enclaves with ~100 residents) as they are not representative.
from IPython.display import display, HTML
OUTLIERS = ['Vernon', 'Industry']
filtered = df[~df['Entity Name'].isin(OUTLIERS)]
city_avg = (
filtered.groupby(['Entity Name', 'Region'])['Expenditures Per Capita']
.mean()
.reset_index()
.rename(columns={'Expenditures Per Capita': 'Avg Per Capita'})
)
table_style = [
{'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
{'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
{'selector': '', 'props': [('border-collapse', 'collapse')]}
]
for region in sorted(city_avg['Region'].unique()):
rc = city_avg[city_avg['Region'] == region].sort_values('Avg Per Capita', ascending=False)
top5 = rc.head(5)[['Entity Name', 'Avg Per Capita']].copy().reset_index(drop=True)
bot5 = rc.tail(5)[['Entity Name', 'Avg Per Capita']].copy().reset_index(drop=True)
top5['Avg Per Capita'] = top5['Avg Per Capita'].apply(lambda x: f'${x:,.0f}')
bot5['Avg Per Capita'] = bot5['Avg Per Capita'].apply(lambda x: f'${x:,.0f}')
top5_html = top5.style.set_table_styles(table_style).hide(axis='index').to_html()
bot5_html = bot5.style.set_table_styles(table_style).hide(axis='index').to_html()
display(HTML(f'''
<h4 style="color:#1a1a2e;margin-top:28px;margin-bottom:12px">{region}</h4>
<div style="display:flex;gap:40px;align-items:flex-start">
<div>
<p style="color:#555;margin:0 0 6px;font-size:13px">Top 5</p>
{top5_html}
</div>
<div>
<p style="color:#555;margin:0 0 6px;font-size:13px">Bottom 5</p>
{bot5_html}
</div>
</div>
'''))
Central California
Top 5
| Entity Name | Avg Per Capita |
|---|---|
| Sand City | $18,405 |
| Carmel-By-The-Sea | $5,891 |
| Monterey | $4,960 |
| Pismo Beach | $4,777 |
| Santa Barbara | $3,885 |
Bottom 5
| Entity Name | Avg Per Capita |
|---|---|
| Waterford | $824 |
| Mendota | $798 |
| Maricopa | $794 |
| Wasco | $666 |
| Arvin | $635 |
Northern California
Top 5
| Entity Name | Avg Per Capita |
|---|---|
| Mountain View | $45,645 |
| San Francisco | $13,452 |
| Colma | $12,956 |
| Palo Alto | $8,454 |
| Trinidad | $7,908 |
Bottom 5
| Entity Name | Avg Per Capita |
|---|---|
| Oakley | $770 |
| Riverbank | $735 |
| Ione | $706 |
| Clayton | $671 |
| Citrus Heights | $660 |
Sierra Nevada
Top 5
| Entity Name | Avg Per Capita |
|---|---|
| Mammoth Lakes | $4,920 |
| Amador | $4,124 |
| South Lake Tahoe | $3,129 |
| Nevada City | $2,836 |
| Truckee | $2,745 |
Bottom 5
| Entity Name | Avg Per Capita |
|---|---|
| Loyalton | $1,883 |
| Sonora | $1,729 |
| Portola | $1,721 |
| Auburn | $1,693 |
| Susanville | $1,165 |
Southern California
Top 5
| Entity Name | Avg Per Capita |
|---|---|
| Irwindale | $19,945 |
| Beverly Hills | $11,989 |
| Indian Wells | $8,972 |
| Big Bear Lake | $8,621 |
| Santa Monica | $7,480 |
Bottom 5
| Entity Name | Avg Per Capita |
|---|---|
| Cudahy | $498 |
| Maywood | $475 |
| Laguna Woods | $448 |
| Eastvale | $436 |
| Rancho Santa Margarita | $426 |
7. Key Takeaways¶
Southern California's average is misleading. Its mean per-capita (11,762) is nearly 9x its median (1,343) due to Vernon, an industrial city of ~100 residents spending over 2M per capita. Median is the right metric here.
All regions grew significantly. Median per-capita spending rose 39-69% across all regions from 2013 to 2023, with a visible uptick in 2020-2021 consistent with pandemic-era emergency spending.
City size predicts per-capita spend more than region does. The top spenders in every region are small municipalities, resort towns (Mammoth Lakes), tech hubs (Mountain View), or coastal cities (Carmel-By-The-Sea), with high fixed costs relative to their populations.
Within-region variation is extreme. In Northern California, Mountain View (45,645) spends 69x more per capita than Citrus Heights (660). Regional averages alone are not sufficient for meaningful comparison.