California Public Expenditure Analysis Back to Dashboard

California Public Expenditure Analysis¶

Exploratory Data Analysis | 2013-2023

This notebook analyzes city-level public expenditure data across California's four regions: Northern California, Southern California, Central California, and Sierra Nevada.

Data source: California State Controller's Office
Files used: City_Expenditures_Per_Capita.csv, city_to_region.csv

1. Load & Merge Data¶

In [1]:
import pandas as pd

expenditure = pd.read_csv('City_Expenditures_Per_Capita.csv')
regions = pd.read_csv('city_to_region.csv')

df = expenditure.merge(regions, on='Entity Name', how='inner')
df = df[(df['Fiscal Year'] >= 2013) & (df['Fiscal Year'] <= 2023)]
df = df.dropna(subset=['Expenditures Per Capita'])

print(f'Records: {len(df):,}')
print(f'Cities:  {df["Entity Name"].nunique()}')
print(f'Years:   {df["Fiscal Year"].min()} - {df["Fiscal Year"].max()}')

table_style = [
    {'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
    {'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
    {'selector': '', 'props': [('border-collapse', 'collapse')]}
]

df.head().style.set_table_styles(table_style)
Records: 5,147
Cities:  468
Years:   2013 - 2023
Out[1]:
  Entity Name Fiscal Year Total Expenditures Estimated Population Expenditures Per Capita Region
0 Adelanto 2023 23547968 36656.000000 642.000000 Southern California
1 Adelanto 2022 21497555 36357.000000 591.000000 Southern California
2 Adelanto 2021 20580944 36569.000000 563.000000 Southern California
3 Adelanto 2020 26376314 35663.000000 740.000000 Southern California
4 Adelanto 2019 30886709 35136.000000 879.000000 Southern California

2. Regional Overview¶

Total expenditure and 2023 population by region.

In [2]:
totals = df.groupby('Region')['Total Expenditures'].sum().sort_values(ascending=False)
pop_2023 = df[df['Fiscal Year'] == 2023].groupby('Region')['Estimated Population'].sum()

overview = pd.DataFrame({
    'Total Expenditure ($B)': (totals / 1e9).round(1).apply(lambda x: f'{x:.1f}'),
    'Share (%)': (totals / totals.sum() * 100).round(1).apply(lambda x: f'{x:.1f}'),
    'Population 2023 (M)': (pop_2023 / 1e6).round(2).apply(lambda x: f'{x:.2f}')
}).reset_index()

overview.style.set_table_styles([
    {'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
    {'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
    {'selector': '', 'props': [('border-collapse', 'collapse')]}
]).hide(axis='index')
Out[2]:
Region Total Expenditure ($B) Share (%) Population 2023 (M)
Central California 59.5 6.6 3.42
Northern California 340.7 37.5 9.42
Sierra Nevada 3.3 0.4 0.12
Southern California 505.2 55.6 18.94

3. Mean vs Median Per Capita¶

When mean is much higher than median, outliers are skewing the average.

In [3]:
stats = df.groupby('Region')['Expenditures Per Capita'].agg(
    Mean='mean',
    Median='median',
    Std='std'
).round(0).astype(int)

stats['Mean/Median Ratio'] = (stats['Mean'] / stats['Median']).round(1)
stats['Outlier Warning'] = stats['Mean/Median Ratio'].apply(
    lambda x: 'Yes - investigate' if x > 3 else 'No'
)
stats.reset_index().style.set_table_styles([
    {'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
    {'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
    {'selector': '', 'props': [('border-collapse', 'collapse')]}
]).hide(axis='index')
Out[3]:
Region Mean Median Std Mean/Median Ratio Outlier Warning
Central California 1806 1288 2162 1.400000 No
Northern California 2561 1827 5409 1.400000 No
Sierra Nevada 2508 2198 1921 1.100000 No
Southern California 11762 1343 118710 8.800000 Yes - investigate

4. Identify Outliers¶

Top 10 highest per-capita cities across all years. These can distort regional averages.

In [4]:
top10 = (
    df[['Entity Name', 'Fiscal Year', 'Expenditures Per Capita', 'Estimated Population', 'Region']]
    .sort_values('Expenditures Per Capita', ascending=False)
    .head(10)
    .reset_index(drop=True)
)
top10['Expenditures Per Capita'] = top10['Expenditures Per Capita'].apply(lambda x: f'${x:,.0f}')
top10['Estimated Population'] = top10['Estimated Population'].apply(lambda x: f'{x:,.0f}')
top10.style.set_table_styles([
    {'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
    {'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
    {'selector': '', 'props': [('border-collapse', 'collapse')]}
])
Out[4]:
  Entity Name Fiscal Year Expenditures Per Capita Estimated Population Region
0 Vernon 2015 $2,376,905 123 Southern California
1 Vernon 2014 $2,366,253 122 Southern California
2 Vernon 2013 $2,203,322 121 Southern California
3 Vernon 2016 $1,674,096 210 Southern California
4 Vernon 2023 $1,373,375 205 Southern California
5 Vernon 2022 $1,330,643 208 Southern California
6 Vernon 2021 $1,299,582 210 Southern California
7 Vernon 2018 $1,275,596 209 Southern California
8 Vernon 2017 $1,254,691 209 Southern California
9 Vernon 2019 $848,814 301 Southern California

5. Per Capita Growth by Region (2013 to 2023)¶

Using median to avoid outlier distortion.

In [5]:
y2013 = df[df['Fiscal Year'] == 2013].groupby('Region')['Expenditures Per Capita'].median()
y2023 = df[df['Fiscal Year'] == 2023].groupby('Region')['Expenditures Per Capita'].median()

growth = pd.DataFrame({'2013 Median': y2013, '2023 Median': y2023})
growth['Change (%)'] = ((growth['2023 Median'] / growth['2013 Median'] - 1) * 100).round(0).astype(int)
growth['2013 Median'] = growth['2013 Median'].apply(lambda x: f'${x:,.0f}')
growth['2023 Median'] = growth['2023 Median'].apply(lambda x: f'${x:,.0f}')
growth['Change (%)'] = growth['Change (%)'].apply(lambda x: f'+{x}%' if x > 0 else f'{x}%')
growth.reset_index().style.set_table_styles([
    {'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
    {'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
    {'selector': '', 'props': [('border-collapse', 'collapse')]}
]).hide(axis='index')
Out[5]:
Region 2013 Median 2023 Median Change (%)
Central California $1,066 $1,802 +69%
Northern California $1,412 $2,387 +69%
Sierra Nevada $2,004 $2,795 +39%
Southern California $1,061 $1,706 +61%

6. Top 5 & Bottom 5 Cities per Region¶

Excluding Vernon and Industry (industrial enclaves with ~100 residents) as they are not representative.

In [6]:
from IPython.display import display, HTML

OUTLIERS = ['Vernon', 'Industry']
filtered = df[~df['Entity Name'].isin(OUTLIERS)]

city_avg = (
    filtered.groupby(['Entity Name', 'Region'])['Expenditures Per Capita']
    .mean()
    .reset_index()
    .rename(columns={'Expenditures Per Capita': 'Avg Per Capita'})
)

table_style = [
    {'selector': 'th', 'props': [('font-weight', 'bold'), ('text-align', 'left'), ('padding', '6px 12px'), ('background', '#f0f4f8'), ('color', '#212529')]},
    {'selector': 'td', 'props': [('padding', '6px 12px'), ('color', '#212529'), ('background', '#ffffff')]},
    {'selector': '', 'props': [('border-collapse', 'collapse')]}
]

for region in sorted(city_avg['Region'].unique()):
    rc = city_avg[city_avg['Region'] == region].sort_values('Avg Per Capita', ascending=False)
    top5 = rc.head(5)[['Entity Name', 'Avg Per Capita']].copy().reset_index(drop=True)
    bot5 = rc.tail(5)[['Entity Name', 'Avg Per Capita']].copy().reset_index(drop=True)
    top5['Avg Per Capita'] = top5['Avg Per Capita'].apply(lambda x: f'${x:,.0f}')
    bot5['Avg Per Capita'] = bot5['Avg Per Capita'].apply(lambda x: f'${x:,.0f}')

    top5_html = top5.style.set_table_styles(table_style).hide(axis='index').to_html()
    bot5_html = bot5.style.set_table_styles(table_style).hide(axis='index').to_html()

    display(HTML(f'''
        <h4 style="color:#1a1a2e;margin-top:28px;margin-bottom:12px">{region}</h4>
        <div style="display:flex;gap:40px;align-items:flex-start">
            <div>
                <p style="color:#555;margin:0 0 6px;font-size:13px">Top 5</p>
                {top5_html}
            </div>
            <div>
                <p style="color:#555;margin:0 0 6px;font-size:13px">Bottom 5</p>
                {bot5_html}
            </div>
        </div>
    '''))

Central California

Top 5

Entity Name Avg Per Capita
Sand City $18,405
Carmel-By-The-Sea $5,891
Monterey $4,960
Pismo Beach $4,777
Santa Barbara $3,885

Bottom 5

Entity Name Avg Per Capita
Waterford $824
Mendota $798
Maricopa $794
Wasco $666
Arvin $635

Northern California

Top 5

Entity Name Avg Per Capita
Mountain View $45,645
San Francisco $13,452
Colma $12,956
Palo Alto $8,454
Trinidad $7,908

Bottom 5

Entity Name Avg Per Capita
Oakley $770
Riverbank $735
Ione $706
Clayton $671
Citrus Heights $660

Sierra Nevada

Top 5

Entity Name Avg Per Capita
Mammoth Lakes $4,920
Amador $4,124
South Lake Tahoe $3,129
Nevada City $2,836
Truckee $2,745

Bottom 5

Entity Name Avg Per Capita
Loyalton $1,883
Sonora $1,729
Portola $1,721
Auburn $1,693
Susanville $1,165

Southern California

Top 5

Entity Name Avg Per Capita
Irwindale $19,945
Beverly Hills $11,989
Indian Wells $8,972
Big Bear Lake $8,621
Santa Monica $7,480

Bottom 5

Entity Name Avg Per Capita
Cudahy $498
Maywood $475
Laguna Woods $448
Eastvale $436
Rancho Santa Margarita $426

7. Key Takeaways¶

  1. Southern California's average is misleading. Its mean per-capita (11,762) is nearly 9x its median (1,343) due to Vernon, an industrial city of ~100 residents spending over 2M per capita. Median is the right metric here.

  2. All regions grew significantly. Median per-capita spending rose 39-69% across all regions from 2013 to 2023, with a visible uptick in 2020-2021 consistent with pandemic-era emergency spending.

  3. City size predicts per-capita spend more than region does. The top spenders in every region are small municipalities, resort towns (Mammoth Lakes), tech hubs (Mountain View), or coastal cities (Carmel-By-The-Sea), with high fixed costs relative to their populations.

  4. Within-region variation is extreme. In Northern California, Mountain View (45,645) spends 69x more per capita than Citrus Heights (660). Regional averages alone are not sufficient for meaningful comparison.