Final Project: A deeper investigation of climate change in Bozeman, MT¶
This notebook will investigate seasonal trends in precipitation.
In [1]:
# Import libraries (all that will be needed for this project)
import earthpy
import hvplot.pandas
import os
import pathlib
import pandas as pd
import numpy as np
import holoviews as hv
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
print('Done!')
c:\Users\raini\miniconda3\envs\earth-analytics-python\Lib\site-packages\earthpy\__init__.py:7: UserWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html. The pkg_resources package is slated for removal as early as 2025-11-30. Refrain from using this package or pin to Setuptools<81. from pkg_resources import resource_string
Done!
Step 1: Set up a project directory.¶
In [2]:
# Set the project directory
clim_dir = os.path.join(
# Home Directory
pathlib.Path.home(),
'Documents',
'Graduate_School',
'EDA_Certificate',
'Final-Project_Bozeman-Climate',
'Data'
)
# Make the directory if needed
os.makedirs(clim_dir, exist_ok = True)
In [3]:
# Check that the filepath looks right.
clim_dir
Out[3]:
'C:\\Users\\raini\\Documents\\Graduate_School\\EDA_Certificate\\Final-Project_Bozeman-Climate\\Data'
In [4]:
# Access Observed Temperature data from the Bozeman Weather station
bzn_precip_url = ('https://www.ncei.noaa.gov/access/services/da'
'ta/v1?dataset=daily-summaries&dataTypes=PRCP&stations=USC00241044&'
'startDate=1892-04-08&endDate=2025-02-28&units=standard')
# We'll grab data starting on 4/8/1892 (the beginning of the record) and
# ending on 02/28/2025 (the last winter on the record)
bzn_precip_url
Out[4]:
'https://www.ncei.noaa.gov/access/services/data/v1?dataset=daily-summaries&dataTypes=PRCP&stations=USC00241044&startDate=1892-04-08&endDate=2025-02-28&units=standard'
In [5]:
# Download the climate data
# Set the filepath in our data directory
prcp_path = os.path.join(clim_dir, 'bozeman_precip_1892-2025.csv')
prcp_path
# Use an if/else statement in case the data have already been saved
if os.path.exists(prcp_path):
init_bzn_p_df = pd.read_csv(prcp_path)
print('The file was already saved.')
else:
# I'm storing the Tobs data in bzn_t_df
init_bzn_p_df = pd.read_csv(
bzn_precip_url,
index_col='DATE',
parse_dates=True,
na_values=['NaN']
)
# Save the climate data to file
init_bzn_p_df.to_csv(prcp_path)
print("The data were downloaded.")
# # Check that the download worked
# bzn_t_df.head()
The file was already saved.
In [6]:
# Check to make sure things look ok
init_bzn_p_df.tail()
Out[6]:
| DATE | STATION | PRCP | |
|---|---|---|---|
| 47627 | 2025-02-24 | USC00241044 | 0.00 |
| 47628 | 2025-02-25 | USC00241044 | 0.08 |
| 47629 | 2025-02-26 | USC00241044 | 0.00 |
| 47630 | 2025-02-27 | USC00241044 | 0.00 |
| 47631 | 2025-02-28 | USC00241044 | 0.00 |
Step 3: Clean the temperature data¶
We'll first divide the DF into a summer df and a winter DF. From previous experience, the data have multiple years with significant stretches of missing data. We'll need to remove any year with at least one month of NaN values in the season of interest.
In [7]:
# Check how many years are in the df
# ensure Date column is in datetime
init_bzn_p_df.DATE = pd.to_datetime(init_bzn_p_df.DATE)
# Check the number of years
num_years = init_bzn_p_df.DATE.dt.year.nunique()
num_years # 133 years
Out[7]:
133
In [8]:
# Check for NaNs
init_bzn_p_df['Year'] = init_bzn_p_df['DATE'].dt.year
init_bzn_p_df['Month'] = init_bzn_p_df['DATE'].dt.month
na_counts = init_bzn_p_df.PRCP.isna().groupby(init_bzn_p_df['Year']).sum()
na_counts.name = 'NaNs'
na_counts.to_frame()#.rename(columns={'PRCP': "NaNs"}, inplace=True)
bad_years = na_counts[na_counts > 28].to_frame()
bad_years.reset_index()
# Count how many years were removed
num_rem = len(bad_years) # 40 removed
num_rem
Out[8]:
2
In [9]:
# Create a cleaned version of the TOBS DF that's removed any year with >28 NaN values
clean_bzn_p_df = (
init_bzn_p_df
.groupby('Year')
.filter(lambda g: g['PRCP'].isna().sum() <= 28)
)
# Check how many years were kept
num_kept = clean_bzn_p_df.Year.nunique()
num_kept # 93 kept
# Check to make sure things add up and the filter removed the correct number of years
if num_kept + num_rem == num_years:
bzn_p_df = clean_bzn_p_df
print("Things add up.")
else:
print("Something is wrong")
Things add up.
In [10]:
# Double check things look right
bzn_p_df.head()
# Only two missing years here!
# Check the bad years - 1931 is not in the list.
bad_years.tail()
Out[10]:
| NaNs | |
|---|---|
| Year | |
| 1897 | 31 |
| 1900 | 30 |
Step 4: Divide the data into summer and winter groups and calculate annual means¶
For this step, we'll group the data into a summer dataframe (June, July, August) and a winter dataframe (December, January, February).
In [11]:
# Set month ranges
summer = [6, 7, 8]
winter = [12, 1, 2]
In [12]:
# Select data that fall into summer months
bzn_jja_p_df = (
bzn_p_df.groupby(bzn_p_df['DATE'].dt.month)
.filter(lambda g: g.name in summer)
)
# Cite ChatGPT, explain what's going on here
bzn_jja_p_df
Out[12]:
| DATE | STATION | PRCP | Year | Month | |
|---|---|---|---|---|---|
| 54 | 1892-06-01 | USC00241044 | 0.14 | 1892 | 6 |
| 55 | 1892-06-02 | USC00241044 | 0.00 | 1892 | 6 |
| 56 | 1892-06-03 | USC00241044 | 0.02 | 1892 | 6 |
| 57 | 1892-06-04 | USC00241044 | 0.00 | 1892 | 6 |
| 58 | 1892-06-05 | USC00241044 | 0.00 | 1892 | 6 |
| ... | ... | ... | ... | ... | ... |
| 47446 | 2024-08-27 | USC00241044 | 0.00 | 2024 | 8 |
| 47447 | 2024-08-28 | USC00241044 | 0.08 | 2024 | 8 |
| 47448 | 2024-08-29 | USC00241044 | 0.00 | 2024 | 8 |
| 47449 | 2024-08-30 | USC00241044 | 0.00 | 2024 | 8 |
| 47450 | 2024-08-31 | USC00241044 | 0.00 | 2024 | 8 |
11772 rows × 5 columns
In [13]:
# Get annual summer total temperatures
jja_mean_p_df = bzn_jja_p_df.drop(columns=['DATE', 'STATION', 'Month']).groupby('Year').sum()
# Take a look
jja_mean_p_df
Out[13]:
| PRCP | |
|---|---|
| Year | |
| 1892 | 5.82 |
| 1894 | 7.42 |
| 1895 | 4.81 |
| 1896 | 4.58 |
| 1898 | 4.58 |
| ... | ... |
| 2020 | 5.08 |
| 2021 | 5.47 |
| 2022 | 5.18 |
| 2023 | 6.97 |
| 2024 | 5.90 |
129 rows × 1 columns
In [14]:
# Select data that fall into winter months
bzn_djf_p_df = (
bzn_p_df.groupby(bzn_p_df['DATE'].dt.month)
.filter(lambda g: g.name in winter)
)
bzn_djf_p_df
# This will need to be reshaped so that we get December of 1931 and Jan/Feb of 1932 (ChatGPT helped me figure this out)
bzn_djf_p_df['winter_year'] = bzn_djf_p_df['Year']
# Shift December into the following year
bzn_djf_p_df.loc[bzn_djf_p_df['Month'] == 12, 'winter_year'] += 1
# Now compute total precip per winter season
djf_mean_p_df = bzn_djf_p_df.drop(columns=['DATE', 'STATION', 'Month', 'Year']).groupby('winter_year').sum()
# Take a look
djf_mean_p_df
Out[14]:
| PRCP | |
|---|---|
| winter_year | |
| 1893 | 1.72 |
| 1895 | 2.20 |
| 1896 | 2.70 |
| 1897 | 0.40 |
| 1898 | 1.09 |
| ... | ... |
| 2021 | 3.28 |
| 2022 | 3.27 |
| 2023 | 3.99 |
| 2024 | 2.98 |
| 2025 | 3.52 |
130 rows × 1 columns
Step 5: Plot the total precipiation ammounts¶
In [15]:
# Join the data frames
means_p_df = jja_mean_p_df
means_p_df['DJF PRCP'] = djf_mean_p_df['PRCP']
means_p_df.rename(columns={'PRCP': 'JJA Total Precip', 'DJF PRCP': 'DJF Total Precip'}, inplace=True)
means_p_df.dropna(inplace=True)
means_p_df
Out[15]:
| JJA Total Precip | DJF Total Precip | |
|---|---|---|
| Year | ||
| 1895 | 4.81 | 2.20 |
| 1896 | 4.58 | 2.70 |
| 1898 | 4.58 | 1.09 |
| 1901 | 2.22 | 1.35 |
| 1902 | 4.60 | 2.53 |
| ... | ... | ... |
| 2020 | 5.08 | 2.60 |
| 2021 | 5.47 | 3.28 |
| 2022 | 5.18 | 3.27 |
| 2023 | 6.97 | 3.99 |
| 2024 | 5.90 | 2.98 |
127 rows × 2 columns
In [ ]:
# Plot the mean annual temperatures
seas_p_mean = means_p_df.hvplot(
title='Winter and Summer Total Precipitation in Bozeman, MT',
xlabel='Year',
ylabel='Precipitation (inches)'
)
# Save plot
save(seas_p_mean, 'seas_precip_means.html')
# Show plot
seas_p_mean
Out[Â ]:
In [17]:
# Non-interactive plot
seas_p_mean_noint = means_p_df.plot(
title='Winter and Summer Total Precipitation in Bozeman, MT',
xlabel='Year',
ylabel='Precipitation (inches)'
)
# Save the plot
plt.savefig('seas_precip_means.png')
# Show the plot
seas_p_mean_noint
Out[17]:
<Axes: title={'center': 'Winter and Summer Total Precipitation in Bozeman, MT'}, xlabel='Year', ylabel='Precipitation (inches)'>
Step 6: Compute linear regression (OLS) on each timeseries¶
In [18]:
jja_mean_p_df
Out[18]:
| JJA Total Precip | DJF Total Precip | |
|---|---|---|
| Year | ||
| 1895 | 4.81 | 2.20 |
| 1896 | 4.58 | 2.70 |
| 1898 | 4.58 | 1.09 |
| 1901 | 2.22 | 1.35 |
| 1902 | 4.60 | 2.53 |
| ... | ... | ... |
| 2020 | 5.08 | 2.60 |
| 2021 | 5.47 | 3.28 |
| 2022 | 5.18 | 3.27 |
| 2023 | 6.97 | 3.99 |
| 2024 | 5.90 | 2.98 |
127 rows × 2 columns
In [19]:
# Linear regression for JJA
jja_lr_p_df = jja_mean_p_df.reset_index()
x = jja_lr_p_df[['Year']].values
y_f = jja_lr_p_df['JJA Total Precip'].values
## Create and fit the linear regression model
model = LinearRegression()
model.fit(x, y_f)
## Get the slope and intercept
slope = model.coef_[0]
intercept = model.intercept_
## Print the results
print(f"Slope: {slope}") # 0.009
print(f"Intercept: {intercept}") # -12.442
Slope: 0.009115740813774195 Intercept: -12.442322649775683
In [20]:
# Linear regression for DJF
# Reset the index
djf_lr_p_df = djf_mean_p_df.reset_index()
# Get X and Y values
x = djf_lr_p_df[['winter_year']].values
y_f = djf_lr_p_df['PRCP'].values
# Create and fit the linear regression model
model = LinearRegression()
model.fit(x, y_f)
# Get the slope and intercept
slope = model.coef_[0]
intercept = model.intercept_
# Print the results
print(f"Slope: {slope}") # 0.005
print(f"Intercept: {intercept}") # -6.439
Slope: 0.00457137538829407 Intercept: -6.439333101602284
Step 7: Plot linear regressions¶
In [ ]:
# Plot annual total precipitation with a trend line
jja_p_linreg = sns.regplot(
x= jja_lr_p_df['Year'],
y= jja_lr_p_df[['JJA Total Precip']],
data= jja_lr_p_df,
# type = 'line'
ci= None,
)
# Set title and axis labels
jja_p_linreg.set_title('OLS Linear Regression of Total\n' \
'Summer Precipitation in Bozeman, MT')
jja_p_linreg.set_xlabel('Year')
jja_p_linreg.set_ylabel('Precipitation (in)')
# Save and show
plt.savefig('jja_precip_linreg.png')
plt.show()
# Winter plot
djf_p_linreg = sns.regplot(
x= djf_lr_p_df['winter_year'],
y= djf_lr_p_df[['PRCP']],
data= djf_lr_p_df,
# type = 'line'
ci= None
)
# Set labels
djf_p_linreg.set_title('OLS Linear Regression of Total\n' \
'Winter Precipitation in Bozeman, MT')
djf_p_linreg.set_xlabel('Year')
djf_p_linreg.set_ylabel('Temperature (°F)')
# Save and show plot
plt.savefig('djf_precip_linreg.png')
plt.show()