Using Optimization to Calibrate HydroModels.jl Parameters
This tutorial will guide you on how to use Julia's Optimization ecosystem to calibrate parameters in HydroModels.jl hydrological models. Through parameter optimization, you can improve model outputs to better fit observed data, thereby enhancing the model's predictive capabilities.
Table of Contents
- Overview
- Required Packages
- Data Preparation
- Model Initialization
- Parameter Boundary Setup
- Objective Function Definition
- Optimization Problem Configuration
- Executing Optimization
- Result Analysis and Visualization
- Choosing Different Optimization Algorithms
- Advanced Applications
Overview
Parameter optimization is a core step in hydrological model calibration. In HydroModels.jl, we leverage Julia's Optimization ecosystem to implement efficient parameter calibration. This tutorial will demonstrate how to set up and execute the optimization process to find parameter sets that best fit observed data.
Required Packages
First, we need to import the necessary packages:
using CSV, DataFrames, Dates, ComponentArrays
using HydroModels, HydroModelTools
using DataInterpolations
using OptimizationEvolutionary # Evolutionary algorithms
using OptimizationGCMAES # Covariance Matrix Adaptation Evolution Strategy
using OptimizationBBO # Biogeography-Based Optimization
using Optimization # Core optimization framework
using ModelingToolkit
using Distributions
using ProgressMeter
using Plots
using JLD2
Data Preparation
The optimization process requires input data (such as precipitation, evaporation) and observed data (such as streamflow):
# Load data
file_path = "data/exphydro/01013500.csv"
data = CSV.File(file_path)
df = DataFrame(data)
# Select time period
ts = collect(1:10000)
# Extract necessary variables
lday_vec = df[ts, "dayl(day)"] # Day length
prcp_vec = df[ts, "prcp(mm/day)"] # Precipitation
temp_vec = df[ts, "tmean(C)"] # Temperature
flow_vec = df[ts, "flow(mm)"] # Observed streamflow
# Calculate potential evapotranspiration (PET)
pet_vec = @. 29.8 * lday_vec * 24 * 0.611 * exp((17.3 * temp_vec) / (temp_vec + 237.3)) / (temp_vec + 273.2)
# Set warm-up period and maximum iterations
warm_up = 365 # Model warm-up period (days)
max_iter = 1000 # Maximum optimization iterations
Model Initialization
Next, we initialize the hydrological model and set up initial parameters and states:
# Select model (e.g., HBV model)
model_nm = "hbv"
model = HydroModelLibrary.load_model(Symbol(model_nm))
# Get parameter boundaries
param_bounds = getbounds.(get_params(model))
# Randomly generate initial parameter values (within parameter boundaries)
random_param_values = map(param_bounds) do param_bound
rand(Uniform(param_bound[1], param_bound[2]))
end
# Create parameter vector
init_params = ComponentVector(params=NamedTuple{Tuple(get_param_names(model))}(random_param_values))
ps_axes = getaxes(init_params)
# Initialize model states (all state variables set to 0)
init_states = NamedTuple{Tuple(get_state_names(model))}(zeros(length(get_state_names(model)))) |> ComponentVector
Preparing Model Inputs and Run Configuration
Before optimization, we need to prepare model input data and run configuration:
# Prepare input data
input = (P=prcp_vec, Ep=pet_vec, T=temp_vec)
input_matrix = Matrix(reduce(hcat, collect(input[HydroModels.get_input_names(model)])'))
# Set model run configuration
config = (solver=HydroModelTools.DiscreteSolver(), interp=LinearInterpolation)
run_kwargs = (config=config, initstates=init_states)
Objective Function Definition
The core of optimization is defining an objective function to evaluate the performance of model parameters. Here we use KGE (Kling-Gupta Efficiency) as the evaluation metric:
# Calculate mean for R² computation
y_mean = mean(flow_vec)
# Define R² function (coefficient of determination)
r2_func(y, y_hat) = sum((y .- y_hat) .^ 2) ./ sum((y .- y_mean) .^ 2)
# Define KGE function (Kling-Gupta Efficiency)
kge_func(y, y_hat) = sqrt((r2_func(y, y_hat))^2 + (std(y_hat) / std(y) - 1)^2 + (mean(y_hat) / mean(y) - 1)^2)
# Define optimization objective function
function obj_func(p, _)
return kge_func(
flow_vec[warm_up:end], # Observed values after warm-up period
model(input_matrix, ComponentVector(p, ps_axes); run_kwargs...)[end, warm_up:end] # Model predictions
)
end
Optimization Progress Tracking
To monitor the optimization process, we set up a progress bar and callback function:
# Create progress bar
progress = Progress(max_iter, desc="Optimization")
# Create recorder to save results from each iteration
recorder = []
# Define callback function
callback_func!(state, l) = begin
push!(recorder, (iter=state.iter, loss=l, time=now(), params=state.u))
next!(progress) # Update progress bar
false # Return false to continue optimization
end
Parameter Boundary Setup
To ensure the optimization algorithm searches within a reasonable parameter space, we need to set upper and lower bounds for parameters:
# Extract parameter lower bounds
lb_list = first.(param_bounds) .|> eltype(input_matrix)
# Extract parameter upper bounds
ub_list = last.(param_bounds) .|> eltype(input_matrix)
Optimization Problem Configuration
Configure the optimization problem using the Optimization.jl framework:
# Create optimization function object
optf = Optimization.OptimizationFunction(obj_func, Optimization.AutoForwardDiff())
# Create optimization problem
optprob = Optimization.OptimizationProblem(optf, Vector(init_params), lb=lb_list, ub=ub_list)
Executing Optimization
Select an optimization algorithm and execute the optimization process:
# Solve optimization problem
sol = Optimization.solve(
optprob,
# CMAES(), # Covariance Matrix Adaptation Evolution Strategy
BBO_adaptive_de_rand_1_bin_radiuslimited(), # Adaptive Differential Evolution algorithm
# GCMAESOpt(), # Global Covariance Matrix Adaptation Evolution Strategy
maxiters=max_iter, # Maximum iteration count
callback=callback_func! # Callback function
)
# Save optimization process records
recorder_df = DataFrame(recorder)
CSV.write("cache/recorder_$(model_nm).csv", recorder_df)
Result Analysis and Visualization
After optimization is complete, we can run the model with the optimal parameters and analyze the results:
# Convert optimization results to ComponentVector format
params = ComponentVector(sol.u, ps_axes)
# Run model with optimal parameters
output = model(input_matrix, params; run_kwargs...)
# Extract model outputs
model_output_names = vcat(get_state_names(model), get_output_names(model)) |> Tuple
output_df = DataFrame(NamedTuple{model_output_names}(eachslice(output, dims=1)))
# Calculate final KGE value
@info loss = kge_func(flow_vec, output_df[!, :q_routed])
# Visualize results
plot(output_df[!, :q_routed], label="Simulated Flow")
plot!(flow_vec, label="Observed Flow")
Choosing Different Optimization Algorithms
HydroModels.jl supports multiple optimization algorithms. You can choose the most suitable algorithm based on your specific problem:
Differential Evolution Algorithm (BBOadaptivederand1binradiuslimited): Suitable for most hydrological model parameter optimization problems, with good global search capabilities.
sol = Optimization.solve(optprob, BBO_adaptive_de_rand_1_bin_radiuslimited(), maxiters=max_iter)
Covariance Matrix Adaptation Evolution Strategy (CMAES): Works well for complex nonlinear problems.
sol = Optimization.solve(optprob, CMAES(), maxiters=max_iter)
Global Covariance Matrix Adaptation Evolution Strategy (GCMAESOpt): An improved version of CMAES with better global search capabilities.
sol = Optimization.solve(optprob, GCMAESOpt(), maxiters=max_iter)
Advanced Applications
Multi-Objective Optimization
For situations requiring simultaneous optimization of multiple objectives, you can define a composite objective function:
function multi_obj_func(p, _)
model_output = model(input_matrix, ComponentVector(p, ps_axes); run_kwargs...)[end, warm_up:end]
# Calculate multiple objectives
kge_value = kge_func(flow_vec[warm_up:end], model_output)
bias_value = abs(mean(model_output) / mean(flow_vec[warm_up:end]) - 1)
# Combine multiple objectives
return 0.7 * kge_value + 0.3 * bias_value
end
Parameter Sensitivity Analysis
After optimization, you can conduct parameter sensitivity analysis to understand which parameters have the greatest impact on model performance:
function sensitivity_analysis(model, optimal_params, param_names)
results = []
for (i, param_name) in enumerate(param_names)
# Create parameter perturbation sequence
param_values = range(lb_list[i], ub_list[i], length=10)
# Evaluate model performance for each perturbation value
for val in param_values
# Copy optimal parameters and modify current parameter
test_params = copy(optimal_params)
test_params[i] = val
# Run model and evaluate performance
model_output = model(input_matrix, ComponentVector(test_params, ps_axes); run_kwargs...)[end, warm_up:end]
kge_val = kge_func(flow_vec[warm_up:end], model_output)
push!(results, (param=param_name, value=val, kge=kge_val))
end
end
return DataFrame(results)
end
Ensemble of Multiple Optimization Algorithms
You can try multiple optimization algorithms and select the best result:
function ensemble_optimization(optprob, algorithms, max_iter)
best_sol = nothing
best_loss = Inf
for alg in algorithms
sol = Optimization.solve(optprob, alg, maxiters=max_iter)
loss = sol.minimum
if loss < best_loss
best_loss = loss
best_sol = sol
end
end
return best_sol
end
# Usage example
algorithms = [BBO_adaptive_de_rand_1_bin_radiuslimited(), CMAES(), GCMAESOpt()]
best_sol = ensemble_optimization(optprob, algorithms, max_iter)
Summary
This tutorial demonstrates how to use Julia's Optimization ecosystem to calibrate hydrological models in HydroModels.jl. By defining appropriate objective functions, selecting suitable optimization algorithms, and setting reasonable parameter boundaries, you can effectively calibrate hydrological models to better fit observed data.
Optimization is an iterative process that may require trying different objective functions, optimization algorithms, and initial parameter values to achieve the best results. Using the methods in this tutorial, you can flexibly customize the optimization process to meet specific hydrological modeling requirements.