Optimization
SymbolicOptimization uses NSGA-II (Non-dominated Sorting Genetic Algorithm II) for multi-objective optimization. The core API is optimize — the DSL and Model API are built on top of it.
General Optimization
using SymbolicOptimization
grammar = Grammar(
binary_operators = [+, -, *, /],
unary_operators = [sin, cos],
variables = [:x, :y],
constant_range = (-5.0, 5.0),
)
objectives = [
custom_objective(:my_metric, (tree, data) -> compute_score(tree, data)),
complexity_objective(),
]
data = Dict(:training_set => ..., :validation_set => ...)
config = NSGAIIConfig(population_size=100, max_generations=50)
result = optimize(grammar, objectives, data; config=config)
for ind in get_pareto_front(result)
println("Objectives: $(ind.objectives)")
println("Expression: $(node_to_string(ind.tree))")
endCurve Fitting
For the common case of fitting $y \approx f(x)$:
x = collect(-2.0:0.1:2.0)
y = @. x^2 + 2x + 1
result = curve_fitting(x, y;
config = NSGAIIConfig(population_size=100, max_generations=50)
)
best = get_best(result, 1)
println(node_to_string(best.tree))Symbolic Regression
A convenience wrapper that sets up grammar and objectives automatically:
result = symbolic_regression(X, y; config=config)Built-in Objectives
mse_objective() # Mean squared error
mae_objective() # Mean absolute error
complexity_objective() # Number of nodes
depth_objective() # Tree depth
custom_objective(name, f) # f(tree, data) -> Float64Configuration
NSGAIIConfig controls the optimization process:
config = NSGAIIConfig(
population_size = 100,
max_generations = 50,
# ... additional parameters
)Results
NSGAIIResult holds the optimization output:
result = optimize(grammar, objectives, data; config=config)
get_best(result, obj_index) # Best individual for objective i
get_pareto_front(result) # All Pareto-optimal individualsEach Individual has:
tree— the expression tree (AbstractNode)objectives— vector of objective values
Optimization Reference
SymbolicOptimization.optimize — Function
optimize(grammar::Grammar, objectives::Vector{ObjectiveFunction}, data::Dict;
config::NSGAIIConfig=NSGAIIConfig(), rng=Random.GLOBAL_RNG) -> NSGAIIResultRun NSGA-II multi-objective optimization to find expression trees.
Arguments
grammar: The grammar defining valid expressionsobjectives: Vector of objective functions to optimizedata: Dictionary containing training data and metadata
Keyword Arguments
config: NSGA-II configurationrng: Random number generatorinitial_population: Optional initial population
Returns
An NSGAIIResult containing the Pareto front and optimization history.
Example
grammar = Grammar(
binary_operators = [+, -, *, /],
unary_operators = [sin, cos],
variables = [:x],
)
objectives = [mse_objective(), complexity_objective()]
data = Dict(
:X => reshape(collect(-2.0:0.1:2.0), :, 1),
:y => [x^2 + x for x in -2.0:0.1:2.0],
:var_names => [:x],
)
result = optimize(grammar, objectives, data;
config=NSGAIIConfig(population_size=50, max_generations=20))
# Get the Pareto front
front = get_pareto_front(result)
# Get best for first objective (MSE)
best = get_best(result, 1)SymbolicOptimization.curve_fitting — Function
curve_fitting(X::AbstractMatrix, y::AbstractVector;
grammar=nothing, config=NSGAIIConfig(), rng=Random.GLOBAL_RNG) -> NSGAIIResultConvenience function for standard curve fitting (symbolic regression). Optimizes MSE and complexity as objectives.
Note: This is just one application of symbolic optimization. For custom objectives (aggregator discovery, scoring rules, etc.), use the optimize function directly.
Arguments
X: Input features matrix (rows = samples, columns = features)y: Target vector
Keyword Arguments
grammar: Grammar to use (default: arithmetic + trig)var_names: Variable names (default: [:x1, :x2, ...] based on columns)config: NSGA-II configurationrng: Random number generator
curve_fitting(x::AbstractVector, y::AbstractVector; kwargs...) -> NSGAIIResultConvenience method for 1D curve fitting.
SymbolicOptimization.symbolic_regression — Function
curve_fitting(X::AbstractMatrix, y::AbstractVector;
grammar=nothing, config=NSGAIIConfig(), rng=Random.GLOBAL_RNG) -> NSGAIIResultConvenience function for standard curve fitting (symbolic regression). Optimizes MSE and complexity as objectives.
Note: This is just one application of symbolic optimization. For custom objectives (aggregator discovery, scoring rules, etc.), use the optimize function directly.
Arguments
X: Input features matrix (rows = samples, columns = features)y: Target vector
Keyword Arguments
grammar: Grammar to use (default: arithmetic + trig)var_names: Variable names (default: [:x1, :x2, ...] based on columns)config: NSGA-II configurationrng: Random number generator
curve_fitting(x::AbstractVector, y::AbstractVector; kwargs...) -> NSGAIIResultConvenience method for 1D curve fitting.
SymbolicOptimization.NSGAIIConfig — Type
NSGAIIConfigConfiguration for the NSGA-II algorithm.
Fields
population_size::Int: Number of individuals in the populationmax_generations::Int: Maximum number of generationstournament_size::Int: Tournament selection sizecrossover_prob::Float64: Probability of crossover vs mutationmutation_prob::Float64: Probability of mutation (when not doing crossover)elite_fraction::Float64: Fraction of population to preserve as elitemax_depth::Int: Maximum tree depthmin_depth::Int: Minimum tree depth for generationmax_nodes::Int: Maximum number of nodes per tree (0 = unlimited)parsimony_tolerance::Float64: If two solutions' primary objectives differ by less than this fraction, prefer the simpler one (0 = disabled)simplify_prob::Float64: Probability of simplifying offspringverbose::Bool: Print progress informationearly_stop_generations::Int: Stop if no improvement for this many generations (0 = disabled)
SymbolicOptimization.NSGAIIResult — Type
NSGAIIResultResults from an NSGA-II optimization run.
Fields
pareto_front::Vector{Individual}: Non-dominated individuals from final populationpopulation::Vector{Individual}: Final populationgenerations::Int: Number of generations runhistory::Vector{Dict}: Per-generation statisticsbest_per_objective::Vector{Individual}: Best individual for each objective
SymbolicOptimization.Individual — Type
IndividualAn individual in the evolutionary population, wrapping an expression tree with its fitness values and NSGA-II ranking information.
Fields
tree::AbstractNode: The expression treeobjectives::Vector{Float64}: Fitness values for each objectiverank::Int: Pareto rank (1 = first front, 2 = second front, etc.)crowding_distance::Float64: Crowding distance for diversitydata::Dict{Symbol,Any}: Optional metadata storage
SymbolicOptimization.ObjectiveFunction — Type
ObjectiveFunctionDefines an objective to optimize.
Fields
name::Symbol: Name of the objectivefunc::Function: Function (tree, data) -> Float64minimize::Bool: If true, lower is better; if false, higher is betterweight::Float64: Optional weight for weighted-sum methods
SymbolicOptimization.mse_objective — Function
mse_objective(name=:mse) -> ObjectiveFunctionCreate a Mean Squared Error objective. Expects data to contain :X (input matrix) and :y (target vector).
SymbolicOptimization.mae_objective — Function
mae_objective(name=:mae) -> ObjectiveFunctionCreate a Mean Absolute Error objective.
SymbolicOptimization.complexity_objective — Function
complexity_objective(name=:complexity) -> ObjectiveFunctionCreate a tree complexity objective (number of nodes).
SymbolicOptimization.depth_objective — Function
depth_objective(name=:depth) -> ObjectiveFunctionCreate a tree depth objective.
SymbolicOptimization.custom_objective — Function
custom_objective(name, func; minimize=true, weight=1.0) -> ObjectiveFunctionCreate a custom objective function.
Arguments
name: Name of the objectivefunc: Function with signature(tree::AbstractNode, data::Dict) -> Float64minimize: If true, lower values are betterweight: Weight for weighted-sum methods
SymbolicOptimization.get_best — Function
get_best(result::NSGAIIResult, objective_index::Int=1) -> IndividualGet the best individual for a specific objective.
SymbolicOptimization.get_pareto_front — Function
get_pareto_front(result::NSGAIIResult) -> Vector{Individual}Get all non-dominated individuals.
SymbolicOptimization.dominates — Function
dominates(a::Individual, b::Individual, minimize::Vector{Bool}) -> BoolCheck if individual a dominates individual b. a dominates b if a is at least as good in all objectives and strictly better in at least one.
dominates(a::Vector{Float64}, b::Vector{Float64}, minimize::Vector{Bool}) -> BoolCheck if objective vector a dominates objective vector b.
SymbolicOptimization.nondominated_sort! — Function
nondominated_sort!(population::Vector{Individual}, objectives::Vector{ObjectiveFunction})Perform non-dominated sorting on the population, assigning ranks to each individual. Rank 1 = Pareto front (non-dominated), Rank 2 = dominated only by rank 1, etc.
Modifies individuals in place by setting their rank field. Returns a vector of fronts (each front is a vector of indices).
SymbolicOptimization.compute_crowding_distance! — Function
compute_crowding_distance!(population::Vector{Individual}, front_indices::Vector{Int})Compute crowding distance for individuals in a front. Individuals at the boundaries get infinite distance. Others get distance based on the hypervolume of the cuboid formed by their neighbors.
Modifies individuals in place by setting their crowding_distance field.
SymbolicOptimization.tournament_select — Function
tournament_select(population::Vector{Individual}, tournament_size::Int;
rng=Random.GLOBAL_RNG, parsimony_tolerance=0.0, complexity_idx=3) -> IndividualSelect an individual using tournament selection with crowded comparison. If parsimony_tolerance > 0, applies parsimony pressure to prefer simpler solutions when primary objectives are within the tolerance.
SymbolicOptimization.environmental_select! — Function
environmental_select!(combined::Vector{Individual}, target_size::Int,
objectives::Vector{ObjectiveFunction}) -> Vector{Individual}Select the best individuals to survive to the next generation. Uses non-dominated sorting and crowding distance.