Optimization

SymbolicOptimization uses NSGA-II (Non-dominated Sorting Genetic Algorithm II) for multi-objective optimization. The core API is optimize — the DSL and Model API are built on top of it.

General Optimization

using SymbolicOptimization

grammar = Grammar(
    binary_operators = [+, -, *, /],
    unary_operators = [sin, cos],
    variables = [:x, :y],
    constant_range = (-5.0, 5.0),
)

objectives = [
    custom_objective(:my_metric, (tree, data) -> compute_score(tree, data)),
    complexity_objective(),
]

data = Dict(:training_set => ..., :validation_set => ...)

config = NSGAIIConfig(population_size=100, max_generations=50)
result = optimize(grammar, objectives, data; config=config)

for ind in get_pareto_front(result)
    println("Objectives: $(ind.objectives)")
    println("Expression: $(node_to_string(ind.tree))")
end

Curve Fitting

For the common case of fitting $y \approx f(x)$:

x = collect(-2.0:0.1:2.0)
y = @. x^2 + 2x + 1

result = curve_fitting(x, y;
    config = NSGAIIConfig(population_size=100, max_generations=50)
)

best = get_best(result, 1)
println(node_to_string(best.tree))

Symbolic Regression

A convenience wrapper that sets up grammar and objectives automatically:

result = symbolic_regression(X, y; config=config)

Built-in Objectives

mse_objective()           # Mean squared error
mae_objective()           # Mean absolute error
complexity_objective()    # Number of nodes
depth_objective()         # Tree depth
custom_objective(name, f) # f(tree, data) -> Float64

Configuration

NSGAIIConfig controls the optimization process:

config = NSGAIIConfig(
    population_size = 100,
    max_generations = 50,
    # ... additional parameters
)

Results

NSGAIIResult holds the optimization output:

result = optimize(grammar, objectives, data; config=config)

get_best(result, obj_index)   # Best individual for objective i
get_pareto_front(result)      # All Pareto-optimal individuals

Each Individual has:

  • tree — the expression tree (AbstractNode)
  • objectives — vector of objective values

Optimization Reference

SymbolicOptimization.optimizeFunction
optimize(grammar::Grammar, objectives::Vector{ObjectiveFunction}, data::Dict;
         config::NSGAIIConfig=NSGAIIConfig(), rng=Random.GLOBAL_RNG) -> NSGAIIResult

Run NSGA-II multi-objective optimization to find expression trees.

Arguments

  • grammar: The grammar defining valid expressions
  • objectives: Vector of objective functions to optimize
  • data: Dictionary containing training data and metadata

Keyword Arguments

  • config: NSGA-II configuration
  • rng: Random number generator
  • initial_population: Optional initial population

Returns

An NSGAIIResult containing the Pareto front and optimization history.

Example

grammar = Grammar(
    binary_operators = [+, -, *, /],
    unary_operators = [sin, cos],
    variables = [:x],
)

objectives = [mse_objective(), complexity_objective()]

data = Dict(
    :X => reshape(collect(-2.0:0.1:2.0), :, 1),
    :y => [x^2 + x for x in -2.0:0.1:2.0],
    :var_names => [:x],
)

result = optimize(grammar, objectives, data; 
    config=NSGAIIConfig(population_size=50, max_generations=20))

# Get the Pareto front
front = get_pareto_front(result)

# Get best for first objective (MSE)
best = get_best(result, 1)
source
SymbolicOptimization.curve_fittingFunction
curve_fitting(X::AbstractMatrix, y::AbstractVector;
              grammar=nothing, config=NSGAIIConfig(), rng=Random.GLOBAL_RNG) -> NSGAIIResult

Convenience function for standard curve fitting (symbolic regression). Optimizes MSE and complexity as objectives.

Note: This is just one application of symbolic optimization. For custom objectives (aggregator discovery, scoring rules, etc.), use the optimize function directly.

Arguments

  • X: Input features matrix (rows = samples, columns = features)
  • y: Target vector

Keyword Arguments

  • grammar: Grammar to use (default: arithmetic + trig)
  • var_names: Variable names (default: [:x1, :x2, ...] based on columns)
  • config: NSGA-II configuration
  • rng: Random number generator
source
curve_fitting(x::AbstractVector, y::AbstractVector; kwargs...) -> NSGAIIResult

Convenience method for 1D curve fitting.

source
SymbolicOptimization.symbolic_regressionFunction
curve_fitting(X::AbstractMatrix, y::AbstractVector;
              grammar=nothing, config=NSGAIIConfig(), rng=Random.GLOBAL_RNG) -> NSGAIIResult

Convenience function for standard curve fitting (symbolic regression). Optimizes MSE and complexity as objectives.

Note: This is just one application of symbolic optimization. For custom objectives (aggregator discovery, scoring rules, etc.), use the optimize function directly.

Arguments

  • X: Input features matrix (rows = samples, columns = features)
  • y: Target vector

Keyword Arguments

  • grammar: Grammar to use (default: arithmetic + trig)
  • var_names: Variable names (default: [:x1, :x2, ...] based on columns)
  • config: NSGA-II configuration
  • rng: Random number generator
source
curve_fitting(x::AbstractVector, y::AbstractVector; kwargs...) -> NSGAIIResult

Convenience method for 1D curve fitting.

source
SymbolicOptimization.NSGAIIConfigType
NSGAIIConfig

Configuration for the NSGA-II algorithm.

Fields

  • population_size::Int: Number of individuals in the population
  • max_generations::Int: Maximum number of generations
  • tournament_size::Int: Tournament selection size
  • crossover_prob::Float64: Probability of crossover vs mutation
  • mutation_prob::Float64: Probability of mutation (when not doing crossover)
  • elite_fraction::Float64: Fraction of population to preserve as elite
  • max_depth::Int: Maximum tree depth
  • min_depth::Int: Minimum tree depth for generation
  • max_nodes::Int: Maximum number of nodes per tree (0 = unlimited)
  • parsimony_tolerance::Float64: If two solutions' primary objectives differ by less than this fraction, prefer the simpler one (0 = disabled)
  • simplify_prob::Float64: Probability of simplifying offspring
  • verbose::Bool: Print progress information
  • early_stop_generations::Int: Stop if no improvement for this many generations (0 = disabled)
source
SymbolicOptimization.NSGAIIResultType
NSGAIIResult

Results from an NSGA-II optimization run.

Fields

  • pareto_front::Vector{Individual}: Non-dominated individuals from final population
  • population::Vector{Individual}: Final population
  • generations::Int: Number of generations run
  • history::Vector{Dict}: Per-generation statistics
  • best_per_objective::Vector{Individual}: Best individual for each objective
source
SymbolicOptimization.IndividualType
Individual

An individual in the evolutionary population, wrapping an expression tree with its fitness values and NSGA-II ranking information.

Fields

  • tree::AbstractNode: The expression tree
  • objectives::Vector{Float64}: Fitness values for each objective
  • rank::Int: Pareto rank (1 = first front, 2 = second front, etc.)
  • crowding_distance::Float64: Crowding distance for diversity
  • data::Dict{Symbol,Any}: Optional metadata storage
source
SymbolicOptimization.ObjectiveFunctionType
ObjectiveFunction

Defines an objective to optimize.

Fields

  • name::Symbol: Name of the objective
  • func::Function: Function (tree, data) -> Float64
  • minimize::Bool: If true, lower is better; if false, higher is better
  • weight::Float64: Optional weight for weighted-sum methods
source
SymbolicOptimization.mse_objectiveFunction
mse_objective(name=:mse) -> ObjectiveFunction

Create a Mean Squared Error objective. Expects data to contain :X (input matrix) and :y (target vector).

source
SymbolicOptimization.custom_objectiveFunction
custom_objective(name, func; minimize=true, weight=1.0) -> ObjectiveFunction

Create a custom objective function.

Arguments

  • name: Name of the objective
  • func: Function with signature (tree::AbstractNode, data::Dict) -> Float64
  • minimize: If true, lower values are better
  • weight: Weight for weighted-sum methods
source
SymbolicOptimization.dominatesFunction
dominates(a::Individual, b::Individual, minimize::Vector{Bool}) -> Bool

Check if individual a dominates individual b. a dominates b if a is at least as good in all objectives and strictly better in at least one.

source
dominates(a::Vector{Float64}, b::Vector{Float64}, minimize::Vector{Bool}) -> Bool

Check if objective vector a dominates objective vector b.

source
SymbolicOptimization.nondominated_sort!Function
nondominated_sort!(population::Vector{Individual}, objectives::Vector{ObjectiveFunction})

Perform non-dominated sorting on the population, assigning ranks to each individual. Rank 1 = Pareto front (non-dominated), Rank 2 = dominated only by rank 1, etc.

Modifies individuals in place by setting their rank field. Returns a vector of fronts (each front is a vector of indices).

source
SymbolicOptimization.compute_crowding_distance!Function
compute_crowding_distance!(population::Vector{Individual}, front_indices::Vector{Int})

Compute crowding distance for individuals in a front. Individuals at the boundaries get infinite distance. Others get distance based on the hypervolume of the cuboid formed by their neighbors.

Modifies individuals in place by setting their crowding_distance field.

source
SymbolicOptimization.tournament_selectFunction
tournament_select(population::Vector{Individual}, tournament_size::Int; 
                  rng=Random.GLOBAL_RNG, parsimony_tolerance=0.0, complexity_idx=3) -> Individual

Select an individual using tournament selection with crowded comparison. If parsimony_tolerance > 0, applies parsimony pressure to prefer simpler solutions when primary objectives are within the tolerance.

source
SymbolicOptimization.environmental_select!Function
environmental_select!(combined::Vector{Individual}, target_size::Int, 
                      objectives::Vector{ObjectiveFunction}) -> Vector{Individual}

Select the best individuals to survive to the next generation. Uses non-dominated sorting and crowding distance.

source