Gradient Boosted Trees

We use the following notation to describe the gradient-boosted trees formulation:

\[\begin{split}\begin{align*} \hat{\mu} &:= \text{Mean prediction of tree ensemble}\\ T &:= \text{Set of trees in ensemble}\\ L_t &:= \text{Set of leaves in tree $t$}\\ z_{t,l} &:= \text{Binary variable indicating if leaf $l$ in tree $t$ is active}\\ \text{Left}_{t,s} &:= \text{Set of leaf variables left of split $s$ in tree $t$}\\ \text{Right}_{t,s} &:= \text{Set of leaf variables right of split $s$ in tree $t$}\\ y_{i(s),j(s)} &:= \text{Binary variable indicating if split $s$ is active}\\ i(s) &:= \text{feature of split $s$}\\ j(s) &:= \text{index of split $s$}\\ V_t &:= \text{Set of splits in tree $t$}\\ n &:= \text{Index set of input features}\\ m_i &:= \text{Index set of splits for feature $i$}\\ F_{t,l} &:= \text{Weight of leaf $l$ in tree $t$}\\ \end{align*}\end{split}\]
class omlt.gbt.gbt_formulation.GBTBigMFormulation(gbt_model)[source]

Bases: _PyomoFormulation

This class is the entry-point to build gradient-boosted trees formulations.

This class iterates over all trees in the ensemble and generates constraints to enforce splitting rules according to:

References

  • Misic, V. “Optimization of tree ensembles.” Operations Research 68.5 (2020): 1605-1624.

  • Mistry, M., et al. “Mixed-integer convex nonlinear optimization with gradient-boosted trees embedded.” INFORMS Journal on Computing (2020).

Parameters:

tree_ensemble_structure (GradientBoostedTreeModel) – the tree ensemble definition

property input_indexes

The indexes of the formulation inputs.

property output_indexes

The indexes of the formulation output.

omlt.gbt.gbt_formulation.add_formulation_to_block(block, model_definition, input_vars, output_vars)[source]

Adds the gradient-boosted trees formulation to the given Pyomo block.

\[\begin{split}\begin{align*} \hat{\mu} &= \sum\limits_{t \in T} \sum\limits_{l \in {L_t}} F_{t,l} z_{t,l}, && \\ \sum\limits_{l \in L_t} z_{t,l} &= 1, && \forall t \in T, \\ \sum\limits_{l \in \text{Left}_{t,s}} z_{t,l} &\leq y_{i(s),j(s)}, && \forall t \in T, \forall s \in V_t, \\ \sum\limits_{l \in \text{Right}_{t,s}} z_{t,l} &\leq 1 - y_{i(s),j(s)}, && \forall t \in T, \forall s \in V_t, \\ y_{i,j} &\leq y_{i,j+1}, && \forall i \in \left [ n \right ], \forall j \in \left [ m_i - 1 \right ], \\ x_{i} &\geq v_{i,0} + \sum\limits_{j=1}^{m_i} \left (v_{i,j} - v_{i,j-1} \right ) \left ( 1 - y_{i,j} \right ), && \forall i \in \left [ n \right ], \\ x_{i} &\leq v_{i,m_i+1} + \sum\limits_{j=1}^{m_i} \left (v_{i,j} - v_{i,j+1} \right ) y_{i,j}, && \forall i \in \left [ n \right ]. \\ \end{align*}\end{split}\]

References

  • Misic, V. “Optimization of tree ensembles.” Operations Research 68.5 (2020): 1605-1624.

  • Mistry, M., et al. “Mixed-integer convex nonlinear optimization with gradient-boosted trees embedded.” INFORMS Journal on Computing (2020).

Parameters:
  • block (Block) – the Pyomo block

  • tree_ensemble_structure (GradientBoostedTreeModel) – the tree ensemble definition

  • input_vars (Var) – the input variables of the Pyomo block

  • output_vars (Var) – the output variables of the Pyomo block

class omlt.gbt.model.GradientBoostedTreeModel(onnx_model, scaling_object=None, scaled_input_bounds=None)[source]

Bases: object

property n_inputs

Returns the number of input variables

property n_outputs

Returns the number of output variables

property onnx_model

Returns underlying onnx model of the tree model being used

property scaled_input_bounds

Return a list of tuples containing lower and upper bounds of tree ensemble inputs

property scaling_object

Return an instance of the scaling object that supports the ScalingInterface