Polynomial Regression Trees#

class equadratures.polytree.PolyTree(splitting_criterion='model_aware', max_depth=5, min_samples_leaf=None, order=1, basis='total-order', search='exhaustive', samples=50, verbose=False, poly_method='least-squares', poly_solver_args={}, all_data=False, split_dims=None, k=0.05, distribution='uniform')[source]#

Definition of a polynomial tree object.

  • splitting_criterion (str, optional) – The type of splitting_criterion to use in the fit function. Options include model_aware which fits polynomials for each candidate split, model_agnostic which uses a standard deviation based model-agnostic split criterion [1], and loss_gradient which uses a gradient based splitting criterion similar to that in [2].

  • max_depth (int, optional) – The maximum depth which the tree will grow to.

  • min_samples_leaf (int, optional) – The minimum number of samples per leaf node.

  • order (int, optional) – The order of the generated orthogonal polynomials.

  • basis (str, optional) – The type of index set used for the basis. Options include: univariate, total-order, tensor-grid, sparse-grid and hyperbolic-basis.

  • search (str, optional) – The method of search to be used. Options are grid or exhaustive.

  • samples (int, optional) – The interval between splits if grid search is chosen.

  • verbose (bool, optional) – For debugging.

  • all_data (bool, optional) – Store data at all nodes in PolyTree (instead of only leaf nodes).

  • split_dims (list, optional) – List of dimensions along which to make splits.

  • k (float, optional) – The smoothing parameter. Range from 0.0 to 1.0, with 0 giving no smoothing, and 1 giving maximum smoothing.

  • distribution (str, optional) – The type of input parameter distributions. Either uniform or data.


>>> tree = polytree.PolyTree()
>>> X = np.loadtxt('inputs.txt')
>>> Xtest = np.loadtxt('inputs_test.txt')
>>> y = np.loadtxt('outputs.txt')
>>> tree.fit(X,y)
>>> y_test = tree.predict(X_test)


  1. Wang, Y., Witten, I. H., (1997) Inducing Model Trees for Continuous Classes. In Proc. of the 9th European Conf. on Machine Learning Poster Papers. 128-137. Paper

  2. Broelemann, K., Kasneci, G., (2019) A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees. In Int. Joint Conf. on Artificial Intelligence (IJCAI). 2030-2037. Paper

  3. Chan, T. F., Golub, G. H., LeVeque, R. J., (1983) Algorithms for computing the sample variance: Analysis and recommendations. The American Statistician. 37(3): 242–247. Paper


Returns the leaf node index for each observation in the data.


X (numpy.ndarray) – Array with shape (number_of_observations, dimensions) at which the tree fit must be evaluated at.


A numpy.ndarray of shape (number_of_observations,1) corresponding to the node indices for each observation in X.

Return type


fit(X, y)[source]#

Fits the PolyTree to the provided data.

get_graphviz(X=None, feature_names=None, file_name=None)[source]#

Generates a graphviz visualisation of the PolyTree.

  • X (numpy.ndarray, optional) – An ndarray with shape (dimensions) containing an input vector for a given sample, to highlight in the tree.

  • feature_names (list, optional) – A list of the names of the features used in the training data.

  • filename (str, optional) – Filename to write graphviz data to. If None (default) then rendered in-place, if 'source', the raw graphviz string is returned.


Returns the node indices for all leaf nodes.


Contains the node indices of all leaf nodes.

Return type



Computes the mean and variance of the polynomial tree model.


Tuple (mean,variance) containing two floats; the approximated mean and variance from the fitted PolyTree.

Return type



Returns the node corresponding to a given node number.


inode (int) – The node number.


Dictionary containing the data for the requested node.

Return type



Returns the tree paths for the leaf nodes in the tree.


X (numpy.ndarray, optional) – Array with shape (number_of_observations, dimensions) to apply the tree to. If given, paths will only be returned for leaves which contain observations.


Dictionary containing a dict for each leaf node. Indexed by the node indices for the leaf nodes.

Return type



Returns all of the polynomials fitted at each node in the tree.


A list of Poly objects.

Return type



Returns all of the data splits made.


A list of splits made in the format of a nested list: [[split, dimension], …]

Return type


plot_decision_surface(ij, ax=None, X=None, y=None, max_depth=None, label=True, color='data', colorbar=True, show=True, kwargs={})[source]#

Plots the decision boundaries of the PolyTree over a 2D surface. See plot_decision_surface() for full description.


Evaluates the the polynomial tree approximation of the data.


X (numpy.ndarray) – An ndarray with shape (number_of_observations, dimensions) at which the tree fit must be evaluated at.


Array with shape (1, number_of_observations) corresponding to the polynomial approximations of the tree.

Return type


prune(X, y, tol=0.0, percent=False)[source]#

Prunes the tree that you have fitted.

  • X (numpy.ndarray) – Training input data

  • y (numpy.ndarray) – Training output data

  • tol (float, optional) – Pruning tolerance (%). Prune nodes if they only improve loss by less than this tolerance.

  • percent (bool, optional) – If true, tol is taken as a percentage of the parent node’s error. Otherwise, tol is taken to be an absolute value.