Polynomial Regression Trees

class equadratures.polytree.PolyTree(splitting_criterion='model_aware', max_depth=5, min_samples_leaf=None, order=1, basis='total-order', search='exhaustive', samples=50, verbose=False, poly_method='least-squares', poly_solver_args={}, all_data=False, split_dims=None, k=0.05, distribution='uniform')[source]

Definition of a polynomial tree object.

Parameters
  • splitting_criterion (str, optional) – The type of splitting_criterion to use in the fit function. Options include model_aware which fits polynomials for each candidate split, model_agnostic which uses a standard deviation based model-agnostic split criterion [1], and loss_gradient which uses a gradient based splitting criterion similar to that in [2].

  • max_depth (int, optional) – The maximum depth which the tree will grow to.

  • min_samples_leaf (int, optional) – The minimum number of samples per leaf node.

  • order (int, optional) – The order of the generated orthogonal polynomials.

  • basis (str, optional) – The type of index set used for the basis. Options include: univariate, total-order, tensor-grid, sparse-grid and hyperbolic-basis.

  • search (str, optional) – The method of search to be used. Options are grid or exhaustive.

  • samples (int, optional) – The interval between splits if grid search is chosen.

  • verbose (bool, optional) – For debugging.

  • all_data (bool, optional) – Store data at all nodes in PolyTree (instead of only leaf nodes).

  • split_dims (list, optional) – List of dimensions along which to make splits.

  • k (float, optional) – The smoothing parameter. Range from 0.0 to 1.0, with 0 giving no smoothing, and 1 giving maximum smoothing.

  • distribution (str, optional) – The type of input parameter distributions. Either uniform or data.

Example

>>> tree = polytree.PolyTree()
>>> X = np.loadtxt('inputs.txt')
>>> Xtest = np.loadtxt('inputs_test.txt')
>>> y = np.loadtxt('outputs.txt')
>>> tree.fit(X,y)
>>> y_test = tree.predict(X_test)

References

  1. Wang, Y., Witten, I. H., (1997) Inducing Model Trees for Continuous Classes. In Proc. of the 9th European Conf. on Machine Learning Poster Papers. 128-137. Paper

  2. Broelemann, K., Kasneci, G., (2019) A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees. In Int. Joint Conf. on Artificial Intelligence (IJCAI). 2030-2037. Paper

  3. Chan, T. F., Golub, G. H., LeVeque, R. J., (1983) Algorithms for computing the sample variance: Analysis and recommendations. The American Statistician. 37(3): 242–247. Paper

apply(X)[source]

Returns the leaf node index for each observation in the data.

Parameters

X (numpy.ndarray) – Array with shape (number_of_observations, dimensions) at which the tree fit must be evaluated at.

Returns

A numpy.ndarray of shape (number_of_observations,1) corresponding to the node indices for each observation in X.

Return type

numpy.ndarray

fit(X, y)[source]

Fits the PolyTree to the provided data.

Parameters
get_graphviz(X=None, feature_names=None, file_name=None)[source]

Generates a graphviz visualisation of the PolyTree.

Parameters
  • X (numpy.ndarray, optional) – An ndarray with shape (dimensions) containing an input vector for a given sample, to highlight in the tree.

  • feature_names (list, optional) – A list of the names of the features used in the training data.

  • filename (str, optional) – Filename to write graphviz data to. If None (default) then rendered in-place, if 'source', the raw graphviz string is returned.

get_leaves()[source]

Returns the node indices for all leaf nodes.

Returns

Contains the node indices of all leaf nodes.

Return type

list

get_mean_and_variance()[source]

Computes the mean and variance of the polynomial tree model.

Returns

Tuple (mean,variance) containing two floats; the approximated mean and variance from the fitted PolyTree.

Return type

tuple

get_node(inode)[source]

Returns the node corresponding to a given node number.

Parameters

inode (int) – The node number.

Returns

Dictionary containing the data for the requested node.

Return type

dict

get_paths(X=None)[source]

Returns the tree paths for the leaf nodes in the tree.

Parameters

X (numpy.ndarray, optional) – Array with shape (number_of_observations, dimensions) to apply the tree to. If given, paths will only be returned for leaves which contain observations.

Returns

Dictionary containing a dict for each leaf node. Indexed by the node indices for the leaf nodes.

Return type

dict

get_polys()[source]

Returns all of the polynomials fitted at each node in the tree.

Returns

A list of Poly objects.

Return type

list

get_splits()[source]

Returns all of the data splits made.

Returns

A list of splits made in the format of a nested list: [[split, dimension], …]

Return type

list

plot_decision_surface(ij, ax=None, X=None, y=None, max_depth=None, label=True, color='data', colorbar=True, show=True, kwargs={})[source]

Plots the decision boundaries of the PolyTree over a 2D surface. See plot_decision_surface() for full description.

predict(X)[source]

Evaluates the the polynomial tree approximation of the data.

Parameters

X (numpy.ndarray) – An ndarray with shape (number_of_observations, dimensions) at which the tree fit must be evaluated at.

Returns

Array with shape (1, number_of_observations) corresponding to the polynomial approximations of the tree.

Return type

numpy.ndarray

prune(X, y, tol=0.0, percent=False)[source]

Prunes the tree that you have fitted.

Parameters
  • X (numpy.ndarray) – Training input data

  • y (numpy.ndarray) – Training output data

  • tol (float, optional) – Pruning tolerance (%). Prune nodes if they only improve loss by less than this tolerance.

  • percent (bool, optional) – If true, tol is taken as a percentage of the parent node’s error. Otherwise, tol is taken to be an absolute value.