Polynomial Regression Trees#

class equadratures.polytree.PolyTree(splitting_criterion='model_aware', max_depth=5, min_samples_leaf=None, order=1, basis='total-order', search='exhaustive', samples=50, verbose=False, poly_method='least-squares', poly_solver_args={}, all_data=False, split_dims=None, k=0.05, distribution='uniform')[source]#

Definition of a polynomial tree object.

Parameters

splitting_criterion (str, optional) – The type of splitting_criterion to use in the fit function. Options include model_aware which fits polynomials for each candidate split, model_agnostic which uses a standard deviation based model-agnostic split criterion [1], and loss_gradient which uses a gradient based splitting criterion similar to that in [2].
max_depth (int, optional) – The maximum depth which the tree will grow to.
min_samples_leaf (int, optional) – The minimum number of samples per leaf node.
order (int, optional) – The order of the generated orthogonal polynomials.
basis (str, optional) – The type of index set used for the basis. Options include: univariate, total-order, tensor-grid, sparse-grid and hyperbolic-basis.
search (str, optional) – The method of search to be used. Options are grid or exhaustive.
samples (int, optional) – The interval between splits if grid search is chosen.
verbose (bool, optional) – For debugging.
all_data (bool, optional) – Store data at all nodes in PolyTree (instead of only leaf nodes).
split_dims (list, optional) – List of dimensions along which to make splits.
k (float, optional) – The smoothing parameter. Range from 0.0 to 1.0, with 0 giving no smoothing, and 1 giving maximum smoothing.
distribution (str, optional) – The type of input parameter distributions. Either uniform or data.

Example

>>> tree = polytree.PolyTree()
>>> X = np.loadtxt('inputs.txt')
>>> Xtest = np.loadtxt('inputs_test.txt')
>>> y = np.loadtxt('outputs.txt')
>>> tree.fit(X,y)
>>> y_test = tree.predict(X_test)

References

Wang, Y., Witten, I. H., (1997) Inducing Model Trees for Continuous Classes. In Proc. of the 9th European Conf. on Machine Learning Poster Papers. 128-137. Paper
Broelemann, K., Kasneci, G., (2019) A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees. In Int. Joint Conf. on Artificial Intelligence (IJCAI). 2030-2037. Paper
Chan, T. F., Golub, G. H., LeVeque, R. J., (1983) Algorithms for computing the sample variance: Analysis and recommendations. The American Statistician. 37(3): 242–247. Paper

apply(X)[source]#

Returns the leaf node index for each observation in the data.

Parameters: X (numpy.ndarray) – Array with shape (number_of_observations, dimensions) at which the tree fit must be evaluated at.
Returns: A numpy.ndarray of shape (number_of_observations,1) corresponding to the node indices for each observation in X.
Return type: numpy.ndarray

fit(X, y)[source]#

Fits the PolyTree to the provided data.

Parameters

X (numpy.ndarray) – Training input data
y (numpy.ndarray) – Training output data

get_graphviz(X=None, feature_names=None, file_name=None)[source]#

Generates a graphviz visualisation of the PolyTree.

Parameters

X (numpy.ndarray, optional) – An ndarray with shape (dimensions) containing an input vector for a given sample, to highlight in the tree.
feature_names (list, optional) – A list of the names of the features used in the training data.
filename (str, optional) – Filename to write graphviz data to. If None (default) then rendered in-place, if 'source', the raw graphviz string is returned.

get_leaves()[source]#

Returns the node indices for all leaf nodes.

Returns: Contains the node indices of all leaf nodes.
Return type: list

get_mean_and_variance()[source]#

Computes the mean and variance of the polynomial tree model.

Returns: Tuple (mean,variance) containing two floats; the approximated mean and variance from the fitted PolyTree.
Return type: tuple

get_node(inode)[source]#

Returns the node corresponding to a given node number.

Parameters: inode (int) – The node number.
Returns: Dictionary containing the data for the requested node.
Return type: dict

get_paths(X=None)[source]#

Returns the tree paths for the leaf nodes in the tree.

Parameters: X (numpy.ndarray, optional) – Array with shape (number_of_observations, dimensions) to apply the tree to. If given, paths will only be returned for leaves which contain observations.
Returns: Dictionary containing a dict for each leaf node. Indexed by the node indices for the leaf nodes.
Return type: dict

get_polys()[source]#

Returns all of the polynomials fitted at each node in the tree.

Returns: A list of Poly objects.
Return type: list

get_splits()[source]#

Returns all of the data splits made.

Returns: A list of splits made in the format of a nested list: [[split, dimension], …]
Return type: list

plot_decision_surface(ij, ax=None, X=None, y=None, max_depth=None, label=True, color='data', colorbar=True, show=True, kwargs={})[source]#: Plots the decision boundaries of the PolyTree over a 2D surface. See plot_decision_surface() for full description.

predict(X)[source]#

Evaluates the the polynomial tree approximation of the data.

Parameters: X (numpy.ndarray) – An ndarray with shape (number_of_observations, dimensions) at which the tree fit must be evaluated at.
Returns: Array with shape (1, number_of_observations) corresponding to the polynomial approximations of the tree.
Return type: numpy.ndarray

prune(X, y, tol=0.0, percent=False)[source]#

Prunes the tree that you have fitted.

Parameters

X (numpy.ndarray) – Training input data
y (numpy.ndarray) – Training output data
tol (float, optional) – Pruning tolerance (%). Prune nodes if they only improve loss by less than this tolerance.
percent (bool, optional) – If true, tol is taken as a percentage of the parent node’s error. Otherwise, tol is taken to be an absolute value.