Polynomial Regression Trees#
- class equadratures.polytree.PolyTree(splitting_criterion='model_aware', max_depth=5, min_samples_leaf=None, order=1, basis='total-order', search='exhaustive', samples=50, verbose=False, poly_method='least-squares', poly_solver_args={}, all_data=False, split_dims=None, k=0.05, distribution='uniform')[source]#
Definition of a polynomial tree object.
- Parameters
splitting_criterion (str, optional) – The type of splitting_criterion to use in the fit function. Options include
model_aware
which fits polynomials for each candidate split,model_agnostic
which uses a standard deviation based model-agnostic split criterion [1], andloss_gradient
which uses a gradient based splitting criterion similar to that in [2].max_depth (int, optional) – The maximum depth which the tree will grow to.
min_samples_leaf (int, optional) – The minimum number of samples per leaf node.
order (int, optional) – The order of the generated orthogonal polynomials.
basis (str, optional) – The type of index set used for the basis. Options include:
univariate
,total-order
,tensor-grid
,sparse-grid
andhyperbolic-basis
.search (str, optional) – The method of search to be used. Options are
grid
orexhaustive
.samples (int, optional) – The interval between splits if
grid
search is chosen.verbose (bool, optional) – For debugging.
all_data (bool, optional) – Store data at all nodes in
PolyTree
(instead of only leaf nodes).split_dims (list, optional) – List of dimensions along which to make splits.
k (float, optional) – The smoothing parameter. Range from 0.0 to 1.0, with 0 giving no smoothing, and 1 giving maximum smoothing.
distribution (str, optional) – The type of input parameter distributions. Either
uniform
ordata
.
Example
>>> tree = polytree.PolyTree() >>> X = np.loadtxt('inputs.txt') >>> Xtest = np.loadtxt('inputs_test.txt') >>> y = np.loadtxt('outputs.txt') >>> tree.fit(X,y) >>> y_test = tree.predict(X_test)
References
Wang, Y., Witten, I. H., (1997) Inducing Model Trees for Continuous Classes. In Proc. of the 9th European Conf. on Machine Learning Poster Papers. 128-137. Paper
Broelemann, K., Kasneci, G., (2019) A Gradient-Based Split Criterion for Highly Accurate and Transparent Model Trees. In Int. Joint Conf. on Artificial Intelligence (IJCAI). 2030-2037. Paper
Chan, T. F., Golub, G. H., LeVeque, R. J., (1983) Algorithms for computing the sample variance: Analysis and recommendations. The American Statistician. 37(3): 242–247. Paper
- apply(X)[source]#
Returns the leaf node index for each observation in the data.
- Parameters
X (numpy.ndarray) – Array with shape (number_of_observations, dimensions) at which the tree fit must be evaluated at.
- Returns
A numpy.ndarray of shape (number_of_observations,1) corresponding to the node indices for each observation in X.
- Return type
- fit(X, y)[source]#
Fits the PolyTree to the provided data.
- Parameters
X (numpy.ndarray) – Training input data
y (numpy.ndarray) – Training output data
- get_graphviz(X=None, feature_names=None, file_name=None)[source]#
Generates a graphviz visualisation of the PolyTree.
- Parameters
X (numpy.ndarray, optional) – An ndarray with shape (dimensions) containing an input vector for a given sample, to highlight in the tree.
feature_names (list, optional) – A list of the names of the features used in the training data.
filename (str, optional) – Filename to write graphviz data to. If
None
(default) then rendered in-place, if'source'
, the raw graphviz string is returned.
- get_leaves()[source]#
Returns the node indices for all leaf nodes.
- Returns
Contains the node indices of all leaf nodes.
- Return type
- get_mean_and_variance()[source]#
Computes the mean and variance of the polynomial tree model.
- Returns
Tuple (mean,variance) containing two floats; the approximated mean and variance from the fitted PolyTree.
- Return type
- get_paths(X=None)[source]#
Returns the tree paths for the leaf nodes in the tree.
- Parameters
X (numpy.ndarray, optional) – Array with shape (number_of_observations, dimensions) to apply the tree to. If given, paths will only be returned for leaves which contain observations.
- Returns
Dictionary containing a dict for each leaf node. Indexed by the node indices for the leaf nodes.
- Return type
- get_polys()[source]#
Returns all of the polynomials fitted at each node in the tree.
- Returns
A list of Poly objects.
- Return type
- get_splits()[source]#
Returns all of the data splits made.
- Returns
A list of splits made in the format of a nested list: [[split, dimension], …]
- Return type
- plot_decision_surface(ij, ax=None, X=None, y=None, max_depth=None, label=True, color='data', colorbar=True, show=True, kwargs={})[source]#
Plots the decision boundaries of the PolyTree over a 2D surface. See
plot_decision_surface()
for full description.
- predict(X)[source]#
Evaluates the the polynomial tree approximation of the data.
- Parameters
X (numpy.ndarray) – An ndarray with shape (number_of_observations, dimensions) at which the tree fit must be evaluated at.
- Returns
Array with shape (1, number_of_observations) corresponding to the polynomial approximations of the tree.
- Return type
- prune(X, y, tol=0.0, percent=False)[source]#
Prunes the tree that you have fitted.
- Parameters
X (numpy.ndarray) – Training input data
y (numpy.ndarray) – Training output data
tol (float, optional) – Pruning tolerance (%). Prune nodes if they only improve loss by less than this tolerance.
percent (bool, optional) – If true, tol is taken as a percentage of the parent node’s error. Otherwise, tol is taken to be an absolute value.