Active subspacesΒΆ
Active subspaces represent a powerful set of ideas for output-based data-driven dimension reduction. In a nutshell, the idea is to identify a subspace along which the data (or data-generating function) can be well approximated. While this has some similarity (at least notionally) to principal component analysis (PCA), it is very different. In PCA, the input-output relationship is only delineated by the distribution of the inputs. However, in active subspaces, the input-output relationship is explicitly delienated by the gradient (or an approximation thereof) of the output with respect to the inputs.
import plotly.express as px
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import equadratures as eq
data = eq.datasets.load_eq_dataset('3Dfan_blades',verbose=False) # Download the data
X = data['X_a']
y = data['y1_a']
Downloading the 3Dfan_blades dataset from github...
We now feed this data to equadratures' Subspace class. Underneath the hood, the code fits a polynomial model to the data (via least squares) and then uses the gradients to compute the active subspace via the approximation
$$ C = \int \nabla_{x}p(x) \nabla_{x}p^{T}(x) \rho(x) dx $$
whereupon eigenvalues $\Lambda$ and eigenvectors $W$ can be computed via an eigendecomposition
$$ C = W \Lambda W^{T} $$
# Active subspace computation
sub = eq.Subspaces(sample_points=X, sample_outputs=y, \
method='active-subspaces')
W = sub.get_subspace()
We define the active subspace by the dominant eigenvectors.
# Plotting
df = pd.DataFrame({'Eigenvalues (log)': np.log10(np.abs(sub.get_eigenvalues())) ,
'Parameter #': np.arange(X.shape[1])})
df2 = pd.DataFrame({'Active-1': X @ W[:,0],
'Active-2': X @ W[:,1],
'Efficiency': y})
fig = px.scatter(df, x="Parameter #", y="Eigenvalues (log)", color="Eigenvalues (log)")
fig.update_layout(margin=dict(l=0, r=0, t=0, b=0))
fig.show()
fig = px.scatter_3d(df2, x='Active-1', y='Active-2', z='Efficiency',
color='Efficiency')
fig.update_layout(margin=dict(l=0, r=0, t=0, b=0))
fig.show()