Utility Classes¶

Modified Gram Matrix \(\mathbf{\tilde{K}}\)¶

skmatter.utils._pcovr_utils.pcovr_kernel(mixing, X, Y, **kernel_params)¶

Creates the PCovR modified kernel distances

\[\mathbf{\tilde{K}} = \alpha \mathbf{K} + (1 - \alpha) \mathbf{Y}\mathbf{Y}^T\]

the default kernel is the linear kernel, such that:

\[\mathbf{\tilde{K}} = \alpha \mathbf{X} \mathbf{X}^T + (1 - \alpha) \mathbf{Y}\mathbf{Y}^T\]

Parameters:

mixing (float) – mixing parameter, as described in PCovR as \({\alpha}\)
X (ndarray of shape (n x m)) – Data matrix \(\mathbf{X}\)
Y (ndarray of shape (n x p)) – Array to include in biased selection when mixing < 1
kernel_params (dict, optional) – dictionary of arguments to pass to pairwise_kernels if none are specified, assumes that the kernel is linear

Modified Covariance Matrix \(\mathbf{\tilde{C}}\)¶

skmatter.utils._pcovr_utils.pcovr_covariance(mixing, X, Y, rcond=1e-12, return_isqrt=False, rank=None, random_state=0, iterated_power='auto')¶

Creates the PCovR modified covariance

\[\mathbf{\tilde{C}} = \alpha \mathbf{X}^T \mathbf{X} + (1 - \alpha) \left(\left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}} \mathbf{X}^T \mathbf{\hat{Y}}\mathbf{\hat{Y}}^T \mathbf{X} \left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}}\right)\]

where \(\mathbf{\hat{Y}}`\) are the properties obtained by linear regression.

Parameters:

mixing (float) – mixing parameter, as described in PCovR as \({\alpha}\),
X (ndarray of shape (n x m)) – Data matrix \(\mathbf{X}\)
Y (ndarray of shape (n x p)) – Array to include in biased selection when mixing < 1
rcond (float, default=1E-12) – threshold below which eigenvalues will be considered 0,
return_isqrt (bool, default=False) – Whether to return the calculated inverse square root of the covariance. Used when inverse square root is needed and the pcovr_covariance has already been calculated
rank (int, default=min(X.shape)) – number of eigenpairs to estimate the inverse square root with
random_state (int, default=0) – random seed to use for randomized svd

Orthogonalizers for CUR¶

When computing non-iterative CUR, it is necessary to orthogonalize the input matrices after each selection. For this, we have supplied a feature and a sample orthogonalizer for feature and sample selection.

skmatter.utils._orthogonalizers.X_orthogonalizer(x1, c=None, x2=None, tol=1e-12, copy=False)¶

Orthogonalizes a feature matrix by the given columns. Can be used to orthogonalize by samples by calling X = X_orthogonalizer(X.T, row_index).T. After orthogonalization, each column of X will contain only what is orthogonal to X[:, c] or x2.

Parameters:

x1 (matrix of shape (n x m)) – feature matrix to orthogonalize
c (int, less than m, default=None) – index of the column to orthogonalize by
x2 (matrix of shape (n x a), default=x1[:, c]) – a separate set of columns to orthogonalize with respect to Note: the orthogonalizer will work column-by-column in column-index order

skmatter.utils._orthogonalizers.Y_feature_orthogonalizer(y, X, tol=1e-12, copy=True)¶

Orthogonalizes a property matrix given the selected features in \(\mathbf{X}\)

\[\mathbf{Y} \leftarrow \mathbf{Y} - \mathbf{X} \left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T \mathbf{Y}\]

Parameters:

y (ndarray of shape (n_samples x n_properties)) – property matrix
X (ndarray of shape (n_samples x n_features)) – feature matrix
tol (float) – cutoff for small eigenvalues to send to np.linalg.pinv
copy (bool) – whether to return a copy of y or edit in-place, default=True

skmatter.utils._orthogonalizers.Y_sample_orthogonalizer(y, X, y_ref, X_ref, tol=1e-12, copy=True)¶

Orthogonalizes a matrix of targets \({\mathbf{Y}}`given a reference feature matrix :math:`{\mathbf{X}_r}\) and reference target matrix \({\mathbf{Y}_r}\):

\[\mathbf{Y} \leftarrow \mathbf{Y} - \mathbf{X} \left(\mathbf{X}_{\mathbf{r}}^T \mathbf{X}_{\mathbf{r}}\right)^{-1}\mathbf{X}_{\mathbf{r}}^T \mathbf{Y}_{\mathbf{r}}\]

Parameters:

y (ndarray of shape (n_samples x n_properties)) – property matrix
X (ndarray of shape (n_samples x n_features)) – feature matrix
y_ref (ndarray of shape (n_ref x n_properties)) – reference property matrix
X_ref (ndarray of shape (n_ref x n_features)) – reference feature matrix
tol (float) – cutoff for small eigenvalues to send to np.linalg.pinv
copy (bool) – whether to return a copy of y or edit in-place, default=True

Utility Classes¶

Modified Gram Matrix \(\mathbf{\tilde{K}}\)¶

Modified Covariance Matrix \(\mathbf{\tilde{C}}\)¶

Orthogonalizers for CUR¶

Random Partitioning with Overlaps¶