Utility Classes

Modified Gram Matrix \(\mathbf{\tilde{K}}\)

skmatter.utils._pcovr_utils.pcovr_kernel(mixing, X, Y, **kernel_params)

Creates the PCovR modified kernel distances

\[\mathbf{\tilde{K}} = \alpha \mathbf{K} + (1 - \alpha) \mathbf{Y}\mathbf{Y}^T\]

the default kernel is the linear kernel, such that:

\[\mathbf{\tilde{K}} = \alpha \mathbf{X} \mathbf{X}^T + (1 - \alpha) \mathbf{Y}\mathbf{Y}^T\]
Parameters:
  • mixing (float) – mixing parameter, as described in PCovR as \({\alpha}\)

  • X (ndarray of shape (n x m)) – Data matrix \(\mathbf{X}\)

  • Y (ndarray of shape (n x p)) – Array to include in biased selection when mixing < 1

  • kernel_params (dict, optional) – dictionary of arguments to pass to pairwise_kernels if none are specified, assumes that the kernel is linear

Modified Covariance Matrix \(\mathbf{\tilde{C}}\)

skmatter.utils._pcovr_utils.pcovr_covariance(mixing, X, Y, rcond=1e-12, return_isqrt=False, rank=None, random_state=0, iterated_power='auto')

Creates the PCovR modified covariance

\[\mathbf{\tilde{C}} = \alpha \mathbf{X}^T \mathbf{X} + (1 - \alpha) \left(\left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}} \mathbf{X}^T \mathbf{\hat{Y}}\mathbf{\hat{Y}}^T \mathbf{X} \left(\mathbf{X}^T \mathbf{X}\right)^{-\frac{1}{2}}\right)\]

where \(\mathbf{\hat{Y}}`\) are the properties obtained by linear regression.

Parameters:
  • mixing (float) – mixing parameter, as described in PCovR as \({\alpha}\),

  • X (ndarray of shape (n x m)) – Data matrix \(\mathbf{X}\)

  • Y (ndarray of shape (n x p)) – Array to include in biased selection when mixing < 1

  • rcond (float, default=1E-12) – threshold below which eigenvalues will be considered 0,

  • return_isqrt (bool, default=False) – Whether to return the calculated inverse square root of the covariance. Used when inverse square root is needed and the pcovr_covariance has already been calculated

  • rank (int, default=min(X.shape)) – number of eigenpairs to estimate the inverse square root with

  • random_state (int, default=0) – random seed to use for randomized svd

Orthogonalizers for CUR

When computing non-iterative CUR, it is necessary to orthogonalize the input matrices after each selection. For this, we have supplied a feature and a sample orthogonalizer for feature and sample selection.

skmatter.utils._orthogonalizers.X_orthogonalizer(x1, c=None, x2=None, tol=1e-12, copy=False)

Orthogonalizes a feature matrix by the given columns. Can be used to orthogonalize by samples by calling X = X_orthogonalizer(X.T, row_index).T. After orthogonalization, each column of X will contain only what is orthogonal to X[:, c] or x2.

Parameters:
  • x1 (matrix of shape (n x m)) – feature matrix to orthogonalize

  • c (int, less than m, default=None) – index of the column to orthogonalize by

  • x2 (matrix of shape (n x a), default=x1[:, c]) – a separate set of columns to orthogonalize with respect to Note: the orthogonalizer will work column-by-column in column-index order

skmatter.utils._orthogonalizers.Y_feature_orthogonalizer(y, X, tol=1e-12, copy=True)

Orthogonalizes a property matrix given the selected features in \(\mathbf{X}\)

\[\mathbf{Y} \leftarrow \mathbf{Y} - \mathbf{X} \left(\mathbf{X}^T\mathbf{X}\right)^{-1}\mathbf{X}^T \mathbf{Y}\]
Parameters:
  • y (ndarray of shape (n_samples x n_properties)) – property matrix

  • X (ndarray of shape (n_samples x n_features)) – feature matrix

  • tol (float) – cutoff for small eigenvalues to send to np.linalg.pinv

  • copy (bool) – whether to return a copy of y or edit in-place, default=True

skmatter.utils._orthogonalizers.Y_sample_orthogonalizer(y, X, y_ref, X_ref, tol=1e-12, copy=True)

Orthogonalizes a matrix of targets \({\mathbf{Y}}`given a reference feature matrix :math:`{\mathbf{X}_r}\) and reference target matrix \({\mathbf{Y}_r}\):

\[\mathbf{Y} \leftarrow \mathbf{Y} - \mathbf{X} \left(\mathbf{X}_{\mathbf{r}}^T \mathbf{X}_{\mathbf{r}}\right)^{-1}\mathbf{X}_{\mathbf{r}}^T \mathbf{Y}_{\mathbf{r}}\]
Parameters:
  • y (ndarray of shape (n_samples x n_properties)) – property matrix

  • X (ndarray of shape (n_samples x n_features)) – feature matrix

  • y_ref (ndarray of shape (n_ref x n_properties)) – reference property matrix

  • X_ref (ndarray of shape (n_ref x n_features)) – reference feature matrix

  • tol (float) – cutoff for small eigenvalues to send to np.linalg.pinv

  • copy (bool) – whether to return a copy of y or edit in-place, default=True

Random Partitioning with Overlaps