Utilities¶

class mellon.util.GaussianProcessType(value)View on GitHub ¶

Bases: Enum

Defines types of Gaussian Process (GP) computations for various estimators within the mellon library: mellon.model.DensityEstimator, mellon.model.FunctionEstimator, mellon.model.DimensionalityEstimator, mellon.model.TimeSensitiveDensityEstimator.

This enum can be passed through the gp_type attribute to the mentioned estimators. If a string representing one of these values is passed alternatively, the from_string() method is called to convert it to a GaussianProcessType.

Options are ‘full’, ‘full_nystroem’, ‘sparse_cholesky’, ‘sparse_nystroem’.

FULL = 'full'¶

FULL_NYSTROEM = 'full_nystroem'¶

SPARSE_CHOLESKY = 'sparse_cholesky'¶

SPARSE_NYSTROEM = 'sparse_nystroem'¶

static from_string(s: str, optional: bool = False)View on GitHub ¶

Converts a string to a GaussianProcessType object or raises an error.

Parameters:

s (str) –
The type of Gaussian Process (GP). Options are:
- ’full’: None-sparse GP
- ’full_nystroem’: Sparse GP with Nyström rank reduction
- ’sparse_cholesky’: Sparse GP using landmarks/inducing points
- ’sparse_nystroem’: Sparse GP along with an improved Nyström rank reduction
optional (bool, optional) – Specifies if the input is optional. Returns None if True and input is None.

Returns:

Corresponding Gaussian Process type.

Return type:

GaussianProcessType

Raises:

ValueError – If the input does not correspond to any known Gaussian Process type.

mellon.util.add_variance(K, M=None, jitter=1e-06)View on GitHub ¶

Computes \(K + MM^T\) where the diagonal of \(MM^T\) is at least jitter. This function stabilizes \(K\) for the Cholesky decomposition if not already stable enough through adding \(MM^T\).

Parameters:

K (array_like, shape (n, n)) – A covariance matrix.
M (array_like, shape (n, p), optional) – Left factor of additional variance. Default is 0.
jitter (float, optional) – A small number to stabilize the covariance matrix. Defaults to 1e-6.

Returns:

combined_covariance – A combined covariance matrix that is more stably positive definite.

Return type:

array_like, shape (n, n)

Notes

If M is None, the function will add the jitter to the diagonal of K to make it more stable. Otherwise, it will add \(MM^T\) and correct the diagonal based on the jitter parameter.

mellon.util.batched_vmap(func, x, *args, batch_size=100)View on GitHub ¶: Apply function in batches to save memory.

mellon.util.deserialize(serializable_x)View on GitHub ¶

Convert the serializable input back into the original format.

Parameters:: serializable_x (variable) – The input variable that is in a serializable format.
Returns:: x – The input variable converted back into its original format.
Return type:: variable

mellon.util.distance(x, y)View on GitHub ¶

Computes the distances between each point in x and y.

Parameters:

x (array-like) – A set of points.
y (array-like) – A set of points.

Returns:

distances - The distance between each point in x and y.

Return type:

array-like

mellon.util.distance_grad(x, eps=1e-12)View on GitHub ¶

Produces a function that computes the Euclidean distance from a fixed set of points x to any other set of points y, and the gradient of this distance with respect to y.

Parameters:

x (array-like) – A fixed set of points. The array should have a shape of (n, d), where n is the number of points and d is the dimensionality of each point.
eps (float, optional) – A small epsilon value added to the distance to avoid division by zero when computing the gradient. Default is 1e-12.

Returns:

A function that, when called with an array-like y of shape (m, d), returns a tuple containing: - distance : ndarray

An array of shape (n, m) representing the Euclidean distances from each point in x to each point in y.

gradient : ndarray An array of shape (n, m, d) representing the gradient of the distance with respect to each point in y.

Return type:

function

Examples

>>> x = np.array([[0, 0], [1, 1]])
>>> dist_grad_func = distance_grad(x)
>>> y = np.array([[1, 0], [0, 1]])
>>> distance, gradient = dist_grad_func(y)
>>> distance.shape
(2, 2)
>>> gradient.shape
(2, 2, 2)

Notes

The gradient is computed using the formula for the derivative of the Euclidean distance with respect to the points y. The epsilon value is added to the computed squared distances before taking the square root to ensure numerical stability.

mellon.util.ensure_2d(X)View on GitHub ¶

Ensures that the input JAX array, X, is at least 2-dimensional.

Parameters:: X (jnp.array) – The input JAX array to be made 2-dimensional.
Returns:: The input array transformed to a 2-dimensional array.
Return type:: jnp.array

If X is 1-dimensional, it is reshaped to a 2-dimensional array, where each element of X becomes a row in the 2-dimensional array.

mellon.util.expand_to_inactive(values, target_shape, active_dims)View on GitHub ¶

Expand the values to the full target shape with zeros in inactive dimensions, specifically targeting the last dimension based on active_dims.

Parameters:

values (jax.numpy.ndarray) – Gradient values corresponding to the active dimensions.
target_shape (tuple) – The shape of the full gradient array to be constructed.
active_dims (slice) – Specifies the active dimensions as a slice object.

Returns:

An array with the specified target shape, where values are set in active dimensions of the last dimension, and zeros are placed elsewhere.

Return type:

jax.numpy.ndarray

mellon.util.local_dimensionality(x, k=30, x_query=None, neighbor_idx=None)View on GitHub ¶

Compute an estimate of the local fractal dimension of a data set using nearest neighbors.

Parameters:

x (array-like of shape (n_samples, n_features)) – The input samples.
k (int, optional) – The number of neighbors to consider, defaults to 30.
x_query (array-like of shape (n_queries, n_features), optional) – The points at which to compute the local fractal dimension. If None, use x itself, defaults to None.
neighbor_idx (array-like of shape (n_queries, k), optional) – The indices of the neighbors for each query point. If None, these are computed using a nearest neighbor search, defaults to None.

Returns:

The estimated local fractal dimension at each query point.

Return type:

array-like of shape (n_queries,)

This function computes the local fractal dimension of a dataset at query points. It uses nearest neighbors and fits a line in log-log space to estimate the fractal dimension.

mellon.util.make_multi_time_argument(func)View on GitHub ¶

Decorator to modify a method to optionally take a multi-time argument.

This decorator modifies the method it wraps to take an optional multi_time keyword argument. If multi_time is provided, the decorated method will be called once for each value in multi_time with that value passed as the time argument to the method.

The original method’s signature and docstring are preserved.

Parameters:: func (callable) – The method to be modified. This method must take a time keyword argument.
Returns:: The modified method.
Return type:: callable

Examples

class MyClass:
    @make_multi_time_argument
    def method(self, x, time=None):
        return x + time

my_object = MyClass()
print(my_object.method(1, multi_time=np.array([1, 2, 3])))
# Output: array([2, 3, 4])

mellon.util.make_serializable(x)View on GitHub ¶

Convert the input into a serializable format.

Parameters:: x (variable) – The input variable that can be array-like, slice, scalar or dict.
Returns:: serializable_x – The input variable converted into a serializable format.
Return type:: variable

mellon.util.mle(nn_distances, d)View on GitHub ¶

Nearest Neighbor distribution maximum likelihood estimate for log density given observed nearest neighbor distances \(nn\text{_}distances\) in dimensions \(d\): \(mle = \log(\text{gamma}(d/2 + 1)) - (d/2) \cdot \log(\pi) - d \cdot \log(nn\text{_}distances)\)

Parameters:

nn_distances (array-like) – The observed nearest neighbor distances.
d (int) – The local dimensionality of the data.

Returns:

\(mle\) - The maximum likelihood estimate at each point.

Return type:

array-like

mellon.util.object_str(obj: object, dim_names: List[str] | None = None) → strView on GitHub ¶

Generate a concise string representation of metadata for array-like objects.

Parameters:

obj (object) – Object for which to generate metadata string.
dim_names (list of str, optional) – Names for dimensions, used for array-like objects.

Returns:

Metadata string.

Return type:

str

Examples

>>> object_metadata_str(np.array([[1, 2], [3, 4]]), dim_names=['row', 'col'])
'<array 2 row x 2 col, dtype=int64>'

>>> object_metadata_str(np.array([1, 2, 3]), dim_names=['element'])
'<array 3 element, dtype=int64>'

>>> object_metadata_str("hello")
'hello'

mellon.util.select_active_dims(x, active_dims)View on GitHub ¶

Select the active dimensions from the input.

Parameters:

x (array-like) – Input array.
selected_dimensions (array-like, slice or scalar) – The indices of the active dimensions. It could be a scalar, a list, a numpy array, or a slice object.

Returns:

x – Array with selected dimensions.

Return type:

array-like

mellon.util.set_jax_config(enable_x64=True, platform_name='cpu')View on GitHub ¶

Sets up the JAX configuration with the specified settings.

Parameters:

enable_x64 (bool, optional) – Whether to enable 64-bit (double precision) computations in JAX. Defaults to True.
platform_name (str, optional) – The platform name to use in JAX (‘cpu’, ‘gpu’, or ‘tpu’). Defaults to ‘cpu’.

mellon.util.set_verbosity(verbose: bool)View on GitHub ¶

Adjusts the verbosity of mellon logging.

Parameters:: verbose (bool) – If True, sets the logging level to INFO to show detailed logs. If False, sets it to WARNING to show only important messages.

Notes

This function provides a simplified interface for controlling logging verbosity, making it more accessible to users who are not familiar with the logging module’s levels. You can access the mellon logger through mellon.logger to perform logging operations throughout the mellon module.

Example

To enable detailed logging, showing more information:

>>> set_verbosity(True)

To reduce the log output to only include warnings and errors:

>>> set_verbosity(False)

mellon.util.stabilize(A, jitter=1e-06)View on GitHub ¶

Add a small jitter to the diagonal for numerical stability.

Parameters:

A – A square matrix.
jitter (float) – The amount to add to the diagonal. Defaults to 1e-6.

Returns:

\(A'\) - The matrix \(A\) with a small jitter added to the diagonal.

Return type:

array-like

mellon.util.test_rank(input, tol=0.5, threshold=None)View on GitHub ¶

Inspects the approximate rank of the transformation matrix L. The input can be the matrix itself or an object containing L as an attribute. A high rank indicates a potentially insufficient latent representation, suggesting a need for a more complex transformation matrix. Also allows logging based on a rank fraction threshold.

Parameters:

input (array-like or mellon estimator object) – The matrix L or an object containing it as an attribute.
tol (float, optional) – The rank calculation tolerance, defaults to {DEFAULT_RANK_TOL}.
threshold (float, optional) – If provided, logs a message based on the rank fraction.

Returns:

The approximate rank of the matrix.

Return type:

int

Raises:

ValueError – If the input matrix is not 2D.