Model¶

class mellon.model.DensityEstimator(cov_func_curry=<class 'mellon.cov.Matern52'>, n_landmarks=None, rank=None, gp_type=None, d_method='embedding', jitter=1e-06, optimizer='L-BFGS-B', n_iter=100, init_learn_rate=0.1, landmarks=None, nn_distances=None, d=None, mu=None, ls=None, ls_factor=1, cov_func=None, Lp=None, L=None, initial_value=None, predictor_with_uncertainty=False, jit=False, check_rank=None)View on GitHub ¶

Bases: BaseEstimator

A class for non-parametric density estimation. It performs Bayesian inference with a Gaussian process prior and Nearest Neighbor likelihood. All intermediate computations are cached as instance variables, which allows viewing intermediate results and saving computation time by passing precomputed values as arguments to a new model.

Parameters:

cov_func_curry (function or type) – The generator of the Gaussian process covariance function. This must be a curry that takes one length scale argument and returns a covariance function of the form k(x, y) \(\rightarrow\) float. Defaults to Matern52.
n_landmarks (int) – The number of landmark/inducing points. Only used if a sparse GP is indicated through gp_type. If 0 or equal to the number of training points, inducing points will not be computed or used. Defaults to 5000.
rank (int or float) – The rank of the approximate covariance matrix for the Nyström rank reduction. If rank is an int, an \(n \times\) rank matrix \(L\) is computed such that \(L L^\top \approx K\), where K is the exact \(n \times n\) covariance matrix. If rank is a float 0.0 \(\le\) rank \(\le\) 1.0, the rank/size of \(L\) is selected such that the included eigenvalues of the covariance between landmark points account for the specified percentage of the sum of eigenvalues. It is ignored if gp_type does not indicate a Nyström rank reduction. Defaults to 0.99.
gp_type (str or GaussianProcessType) –
The type of sparcification used for the Gaussian Process
- ’full’ None-sparse Gaussian Process
- ’full_nystroem’ Sparse GP with Nyström rank reduction without landmarks,
  which lowers the computational complexity.
- ’sparse_cholesky’ Sparse GP using landmarks/inducing points,
  typically employed to enable scalable GP models.
- ’sparse_nystroem’ Sparse GP using landmarks or inducing points,
  along with an improved Nyström rank reduction method.
The value can be either a string matching one of the above options or an instance of the mellon.util.GaussianProcessType Enum. If a partial match is found with the Enum, a warning will be logged, and the closest match will be used. Defaults to ‘sparse_cholesky’.
d_method (str) –
The method to compute the intrinsic dimensionality of the data. Implemented options are
- ’embedding’: uses the embedding dimension x.shape[1]
- ’fractal’: uses the average fractal dimension (experimental)
Defaults to ‘embedding’.
jitter (float) – A small amount added to the diagonal of the covariance matrix to bind eigenvalues numerically away from 0, ensuring numerical stability. Defaults to 1e-6.
optimizer (str) – The optimizer for the maximum a posteriori or posterior density estimation. Options are ‘L-BFGS-B’, stochastic optimizer ‘adam’, or automatic differentiation variational inference ‘advi’. Defaults to ‘L-BFGS-B’.
n_iter (int) – The number of optimization iterations. Defaults to 100.
init_learn_rate (float) – The initial learning rate. Defaults to 1.
landmarks (array-like or None) – The points used to quantize the data for the approximate covariance. If None, landmarks are set as k-means centroids with k=n_landmarks. This is ignored if n_landmarks is greater than or equal to the number of training points. Defaults to None.
nn_distances (array-like or None) – The nearest neighbor distances at each data point. If None, the nearest neighbor distances are computed automatically, using a KDTree if the dimensionality of the data is less than 20, or a BallTree otherwise. Defaults to None.
d (int, array-like or None) – The intrinsic dimensionality of the data, i.e., the dimensionality of the embedded manifold. If None, d is set to the size of axis 1 of the training data points. Defaults to None.
mu (float or None) – The mean \(\mu\) of the Gaussian process. If None, sets \(\mu\) to the 1st percentile of \(\text{mle}(\text{nn_distances}, d) - 10\), where \(\text{mle} = \log(\text{gamma}(d/2 + 1)) - (d/2) \cdot \log(\pi) - d \cdot \log(\text{nn_distances})\). Defaults to None.
ls (float or None) – The length scale of the Gaussian process covariance function. If None, ls is set to the geometric mean of the nearest neighbor distances times a constant. If cov_func is supplied explicitly, ls has no effect. Defaults to None.
ls_factor (float, optional) – A scaling factor applied to the length scale when it’s automatically selected. It is used to manually adjust the automatically chosen length scale for finer control over the model’s sensitivity to variations in the data.
cov_func (mellon.Covaraince or None) – The Gaussian process covariance function as instance of mellon.Covaraince. If None, the covariance function cov_func is automatically generated as cov_func_curry(ls). Defaults to None.
Lp (array-like or None) – A matrix such that \(L_p L_p^\top = \Sigma_p\), where \(\Sigma_p\) is the covariance matrix of the inducing points (all cells in non-sparse GP). Not used when Nyström rank reduction is employed. Defaults to None.
L (array-like or None) – A matrix such that \(L L^\top \approx K\), where \(K\) is the covariance matrix. If None, L is computed automatically. Defaults to None.
initial_value (array-like or None) – The initial guess for optimization. If None, the value \(z\) that minimizes \(||Lz + \mu - mle|| + ||z||\) is found, where \(\text{mle} = \log(\text{gamma}(d/2 + 1)) - (d/2) \cdot \log(\pi) - d \cdot \log(\text{nn_distances})\) and \(d\) is the intrinsic dimensionality of the data. Defaults to None.
predictor_with_uncertainty (bool) –
If set to True, computes the predictor instance .predict with its predictive uncertainty. The uncertainty comes from two sources:
1. .predict.mean_covariance:
  Uncertainty arising from the posterior distribution of the Bayesian inference. This component quantifies uncertainties inherent in the model’s parameters and structure. Available only if .pre_transformation_std is defined (e.g., using optimizer=”advi”), which reflects the standard deviation of the latent variables before transformation.
2. .predict.covariance:
  Uncertainty for out-of-bag states originating from the compressed function representation in the Gaussian Process. Specifically, this uncertainty corresponds to locations that are not inducing points of the Gaussian Process and represents the covariance of the conditional normal distribution.
jit (bool) – Use jax just-in-time compilation for loss and its gradient during optimization. Defaults to False.
check_rank (bool) – Weather to check if landmarks allow sufficient complexity by checking the approximate rank of the covariance matrix. This only applies to the non-Nyström gp_types. If set to None the rank check is only performed if n_landmarks >= n_samples/10. Defaults to None.

fit(x=None, build_predict=True)View on GitHub ¶

Trains the model from end to end. This includes preparing the model for inference, running the inference, and post-processing the inference results.

Parameters:

x (array-like of shape (n_samples, n_features), default=None) – The training instances where n_samples is the number of samples and n_features is the number of features.
build_predict (bool, default=True) – Whether to build the prediction function after training.

Returns:

self – This method returns self for chaining.

Return type:

object

fit_predict(x=None, build_predict=False)View on GitHub ¶

Trains the model and predicts the log density at the training points.

Parameters:

x (array-like of shape (n_samples, n_features), default=None) – The training instances where n_samples is the number of samples and n_features is the number of features.
build_predict (bool, default=False) – Whether to build the prediction function after training.

Raises:

ValueError – If the input x is not consistent with the training data used before.

Returns:

The log density at each training point in x.

Return type:

array-like

property predictView on GitHub ¶

A property that returns an instance of the mellon.Predictor class. This predictor can be used to predict the log density for new data points by calling the instance like a function.

The predictor instance also supports serialization features, which allow for saving and loading the predictor’s state. For more details, refer to the mellon.Predictor documentation.

Returns:: A predictor instance that computes the log density at each new data point.
Return type:: mellon.Predictor

Example

>>> log_density = model.predict(Xnew)

prepare_inference(x)View on GitHub ¶

Set all attributes in preparation for optimization, but do not perform Bayesian inference. It is not necessary to call this function before calling fit.

Parameters:: x (array-like) – The training instances to estimate density function.
Returns:: loss_func, initial_value - The Bayesian loss function and initial guess for optimization.
Return type:: function, array-like

process_inference(pre_transformation=None, build_predict=True)View on GitHub ¶

Use the optimized parameters to compute the log density at the training points. If build_predict, also build the prediction function.

Parameters:

pre_transformation (array-like) – The optimized parameters. If None, uses the stored pre_transformation attribute.
build_predict (bool) – Whether or not to build the prediction function. Defaults to True.

Returns:

log_density_x - The log density

Return type:

array-like

run_inference(loss_func=None, initial_value=None, optimizer=None)View on GitHub ¶

Perform Bayesian inference, optimizing the pre_transformation parameters. If you would like to run your own inference procedure, use the loss_function and initial_value attributes and set pre_transformation to the optimized parameters.

Parameters:

loss_func (function) – The Bayesian loss function. If None, uses the stored loss_func attribute.
initial_value (array-like) – The initial guess for optimization. If None, uses the stored initial_value attribute.

Returns:

pre_transformation - The optimized parameters.

Return type:

array-like

set_x(x)View on GitHub ¶

Sets the training instances (x) for the model and validates that they are formatted correctly.

Parameters:: x (array-like of shape (n_samples, n_features)) – The training instances where n_samples is the number of samples and n_features is the number of features. Each sample is an array of features representing a point in the feature space.
Returns:: The validated training instances.
Return type:: array-like of shape (n_samples, n_features)
Raises:: ValueError – If the input x is not valid. For instance, x may not be valid if it is not a numerical array-like object, or if its shape does not match the required shape (n_samples, n_features).

class mellon.model.DimensionalityEstimator(cov_func_curry=<class 'mellon.cov.Matern52'>, n_landmarks=None, rank=None, gp_type=None, jitter=1e-06, optimizer='L-BFGS-B', n_iter=100, init_learn_rate=0.1, landmarks=None, k=10, distances=None, d=None, mu_dim=0, mu_dens=None, ls=None, ls_factor=1, cov_func=None, Lp=None, L=None, initial_value=None, predictor_with_uncertainty=False, jit=False, check_rank=None)View on GitHub ¶

Bases: BaseEstimator

This class provides a non-parametric method for estimating local dimensionality and density. It uses Bayesian inference with a Gaussian process prior and a normal distribution for local scaling rates. The class caches all intermediate computations as instance variables, enabling users to view intermediate results and save computational time by passing precomputed values to a new model instance.

Parameters:

cov_func_curry (function or type, optional (default=Matern52)) – A generator for the Gaussian process covariance function. It should be a curry function taking one length scale argument and returning a covariance function of the form k(x, y) \(\rightarrow\) float.
n_landmarks (int, optional (default=5000)) – The number of landmark/inducing points. Only used if a sparse GP is indicated through gp_type. If 0 or equal to the number of training points, inducing points will not be computed or used.
rank (int or float, optional (default=0.99)) – The rank of the approximate covariance matrix for the Nyström rank reduction. If rank is an int, an \(n \times\) rank matrix \(L\) is computed such that \(L L^\top \approx K\), where K is the exact \(n \times n\) covariance matrix. If rank is a float 0.0 \(\le\) rank \(\le\) 1.0, the rank/size of \(L\) is selected such that the included eigenvalues of the covariance between landmark points account for the specified percentage of the sum of eigenvalues. It is ignored if gp_type does not indicate a Nyström rank reduction.
gp_type (str or GaussianProcessType, optional (default='sparse_cholesky')) –
The type of sparcification used for the Gaussian Process:
- ’full’ None-sparse Gaussian Process
- ’full_nystroem’ Sparse GP with Nyström rank reduction without landmarks,
  which lowers the computational complexity.
- ’sparse_cholesky’ Sparse GP using landmarks/inducing points,
  typically employed to enable scalable GP models.
- ’sparse_nystroem’ Sparse GP using landmarks or inducing points,
  along with an improved Nyström rank reduction method.
The value can be either a string matching one of the above options or an instance of the mellon.util.GaussianProcessType Enum. If a partial match is found with the Enum, a warning will be logged, and the closest match will be used.
jitter (float, optional (default=1e-6)) – A small amount added to the diagonal of the covariance matrix to ensure numerical stability by keeping eigenvalues away from 0.
optimizer (str, optional (default='L-BFGS-B')) – The optimizer to use for maximum a posteriori density estimation. It can be either ‘L-BFGS-B’, ‘adam’, or ‘advi’.
n_iter (int, optional (default=100)) – The number of iterations for optimization.
init_learn_rate (float, optional (default=1)) – The initial learning rate for the optimizer.
landmarks (array-like or None, optional) – Points used to quantize the data for approximate covariance. If None, landmarks are set as k-means centroids with k=n_landmarks. If the number of landmarks is greater than or equal to the number of training points, this parameter is ignored.
k (int, optional (default=10)) – The number of nearest neighbor distances to consider.
distances (array-like or None, optional) – The k nearest neighbor distances at each data point. If None, these distances are computed automatically using KDTree (if data dimensionality is < 20) or BallTree otherwise.
d (array-like, optional) – The estimated local intrinsic dimensionality of the data. This is only used to initialize the density estimation. If None, an empirical estimate is used.
mu_dim (float or None, optional (default=0)) – The mean of the Gaussian process for log intrinsic dimensionality \(\mu_D\).
mu_dens (float or None, optional) – The mean of the Gaussian process for log density \(\mu_\rho\). If None, mu_dens is set to the 1st percentile of \(\text{mle}(\text{nn_distances}, d) - 10\).
ls (float or None, optional) – The length scale for the Gaussian process covariance function. If None (default), the length scale is automatically selected based on a heuristic link between the nearest neighbor distances and the optimal length scale.
ls_factor (float, optional) – A scaling factor applied to the length scale when it’s automatically selected. It is used to manually adjust the automatically chosen length scale for finer control over the model’s sensitivity to variations in the data.
cov_func (mellon.Covaraince or None) – The Gaussian process covariance function as instance of mellon.Covaraince. If None, the covariance function cov_func is automatically generated as cov_func_curry(ls). Defaults to None.
Lp (array-like or None) – A matrix such that \(L_p L_p^\top = \Sigma_p\), where \(\Sigma_p\) is the covariance matrix of the inducing points (all cells in non-sparse GP). Not used when Nyström rank reduction is employed. Defaults to None.
L (array-like or None, optional) – A matrix such that \(L L^\top \approx K\), where \(K\) is the covariance matrix. If None, L is computed automatically.
initial_value (array-like or None, optional) – The initial guess for optimization. If None, an optimized \(z\) is found that minimizes \(||Lz + \mu_\cdot - mle|| + ||z||\), where \(\text{mle}\) is the maximum likelihood estimate for density initialization and the neighborhood-based local intrinsic dimensionality for dimensionality initialization.
predictor_with_uncertainty (bool) –
If set to True, computes the predictor instances .predict and .predict_density with its predictive uncertainty. The uncertainty comes from two sources:
1. .predict.mean_covariance:
  Uncertainty arising from the posterior distribution of the Bayesian inference. This component quantifies uncertainties inherent in the model’s parameters and structure. Available only if .pre_transformation_std is defined (e.g., using optimizer=”advi”), which reflects the standard deviation of the latent variables before transformation.
2. .predict.covariance:
  Uncertainty for out-of-bag states originating from the compressed function representation in the Gaussian Process. Specifically, this uncertainty corresponds to locations that are not inducing points of the Gaussian Process and represents the covariance of the conditional normal distribution.
jit (bool, optional (default=False)) – If True, use JAX’s just-in-time compilation for loss and its gradient during optimization.
check_rank (bool) – Weather to check if landmarks allow sufficient complexity by checking the approximate rank of the covariance matrix. This only applies to the non-Nyström gp_types. If set to None the rank check is only performed if n_landmarks >= n_samples/10. Defaults to None.

fit(x=None, build_predict=True)View on GitHub ¶

Trains the model from start to finish. This includes preparing for inference, running inference, and processing the inference results.

Parameters:

x (array-like of shape (n_samples, n_features), default=None) – The training instances where n_samples is the number of samples and n_features is the number of features.
build_predict (bool, default=True) – Whether or not to construct the prediction function after training.

Returns:

A fitted instance of this estimator.

Return type:

self

fit_predict(x=None, build_predict=False)View on GitHub ¶

Trains the model using the provided training data, and then makes predictions on the trained data points. This function performs Bayesian inference to compute the local dimensionality and returns the computed local dimensionality at each training point.

Parameters:

x (array-like of shape (n_samples, n_features), default=None) – The training instances to estimate the local dimensionality function, where n_samples is the number of samples and n_features is the number of features. Each sample is an array of features representing a point in the feature space.
build_predict (bool, default=False) – Whether or not to build the prediction function after training.

Returns:

The local dimensionality at each training point in x.

Return type:

array-like of shape (n_samples,)

Raises:

ValueError – If the argument x does not match self.x which was already set in a previous operation.

property predictView on GitHub ¶

A property that returns an instance of the mellon.base_predictor.ExpPredictor class. This predictor can be used to predict the dimensionality for new data points by calling the instance like a function.

The predictor instance also supports serialization features, which allow for saving and loading the predictor’s state. For more details, refer to the mellon.base_predictor.ExpPredictor documentation.

Returns:: A predictor instance that computes the dimensionality at each new data point.
Return type:: mellon.base_predictor.ExpPredictor

Example

>>> log_density = model.predict(Xnew)

property predict_densityView on GitHub ¶

A property that returns an instance of the mellon.Predictor class. This predictor can be used to predict the log density for new data points by calling the instance like a function.

The predictor instance also supports serialization features, which allow for saving and loading the predictor’s state. For more details, refer to the mellon.Predictor documentation.

Returns:: A predictor instance that computes the log density at each new data point.
Return type:: mellon.Predictor

Example

>>> log_density = model.predict_density(Xnew)

prepare_inference(x)View on GitHub ¶

Set all attributes in preparation for optimization, but do not perform Bayesian inference. It is not necessary to call this function before calling fit.

Parameters:: x (array-like) – The training instances to estimate density function.
Returns:: loss_func, initial_value - The Bayesian loss function and initial guess for optimization.
Return type:: function, array-like

process_inference(pre_transformation=None, build_predict=True)View on GitHub ¶

Use the optimized parameters to compute the local dimensionality at the training points. If build_predict, also build the prediction function.

Parameters:

pre_transformation (array-like) – The optimized parameters. If None, uses the stored pre_transformation attribute.
build_predict (bool) – Whether or not to build the prediction function. Defaults to True.

Returns:

local_dim_x - The local dimensionality

Return type:

array-like

run_inference(loss_func=None, initial_value=None, optimizer=None)View on GitHub ¶

Parameters:

loss_func (function) – The Bayesian loss function. If None, uses the stored loss_func attribute.
initial_value (array-like) – The initial guess for optimization. If None, uses the stored initial_value attribute.

Returns:

pre_transformation - The optimized parameters.

Return type:

array-like

set_x(x)View on GitHub ¶

Sets the training instances (x) for the model and validates that they are formatted correctly.

Parameters:: x (array-like of shape (n_samples, n_features)) – The training instances where n_samples is the number of samples and n_features is the number of features. Each sample is an array of features representing a point in the feature space.
Returns:: The validated training instances.
Return type:: array-like of shape (n_samples, n_features)
Raises:: ValueError – If the input x is not valid. For instance, x may not be valid if it is not a numerical array-like object, or if its shape does not match the required shape (n_samples, n_features).

class mellon.model.FunctionEstimator(cov_func_curry=<class 'mellon.cov.Matern52'>, n_landmarks=None, gp_type=None, jitter=1e-06, optimizer='L-BFGS-B', n_iter=100, init_learn_rate=0.1, landmarks=None, nn_distances=None, mu=0, ls=None, ls_factor=1, cov_func=None, sigma=0, y_is_mean=False, predictor_with_uncertainty=False, jit=True)View on GitHub ¶

Bases: BaseEstimator

This class implements a Function Estimator that uses a conditional normal distribution to smoothen and extend a function on all cell states using the Mellon abstractions.

Parameters:

cov_func_curry (function or type) – A curry that takes one length scale argument and returns a covariance function of the form k(x, y) \(\rightarrow\) float. Defaults to Matern52.
n_landmarks (int) – The number of landmark/inducing points. Only used if a sparse GP is indicated through gp_type. If 0 or equal to the number of training points, inducing points will not be computed or used. Defaults to 5000.
gp_type (str or GaussianProcessType) –
The type of sparcification used for the Gaussian Process:
- ’full’ None-sparse Gaussian Process
- ’sparse_cholesky’ Sparse GP using landmarks/inducing points,
  typically employed to enable scalable GP models.
The value can be either a string matching one of the above options or an instance of the mellon.util.GaussianProcessType Enum. If a partial match is found with the Enum, a warning will be logged, and the closest match will be used. Defaults to ‘sparse_cholesky’.
jitter (float, optional) – A small amount added to the diagonal of the covariance matrix to ensure numerical stability. Defaults to 1e-6.
landmarks (array-like or None, optional) – Points used to quantize the data for the approximate covariance. If None, landmarks are set as k-means centroids with k=n_landmarks. This is ignored if n_landmarks is greater than or equal to the number of training points. Defaults to None.
nn_distances (array-like or None, optional) – The nearest neighbor distances at each data point. If None, computes the nearest neighbor distances automatically, with a KDTree if the dimensionality of the data is less than 20, or a BallTree otherwise. Defaults to None.
mu (float) – The mean of the Gaussian process \(\mu\). Defaults to 0.
ls (float or None, optional) – The length scale for the Gaussian process covariance function. If None (default), the length scale is automatically selected based on a heuristic link between the nearest neighbor distances and the optimal length scale.
ls_factor (float, optional) – A scaling factor applied to the length scale when it’s automatically selected. It is used to manually adjust the automatically chosen length scale for finer control over the model’s sensitivity to variations in the data.
cov_func (mellon.Covaraince or None) – The Gaussian process covariance function as instance of mellon.Covaraince. If None, the covariance function cov_func is automatically generated as cov_func_curry(ls). Defaults to None.
sigma (float, optional) – The standard deviation of the white noise. Defaults to 0.
y_is_mean (bool) – Wether to consider y the GP mean or a noise measurment subject to sigma or y_cov_factor. Has no effect if L is passed. Defaults to False.
predictor_with_uncertainty (bool) –
If set to True, computes the predictor instance .predict with its predictive uncertainty. The uncertainty comes from two sources:
1. .predict.mean_covariance:
  Uncertainty arising from the input noise sigma.
2. .predict.covariance:
  Uncertainty for out-of-bag states originating from the compressed function representation in the Gaussian Process. Specifically, this uncertainty corresponds to locations that are not inducing points of the Gaussian Process and represents the covariance of the conditional normal distribution.
jit (bool, optional) – Use JAX just-in-time compilation for the loss function and its gradient during optimization. Defaults to False.

compute_conditional(x=None, y=None)View on GitHub ¶

Compute and return the conditional mean function.

Parameters:

x (array-like) – The training instances to estimate density function.
y (array-like) – The training function values on cell states.

Returns:

condition_mean_function - The conditional mean function.

Return type:

array-like

fit(x=None, y=None)View on GitHub ¶

Trains the model using the provided training data and function values. This includes preparing the model for inference and computing the conditional distribution for the given data.

Parameters:

x (array-like of shape (n_samples, n_features), default=None) – The training instances where n_samples is the number of samples and n_features is the number of features. Each sample is an array of features representing a point in the feature space.
y (array-like of shape (n_samples, n_output_features), default=None) – The function values of the training instances. n_samples is the number of samples and n_output_features is the number of function values at each sample.

Raises:

ValueError – If the number of samples in x and y doesn’t match.

Returns:

self – This method returns self for chaining.

Return type:

object

fit_predict(x=None, y=None, Xnew=None)View on GitHub ¶

Trains the model using the provided training data and function values, then makes predictions for new data points. The function computes the conditional mean and returns the smoothed function values at the points Xnew for each column of values in y.

Parameters:

x (array-like of shape (n_samples, n_features), default=None) – The training instances where n_samples is the number of samples and n_features is the number of features. Each sample is an array of features representing a point in the feature space.
y (array-like of shape (n_samples, n_output_features), default=None) – The function values of the training instances. n_samples is the number of samples and n_output_features is the number of function values at each sample.
Xnew (array-like of shape (n_predict_samples, n_features), default=None) – The new data points to make predictions on where n_predict_samples is the number of samples to predict and n_features is the number of features. If not provided, the predictions will be made on the training instances x.

Returns:

The conditional mean function value at each new data point in Xnew. The number of predicted function values at each sample will match the number of output features in y.

Return type:

array-like of shape (n_predict_samples, n_output_features)

Raises:

ValueError – If the number of samples in x and y don’t match, or if the number of features in x and Xnew don’t match.

property predictView on GitHub ¶

A property that returns an instance of the mellon.Predictor class. This predictor can be used to predict the function values for new data points by calling the instance like a function.

The predictor instance also supports serialization features, allowing for saving and loading the predictor’s state. For more details, refer to the mellon.Predictor documentation.

Returns:: A predictor instance that computes the conditional mean function value at each new data point.
Return type:: mellon.Predictor

Example

>>> y_pred = model.predict(Xnew)

prepare_inference(x)View on GitHub ¶

Set all attributes in preparation. It is not necessary to call this function before calling fit.

Parameters:

x (array-like) – The cell states.
y (array-like) – The function values on the cell states.

Returns:

loss_func, initial_value - The Bayesian loss function and initial guess for optimization.

Return type:

function, array-like

set_x(x)View on GitHub ¶

Sets the training instances (x) for the model and validates that they are formatted correctly.

Parameters:: x (array-like of shape (n_samples, n_features)) – The training instances where n_samples is the number of samples and n_features is the number of features. Each sample is an array of features representing a point in the feature space.
Returns:: The validated training instances.
Return type:: array-like of shape (n_samples, n_features)
Raises:: ValueError – If the input x is not valid. For instance, x may not be valid if it is not a numerical array-like object, or if its shape does not match the required shape (n_samples, n_features).

class mellon.model.TimeSensitiveDensityEstimator(cov_func_curry=<class 'mellon.cov.Matern52'>, n_landmarks=None, rank=None, gp_type=None, d_method='embedding', jitter=1e-06, optimizer='L-BFGS-B', n_iter=100, init_learn_rate=0.1, landmarks=None, nn_distances=None, normalize_per_time_point=False, d=None, mu=None, ls=None, ls_time=None, ls_factor=1, ls_time_factor=1, density_estimator_kwargs={}, cov_func=None, Lp=None, L=None, initial_value=None, predictor_with_uncertainty=False, _save_intermediate_ls_times=False, jit=False, check_rank=None)View on GitHub ¶

Bases: BaseEstimator

A class for non-parametric density estimation with time sensitivity. It performs Bayesian inference with a Gaussian process prior and Nearest Neighbor likelihood. All intermediate computations are cached as instance variables, which allows viewing intermediate results and saving computation time by passing precomputed values as arguments to a new model.

Parameters:

cov_func_curry (function or type) – The generator of the Gaussian process covariance function. This must be a curry that takes one length scale argument and returns a covariance function of the form k(x, y) \(\rightarrow\) float. Defaults to Matern52.
n_landmarks (int) – The number of landmark/inducing points. Only used if a sparse GP is indicated through gp_type. If 0 or equal to the number of training points, inducing points will not be computed or used. Defaults to 5000.
rank (int or float) – The rank of the approximate covariance matrix for the Nyström rank reduction. If rank is an int, an \(n \times\) rank matrix \(L\) is computed such that \(L L^\top \approx K\), where K is the exact \(n \times n\) covariance matrix. If rank is a float 0.0 \(\le\) rank \(\le\) 1.0, the rank/size of \(L\) is selected such that the included eigenvalues of the covariance between landmark points account for the specified percentage of the sum of eigenvalues. It is ignored if gp_type does not indicate a Nyström rank reduction. Defaults to 0.99.
gp_type (str or GaussianProcessType) –
The type of sparcification used for the Gaussian Process:
- ’full’ None-sparse Gaussian Process
- ’full_nystroem’ Sparse GP with Nyström rank reduction without landmarks,
  which lowers the computational complexity.
- ’sparse_cholesky’ Sparse GP using landmarks/inducing points,
  typically employed to enable scalable GP models.
- ’sparse_nystroem’ Sparse GP using landmarks or inducing points,
  along with an improved Nyström rank reduction method.
The value can be either a string matching one of the above options or an instance of the mellon.util.GaussianProcessType Enum. If a partial match is found with the Enum, a warning will be logged, and the closest match will be used. Defaults to ‘sparse_cholesky’.
d_method (str) –
The method to compute the intrinsic dimensionality of the data. Implemented options are
- ’embedding’: uses the embedding dimension x.shape[1]
- ’fractal’: uses the average fractal dimension (experimental)
Defaults to ‘embedding’.
jitter (float) – A small amount added to the diagonal of the covariance matrix to bind eigenvalues numerically away from 0, ensuring numerical stability. Defaults to 1e-6.
optimizer (str) – The optimizer for the maximum a posteriori or posterior density estimation. Options are ‘L-BFGS-B’, stochastic optimizer ‘adam’, or automatic differentiation variational inference ‘advi’. Defaults to ‘L-BFGS-B’.
n_iter (int) – The number of optimization iterations. Defaults to 100.
init_learn_rate (float) – The initial learning rate. Defaults to 1.
landmarks (array-like or None) – The points used to quantize the data for the approximate covariance. If None, landmarks are set as k-means centroids with k=n_landmarks. This is ignored if n_landmarks is greater than or equal to the number of training points. Defaults to None.
nn_distances (array-like or None) – The nearest neighbor distances at each data point within each time point. If None, the nearest neighbor distances are computed automatically, using a KDTree if the dimensionality of the data is less than 20, or a BallTree otherwise. Defaults to None.
normalize_per_time_point (bool, list, array-like, or dict, optional) –
Controls the normalization for varying cell counts across time points to adjust for sampling bias by modifying the nearest neighbor distances before inference.
- If True, normalizes to simulate a constant total cell count divided by the number of time points.
- If False, the raw cell counts per time point is reflected in the density estimation.
- If a list or array-like, assumes total cell counts for time points, ordered from earliest to latest.
- If a dict, maps each time point to its total cell count. Must cover all unique time points.
Note: Relative cell counts are sufficient for comparison within dataset; exact numbers are not required.

Note: Ignored if nn_distance is provided; distances are used as-is and this parameter has no effect.

Default is False.

dint, array-like or None

The intrinsic dimensionality of the data, i.e., the dimensionality of the embedded manifold. If None, d is set to the size of axis 1 of the training data points. Defaults to None.

mufloat or None

The mean \(\mu\) of the Gaussian process. If None, sets \(\mu\) to the 1st percentile of \(\text{mle}(\text{nn_distances}, d) - 10\), where \(\text{mle} = \log(\text{gamma}(d/2 + 1)) - (d/2) \cdot \log(\pi) - d \cdot \log(\text{nn_distances})\). Defaults to None.

lsfloat or None, optional

The length scale for the Gaussian process covariance function. If None (default), the length scale is automatically selected based on a heuristic link between the nearest neighbor distances and the optimal length scale.

ls_timefloat or None

The length scale of the Gaussian process covariance function for the time dimension. If None, ls_time is set to the length scale that best induces a covariance (using the cov_func_curry) between the time points that best mimics the Pearson correlation observed between densities of the individual time points. If cov_func is supplied explicitly, ls_time has no effect. Defaults to None.

ls_factorfloat, optional

A scaling factor applied to the length scale when it’s automatically selected. It is used to manually adjust the automatically chosen length

ls_time_factorfloat, optional

A scaling factor applied to the time length scale (ls_time) when it’s automatically selected. This allows for manual adjustment of the automatically determined time length scale. Defaults to 1.

density_estimator_kwargsdict, optional

A dictionary of keyword arguments to be passed for timepoint-specific density estimation during the automatic selection of the time length scale ls_time. This parameter allows custom configuration for the density estimation process. Note that this parameter has no effect if ls_time is specified explicitly. Default is an empty dictionary ({}).

cov_funcmellon.Covariance or None

The Gaussian process covariance function as instance of mellon.Covaraince. If None, the covariance function cov_func is automatically generated as cov_func_curry(ls, active_dims=slice(None, -1)) * cov_func_curry(ls_time, active_dims=-1). Defaults to None.

Lparray-like or None

A matrix such that \(L_p L_p^\top = \Sigma_p\), where \(\Sigma_p\) is the covariance matrix of the inducing points (all cells in non-sparse GP). Not used when Nyström rank reduction is employed. Defaults to None.

Larray-like or None

A matrix such that \(L L^\top \approx K\), where \(K\) is the covariance matrix. If None, L is computed automatically. Defaults to None.

initial_valuearray-like or None

The initial guess for optimization. If None, the value \(z\) that minimizes \(||Lz + \mu - mle|| + ||z||\) is found, where \(\text{mle} = \log(\text{gamma}(d/2 + 1)) - (d/2) \cdot \log(\pi) - d \cdot \log(\text{nn_distances})\) and \(d\) is the intrinsic dimensionality of the data. Defaults to None.

predictor_with_uncertaintybool

If set to True, computes the predictor instance .predict with its predictive uncertainty. The uncertainty comes from two sources:

.predict.mean_covariance:
Uncertainty arising from the posterior distribution of the Bayesian inference. This component quantifies uncertainties inherent in the model’s parameters and structure. Available only if .pre_transformation_std is defined (e.g., using optimizer=”advi”), which reflects the standard deviation of the latent variables before transformation.
.predict.covariance:
Uncertainty for out-of-bag states originating from the compressed function representation in the Gaussian Process. Specifically, this uncertainty corresponds to locations that are not inducing points of the Gaussian Process and represents the covariance of the conditional normal distribution.

_save_intermediate_ls_timesbool

Determines whether the intermediate results obtained during the computation of ls_time are retained for debugging. When set to True, the results will be stored in self.densities, self.predictors, and self.numeric_stages. Defaults to False.

jitbool

Use jax just-in-time compilation for loss and its gradient during optimization. Defaults to False.

check_rankbool

Weather to check if landmarks allow sufficient complexity by checking the approximate rank of the covariance matrix. This only applies to the non-Nyström gp_types. If set to None the rank check is only performed if n_landmarks >= n_samples/10. Defaults to None.

fit(x=None, times=None, build_predict=True)View on GitHub ¶

Fit the model from end to end.

Parameters:

x (array-like, optional) – The training instances to estimate density function. If ‘x’ is not provided and ‘self.x’ is also None, a ValueError is raised.
times (array-like, optional) – An array encoding the time points associated with each cell/row in ‘x’. Shape must be either (n_samples,) or (n_samples, 1).
build_predict (bool, optional) – Whether or not to build the prediction function. Defaults to True.

Returns:

self – A fitted instance of this estimator.

Return type:

object

Raises:

ValueError – If both ‘x’ and ‘self.x’ are None or if ‘x’ is provided and not equal to ‘self.x’.

fit_predict(x=None, times=None, build_predict=False)View on GitHub ¶

Perform Bayesian inference and return the log density at training points.

Parameters:: x (array-like) – The training instances to estimate density function.
Returns:: log_density_x - The log density at each training point in x.

property predictView on GitHub ¶

An instance of the mellon.base_predictor.PredictorTime that predicts the log density at each point in x and time point time.

The instance contains a __call__ method which can be used to predict the log density. This instance also supports serialization features which allows for saving and loading the predictor state. Refer to mellon.base_predictor.PredictorTime documentation for more details.

Returns:: A predictor instance that computes the log density at each new data point.
Return type:: mellon.base_predictor.PredictorTime

Example

>>> log_density = model.predict(Xnew)

prepare_inference(x, times=None)View on GitHub ¶

Prepares for optimization without performing Bayesian inference. This method sets all attributes required for optimization. It is not required to call this method manually before calling fit.

Parameters:

x (array-like) – The training instances for which the density function will be estimated. If ‘times’ is None, the last column of ‘x’ is interpreted as the times. Shape must be (n_samples, n_features).
times (array-like, optional) – An array encoding the time points associated with each cell/row in ‘x’. If provided, it overrides the last column of ‘x’ as the times. Shape must be either (n_samples,) or (n_samples, 1).

Returns:

loss_func (function) – The Bayesian loss function that will be minimized during optimization.
initial_value (array-like) – The initial guess for the optimization process.

process_inference(pre_transformation=None, build_predict=True)View on GitHub ¶

Use the optimized parameters to compute the log density at the training points. If build_predict, also build the prediction function.

Parameters:

pre_transformation (array-like) – The optimized parameters. If None, uses the stored pre_transformation attribute.
build_predict (bool) – Whether or not to build the prediction function. Defaults to True.

Returns:

log_density_x - The log density

Return type:

array-like

run_inference(loss_func=None, initial_value=None, optimizer=None)View on GitHub ¶

Parameters:

loss_func (function) – The Bayesian loss function. If None, uses the stored loss_func attribute.
initial_value (array-like) – The initial guess for optimization. If None, uses the stored initial_value attribute.

Returns:

pre_transformation - The optimized parameters.

Return type:

array-like

set_x(x)View on GitHub ¶

Sets the training instances (x) for the model and validates that they are formatted correctly.

Parameters:: x (array-like of shape (n_samples, n_features)) – The training instances where n_samples is the number of samples and n_features is the number of features. Each sample is an array of features representing a point in the feature space.
Returns:: The validated training instances.
Return type:: array-like of shape (n_samples, n_features)
Raises:: ValueError – If the input x is not valid. For instance, x may not be valid if it is not a numerical array-like object, or if its shape does not match the required shape (n_samples, n_features).