leaspy.models.obs_models¶

Attributes¶

`OBSERVATION_MODELS`
`ObservationModelFactoryInput`

Classes¶

`ObservationModel`	Base class for valid observation models that may be used in probabilistic models (stateless).
`BernoulliObservationModel`	Observation model for binary outcomes using a Bernoulli distribution.
`ObservationModelNames`	Enumeration defining the possible names for observation models.
`FullGaussianObservationModel`	Specialized GaussianObservationModel when all data share the same observation model, with default naming.
`GaussianObservationModel`	Specialized ObservationModel for noisy observations with Gaussian residuals assumption.
`AbstractWeibullRightCensoredObservationModel`	Base class for valid observation models that may be used in probabilistic models (stateless).
`WeibullRightCensoredObservationModel`	Base class for valid observation models that may be used in probabilistic models (stateless).
`WeibullRightCensoredWithSourcesObservationModel`	Base class for valid observation models that may be used in probabilistic models (stateless).

Functions¶

observation_model_factory(model, **kwargs)

Factory for observation models.

Package Contents¶

class ObservationModel[source]¶

Base class for valid observation models that may be used in probabilistic models (stateless).

In particular, it provides data & linked variables regarding observations and their attachment to the model (the negative log-likelihood - nll - to be minimized).

Parameters:

namestr: The name of observed variable (to name the data variable & attachment term related to this observation).
getterfunction Dataset -> WeightedTensor: The way to retrieve the observed values from the Dataset (as a WeightedTensor): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, …
distSymbolicDistribution: The symbolic distribution, parametrized by model variables, for observed values (so to compute attachment).
extra_varsNone (default) or Mapping[VarName, VariableInterface]: Some new variables that are needed to fully define the symbolic distribution or the sufficient statistics. (e.g. “noise_std”, and “y_L2_per_ft” for instance for a Gaussian model)

name: VariableName¶

getter: Callable[[Dataset], WeightedTensor]¶

dist: SymbolicDistribution¶

extra_vars: Mapping[VariableName, VariableInterface] | None = None¶

get_nll_attach_var_name(named_attach_vars=True)[source]¶

Return the name of the negative log likelihood attachement variable.

Parameters:: named_attach_vars (bool)
Return type:: str

get_variables_specs(named_attach_vars=True)[source]¶

Automatic specifications of variables for this observation model.

Parameters:

named_attached_vars ::obj:`bool`, optional

Returns:

dict [ VariableName, VariableInterface]

A dictionary mapping variable name to their correspondind specifications with - the primary DaraVariable - any extra_vars defined by the model - nll attachment variables :

nll_attach_var_ind: a LinkedVariable representing the individual-level

negative log-likelihood contributions - nll_attach_var: a LinkedVariable that sums the individual contributions

Parameters:

named_attach_vars (bool)

Return type:

dict[VariableName, VariableInterface]

Notes

The distribution object self.dist`should provide a `get_func_nll(name) method that returns a callable for computing the nll

serialized()[source]¶

Returns a JSON-exportable representation of the instance, excluding its name.

Returns:

Any: A representation of the instance, currently based on repr(self.dist), that is intended to be JSON-serializable.

Return type:

Any

to_dict()[source]¶

To be implemented…

Return type:: dict

to_string()[source]¶

Returns a string representation of the parameter for saving

Returns:

str: A string representation of the parameter, as stored in self.string_for_json.

Return type:

str

class BernoulliObservationModel(**extra_vars)[source]¶

Bases: leaspy.models.obs_models._base.ObservationModel

Observation model for binary outcomes using a Bernoulli distribution.

This model expects binary-valued observations and uses a Bernoulli distribution to define the likelihood. It assumes the response variable is named “y”.

Parameters:

**extra_varsVariableInterface: Optional extra variables required by the model. These are passed to the parent ObservationModel class and can be used for conditioning the likelihood.

Attributes:

string_for_jsonstr: A static string identifier used for serialization.

Parameters:

extra_vars (VariableInterface)

string_for_json = 'bernoulli'¶

static y_getter(dataset)[source]¶

Extracts and validates the observation values and associated mask from a dataset.

Parameters:

datasetDataset: A dataset object containing values and mask attributes.

Returns:

WeightedTensor: A tensor containing the observed binary values along with a boolean mask indicating which entries are valid.

Raises:

ValueError: If either dataset.values or dataset.mask is None, indicating that the dataset is improperly initialized.

Parameters:

dataset (Dataset)

Return type:

WeightedTensor

OBSERVATION_MODELS: Dict[ObservationModelNames, Type[leaspy.models.obs_models._base.ObservationModel]]¶

ObservationModelFactoryInput¶

class ObservationModelNames(*args, **kwds)[source]¶

Bases: enum.Enum

Enumeration defining the possible names for observation models.

GAUSSIAN_DIAGONAL = 'gaussian-diagonal'¶

GAUSSIAN_SCALAR = 'gaussian-scalar'¶

BERNOULLI = 'bernoulli'¶

WEIBULL_RIGHT_CENSORED = 'weibull-right-censored'¶

WEIBULL_RIGHT_CENSORED_WITH_SOURCES = 'weibull-right-censored-with-sources'¶

classmethod from_string(model_name)[source]¶

Parameters:: model_name (str)

observation_model_factory(model, **kwargs)[source]¶

Factory for observation models.

Parameters:

modelstr or ObservationModel or dict [ str, …]

If an instance of a subclass of ObservationModel, returns the instance.
If a string, then returns a new instance of the appropriate class (with optional parameters kws).
If a dictionary, it must contain the ‘name’ key and other initialization parameters.

**kwargs

Optional parameters for initializing the requested observation model when a string.

Returns:

ObservationModel: The desired observation model.

Raises:

LeaspyModelInputError: If model is not supported.

Parameters:

model (ObservationModelFactoryInput)

Return type:

leaspy.models.obs_models._base.ObservationModel

class FullGaussianObservationModel(noise_std, **extra_vars)[source]¶

Bases: GaussianObservationModel

Specialized GaussianObservationModel when all data share the same observation model, with default naming.

The default naming is:

‘y’ for observations
‘model’ for model predictions
‘noise_std’ for scale of residuals

We also provide a convenient factory default for most common case, which corresponds to noise_std directly being a ModelParameter (it could also be a PopulationLatentVariable with positive support). Whether scale of residuals is scalar or diagonal depends on the dimension argument of this method.

Parameters:

noise_std (VariableInterface)
extra_vars (VariableInterface)

tol_noise_variance = 1e-05¶

static y_getter(dataset)[source]¶

Extracts the observation values and mask from a dataset.

Parameters:

datasetDataset: A dataset object containing ‘values’ and ‘mask’ attributes

Returns:

WeightedTensor: A tensor containing the observed values and a boolean mask used as weights for likekelihood and loss computations

Raises:

AssertionError: If either dataset.values`or `dataset.mask`is `None.

Parameters:

dataset (Dataset)

Return type:

WeightedTensor

classmethod noise_std_suff_stats()[source]¶

Dictionary of sufficient statistics needed for noise_std (when directly a model parameter).

Returns:

dict [ VariableName, LinkedVariable]

A dictionary containing the sufficient statistics:

“y_x_model”: Product of the observed values (“y”) and the model predictions (“model”).
“model_x_model”: Squared values of the model predictions (“model”).

Return type:

dict[VariableName, LinkedVariable]

classmethod scalar_noise_std_update(*, state, y_x_model, model_x_model)[source]¶

Update rule for scalar noise_std (when directly a model parameter), from state & sufficient statistics.

Computes a common noise_std for all the features

Parameters:

state: :class:`State`

A state dictionary containing precomputed values

y_x_modelWeightedTensor[float]: The weighted inner product between the observations and the model predictions.

model_x_modelWeightedTensor[float]

The weighted inner product of the model predictions with themselves.

Returns:

torch.Tensor: The updated scalar value of the noise_std.

Parameters:

state (State)
y_x_model (WeightedTensor[float])
model_x_model (WeightedTensor[float])

Return type:

Tensor

classmethod diagonal_noise_std_update(*, state, y_x_model, model_x_model)[source]¶

Update rule for feature-wise noise_std (when directly a model parameter), from state & sufficient statistics.

Computes one noise_std per feature.

Parameters:

state: :class:`State`: A state dictionary containing precomputed values
y_x_modelWeightedTensor`[:obj:`float]: The weighted inner product between the observations and the model predictions.
model_x_modelWeightedTensor`[:obj:`float]: The weighted inner product of the model predictions with themselves.

Returns:

torch.Tensor: The updated value of the noise_std for each feature.

Parameters:

state (State)
y_x_model (WeightedTensor[float])
model_x_model (WeightedTensor[float])

Return type:

Tensor

classmethod noise_std_specs(dimension)[source]¶

Default specifications of noise_std variable when directly modelled as a parameter (no latent population variable).

Parameters:

dimensionint: The dimension of the noise_std. - If dimension == 1, a scalar noise_std deviation is assumed. - If dimension > 1, feature-wise independent noise_std deviations are assumed (diagonal noise).

Returns:

ModelParameter: The specification of the noise_std, including: - shape: tuple defining the parameter shape. - suff_stats: collected sufficient statistics needed for updates. - update_rule: method to update the parameter based on statistics.

Parameters:

dimension (int)

Return type:

ModelParameter

classmethod with_noise_std_as_model_parameter(dimension)[source]¶

Default instance of FullGaussianObservationModel with noise_std (scalar or diagonal depending on dimension) being a ModelParameter.

Parameters:

dimensionint: The dimension of the noise_std. - If dimension == 1, a scalar noise_std is assumed. - If dimension > 1, feature-wise independent noise_std deviations are assumed (diagonal noise).

Returns:

FullGaussianObservationModel: A configured instance with noise_std as a ModelParameter, along with the necessary sufficient statistics for inference.

Raises:

ValueError: If dimension is not a positive integer.

Parameters:

dimension (int)

classmethod compute_rmse(*, y, model)[source]¶

Computes the Root Mean Square Error (RMSE) between predictions and observations.

Parameters:

yWeightedTensor`[:obj:`float]: The observed target values with associated weights.
modelWeightedTensor`[:obj:`float]: The model predictions with the same shape and weighting scheme as y.

Returns:

torch.Tensor: A scalar tensor representing the RMSE between model and y.

Parameters:

y (WeightedTensor[float])
model (WeightedTensor[float])

Return type:

Tensor

classmethod compute_rmse_per_ft(*, y, model)[source]¶

Computes the Root Mean Square Error (RMSE) between predictions and observations separately for each feature.

Parameters:

yWeightedTensor`[:obj:`float]: The observed target values with associated weights.
modelWeightedTensor`[:obj:`float]: The model predictions with the same shape and weighting scheme as y.

Returns:

torch.Tensor: A scalar tensor representing the RMSE between model and y.

Parameters:

y (WeightedTensor[float])
model (WeightedTensor[float])

Return type:

Tensor

to_string()[source]¶

method for parameter saving

Return type:: str

class GaussianObservationModel(name, getter, loc, scale, **extra_vars)[source]¶

Bases: leaspy.models.obs_models._base.ObservationModel

Specialized ObservationModel for noisy observations with Gaussian residuals assumption.

Parameters:

namestr: The name of observed variable (to name the data variable & attachment term related to this observation).
getterfunction Dataset -> WeightedTensor: The way to retrieve the observed values from the Dataset (as a WeightedTensor): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, …
locstr: The name of the variable representing the mean (location) of the Gaussian
scalestr: The name of the variable representing the standard deviation (scale) of the Gaussian (noise_std)
**extra_varsVariableInterface: Additional variables required by the model

Parameters:

name (VariableName)
getter (Callable[[Dataset], WeightedTensor])
loc (VariableName)
scale (VariableName)
extra_vars (VariableInterface)

Notes

The model uses leaspy.variables.distributions.Normal internally for computing the log-likelihood and related operations.

class AbstractWeibullRightCensoredObservationModel[source]¶

Bases: leaspy.models.obs_models._base.ObservationModel

Base class for valid observation models that may be used in probabilistic models (stateless).

In particular, it provides data & linked variables regarding observations and their attachment to the model (the negative log-likelihood - nll - to be minimized).

Parameters:

namestr: The name of observed variable (to name the data variable & attachment term related to this observation).
getterfunction Dataset -> WeightedTensor: The way to retrieve the observed values from the Dataset (as a WeightedTensor): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, …
distSymbolicDistribution: The symbolic distribution, parametrized by model variables, for observed values (so to compute attachment).
extra_varsNone (default) or Mapping[VarName, VariableInterface]: Some new variables that are needed to fully define the symbolic distribution or the sufficient statistics. (e.g. “noise_std”, and “y_L2_per_ft” for instance for a Gaussian model)

static getter(dataset)[source]¶

Parameters:: dataset (Dataset)
Return type:: WeightedTensor

get_variables_specs(named_attach_vars=True)[source]¶

Automatic specifications of variables for this observation model.

Parameters:: named_attach_vars (bool)
Return type:: dict[VariableName, VariableInterface]

class WeibullRightCensoredObservationModel(nu, rho, xi, tau, **extra_vars)[source]¶

Bases: AbstractWeibullRightCensoredObservationModel

Base class for valid observation models that may be used in probabilistic models (stateless).

In particular, it provides data & linked variables regarding observations and their attachment to the model (the negative log-likelihood - nll - to be minimized).

Parameters:

namestr: The name of observed variable (to name the data variable & attachment term related to this observation).
getterfunction Dataset -> WeightedTensor: The way to retrieve the observed values from the Dataset (as a WeightedTensor): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, …
distSymbolicDistribution: The symbolic distribution, parametrized by model variables, for observed values (so to compute attachment).
extra_varsNone (default) or Mapping[VarName, VariableInterface]: Some new variables that are needed to fully define the symbolic distribution or the sufficient statistics. (e.g. “noise_std”, and “y_L2_per_ft” for instance for a Gaussian model)

Parameters:

nu (VariableName)
rho (VariableName)
xi (VariableName)
tau (VariableName)
extra_vars (VariableInterface)

string_for_json = 'weibull-right-censored'¶

classmethod default_init(**kwargs)[source]¶

class WeibullRightCensoredWithSourcesObservationModel(nu, rho, xi, tau, survival_shifts, **extra_vars)[source]¶

Bases: AbstractWeibullRightCensoredObservationModel

Base class for valid observation models that may be used in probabilistic models (stateless).

In particular, it provides data & linked variables regarding observations and their attachment to the model (the negative log-likelihood - nll - to be minimized).

Parameters:

namestr: The name of observed variable (to name the data variable & attachment term related to this observation).
getterfunction Dataset -> WeightedTensor: The way to retrieve the observed values from the Dataset (as a WeightedTensor): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, …
distSymbolicDistribution: The symbolic distribution, parametrized by model variables, for observed values (so to compute attachment).
extra_varsNone (default) or Mapping[VarName, VariableInterface]: Some new variables that are needed to fully define the symbolic distribution or the sufficient statistics. (e.g. “noise_std”, and “y_L2_per_ft” for instance for a Gaussian model)

Parameters:

nu (VariableName)
rho (VariableName)
xi (VariableName)
tau (VariableName)
survival_shifts (VariableName)
extra_vars (VariableInterface)

string_for_json = 'weibull-right-censored-with-sources'¶

classmethod default_init(**kwargs)[source]¶