leaspy.models.obs_models ======================== .. py:module:: leaspy.models.obs_models Attributes ---------- .. autoapisummary:: leaspy.models.obs_models.OBSERVATION_MODELS leaspy.models.obs_models.ObservationModelFactoryInput Classes ------- .. autoapisummary:: leaspy.models.obs_models.ObservationModel leaspy.models.obs_models.BernoulliObservationModel leaspy.models.obs_models.ObservationModelNames leaspy.models.obs_models.FullGaussianObservationModel leaspy.models.obs_models.GaussianObservationModel leaspy.models.obs_models.AbstractWeibullRightCensoredObservationModel leaspy.models.obs_models.WeibullRightCensoredObservationModel leaspy.models.obs_models.WeibullRightCensoredWithSourcesObservationModel Functions --------- .. autoapisummary:: leaspy.models.obs_models.observation_model_factory Package Contents ---------------- .. py:class:: ObservationModel Base class for valid observation models that may be used in probabilistic models (stateless). In particular, it provides data & linked variables regarding observations and their attachment to the model (the negative log-likelihood - nll - to be minimized). :Parameters: **name** : :obj:`str` The name of observed variable (to name the data variable & attachment term related to this observation). **getter** : function :class:`.Dataset` -> :class:`.WeightedTensor` The way to retrieve the observed values from the :class:`.Dataset` (as a :class:`.WeightedTensor`): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, ... **dist** : :class:`.SymbolicDistribution` The symbolic distribution, parametrized by model variables, for observed values (so to compute attachment). **extra_vars** : None (default) or Mapping[VarName, :class:`.VariableInterface`] Some new variables that are needed to fully define the symbolic distribution or the sufficient statistics. (e.g. "noise_std", and "y_L2_per_ft" for instance for a Gaussian model) .. !! processed by numpydoc !! .. py:attribute:: name :type: leaspy.variables.specs.VariableName .. py:attribute:: getter :type: Callable[[leaspy.io.data.dataset.Dataset], leaspy.utils.weighted_tensor.WeightedTensor] .. py:attribute:: dist :type: leaspy.variables.distributions.SymbolicDistribution .. py:attribute:: extra_vars :type: Optional[Mapping[leaspy.variables.specs.VariableName, leaspy.variables.specs.VariableInterface]] :value: None .. py:method:: get_nll_attach_var_name(named_attach_vars = True) Return the name of the negative log likelihood attachement variable. .. !! processed by numpydoc !! .. py:method:: get_variables_specs(named_attach_vars = True) Automatic specifications of variables for this observation model. :Parameters: **named_attached_vars ::obj:`bool`, optional** .. :Returns: :obj:`dict` [ :class:`~leaspy.variables.specs.VariableName`, :class:`~leaspy.variables.specs.VariableInterface`] A dictionary mapping variable name to their correspondind specifications with - the primary DaraVariable - any `extra_vars` defined by the model - nll attachment variables : - nll_attach_var_ind: a :class:`~leaspy.variables.specs.LinkedVariable` representing the individual-level negative log-likelihood contributions - nll_attach_var: a :class:`~leaspy.variables.specs.LinkedVariable` that sums the individual contributions .. rubric:: Notes The distribution object `self.dist`should provide a `get_func_nll(name)` method that returns a callable for computing the nll .. !! processed by numpydoc !! .. py:method:: serialized() Returns a JSON-exportable representation of the instance, excluding its name. :Returns: Any A representation of the instance, currently based on `repr(self.dist)`, that is intended to be JSON-serializable. .. !! processed by numpydoc !! .. py:method:: to_dict() To be implemented... .. !! processed by numpydoc !! .. py:method:: to_string() Returns a string representation of the parameter for saving :Returns: :obj:`str` A string representation of the parameter, as stored in `self.string_for_json`. .. !! processed by numpydoc !! .. py:class:: BernoulliObservationModel(**extra_vars) Bases: :py:obj:`leaspy.models.obs_models._base.ObservationModel` Observation model for binary outcomes using a Bernoulli distribution. This model expects binary-valued observations and uses a Bernoulli distribution to define the likelihood. It assumes the response variable is named `"y"`. :Parameters: **\*\*extra_vars** : VariableInterface Optional extra variables required by the model. These are passed to the parent `ObservationModel` class and can be used for conditioning the likelihood. :Attributes: **string_for_json** : :obj:`str` A static string identifier used for serialization. .. !! processed by numpydoc !! .. py:attribute:: string_for_json :value: 'bernoulli' .. py:method:: y_getter(dataset) :staticmethod: Extracts and validates the observation values and associated mask from a dataset. :Parameters: **dataset** : :class:`.Dataset` A dataset object containing `values` and `mask` attributes. :Returns: :class:`.WeightedTensor` A tensor containing the observed binary values along with a boolean mask indicating which entries are valid. :Raises: ValueError If either `dataset.values` or `dataset.mask` is `None`, indicating that the dataset is improperly initialized. .. !! processed by numpydoc !! .. py:data:: OBSERVATION_MODELS :type: Dict[ObservationModelNames, Type[leaspy.models.obs_models._base.ObservationModel]] .. py:data:: ObservationModelFactoryInput .. py:class:: ObservationModelNames(*args, **kwds) Bases: :py:obj:`enum.Enum` Enumeration defining the possible names for observation models. .. !! processed by numpydoc !! .. py:attribute:: GAUSSIAN_DIAGONAL :value: 'gaussian-diagonal' .. py:attribute:: GAUSSIAN_SCALAR :value: 'gaussian-scalar' .. py:attribute:: BERNOULLI :value: 'bernoulli' .. py:attribute:: WEIBULL_RIGHT_CENSORED :value: 'weibull-right-censored' .. py:attribute:: WEIBULL_RIGHT_CENSORED_WITH_SOURCES :value: 'weibull-right-censored-with-sources' .. py:method:: from_string(model_name) :classmethod: .. py:function:: observation_model_factory(model, **kwargs) Factory for observation models. :Parameters: **model** : :obj:`str` or :class:`.ObservationModel` or :obj:`dict` [ :obj:`str`, ...] - If an instance of a subclass of :class:`.ObservationModel`, returns the instance. - If a string, then returns a new instance of the appropriate class (with optional parameters `kws`). - If a dictionary, it must contain the 'name' key and other initialization parameters. **\*\*kwargs** Optional parameters for initializing the requested observation model when a string. :Returns: :class:`.ObservationModel` The desired observation model. :Raises: :exc:`.LeaspyModelInputError` If `model` is not supported. .. !! processed by numpydoc !! .. py:class:: FullGaussianObservationModel(noise_std, **extra_vars) Bases: :py:obj:`GaussianObservationModel` Specialized `GaussianObservationModel` when all data share the same observation model, with default naming. The default naming is: - 'y' for observations - 'model' for model predictions - 'noise_std' for scale of residuals We also provide a convenient factory `default` for most common case, which corresponds to `noise_std` directly being a `ModelParameter` (it could also be a `PopulationLatentVariable` with positive support). Whether scale of residuals is scalar or diagonal depends on the `dimension` argument of this method. .. !! processed by numpydoc !! .. py:attribute:: tol_noise_variance :value: 1e-05 .. py:method:: y_getter(dataset) :staticmethod: Extracts the observation values and mask from a dataset. :Parameters: **dataset** : :class:`.Dataset` A dataset object containing 'values' and 'mask' attributes :Returns: :class:`.WeightedTensor` A tensor containing the observed values and a boolean mask used as weights for likekelihood and loss computations :Raises: AssertionError If either `dataset.values`or `dataset.mask`is `None`. .. !! processed by numpydoc !! .. py:method:: noise_std_suff_stats() :classmethod: Dictionary of sufficient statistics needed for `noise_std` (when directly a model parameter). :Returns: :obj:`dict` [ :class:`~leaspy.variables.specs.VariableName`, :class:`~leaspy.variables.specs.LinkedVariable`] A dictionary containing the sufficient statistics: - `"y_x_model"`: Product of the observed values (`"y"`) and the model predictions (`"model"`). - `"model_x_model"`: Squared values of the model predictions (`"model"`). .. !! processed by numpydoc !! .. py:method:: scalar_noise_std_update(*, state, y_x_model, model_x_model) :classmethod: Update rule for scalar `noise_std` (when directly a model parameter), from state & sufficient statistics. Computes a common `noise_std` for all the features :Parameters: **state: :class:`State`** A state dictionary containing precomputed values y_x_model : WeightedTensor[float] The weighted inner product between the observations and the model predictions. **model_x_model** : WeightedTensor[float] The weighted inner product of the model predictions with themselves. :Returns: :class:`torch.Tensor` The updated scalar value of the `noise_std`. .. !! processed by numpydoc !! .. py:method:: diagonal_noise_std_update(*, state, y_x_model, model_x_model) :classmethod: Update rule for feature-wise `noise_std` (when directly a model parameter), from state & sufficient statistics. Computes one `noise_std` per feature. :Parameters: **state: :class:`State`** A state dictionary containing precomputed values **y_x_model** : :class:`.WeightedTensor`[:obj:`float`] The weighted inner product between the observations and the model predictions. **model_x_model** : :class:`.WeightedTensor`[:obj:`float`] The weighted inner product of the model predictions with themselves. :Returns: :class:`torch.Tensor` The updated value of the `noise_std` for each feature. .. !! processed by numpydoc !! .. py:method:: noise_std_specs(dimension) :classmethod: Default specifications of `noise_std` variable when directly modelled as a parameter (no latent population variable). :Parameters: **dimension** : :obj:`int` The dimension of the `noise_std`. - If `dimension == 1`, a scalar `noise_std` deviation is assumed. - If `dimension > 1`, feature-wise independent `noise_std` deviations are assumed (diagonal noise). :Returns: ModelParameter The specification of the `noise_std`, including: - `shape`: tuple defining the parameter shape. - `suff_stats`: collected sufficient statistics needed for updates. - `update_rule`: method to update the parameter based on statistics. .. !! processed by numpydoc !! .. py:method:: with_noise_std_as_model_parameter(dimension) :classmethod: Default instance of `FullGaussianObservationModel` with `noise_std` (scalar or diagonal depending on `dimension`) being a `ModelParameter`. :Parameters: **dimension** : :obj:`int` The dimension of the `noise_std`. - If `dimension == 1`, a scalar `noise_std` is assumed. - If `dimension > 1`, feature-wise independent `noise_std` deviations are assumed (diagonal noise). :Returns: FullGaussianObservationModel A configured instance with `noise_std` as a `ModelParameter`, along with the necessary sufficient statistics for inference. :Raises: ValueError If `dimension` is not a positive integer. .. !! processed by numpydoc !! .. py:method:: compute_rmse(*, y, model) :classmethod: Computes the Root Mean Square Error (RMSE) between predictions and observations. :Parameters: **y** : :class:`.WeightedTensor`[:obj:`float`] The observed target values with associated weights. **model** : :class:`.WeightedTensor`[:obj:`float`] The model predictions with the same shape and weighting scheme as `y`. :Returns: :class:`torch.Tensor` A scalar tensor representing the RMSE between `model` and `y`. .. !! processed by numpydoc !! .. py:method:: compute_rmse_per_ft(*, y, model) :classmethod: Computes the Root Mean Square Error (RMSE) between predictions and observations separately for each feature. :Parameters: **y** : :class:`.WeightedTensor`[:obj:`float`] The observed target values with associated weights. **model** : :class:`.WeightedTensor`[:obj:`float`] The model predictions with the same shape and weighting scheme as `y`. :Returns: :class:`torch.Tensor` A scalar tensor representing the RMSE between `model` and `y`. .. !! processed by numpydoc !! .. py:method:: to_string() method for parameter saving .. !! processed by numpydoc !! .. py:class:: GaussianObservationModel(name, getter, loc, scale, **extra_vars) Bases: :py:obj:`leaspy.models.obs_models._base.ObservationModel` Specialized `ObservationModel` for noisy observations with Gaussian residuals assumption. :Parameters: **name** : :obj:`str` The name of observed variable (to name the data variable & attachment term related to this observation). **getter** : function :class:`.Dataset` -> :class:`.WeightedTensor` The way to retrieve the observed values from the :class:`.Dataset` (as a :class:`.WeightedTensor`): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, ... **loc** : :obj:`str` The name of the variable representing the mean (location) of the Gaussian **scale** : :obj:`str` The name of the variable representing the standard deviation (scale) of the Gaussian (`noise_std`) **\*\*extra_vars** : VariableInterface Additional variables required by the model .. rubric:: Notes - The model uses `leaspy.variables.distributions.Normal` internally for computing the log-likelihood and related operations. .. !! processed by numpydoc !! .. py:class:: AbstractWeibullRightCensoredObservationModel Bases: :py:obj:`leaspy.models.obs_models._base.ObservationModel` Base class for valid observation models that may be used in probabilistic models (stateless). In particular, it provides data & linked variables regarding observations and their attachment to the model (the negative log-likelihood - nll - to be minimized). :Parameters: **name** : :obj:`str` The name of observed variable (to name the data variable & attachment term related to this observation). **getter** : function :class:`.Dataset` -> :class:`.WeightedTensor` The way to retrieve the observed values from the :class:`.Dataset` (as a :class:`.WeightedTensor`): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, ... **dist** : :class:`.SymbolicDistribution` The symbolic distribution, parametrized by model variables, for observed values (so to compute attachment). **extra_vars** : None (default) or Mapping[VarName, :class:`.VariableInterface`] Some new variables that are needed to fully define the symbolic distribution or the sufficient statistics. (e.g. "noise_std", and "y_L2_per_ft" for instance for a Gaussian model) .. !! processed by numpydoc !! .. py:method:: getter(dataset) :staticmethod: .. py:method:: get_variables_specs(named_attach_vars = True) Automatic specifications of variables for this observation model. .. !! processed by numpydoc !! .. py:class:: WeibullRightCensoredObservationModel(nu, rho, xi, tau, **extra_vars) Bases: :py:obj:`AbstractWeibullRightCensoredObservationModel` Base class for valid observation models that may be used in probabilistic models (stateless). In particular, it provides data & linked variables regarding observations and their attachment to the model (the negative log-likelihood - nll - to be minimized). :Parameters: **name** : :obj:`str` The name of observed variable (to name the data variable & attachment term related to this observation). **getter** : function :class:`.Dataset` -> :class:`.WeightedTensor` The way to retrieve the observed values from the :class:`.Dataset` (as a :class:`.WeightedTensor`): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, ... **dist** : :class:`.SymbolicDistribution` The symbolic distribution, parametrized by model variables, for observed values (so to compute attachment). **extra_vars** : None (default) or Mapping[VarName, :class:`.VariableInterface`] Some new variables that are needed to fully define the symbolic distribution or the sufficient statistics. (e.g. "noise_std", and "y_L2_per_ft" for instance for a Gaussian model) .. !! processed by numpydoc !! .. py:attribute:: string_for_json :value: 'weibull-right-censored' .. py:method:: default_init(**kwargs) :classmethod: .. py:class:: WeibullRightCensoredWithSourcesObservationModel(nu, rho, xi, tau, survival_shifts, **extra_vars) Bases: :py:obj:`AbstractWeibullRightCensoredObservationModel` Base class for valid observation models that may be used in probabilistic models (stateless). In particular, it provides data & linked variables regarding observations and their attachment to the model (the negative log-likelihood - nll - to be minimized). :Parameters: **name** : :obj:`str` The name of observed variable (to name the data variable & attachment term related to this observation). **getter** : function :class:`.Dataset` -> :class:`.WeightedTensor` The way to retrieve the observed values from the :class:`.Dataset` (as a :class:`.WeightedTensor`): e.g. all values, subset of values - only x, y, z features, one-hot encoded features, ... **dist** : :class:`.SymbolicDistribution` The symbolic distribution, parametrized by model variables, for observed values (so to compute attachment). **extra_vars** : None (default) or Mapping[VarName, :class:`.VariableInterface`] Some new variables that are needed to fully define the symbolic distribution or the sufficient statistics. (e.g. "noise_std", and "y_L2_per_ft" for instance for a Gaussian model) .. !! processed by numpydoc !! .. py:attribute:: string_for_json :value: 'weibull-right-censored-with-sources' .. py:method:: default_init(**kwargs) :classmethod: