Internal API Reference#

Here is a documentation of methods internal to the package, which are subject to considerable change between releases. No promises of backwards compatibility are made with these methods.

The package consists of a single general class for estimators, which is modelled after sk-learn’s Estimator class.

Classes#

CoreEstimator#

class scikit_stan.modelcore.CoreEstimator[source]#

Abstract class for all estimator-type models in this package.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters:: deep (bool, default True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.
Returns:: params (dict) – Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object. :param **params: Estimator parameters. :type **params: dict

Returns:: self (estimator instance) – Estimator instance.

Validation Methods#

Validating Family-Link Choice#

scikit_stan.utils.validation.validate_family(family, link)[source]#

Validate family and link combination choice.

Parameters:

family (str) – Name of chosen family. Only the following families are supported: “gaussian”, “binomial”, “gamma”, “poisson”, “inverse-gaussian”.
link (Optional[str]) –
Name of chosen link function. Only the following combinations are supported, following the R families package:
- ”gaussian”:
  
  ”identity” - Identity link function,
  
  ”log” - Log link function,
  
  ”inverse” - Inverse link function
- ”gamma”:
  
  ”identity” - Identity link function,
  
  ”log” - Log link function,
  
  ”inverse” - Inverse link function
- ”inverse-gaussian”:
  
  ”identity” - Identity link function,
  
  ”log” - Log link function,
  
  ”inverse” - Inverse link function,
  
  ”inverse-square” - Inverse square link function
- ”poisson”:
  
  ”identity” - Identity link function,
  
  ”log” - Log link function,
  
  ”sqrt” - Square root link function
- ”binomial”:
  
  ”log” - Log link function,
  
  ”logit” - Logit link function,
  
  ”probit” - Probit link function,
  
  ”cloglog” - Complementary log-log link function,
  
  ”cauchit” - Cauchit link function
If an invalid combination of family and link is passed, a ValueError is raised.

Raises:

ValueError – Passed family is not supported.
ValueError – Passed link is not supported or is not valid for the chosen family.

Note that the package has a consistent internal numbering scheme for families and links alike. Specifically, since Stan does not support strings, families, links, and priors have internal numeric representations.

Families are mapped as follows:

“gaussian”: 0,
“gamma”: 1,
“inverse-gaussian”: 2,
“poisson”: 3,
“binomial”: 4,
“negative-binomial”: 5,

Link functions are mapped as follows:

identity - 0
log - 1
inverse - 2
sqrt - 3
inverse-square - 4
logit - 5
probit - 6
cloglog - 7
cauchit - 8

Validating Input Data#

scikit_stan.utils.validation.check_array(X, ensure_2d=True, allow_nd=False, allow_sparse=False, dtype=<class 'numpy.float64'>)[source]#

Input validation on an array, list, sparse matrix or similar. By default, the input is checked to be a non-empty 2D array containing only finite values.

Parameters:

X (array-like) – Array-like, list, sparse matrix, or similar of data to be checked.
ensure_2d (bool, optional) – Whether to ensure that the array is 2D
allow_nd (bool, optional) – Whether to allow the array to be an n-dimensional matrix where n > 2
dtype (type, optional) – Dtype of the data; regressions only supported on float64 or int64 arrays

Returns:

NDArray[Union[np.float64, np.int64]] – Verified set of data that can be used for regression.

Raises:

ValueError – Sparse, complex, or otherwise invalid data type passed for X.
ValueError – Invalid number of dimensions in data passed for X, or otherwise data that cannot be recast to satisfy dimension requirements

scikit_stan.utils.validation.check_is_fitted(estimator, attributes=None, *, msg=None, all_or_any=<built-in function all>)[source]#

Perform is_fitted validation for estimator. Checks if the estimator is fitted by verifying the presence of fitted attributes (ending with a trailing underscore) and otherwise raises a NotFittedError with the given message. If an estimator does not set any attributes with a trailing underscore, it can define a __sklearn_is_fitted__ method returning a boolean to specify if the estimator is fitted or not.

Parameters:

estimator (estimator instance) – estimator instance for which the check is performed.
attributes (str, list or tuple of str, default None) – Attribute name(s) given as string or a list/tuple of strings Eg.: ["coef_", "estimator_", ...], "coef_" If None, estimator is considered fitted if there exist an attribute that ends with a underscore and does not start with double underscore.
msg (str, default None) – The default error message is, “This %(name)s instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator.” For custom messages if “%(name)s” is present in the message string, it is substituted for the estimator name. Eg. : “Estimator, %(name)s, must be fitted before sparsifying”.
all_or_any (callable, {all, any}, default all) – Specify whether all or any of the given attributes must exist.

Returns:

None

Raises:

NotFittedError – If the attributes are not found.

scikit_stan.utils.validation.check_consistent_length(*arrays)[source]#: Check that all arrays have consistent first dimensions. Checks whether all objects in arrays have the same shape or length. :param *arrays: Objects that will be checked for consistent length. :type *arrays: list or tuple of input objects.

scikit_stan.utils.validation._num_samples(x)[source]#: Return number of samples in array-like x.

Validating Priors#

scikit_stan.utils.validation.validate_prior(prior_spec, coeff_type)[source]#

Perform validation on given prior dictionary for prior on either slope or intercept. This is only called when there is a prior to check.

Parameters:

prior_spec (Dict[str, Any]) – Proposed prior dictionary, can be either for slope or intercept.
coeff_type (str) – Specify whether the prior is for slope or intercept - should only be ‘slope’ or ‘intercept’.

Returns:

Dict[str, Any] – Validated dictionary of parameters for the given prior.

Raises:

ValueError – Validating a non-(slope or intercept) prior type.
ValueError – Prior distribution is not specified.
ValueError – Not all parameters for prior set-up are specified.
ValueError – Prior sigma is negative.

scikit_stan.utils.validation.validate_aux_prior(aux_prior_spec)[source]#

Validates passed configuration for prior on auxiliary parameters. This does not perform parameter autoscaling.

This validation method returns the following fields in the dictionary:

“prior_aux_dist”: distribution of the prior on auxiliary parameters from this mapping:

PRIORS_AUX_MAP = {
“exponential”: 0, # exponential distribution, requires only beta parameter “chi2”: 1, # chi-squared distribution, requires only nu parameter “gamma”: 2, # gamma distribution, requires only alpha and beta parameters “inv_gamma”: 3, # inverse gamma distribution, requires only alpha and beta parameters }

“num_prior_aux_params”: number of parameters in the prior on auxiliary parameters,
determined by the number of parameters in the distribution.

“prior_aux_params”: list of parameters in the prior on auxiliary parameters;
this must be a list even if the distribution only has one parameter

Parameters:

aux_prior_spec (Dict[str, Any]) –

Dictionary containing configuration for prior on auxiliary parameters. Currently supported priors are: “exponential” and “chi2”, which are both parameterized by a single scalar.

Priors here with more parameters are a future feature. For single-parameter priors, this field is a dictionary with the following keys

”prior_aux_dist”: distribution of the prior on this parameter

”prior_aux_param”: parameter of the prior on this parameter

For example, to specify a chi2 prior with nu=2.5, pass:

{"prior_aux_dist": "chi2", "prior_aux_param": 2.5}

Returns:

Dict[str, Any] – Dictionary containing validated configuration for prior on auxiliary parameters.

Raises:

ValueError – Prior’s distribution is not specified.
ValueError – Unsupported prior distribution for auxiliary parameter.
ValueError – Prior distribution parameters are not specified.