Internal API Reference#
Here is a documentation of methods internal to the package, which are subject to considerable change between releases. No promises of backwards compatibility are made with these methods.
The package consists of a single general class for estimators, which is modelled after sk-learn’s Estimator class.
Classes#
CoreEstimator#
- class scikit_stan.modelcore.CoreEstimator[source]#
Abstract class for all estimator-type models in this package.
- set_params(**params)[source]#
Set the parameters of this estimator. The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object. :param **params: Estimator parameters. :type **params:dict
- Returns:
self (
estimator instance
) – Estimator instance.
Validation Methods#
Validating Family-Link Choice#
- scikit_stan.utils.validation.validate_family(family, link)[source]#
Validate family and link combination choice.
- Parameters:
family (
str
) – Name of chosen family. Only the following families are supported: “gaussian”, “binomial”, “gamma”, “poisson”, “inverse-gaussian”.link (
Optional[str]
) –Name of chosen link function. Only the following combinations are supported, following the R families package:
- ”gaussian”:
”identity” - Identity link function,
”log” - Log link function,
”inverse” - Inverse link function
- ”gamma”:
”identity” - Identity link function,
”log” - Log link function,
”inverse” - Inverse link function
- ”inverse-gaussian”:
”identity” - Identity link function,
”log” - Log link function,
”inverse” - Inverse link function,
”inverse-square” - Inverse square link function
- ”poisson”:
”identity” - Identity link function,
”log” - Log link function,
”sqrt” - Square root link function
- ”binomial”:
”log” - Log link function,
”logit” - Logit link function,
”probit” - Probit link function,
”cloglog” - Complementary log-log link function,
”cauchit” - Cauchit link function
If an invalid combination of family and link is passed, a ValueError is raised.
- Raises:
ValueError – Passed family is not supported.
ValueError – Passed link is not supported or is not valid for the chosen family.
Note that the package has a consistent internal numbering scheme for families and links alike. Specifically, since Stan does not support strings, families, links, and priors have internal numeric representations.
- Families are mapped as follows:
“gaussian”: 0,
“gamma”: 1,
“inverse-gaussian”: 2,
“poisson”: 3,
“binomial”: 4,
“negative-binomial”: 5,
- Link functions are mapped as follows:
identity - 0
log - 1
inverse - 2
sqrt - 3
inverse-square - 4
logit - 5
probit - 6
cloglog - 7
cauchit - 8
Validating Input Data#
- scikit_stan.utils.validation.check_array(X, ensure_2d=True, allow_nd=False, allow_sparse=False, dtype=<class 'numpy.float64'>)[source]#
Input validation on an array, list, sparse matrix or similar. By default, the input is checked to be a non-empty 2D array containing only finite values.
- Parameters:
X (array-like) – Array-like, list, sparse matrix, or similar of data to be checked.
ensure_2d (
bool
, optional) – Whether to ensure that the array is 2Dallow_nd (
bool
, optional) – Whether to allow the array to be an n-dimensional matrix where n > 2dtype (
type
, optional) – Dtype of the data; regressions only supported on float64 or int64 arrays
- Returns:
NDArray[Union[np.float64
,np.int64]]
– Verified set of data that can be used for regression.- Raises:
ValueError – Sparse, complex, or otherwise invalid data type passed for X.
ValueError – Invalid number of dimensions in data passed for X, or otherwise data that cannot be recast to satisfy dimension requirements
- scikit_stan.utils.validation.check_is_fitted(estimator, attributes=None, *, msg=None, all_or_any=<built-in function all>)[source]#
Perform is_fitted validation for estimator. Checks if the estimator is fitted by verifying the presence of fitted attributes (ending with a trailing underscore) and otherwise raises a NotFittedError with the given message. If an estimator does not set any attributes with a trailing underscore, it can define a
__sklearn_is_fitted__
method returning a boolean to specify if the estimator is fitted or not.- Parameters:
estimator (
estimator instance
) – estimator instance for which the check is performed.attributes (
str
,list
ortuple
ofstr
, defaultNone
) – Attribute name(s) given as string or a list/tuple of strings Eg.:["coef_", "estimator_", ...], "coef_"
IfNone
,estimator
is considered fitted if there exist an attribute that ends with a underscore and does not start with double underscore.msg (
str
, defaultNone
) – The default error message is, “This %(name)s instance is not fitted yet. Call ‘fit’ with appropriate arguments before using this estimator.” For custom messages if “%(name)s” is present in the message string, it is substituted for the estimator name. Eg. : “Estimator, %(name)s, must be fitted before sparsifying”.all_or_any (
callable
,{all, any}
, defaultall
) – Specify whether all or any of the given attributes must exist.
- Returns:
- Raises:
NotFittedError – If the attributes are not found.
- scikit_stan.utils.validation.check_consistent_length(*arrays)[source]#
Check that all arrays have consistent first dimensions. Checks whether all objects in arrays have the same shape or length. :param *arrays: Objects that will be checked for consistent length. :type *arrays:
list
ortuple
ofinput objects.
Validating Priors#
- scikit_stan.utils.validation.validate_prior(prior_spec, coeff_type)[source]#
Perform validation on given prior dictionary for prior on either slope or intercept. This is only called when there is a prior to check.
- Parameters:
prior_spec (
Dict[str
,Any]
) – Proposed prior dictionary, can be either for slope or intercept.coeff_type (
str
) – Specify whether the prior is for slope or intercept - should only be ‘slope’ or ‘intercept’.
- Returns:
Dict[str
,Any]
– Validated dictionary of parameters for the given prior.- Raises:
ValueError – Validating a non-(slope or intercept) prior type.
ValueError – Prior distribution is not specified.
ValueError – Not all parameters for prior set-up are specified.
ValueError – Prior sigma is negative.
- scikit_stan.utils.validation.validate_aux_prior(aux_prior_spec)[source]#
Validates passed configuration for prior on auxiliary parameters. This does not perform parameter autoscaling.
This validation method returns the following fields in the dictionary:
- “prior_aux_dist”: distribution of the prior on auxiliary parameters from this mapping:
- PRIORS_AUX_MAP = {
“exponential”: 0, # exponential distribution, requires only beta parameter “chi2”: 1, # chi-squared distribution, requires only nu parameter “gamma”: 2, # gamma distribution, requires only alpha and beta parameters “inv_gamma”: 3, # inverse gamma distribution, requires only alpha and beta parameters }
- “num_prior_aux_params”: number of parameters in the prior on auxiliary parameters,
determined by the number of parameters in the distribution.
- “prior_aux_params”: list of parameters in the prior on auxiliary parameters;
this must be a list even if the distribution only has one parameter
- Parameters:
aux_prior_spec (
Dict[str
,Any]
) –Dictionary containing configuration for prior on auxiliary parameters. Currently supported priors are: “exponential” and “chi2”, which are both parameterized by a single scalar.
Priors here with more parameters are a future feature. For single-parameter priors, this field is a dictionary with the following keys
”prior_aux_dist”: distribution of the prior on this parameter
”prior_aux_param”: parameter of the prior on this parameter
For example, to specify a chi2 prior with nu=2.5, pass:
{"prior_aux_dist": "chi2", "prior_aux_param": 2.5}
- Returns:
Dict[str
,Any]
– Dictionary containing validated configuration for prior on auxiliary parameters.- Raises:
ValueError – Prior’s distribution is not specified.
ValueError – Unsupported prior distribution for auxiliary parameter.
ValueError – Prior distribution parameters are not specified.