Python Interface#


Installation#

From Source#

This assumes you have followed the Getting Started guide to install TinyStan’s pre-requisites and downloaded a copy of the TinyStan source code.

To install the Python interface, you can either install it directly from Github with

pip install "git+https://github.com/WardBrian/tinystan.git#egg=tinystan&subdirectory=clients/python"

Or, since you have already downloaded the repository, you can run

pip install -e python/

from the TinyStan folder.

To use the TinyStan source you’ve manually downloaded instead of one the package will download for you, you must use set_tinystan_path() or the $TINYSTAN environment variable.

Note that the Python package depends on Python 3.9+ and NumPy, and will install NumPy if it is not already installed.

Example Program#

An example program is provided alongside the Python interface code in example.py:

Show example.py
import tinystan

if __name__ == "__main__":
    model = tinystan.Model("./test_models/bernoulli/bernoulli.stan")
    data = "./test_models/bernoulli/bernoulli.data.json"

    pf = model.pathfinder(data)
    print(pf.parameters)
    print(pf["theta"].mean())
    print(pf["theta"].shape)

    fit = model.sample(data, num_samples=10000, num_chains=10, inits=pf)
    print(fit.parameters)
    print(fit["theta"].mean())
    print(fit["theta"].shape)

    data = {"N": 10, "y": [0, 1, 0, 0, 0, 0, 0, 0, 0, 1]}
    o = model.optimize(data, jacobian=True, init=fit)
    print(o.parameters)
    print(o["theta"].mean())
    print(o["theta"].shape)

API Reference#

Model object#

class tinystan.Model(model: str | PathLike, *, capture_stan_prints: bool = True, stanc_args: List[str] = [], make_args: List[str] = [], warn: bool = True)[source]#

Load a Stan model for inference, compiling it if necessary.

Parameters:
  • model (Union[str, PathLike]) – Path to the Stan model file or shared object.

  • stanc_args (List[str], optional) – A list of arguments to pass to stanc3 if the model is not compiled. For example, ["--O1"] will enable compiler optimization level 1.

  • make_args (List[str], optional) – A list of additional arguments to pass to GNU Make if the model is not compiled. For example, ["STAN_THREADS=True"] will enable threading for the compiled model. If the same flags are defined in make/local, the versions passed here will take precedent.

  • capture_stan_prints (bool, optional) –

    If True, capture all print statements and output from Stan and print them from Python. This may have a performance impact. If False, print statements from Stan will be sent to cout and will not be seen in Jupyter or capturable with contextlib.redirect_stdout().

    Note: If this is set for a model, any other models instantiated from the same shared library will also have the callback set, even if they were created before this model.

  • warn (bool, optional) – If False, the warning about re-loading the same shared object is suppressed.

api_version()[source]#

Return the version of the TinyStan API backing this model.

laplace_sample(mode: StanOutput | ndarray | str | PathLike | Mapping[str, Any], data: str | PathLike | Mapping[str, Any] = '', *, seed: int | None = None, num_draws: int = 1000, jacobian: bool = True, calculate_lp: bool = True, save_hessian: bool = False, refresh: int = 0, num_threads: int = -1)[source]#

Sample from the Laplace approximation of the posterior centered at the provided mode.

Parameters:
  • mode (Union[StanOutput, np.ndarray, StanData]) – The mode of the Laplace approximation. This can be a StanOutput object from optimize(), a numpy array, a path to a JSON file, a JSON string, or a dictionary.

  • data (str | dict, optional) – The data to use for the model. This can be a path to a JSON file, a JSON string, or a dictionary. By default, “”

  • seed (Optional[int], optional) – The seed to use for the random number generator. If not provided, a random seed will be generated.

  • num_draws (int, optional) – Number of draws, by default 1000

  • jacobian (bool, optional) – Whether to apply the Jacobian change of variables to the log density. Note: This should match the value used when the mode was calculated. By default True.

  • calculate_lp (bool, optional) – Whether to calculate the log probability of the samples, by default True

  • save_hessian (bool, optional) – Whether to save the Hessian matrix calculated at the mode, by default False

  • refresh (int, optional) – Number of iterations between progress messages, by default 0 (supress messages)

  • num_threads (int, optional) – Number of threads to use for log density evaluations, by default -1 (use all available)

Returns:

An object containing the samples and metadata from the algorithm.

Return type:

StanOutput

Raises:
  • ValueError – If any of the parameters are invalid or out of range.

  • RuntimeError – If there is an unrecoverable error during the algorithm.

optimize(data: str | PathLike | Mapping[str, Any] = '', *, init: str | PathLike | Mapping[str, Any] | None = None, seed: int | None = None, id: int = 1, init_radius: float = 2.0, algorithm: OptimizationAlgorithm = OptimizationAlgorithm.LBFGS, jacobian: bool = False, num_iterations: int = 2000, max_history_size: int = 5, init_alpha: float = 0.001, tol_obj: float = 1e-12, tol_rel_obj: float = 10000.0, tol_grad: float = 1e-08, tol_rel_grad: float = 10000000.0, tol_param: float = 1e-08, refresh: int = 0, num_threads: int = -1)[source]#

Optimize the model parameters using the specified algorithm.

This will find either the maximum a posteriori (MAP) estimate or the maximum likelihood estimate (MLE) of the model parameters, depending on the value of the jacobian parameter. Additional parameters can be found in the Stan documentation at https://mc-stan.org/docs/reference-manual/optimization.html

Parameters:
  • data (str | dict, optional) – The data to use for the model. This can be a path to a JSON file, a JSON string, or a dictionary. By default, “”

  • init (str | dict | None, optional) – Initial parameter values. This can be a path to a JSON file, a JSON string, or a dictionary. By default, “”

  • seed (Optional[int], optional) – The seed to use for the random number generator. If not provided, a random seed will be generated.

  • id (int, optional) – ID used to offset the random number generator, by default 1

  • init_radius (float, optional) – Radius to initialize unspecified parameters within. The parameter values are drawn uniformly from the interval [-init_radius, init_radius] on the unconstrained scale. By default 2.0

  • algorithm (OptimizationAlgorithm, optional) – Which optimization algorithm to use. Some of the following arguments may be ignored depending on the algorithm. By default OptimizationAlgorithm.LBFGS

  • jacobian (bool, optional) – Whether to apply the Jacobian change of variables to the log density. If False, the algorithm will find the MLE. If True, the algorithm will find the MAP estimate. By default False

  • num_iterations (int, optional) – Maximum number of iterations to run the optimization, by default 2000

  • max_history_size (int, optional) – History size used to approximate the Hessian, by default 5

  • init_alpha (float, optional) – Initial step size, by default 0.001

  • tol_obj (float, optional) – Convergence tolerance for the objective function, by default 1e-12

  • tol_rel_obj (float, optional) – Relative convergence tolerance for the objective function, by default 1e4

  • tol_grad (float, optional) – Convergence tolerance for the gradient norm, by default 1e-8

  • tol_rel_grad (float, optional) – Relative convergence tolerance for the gradient norm, by default 1e7

  • tol_param (float, optional) – Convergence tolerance for the changes in parameters, by default 1e-8

  • refresh (int, optional) – Number of iterations between progress messages, by default 0 (supress messages)

  • num_threads (int, optional) – Number of threads to use for log density evaluations, by default -1 (use all available)

Returns:

An object containing the samples and metadata from the algorithm.

Return type:

StanOutput

Raises:
  • ValueError – If any of the parameters are invalid or out of range.

  • RuntimeError – If there is an unrecoverable error during the algorithm.

pathfinder(data: str | PathLike | Mapping[str, Any] = '', *, num_paths: int = 4, inits: str | PathLike | Mapping[str, Any] | None = None, seed: int | None = None, id: int = 1, init_radius: float = 2.0, num_draws: int = 1000, max_history_size: int = 5, init_alpha: float = 0.001, tol_obj: float = 1e-12, tol_rel_obj: float = 10000.0, tol_grad: float = 1e-08, tol_rel_grad: float = 10000000.0, tol_param: float = 1e-08, num_iterations: int = 1000, num_elbo_draws: int = 25, num_multi_draws: int = 1000, calculate_lp: bool = True, psis_resample: bool = True, refresh: int = 0, num_threads: int = -1)[source]#

Run the Pathfinder algorithm to approximate the posterior. See https://mc-stan.org/docs/reference-manual/pathfinder.html for more information on the algorithm.

Parameters:
  • data (str | dict, optional) – The data to use for the model. This can be a path to a JSON file, a JSON string, or a dictionary. By default, “”

  • num_paths (int, optional) – The number of individual runs of the algorithm to run in parallel, by default 4

  • inits (str | dict | list[str | dict] | None, optional) – Initial parameter values. This can be a single path to a JSON file, a JSON string, a dictionary, or a list of length num_paths of those. By default, “”

  • seed (Optional[int], optional) – The seed to use for the random number generator. If not provided, a random seed will be generated.

  • id (int, optional) – ID for the first path, by default 1

  • init_radius (float, optional) – Radius to initialize unspecified parameters within. The parameter values are drawn uniformly from the interval [-init_radius, init_radius] on the unconstrained scale. By default 2.0

  • num_draws (int, optional) – Number of approximate draws drawn from each of the num_paths Pathfinders, by default 1000

  • max_history_size (int, optional) – History size used by the internal L-BFGS algorithm to approximate the Hessian, by default 5

  • init_alpha (float, optional) – Initial step size for the internal L-BFGS algorithm, by default 0.001

  • tol_obj (float, optional) – Convergence tolerance for the objective function for the internal L-BFGS algorithm, by default 1e-12

  • tol_rel_obj (float, optional) – Relative convergence tolerance for the objective function for the internal L-BFGS algorithm, by default 1e4

  • tol_grad (float, optional) – Convergence tolerance for the gradient norm for the internal L-BFGS algorithm, by default 1e-8

  • tol_rel_grad (float, optional) – Relative convergence tolerance for the gradient norm for the internal L-BFGS algorithm, by default 1e7

  • tol_param (float, optional) – Convergence tolerance for the changes in parameters for the internal L-BFGS algorithm, by default 1e-8

  • num_iterations (int, optional) – Maximum number of iterations for the internal L-BFGS algorithm, by default 1000

  • num_elbo_draws (int, optional) – Number of Monte Carlo draws used to estimate the ELBO, by default 25

  • num_multi_draws (int, optional) – Number of draws returned by Multi-Pathfinder, by default 1000

  • calculate_lp (bool, optional) – Whether to calculate the log probability of the approximate draws. If False, this also implies psis_resample=False. By default True

  • psis_resample (bool, optional) – Whether to use Pareto smoothed importance sampling on the approximate draws. If False, all num_path * num_draws approximate samples will be returned. By default True.

  • refresh (int, optional) – Number of iterations between progress messages, by default 0 (supress messages)

  • num_threads (int, optional) – Number of threads to use for Pathfinder, by default -1 (use all available)

Returns:

An object containing the samples and metadata from the algorithm.

Return type:

StanOutput

Raises:
  • ValueError – If any of the parameters are invalid or out of range.

  • RuntimeError – If there is an unrecoverable error during the algorithm.

sample(data: str | PathLike | Mapping[str, Any] = '', *, num_chains: int = 4, inits: str | PathLike | Mapping[str, Any] | List[str | PathLike | Mapping[str, Any]] | None = None, seed: int | None = None, id: int = 1, init_radius: float = 2.0, num_warmup: int = 1000, num_samples: int = 1000, metric: HMCMetric = HMCMetric.DIAGONAL, init_inv_metric: ndarray | None = None, save_metric: bool = False, adapt: bool = True, delta: float = 0.8, gamma: float = 0.05, kappa: float = 0.75, t0: float = 10, init_buffer: int = 75, term_buffer: int = 50, window: int = 25, save_warmup: bool = False, stepsize: float = 1.0, stepsize_jitter: float = 0.0, max_depth: int = 10, refresh: int = 0, num_threads: int = -1)[source]#

Run Stan’s No-U-Turn Sampler (NUTS) to sample from the posterior. An in-depth explanation of the parameters can be found in the Stan documentation at https://mc-stan.org/docs/reference-manual/mcmc.html

Parameters:
  • data (str | dict, optional) – The data to use for the model. This can be a path to a JSON file, a JSON string, or a dictionary. By default, “”

  • num_chains (int, optional) – The number of chains to run, by default 4

  • inits (str | dict | list[str | dict] | None, optional) – Initial parameter values. This can be a single path to a JSON file, a JSON string, a dictionary, or a list of length num_chains of those. By default, “”

  • seed (Optional[int], optional) – The seed to use for the random number generator. If not provided, a random seed will be generated.

  • id (int, optional) – Chain ID for the first chain, by default 1

  • init_radius (float, optional) – Radius to initialize unspecified parameters within. The parameter values are drawn uniformly from the interval [-init_radius, init_radius] on the unconstrained scale. By default 2.0

  • num_warmup (int, optional) – Number of warmup iterations to run, by default 1000

  • num_samples (int, optional) – Number of samples to draw after warmup, by default 1000

  • metric (HMCMetric, optional) – The type of mass matrix to use in the sampler. The options are UNIT, DENSE, and DIAGONAL. By default HMCMetric.DIAGONAL

  • init_inv_metric (Optional[np.ndarray], optional) – Initial value for the mass matrix used by the sampler. Valid shapes depend on the value of metric. Can have a leading dimension of num_chains to specify different initial metrics for each chain.

  • save_metric (bool, optional) – Whether to report the final mass matrix, by default False

  • adapt (bool, optional) – Whether the sampler should adapt the step size and metric, by default True

  • delta (float, optional) – Target average acceptance probability, by default 0.8

  • gamma (float, optional) – Adaptation regularization scale, by default 0.05

  • kappa (float, optional) – Adaptation relaxation exponent, by default 0.75

  • t0 (float, optional) – Adaptation iteration offset, by default 10

  • init_buffer (int, optional) – Number of warmup samples to use for initial step size adaptation, by default 75

  • term_buffer (int, optional) – Number of warmup samples to use for step size adaptation after the metric is adapted, by default 50

  • window (int, optional) – Initial number of iterations to use for metric adaptation, which is doubled each time the adaptation window is hit, by default 25

  • save_warmup (bool, optional) – Whether to save the warmup samples, by default False

  • stepsize (float, optional) – Initial step size for the sampler, by default 1.0

  • stepsize_jitter (float, optional) – Amount of random jitter to add to the step size, by default 0.0

  • max_depth (int, optional) – Maximum tree depth for the sampler, by default 10

  • refresh (int, optional) – Number of iterations between progress messages, by default 0 (supress messages)

  • num_threads (int, optional) – Number of threads to use for sampling, by default -1 (use all available)

Returns:

An object containing the samples and metadata from the sampling run.

Return type:

StanOutput

Raises:
  • ValueError – If any of the parameters are invalid or out of range.

  • RuntimeError – If there is an unrecoverable error during sampling.

stan_version()[source]#

Return the version of Stan backing this model.

class tinystan.HMCMetric[source]#

Choices for the structure of the mass matrix used in the HMC sampler.

DENSE#
DIAGONAL#
UNIT#
class tinystan.OptimizationAlgorithm[source]#

Choices for the optimization algorithm to use.

BFGS#
LBFGS#
NEWTON#

Inference outputs#

class tinystan.StanOutput[source]#

A holder for the output of a Stan run.

The data attribute contains the raw output from Stan.

If a specific parameter is needed, it can be extracted using the get() method, or by using the object as a dictionary.

Additional attributes may be available depending on the algorithm used, such as hessian or metric.

create_inits(*, chains: int = 4, seed: int | None = None) Dict[str, ndarray] | List[Dict[str, ndarray]][source]#

Create a dictionary of parameters suitable for initializing a new Stan run.

Parameters:
  • chains (int, optional) – Number of chains needed, by default 4

  • seed (Optional[int], optional) – The seed to use for the random number generator. If not provided, a random seed will be generated.

Returns:

A dictionary of parameters, or a list of dictionaries if chains > 1.

Return type:

Union[Dict[str, np.ndarray], List[Dict[str, np.ndarray]]]

property data: ndarray#

The underlying draws from the Stan model.

get(key: str) ndarray[source]#

Extract a parameter from the Stan output. Synonym for obj[key].

Parameters:

key (str) – name of the parameter to extract

Returns:

The parameter values. Shape depends on the Stan type and algorithm used.

Return type:

np.ndarray

property parameters: List[str]#

The names of the parameters in the Stan model.

Compilation utilities#

tinystan.compile_model(stan_file: str | PathLike, *, stanc_args: List[str] = [], make_args: List[str] = []) Path[source]#

Run TinyStan’s Makefile on a .stan file, creating the .so used by the Model class.

This function requires the presence of the TinyStan source code. It will download the source code if it is not found. A manual location can be used instead by calling set_tinystan_path().

Parameters:
  • stan_file (Union[str, os.PathLike]) – A path to a Stan model file.

  • stanc_args (List[str], optional) – A list of arguments to pass to stanc3. For example, ["--O1"] will enable compiler optimization level 1.

  • make_args (List[str], optional) – A list of additional arguments to pass to Make. If the same flags are defined in make/local, the versions passed here will take precedent.

Raises:
Returns:

The path to the compiled .so file.

Return type:

Path

tinystan.set_tinystan_path(path: str) None[source]#

Set the path to TinyStan.

This is useful for development or if the automatic download is not desired. If the path is invalid, this function will raise an error.

This should point to the top-level folder of the repository.