python - Fitting a Binomial distribution with pymc raises ZeroProbability error for certain FillValues -
i'm not sure if found bug in pymc. seems fitting binomial missing data can produce zeroprobability
error depending on chosen fill_value masks missing data. maybe i'm using wrongly. tried following example current master branch github. i'm aware of bug concerning binomial distributions in pymc 2.3.4, seems different issue.
i fitted binomial distribution pymc , worked expected:
import scipy sp import pymc def make_model(observed_values): p = pymc.uniform('p', lower = 0.0, upper = 1.0, value = 0.1) values = pymc.binomial('values', n = 10* sp.ones_like(observed_values), p = p * sp.ones_like(observed_values),\ value = observed_values, observed = true, plot = false) values = pymc.binomial('values', n = 10, p = p,\ value = observed_values, observed = true, plot = false) return locals() sp.random.seed(0) observed_values = sp.random.binomial(n = 10.0, p = 0.1, size = 100) m1 = pymc.mcmc(make_model(observed_values)) m1.sample(iter=10000, burn=1000, thin=10) pymc.matplot.plot(m1) m1.summary()
output:
[-----------------100%-----------------] 10000 of 10000 complete in 0.7 sec plotting p p: mean sd mc error 95% hpd interval ------------------------------------------------------------------ 0.093 0.007 0.0 [ 0.081 0.107] posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| 0.08 0.088 0.093 0.097 0.106
now, tried similar situation difference 1 observed value missing:
mask = sp.zeros_like(observed_values) mask[0] = true masked_values = sp.ma.masked_array(observed_values, mask = mask, fill_value = 999999) m2 = pymc.mcmc(make_model(masked_values)) m2.sample(iter=10000, burn=1000, thin=10) pymc.matplot.plot(m2) m2.summary()
unexpectedly, got zeroprobability
error:
--------------------------------------------------------------------------- zeroprobability traceback (most recent call last) <ipython-input-16-4f945f269628> in <module>() ----> 1 m2 = pymc.mcmc(make_model(masked_values)) 2 m2.sample(iter=10000, burn=1000, thin=10) 3 pymc.matplot.plot(m2) 4 m2.summary() <ipython-input-12-cb8707bb911f> in make_model(observed_values) 4 def make_model(observed_values): 5 p = pymc.uniform('p', lower = 0.0, upper = 1.0, value = 0.1) ----> 6 values = pymc.binomial('values', n = 10* sp.ones_like(observed_values), p = p * sp.ones_like(observed_values), value = observed_values, observed = true, plot = false) 7 values = pymc.binomial('values', n = 10, p = p, value = observed_values, observed = true, plot = false) 8 return locals() /home/fabian/anaconda/lib/python2.7/site-packages/pymc/distributions.pyc in __init__(self, *args, **kwds) 318 logp_partial_gradients=logp_partial_gradients, 319 dtype=dtype, --> 320 **arg_dict_out) 321 322 new_class.__name__ = name /home/fabian/anaconda/lib/python2.7/site-packages/pymc/pymcobjects.pyc in __init__(self, logp, doc, name, parents, random, trace, value, dtype, rseed, observed, cache_depth, plot, verbose, isdata, check_logp, logp_partial_gradients) 773 if check_logp: 774 # check initial value --> 775 if not isinstance(self.logp, float): 776 raise valueerror( 777 "stochastic " + /home/fabian/anaconda/lib/python2.7/site-packages/pymc/pymcobjects.pyc in get_logp(self) 930 (self._value, self._parents.value)) 931 else: --> 932 raise zeroprobability(self.errmsg) 933 934 return logp zeroprobability: stochastic values's value outside support, or forbids parents' current values.
however, if change fill value in masked array 1, fitting works again:
masked_values2 = sp.ma.masked_array(observed_values, mask = mask, fill_value = 1) m3 = pymc.mcmc(make_model(masked_values2)) m3.sample(iter=10000, burn=1000, thin=10) pymc.matplot.plot(m3) m3.summary()
output:
[-----------------100%-----------------] 10000 of 10000 complete in 2.1 sec plotting p p: mean sd mc error 95% hpd interval ------------------------------------------------------------------ 0.092 0.007 0.0 [ 0.079 0.105] posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| 0.079 0.088 0.092 0.097 0.105 values: mean sd mc error 95% hpd interval ------------------------------------------------------------------ 1.15 0.886 0.029 [ 0. 3.] posterior quantiles: 2.5 25 50 75 97.5 |---------------|===============|===============|---------------| 0.0 1.0 1.0 2.0 3.0
is bug or there problem model? help!
i asked question pymc developers on github , there got answer fonnesbeck (https://github.com/pymc-devs/pymc/issues/47#issuecomment-129301002):
this because filled missing values 999999, outside support of variable. need give valid value, not 1 occurs in non-missing data."
to make sure missing value not in non-missing data following suggested:
you can give non-integer value in order avoid problem cite. example, if fill with, say, 1.5 should work.
(...) pymc calculates log-probability @ first iteration, , therefore values inserted missing values @ first iteration have valid. if give discrete values floating point value, should end getting truncated when converted integers, work."
Comments
Post a Comment