[ot-users] Nonsensical output from computeQuantile
regis lebrun
regis_anne.lebrun_dutfoy at yahoo.fr
Tue Sep 12 11:16:41 CEST 2017
Hi Phil,
Many thanks for your feedback. I see several points in your question:
+ the quantile function is defined over R and not only [0, 1], strange
+ quantile(0) should be equal to -inf and quantile(1) to inf for unbounded distributions
+ the quantile function should be symmetric wrt 1/2 for symmetric distributions
+ the quantile function should be nondecreasing
+ the quantile function should be continuous for continuous distributions
If we adopt a pure mathematical point of view, for an unbounded distribution such as the normal distribution, we would expect quantile(x)=NaN or an exception to be thrown for x outside of [0,1], quantile(0)=-inf and quantile(1)=inf. The point is that the quantile function is used by the iso-probabilistic transformations, which are used in several meta-modeling algorithms. If the quantile function is restricted to the range [0,1], for example by throwing an exception, then each time the meta-model is evaluated outside of the bounding box of the learning sample, an exception is thrown. For many applications, a saturation (ie a projection on the bounding box of the learning sample) of the input is much more adapted so the choice we made. It answers the two first points.
Concerning the lack of symmetry between the lower tail and the upper tail, it is due to the fact that floating point numbers are NOT symmetric wrt 1/2. You have much more positive floating point numbers in the neighborhood of 0 than in the neighborhood of 1, due to the denormalized numbers. The smallest positive floating point number is SpecFunc.MinScalar=2.225e-308, while the gap between 1 and the largest floating point number less than 1 is SpecFunc.ScalarPrecision=2.220e-16. In the case where things are done carefully, eg by using the relation F^{-1}(G(x))=\bar{F}^'-1}(\bar{G}(x)) for x large, where F,G are two CDFs, \bar{F}=1-F, \bar{G}=1-G, it allows to extend the range of the isoprobabilistic transformations from about [-8.12, 8.12] to [-37.5, 37.5] in the standard normal space.
-> it was an historical decision, today it looks weird.
Concerning the last two points, there is definitely a bug in OpenTURNS. For numerical reasons (mainly to be able to compute integrals using the Gauss-Kronrod method), all the distributions have a range, which is the bounding box of their mathematical support (ie the closure of the set where the PDF is positive for absolutely continuous distributions, the set of points with positive probability for discrete distributions). The range is stored as an Interval object: the values of the bounds are used by the numerical methods, and the flags tell the user if the distribution is actually bounded or not. If the distribution is bounded, the value of the bound is the exact value, otherwise it is an extreme quantile (epsilon and 1-epsilon with epsilon=1e-14 by default, see ResourceMap.GetAsScalar("Distribution-DefaultCDFEpsilon")).
At one point, I decided to use the range to provide a meaningful value for quantile(0), quantile(1) and more generally quantile(q) for q outside of (0,1). It works perfectly well for bounded distributions, but not for continuous distributions as eg quantile(1e-15) < quantile(0)! The key point is that the range stores both exact and approximate data, and only the exact one should be used in the implementation of the computeQuantile() method.
I can see two fixes:
1) Use the range information in the computation of quantiles only for finite bounds
2) Restrict the computeQuantile() method to (epsilon, 1.0-epsilon) and extend it using the numerical information in range.
I clearly prefer the first solution, but I have to check that nothing goes wrong in the algorithms that rely on computeQuantile() instead of getRange() to have a clue on the extent of the distributions.
Cheers
Régis
________________________________
De : Phil Fernandes <phil.fernandes at enbridge.com>
À : Julien Schueller | Phimeca <schueller at phimeca.com>; "users at openturns.org" <users at openturns.org>
Envoyé le : Mardi 5 septembre 2017 19h00
Objet : Re: [ot-users] Nonsensical output from computeQuantile
Hi Julien,
Thank you for your reply. However, shouldn’t the quantiles of continuous unbounded distributions at p=0 and p=1 be undefined and return NaN (or -/+inf if we consider the limits)? What is the purpose of real-numbered boundary values in these cases? If this is a coding convenience, then why does ot.Normal().computeQuantile() not return quantiles that are symmetric about 0 for ot.SpecFunc.MinScalar and 1-ot.SpecFunc.Precision?
Phil
From:users-bounces at openturns.org [mailto:users-bounces at openturns.org] On Behalf Of Julien Schueller | Phimeca
Sent: Monday, September 04, 2017 9:37 AM
To: users at openturns.org
Subject: [External] Re: [ot-users] Nonsensical output from computeQuantile
Maybe we don't want to throw, but I'm not sure.
We never throw, but there are some other cases where the quantile function is not really continuous at the bounds, because in DistributionImplementation::computeQuantile(L2063) we filter according to the values defined in the range attribute:
import openturns as ot
import traceback
factories = ot.DistributionFactory.GetUniVariateFactories()
for factory in factories:
dist = factory.build()
try:
q0 = dist.computeQuantile(0.0)
qm1 = dist.computeQuantile(-1.0)
if q0 != qm1:
print(dist.getName(), '<0', q0, qm1)
q1 = dist.computeQuantile(1.0)
q0p = dist.computeQuantile(ot.SpecFunc.MinScalar)
if q0 != q0p:
print(dist.getName(), '0+', q0, q0p)
except:
print(' exc for', dist.getName())
traceback.print_exc()
pass
try:
q1 = dist.computeQuantile(1.0)
q2 = dist.computeQuantile(2.0)
if q1 != q2:
print(dist.getName(), '>1', q1, q2)
q1m = dist.computeQuantile(1.0-ot.SpecFunc.ScalarEpsilon)
if q1m != q1:
print(dist.getName(),'1-', q1m, q1)
except:
print(' exc for', dist.getName())
traceback.print_exc()
pass
________________________________
De :users-bounces at openturns.org <users-bounces at openturns.org> de la part de Julien Schueller | Phimeca <schueller at phimeca.com>
Envoyé : lundi 4 septembre 2017 16:40:25
À : users at openturns.org
Objet : Re: [ot-users] Nonsensical output from computeQuantile
Hello Phil,
- Calling computeQuantile(x) with p outside of (0,1) does not make sense as x is a probability, and it should throw an exception, maybe we should fix that.
- Otherwise the boundary values at x=0 and x=1 seem perfectly fine, (and should not be -/+inf) regarding the continuity of the fonction:
In [32]: ot.Normal().computeQuantile(ot.SpecFunc.MinScalar)
Out[32]: class=Point name=Unnamed dimension=1 values=[-37.5016]
In [42]: ot.Normal().computeQuantile(1-ot.SpecFunc.Precision)
Out[42]: class=Point name=Unnamed dimension=1 values=[8.12589]
j
________________________________
De :users-bounces at openturns.org <users-bounces at openturns.org> de la part de Phil Fernandes <phil.fernandes at enbridge.com>
Envoyé : mercredi 23 août 2017 22:37:37
À : users at openturns.org
Objet : [ot-users] Nonsensical output from computeQuantile
Hello,
I’ve noticed that the computeQuantile function, at least for continuous unbounded distributions, yields a nonsensical numerical result with inputs <=0 and >=1. For example, ot.Normal().computeQuantile(x) when x = 0 the output is -37.5194. If I enter x < 0, the output is -7.65063. At the moment I am handling this by subclassing the distribution and overriding the computeQuantile method with an if-else statement that returns –np.inf or np.inf if x = 0 or 1, but this is cumbersome. Would it be possible to incorporate this fix into the source code?
Thank you,
Phil
_______________________________________________
OpenTURNS users mailing list
users at openturns.org
http://openturns.org/mailman/listinfo/users
More information about the users
mailing list