[ot-users] Sample transformation

roy roy at cerfacs.fr
Wed Oct 11 00:21:21 CEST 2017


Hi Régis,

Not sure about the leak as I only do python.
But using the tool I know, I was not able to free the memory(using some del and gc.collect()).

I saw the issue when constructing a model on a cluster (Quadrature with 121 points, degree 10 in 2d) and the batch manager killed the job
due to memory consumption. On my Mac the memory goes to 3-4 Go for this but on the cluster it explodes.

As always, thanks for the quick reply :)


Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



> Le 10 oct. 2017 à 23:13, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> a écrit :
> 
> Hi Pamphil,
> 
> Nice to know that the code *seems* to work ;-)
> 
> Are you sure that there is a memory leak? The algorithm creates potentially large objects, which are stored into the FunctionalChaosResult member of the algorithm. If there is a pending reference to this object, the memory will not be released. Maybe Denis, Julien or Sofiane have more insight on this point?
> 
> Cheers
> 
> Régis
> 
> 
> 
> ________________________________
> De : roy <roy at cerfacs.fr>
> À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> 
> Cc : users <users at openturns.org>
> Envoyé le : Mardi 10 octobre 2017 6h35
> Objet : Re: [ot-users] Sample transformation
> 
> 
> 
> Hi Regis,
> 
> Thanks for this long and well detailed answer!
> The code you provided seems to work as expected.
> 
> However during my tests I noticed that the memory was not freed correctly.
> Once the class FunctionalChaosAlgorithm is called, there is a memory bump and even after calling del
> and gc.collect(), memory is still not freed (using memory_profiler for that). Might be a memory leak?
> 
> Kind regards,
> 
> Pamphile ROY
> Chercheur doctorant en Quantification d’Incertitudes
> CERFACS - Toulouse (31) - France
> +33 (0) 5 61 19 31 57
> +33 (0) 7 86 43 24 22
> 
> 
> 
> Le 7 oct. 2017 à 19:59, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> a écrit :
>> 
>> Hi Pamphil,
>> 
>> You were almost right: the AdaptiveStieltjesAlgorithm is very close to what you are looking for, but not exactly what you need. It is the algorithmic part of the factory of orthonormal polynomials, the class you have to use is StandardDistributionPolynomialFactory, ie a factory (=able to build something) and not an algorithm (=something able to compute something). You have all the details here:
>> 
>> http://openturns.github.io/openturns/master/user_manual/_generated/openturns.StandardDistributionPolynomialFactory.html
>> 
>> I agree on the fact that the difference is quite subtle, as it can be seen by comparing the API of the two classes. The distinction was made at a time were several algorithms were competing for the task (GramSchmidtAlgorithm, ChebychevAlgorithm) but in fact the AdaptiveStieltjesAlgorithm proved to be much more accurate and reliable than the other algorithms, and now it is the only orthonormalization algorithm available.
>> 
>> Another subtle trick is the following.
>> 
>> If you create a basis this way:
>> basis = ot.StandardDistributionPolynomialFactory(dist)
>> you will get the basis associated to the *standard representative* distribution in the parametric family to which dist belongs. It means the distribution with zero mean and unit variance, or with support equals to [-1, 1], or dist itself if no affine transformation is able to reduce the number of parameters of the distribution. 
>> It is managed automatically within the FunctionalChaosAlgorithm, but can be disturbing if you do things by hand.
>> 
>> If you create a basis this way:
>> basis = ot.StandardDistributionPolynomialFactory(ot.AdaptiveStieltjesAlgorithm(dist))
>> then the distribution is preserved, and you get the orthonormal polynomials corresponding to dist. Be aware of the fact that the algorithm may have hard time to build the polynomials if your distribution is far away from its standard representative, as it may involves the computation of recurrence coefficients with a much wider range of variation. The benefit is that the orthonormality measure is exactly your distribution, assuming that its copula is the independent one, so you don't have to introduce a marginal transformation between both measures.
>> 
>> Some additional remarks:
>> + it looks like dist has dimension>1, as you extract its marginal distributions later on. AdaptiveStieltjesAlgorithm and StandardDistributionPolynomialFactory only work with 1D distributions (it is not checked by the library, my shame). What you have to do is:
>> 
>> basis = ot.OrthogonalProductPolynomialFactory([ot.StandardDistributionPolynomialFactory(ot.AdaptiveStieltjesAlgorithm(dist.getMarginal(i))) for i in range(dist.getDimension())])
>> Quite a long line, I know...
>> It will build a multivariate polynomial basis orthonormal with respect to the product distribution (ie with independent copula) sharing the same 1D marginal distributions as dist.
>> 
>> 
>> After that, everything will work as expected and you will NOT have to build the transformation (if you build it it will coincide with the identity function). If you encounter performance issues (the polynomials of high degrees take ages to be built as in http://trac.openturns.org/ticket/885, or there is an overflow, or the numerical precision is bad) then use:
>> basis = ot.OrthogonalProductPolynomialFactory([ot.StandardDistributionPolynomialFactory(dist.getMarginal(i)) for i in range(dist.getDimension())])
>> and build the transformation the way you do it.
>> 
>> + if you use the FunctionalChaosAlgorithm class by providing an input sample and an output sample, you also have to provide the weights of the input sample EVEN IF the experiment given in the projection strategy would allow to recompute them. It is because the fact that you provide the input sample overwrite the weighted experiment of the projection stratey by a FixedExperiment doe.
>> 
>> I attached two complete examples: one using the exact marginal distributions and the other using the standard representatives.
>> 
>> Best regards
>> 
>> Régis
>> 
>> ________________________________
>> De : roy <roy at cerfacs.fr>
>> À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> 
>> Cc : users <users at openturns.org>
>> Envoyé le : Vendredi 6 octobre 2017 14h22
>> Objet : Re: [ot-users] Sample transformation
>> 
>> 
>> 
>> Hi Regis,
>> 
>> Thank you for this detailed answer.
>> 
>> - I am using the latest release from conda (OT 1.9, python 3.6.2, latest numpy, etc.) ,
>> - For the sample, I need it to generate externally the output (cost code that cannot be integrated into OT as model),
>> - I have to convert ot.Sample into np.array because it is then used by other functions to create the simulations, etc.
>> 
>> If I understood correctly, I can create the projection strategy using this snippet:
>> 
>> basis = ot.AdaptiveStieltjesAlgorithm(dist)
>> measure = basis.getMeasure()
>> quad = ot.Indices(in_dim)
>> for i in range(in_dim):
>>  quad[i] = degree + 1
>> 
>> comp_dist = ot.GaussProductExperiment(measure, quad)
>> proj_strategy = ot.IntegrationStrategy(comp_dist)
>> 
>> inv_trans = ot.Function(ot.MarginalTransformationEvaluation([measure.getMarginal(i) for i in range(in_dim)], distributions))
>> sample = np.array(inv_trans(comp_dist.generate()))
>> 
>> 
>> It seems to work. Except that the basis does not work with ot.FixedStrategy(basis, dim_basis). I get a non implemented method error.
>> 
>> After I get the sample and the corresponding output, what is the way to go? Which arguments do I need to use for the
>> ot.FunctionalChaosAlgorithm? 
>> 
>> I am comparing the Q2 and on Ishigami and I was only able to get correct results using:
>> 
>> pc_algo = ot.FunctionalChaosAlgorithm(sample, output, dist, trunc_strategy)
>> 
>> But for least square strategy I had to use this:
>> 
>> pc_algo = ot.FunctionalChaosAlgorithm(sample, output)
>> 
>> 
>> Is it normal?
>> 
>> 
>> Pamphile ROY
>> Chercheur doctorant en Quantification d’Incertitudes
>> CERFACS - Toulouse (31) - France
>> +33 (0) 5 61 19 31 57
>> +33 (0) 7 86 43 24 22
>> 
>> 
>> 
>> Le 5 oct. 2017 à 15:40, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> a écrit :
>> 
>> 
>>> Hi Pamphile,
>>> 
>>> 
>>> 1) The problem:
>>> The problem you get is due to the fact that in your version of OpenTURNS (1.7 I suppose), the GaussProductExperiment class has a different way to handle the input distribution than the other WeightedExperiment classes: it generates the quadrature rule of the *standard representatives* of the marginal distributions instead of the marginal distributions. It does not change the rate of convergence of the PCE algorithm and allows to use specific algorithms for distributions with known orthonormal polynomials. It is not explained in the documentation and if you ask the doe for its distribution it will give you the initial distribution instead of the standardized one.
>>> 
>>> 2) The mathematical background:
>>> The generation of quadrature rules for arbitrary 1D distributions is a badly conditioned problem. Even if the quadrature rule is well-defined (existence of moments of any order, distribution characterized by these moments), the application that maps the recurrence coefficients of the orthogonal polynomials to their value can have a very large condition number. As a result, the adaptive integration used to compute the recurrence coefficients of order n, based on the values of the polynomials of degree n-1 and n-2, can lead to wrong values and all the process falls down.
>>> 
>>> 3) The current state of the software:
>>> Since version 1.8, OpenTURNS no more generates the quadrature rule of the standard representatives, but the quadrature rule of the actual marginal distributions. The AdaptiveStieltjesAlgorithm class, introduced in release 1.8, is much more robust than the previous orthonormalization algorithms and is able to handle even stiff problems. There are still difficult situations (distributions with discontinuous PDF inside of the range, fixed in OT 1.9, or really badly conditioned distributions, hopefully fixed when ticket#885 will be solved) but most usual situations are under control even with marginal degrees of order 20.
>>> 
>>> 4) The (probable) bug in your code and the way to solve it
>>> You must be aware of the fact that the distribution you put into your WeightedExperiment object will be superseded by the distribution corresponding to your OrthogonalBasisFactory inside of the FunctionalChaosAlgorithm. If you need to have the input sample before to run the functional chaos algorithm, then you have to build your transformation by hand. Assuming that you already defined your projection basis called 'myBasis', your marginal integration degrees 'myDegrees' and your marginal distributions 'myMarginals', you have to write (in OT 1.7):
>>> 
>>> # Here the explicit cast into a NumericalMathFunction is to be able to evaluate the transformation over a sample
>>> myTransformation = ot.NumericalMathFunction(ot.MarginalTransformationEvaluation([myBasis.getDistribution().getMarginal(i) for i in range(dimension), myMarginals))
>>> sample = myTransformation(ot.GaussProductExperiment(myBasis.getDistribution(), myDegrees).generate())
>>> 
>>> 
>>> You should avoid to cast OT objects into np objects as much as possible, and if you cannot avoid these casts you should do them only in the sections where they are needed. They can be expansive for large objects, and if the sample you get from generate() is used only as an argument of a NumericalMathFunction, then it will be converted back into a NumericalSample!
>>> 
>>> Best regards
>>> 
>>> Régis
>>> ________________________________
>>> De : roy <roy at cerfacs.fr>
>>> À : users <users at openturns.org> 
>>> Envoyé le : Jeudi 5 octobre 2017 11h13
>>> Objet : [ot-users] Sample transformation
>>> 
>>> 
>>> 
>>> Hi,
>>> 
>>> I am facing consistency concerns in the API regarding distributions and sampling.
>>> 
>>> The initial goal was to get the sampling for Polynomial Chaos as I must not use the model variable.
>>> So for least square strategy I do something like this:
>>> 
>>> proj_strategy = ot.LeastSquaresStrategy(montecarlo_design)
>>> sample = np.array(proj_strategy.getExperiment().generate())
>>> 
>>> sample is correct as the bounds of each feature lie in the corresponding ranges.
>>> 
>>> But now if I want to use IntegrationStrategy:
>>> 
>>> ot.IntegrationStrategy(ot.GaussProductExperiment(dists, list))
>>> sample = np.array(proj_strategy.getExperiment().generate())
>>> 
>>> sample’s outputs lie between [-1, 1] which does not corresponds to the distribution I have initially.
>>> 
>>> So I used the conversion class but it does not work well with GaussProductExperiment as it requires [0, 1] instead of [-1, 1].
>>> 
>>> Thus I use this hack:
>>> 
>>> # Convert from [-1, 1] -> input distributions
>>> marg_inv_transf = ot.MarginalTransformationEvaluation(distributions, 1)
>>> sample = (proj_strategy.getExperiment().generate() + 1) / 2.
>>> 
>>> 
>>> Is it normal that the distribution classes are not returning in the same intervals?
>>> 
>>> 
>>> Thanks for your support!
>>> 
>>> 
>>> Pamphile ROY
>>> Chercheur doctorant en Quantification d’Incertitudes
>>> CERFACS - Toulouse (31) - France
>>> +33 (0) 5 61 19 31 57
>>> +33 (0) 7 86 43 24 22
>>> 
>>> 
>>> _______________________________________________
>>> OpenTURNS users mailing list
>>> users at openturns.org
>>> http://openturns.org/mailman/listinfo/users
>>> <example.py><example_standard.py>
> 
> _______________________________________________
> OpenTURNS users mailing list
> users at openturns.org
> http://openturns.org/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openturns.org/pipermail/users/attachments/20171011/cd60b591/attachment.html>


More information about the users mailing list