[ot-users] Sample transformation

roy roy at cerfacs.fr
Tue Nov 28 23:01:50 CET 2017


Hi Regis,

On the 1.10, I get this error calling the FunctionalChaosAlgorithm:

  File "/Users/roy/Applications/miniconda3/envs/batman3/lib/python3.6/site-packages/openturns/metamodel.py", line 3849, in __init__
    this = _metamodel.new_FunctionalChaosAlgorithm(*args)
TypeError: InvalidArgumentException : Error: the GaussProductExperiment can only be used with distributions having an independent copula.

But this was working on 1.9. I do not understand the issue as the distribution is an ot.ComposedDistribution. I tried to explicitly add ot.IndependentCopula
without any change.

Thanks in advance,


Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



> Le 21 oct. 2017 à 21:22, roy <roy at cerfacs.fr> a écrit :
> 
> Hi Regis,
> 
> For information, I have trouble pickling the class ot.FixedStrategy.
> From the example script : 
> 
> adaptiveStrategy = ot.FixedStrategy(basis, enumerateFunction.getStrataCumulatedCardinal(deg))
> 
> If I try to pickle this :
> 
> import pickle
> path = './model.dat'
> with open(path, 'wb') as f:
>     pickler = pickle.Pickler(f)
>     pickler.dump(adaptiveStrategy)
> 
> This works but then the deserialization does not :
> 
> with open(path, 'rb') as f:
>     unpickler = pickle.Unpickler(f)
>     adaptiveStrategy = unpickler.load()
> 
> I get this error:
> 
> Traceback (most recent call last):
>   File "example.py", line 58, in <module>
>     adaptiveStrategy = unpickler.load()
>   File "/Users/roy/Applications/miniconda3/envs/batman3/lib/python3.6/site-packages/openturns/common.py", line 344, in Object___setstate__
>     self.__init__()
>   File "/Users/roy/Applications/miniconda3/envs/batman3/lib/python3.6/site-packages/openturns/metamodel.py", line 1908, in __init__
>     this = _metamodel.new_FixedStrategy(*args)
> NotImplementedError: Wrong number or type of arguments for overloaded function 'new_FixedStrategy'.
>   Possible C/C++ prototypes are:
>     OT::FixedStrategy::FixedStrategy(OT::OrthogonalBasis const &,OT::UnsignedInteger const)
>     OT::FixedStrategy::FixedStrategy(OT::FixedStrategy const &)
> 
> Thanks in advance.
> 
> 
> Pamphile ROY
> Chercheur doctorant en Quantification d’Incertitudes
> CERFACS - Toulouse (31) - France
> +33 (0) 5 61 19 31 57
> +33 (0) 7 86 43 24 22
> 
> 
> 
>> Le 12 oct. 2017 à 16:16, roy <roy at cerfacs.fr <mailto:roy at cerfacs.fr>> a écrit :
>> 
>> Hi Regis,
>> 
>> This is great thanks. It is now working as expected.
>> Maybe this can be clarified in the documentation.
>> 
>> 
>> Pamphile ROY
>> Chercheur doctorant en Quantification d’Incertitudes
>> CERFACS - Toulouse (31) - France
>> +33 (0) 5 61 19 31 57
>> +33 (0) 7 86 43 24 22
>> 
>> 
>> 
>>> Le 12 oct. 2017 à 00:11, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> a écrit :
>>> 
>>> I found it. In order to speed up the computation of the coefficients of the polynomial expansion, we developed a class named DesignProxy, which acts like a cache for the evaluation of the multivariate basis oven the input sample. Essentially, it contains a large matrix, with a default size given by ResourceMap.GetAsUnsignedInteger("DesignProxy-DefaultCacheSize") and equals to 16777216, it means 128Mo.
>>> 
>>> 
>>> So if you add ot.ResourceMap.SetAsUnsignedInteger("DesignProxy-DefaultCacheSize", smallSize) with smallSize adapted to your memory budget (eg. smallSize=0), then everything should be ok.
>>> 
>>> You can also run the algorithm on the whole output sample. The DesignProxy instance is built once and shared among the different marginals. You can see that the memory cost of the algorithm is essentially the same for an output sample of dimension 1 or 14. Concerning the computation time, a part of the computation is shared between the marginals so the total cost is not proportional to the output dimension, even if no parallelization is implemented here (but the linear algebra is already parallelized using threads).
>>> 
>>> Tell me if it solved your problem!
>>> 
>>> Régis
>>> 
>>> ________________________________
>>> De : roy <roy at cerfacs.fr <mailto:roy at cerfacs.fr>>
>>> À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> 
>>> Envoyé le : Mercredi 11 octobre 2017 10h40
>>> Objet : Re: [ot-users] Sample transformation
>>> 
>>> 
>>> 
>>> I was able to make an extract.
>>> 
>>> I am fitting a case with functional output. So to parallelize the fitting I use a function that independently construct a model per feature.
>>> The memory consumption is coming from every call to run() with a bump of ~130 Mo each time. Maybe OT can handle itself the parallelization?
>>> I saw that it was working without needing the loop, so maybe I should do that instead.
>>> 
>>> But still, 130 Mo for a model is quite a lot.
>>> 
>>> 
>>> Cheers,
>>> 
>>> Pamphile ROY
>>> Chercheur doctorant en Quantification d’Incertitudes
>>> CERFACS - Toulouse (31) - France
>>> +33 (0) 5 61 19 31 57
>>> +33 (0) 7 86 43 24 22
>>> 
>>> 
>>> 
>>> Le 11 oct. 2017 à 08:44, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> a écrit :
>>>> 
>>>> ouch! 3-4 Go is crazy! Do you have any script to share in order to help us catching the bug? I use the FunctionalChaosAlgorithm class more than often and I never faced this kind of behavior. If there is a bug it should be a good thing to catch it asap: we enter the 1.10 release candidate phase, a good slot to fix this kind of bugs.
>>>> 
>>>> Cheers
>>>> 
>>>> Régis
>>>> 
>>>> 
>>>> 
>>>> ________________________________
>>>> De : roy <roy at cerfacs.fr <mailto:roy at cerfacs.fr>>
>>>> À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> 
>>>> Cc : users <users at openturns.org <mailto:users at openturns.org>>
>>>> Envoyé le : Mercredi 11 octobre 2017 0h21
>>>> Objet : Re: [ot-users] Sample transformation
>>>> 
>>>> 
>>>> 
>>>> Hi Régis,
>>>> 
>>>> Not sure about the leak as I only do python.
>>>> But using the tool I know, I was not able to free the memory(using some del and gc.collect()).
>>>> 
>>>> I saw the issue when constructing a model on a cluster (Quadrature with 121 points, degree 10 in 2d) and the batch manager killed the job
>>>> due to memory consumption. On my Mac the memory goes to 3-4 Go for this but on the cluster it explodes.
>>>> 
>>>> As always, thanks for the quick reply :)
>>>> 
>>>> 
>>>> Pamphile ROY
>>>> Chercheur doctorant en Quantification d’Incertitudes
>>>> CERFACS - Toulouse (31) - France
>>>> +33 (0) 5 61 19 31 57
>>>> +33 (0) 7 86 43 24 22
>>>> 
>>>> 
>>>> 
>>>> Le 10 oct. 2017 à 23:13, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> a écrit :
>>>> 
>>>> 
>>>>> Hi Pamphil,
>>>>> 
>>>>> Nice to know that the code *seems* to work ;-)
>>>>> 
>>>>> Are you sure that there is a memory leak? The algorithm creates potentially large objects, which are stored into the FunctionalChaosResult member of the algorithm. If there is a pending reference to this object, the memory will not be released. Maybe Denis, Julien or Sofiane have more insight on this point?
>>>>> 
>>>>> Cheers
>>>>> 
>>>>> Régis
>>>>> 
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> De : roy <roy at cerfacs.fr <mailto:roy at cerfacs.fr>>
>>>>> À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> 
>>>>> Cc : users <users at openturns.org <mailto:users at openturns.org>>
>>>>> Envoyé le : Mardi 10 octobre 2017 6h35
>>>>> Objet : Re: [ot-users] Sample transformation
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi Regis,
>>>>> 
>>>>> Thanks for this long and well detailed answer!
>>>>> The code you provided seems to work as expected.
>>>>> 
>>>>> However during my tests I noticed that the memory was not freed correctly.
>>>>> Once the class FunctionalChaosAlgorithm is called, there is a memory bump and even after calling del
>>>>> and gc.collect(), memory is still not freed (using memory_profiler for that). Might be a memory leak?
>>>>> 
>>>>> Kind regards,
>>>>> 
>>>>> Pamphile ROY
>>>>> Chercheur doctorant en Quantification d’Incertitudes
>>>>> CERFACS - Toulouse (31) - France
>>>>> +33 (0) 5 61 19 31 57
>>>>> +33 (0) 7 86 43 24 22
>>>>> 
>>>>> 
>>>>> 
>>>>> Le 7 oct. 2017 à 19:59, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> a écrit :
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi Pamphil,
>>>>>> 
>>>>>> You were almost right: the AdaptiveStieltjesAlgorithm is very close to what you are looking for, but not exactly what you need. It is the algorithmic part of the factory of orthonormal polynomials, the class you have to use is StandardDistributionPolynomialFactory, ie a factory (=able to build something) and not an algorithm (=something able to compute something). You have all the details here:
>>>>>> 
>>>>>> http://openturns.github.io/openturns/master/user_manual/_generated/openturns.StandardDistributionPolynomialFactory.html <http://openturns.github.io/openturns/master/user_manual/_generated/openturns.StandardDistributionPolynomialFactory.html>
>>>>>> 
>>>>>> I agree on the fact that the difference is quite subtle, as it can be seen by comparing the API of the two classes. The distinction was made at a time were several algorithms were competing for the task (GramSchmidtAlgorithm, ChebychevAlgorithm) but in fact the AdaptiveStieltjesAlgorithm proved to be much more accurate and reliable than the other algorithms, and now it is the only orthonormalization algorithm available.
>>>>>> 
>>>>>> Another subtle trick is the following.
>>>>>> 
>>>>>> If you create a basis this way:
>>>>>> basis = ot.StandardDistributionPolynomialFactory(dist)
>>>>>> you will get the basis associated to the *standard representative* distribution in the parametric family to which dist belongs. It means the distribution with zero mean and unit variance, or with support equals to [-1, 1], or dist itself if no affine transformation is able to reduce the number of parameters of the distribution. 
>>>>>> It is managed automatically within the FunctionalChaosAlgorithm, but can be disturbing if you do things by hand.
>>>>>> 
>>>>>> If you create a basis this way:
>>>>>> basis = ot.StandardDistributionPolynomialFactory(ot.AdaptiveStieltjesAlgorithm(dist))
>>>>>> then the distribution is preserved, and you get the orthonormal polynomials corresponding to dist. Be aware of the fact that the algorithm may have hard time to build the polynomials if your distribution is far away from its standard representative, as it may involves the computation of recurrence coefficients with a much wider range of variation. The benefit is that the orthonormality measure is exactly your distribution, assuming that its copula is the independent one, so you don't have to introduce a marginal transformation between both measures.
>>>>>> 
>>>>>> Some additional remarks:
>>>>>> + it looks like dist has dimension>1, as you extract its marginal distributions later on. AdaptiveStieltjesAlgorithm and StandardDistributionPolynomialFactory only work with 1D distributions (it is not checked by the library, my shame). What you have to do is:
>>>>>> 
>>>>>> basis = ot.OrthogonalProductPolynomialFactory([ot.StandardDistributionPolynomialFactory(ot.AdaptiveStieltjesAlgorithm(dist.getMarginal(i))) for i in range(dist.getDimension())])
>>>>>> Quite a long line, I know...
>>>>>> It will build a multivariate polynomial basis orthonormal with respect to the product distribution (ie with independent copula) sharing the same 1D marginal distributions as dist.
>>>>>> 
>>>>>> 
>>>>>> After that, everything will work as expected and you will NOT have to build the transformation (if you build it it will coincide with the identity function). If you encounter performance issues (the polynomials of high degrees take ages to be built as in http://trac.openturns.org/ticket/885 <http://trac.openturns.org/ticket/885>, or there is an overflow, or the numerical precision is bad) then use:
>>>>>> basis = ot.OrthogonalProductPolynomialFactory([ot.StandardDistributionPolynomialFactory(dist.getMarginal(i)) for i in range(dist.getDimension())])
>>>>>> and build the transformation the way you do it.
>>>>>> 
>>>>>> + if you use the FunctionalChaosAlgorithm class by providing an input sample and an output sample, you also have to provide the weights of the input sample EVEN IF the experiment given in the projection strategy would allow to recompute them. It is because the fact that you provide the input sample overwrite the weighted experiment of the projection stratey by a FixedExperiment doe.
>>>>>> 
>>>>>> I attached two complete examples: one using the exact marginal distributions and the other using the standard representatives.
>>>>>> 
>>>>>> Best regards
>>>>>> 
>>>>>> Régis
>>>>>> 
>>>>>> ________________________________
>>>>>> De : roy <roy at cerfacs.fr <mailto:roy at cerfacs.fr>>
>>>>>> À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> 
>>>>>> Cc : users <users at openturns.org <mailto:users at openturns.org>>
>>>>>> Envoyé le : Vendredi 6 octobre 2017 14h22
>>>>>> Objet : Re: [ot-users] Sample transformation
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hi Regis,
>>>>>> 
>>>>>> Thank you for this detailed answer.
>>>>>> 
>>>>>> - I am using the latest release from conda (OT 1.9, python 3.6.2, latest numpy, etc.) ,
>>>>>> - For the sample, I need it to generate externally the output (cost code that cannot be integrated into OT as model),
>>>>>> - I have to convert ot.Sample into np.array because it is then used by other functions to create the simulations, etc.
>>>>>> 
>>>>>> If I understood correctly, I can create the projection strategy using this snippet:
>>>>>> 
>>>>>> basis = ot.AdaptiveStieltjesAlgorithm(dist)
>>>>>> measure = basis.getMeasure()
>>>>>> quad = ot.Indices(in_dim)
>>>>>> for i in range(in_dim):
>>>>>> quad[i] = degree + 1
>>>>>> 
>>>>>> comp_dist = ot.GaussProductExperiment(measure, quad)
>>>>>> proj_strategy = ot.IntegrationStrategy(comp_dist)
>>>>>> 
>>>>>> inv_trans = ot.Function(ot.MarginalTransformationEvaluation([measure.getMarginal(i) for i in range(in_dim)], distributions))
>>>>>> sample = np.array(inv_trans(comp_dist.generate()))
>>>>>> 
>>>>>> 
>>>>>> It seems to work. Except that the basis does not work with ot.FixedStrategy(basis, dim_basis). I get a non implemented method error.
>>>>>> 
>>>>>> After I get the sample and the corresponding output, what is the way to go? Which arguments do I need to use for the
>>>>>> ot.FunctionalChaosAlgorithm? 
>>>>>> 
>>>>>> I am comparing the Q2 and on Ishigami and I was only able to get correct results using:
>>>>>> 
>>>>>> pc_algo = ot.FunctionalChaosAlgorithm(sample, output, dist, trunc_strategy)
>>>>>> 
>>>>>> But for least square strategy I had to use this:
>>>>>> 
>>>>>> pc_algo = ot.FunctionalChaosAlgorithm(sample, output)
>>>>>> 
>>>>>> 
>>>>>> Is it normal?
>>>>>> 
>>>>>> 
>>>>>> Pamphile ROY
>>>>>> Chercheur doctorant en Quantification d’Incertitudes
>>>>>> CERFACS - Toulouse (31) - France
>>>>>> +33 (0) 5 61 19 31 57
>>>>>> +33 (0) 7 86 43 24 22
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Le 5 oct. 2017 à 15:40, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr <mailto:regis_anne.lebrun_dutfoy at yahoo.fr>> a écrit :
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Hi Pamphile,
>>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> 1) The problem:
>>>>>>> The problem you get is due to the fact that in your version of OpenTURNS (1.7 I suppose), the GaussProductExperiment class has a different way to handle the input distribution than the other WeightedExperiment classes: it generates the quadrature rule of the *standard representatives* of the marginal distributions instead of the marginal distributions. It does not change the rate of convergence of the PCE algorithm and allows to use specific algorithms for distributions with known orthonormal polynomials. It is not explained in the documentation and if you ask the doe for its distribution it will give you the initial distribution instead of the standardized one.
>>>>>>> 
>>>>>>> 2) The mathematical background:
>>>>>>> The generation of quadrature rules for arbitrary 1D distributions is a badly conditioned problem. Even if the quadrature rule is well-defined (existence of moments of any order, distribution characterized by these moments), the application that maps the recurrence coefficients of the orthogonal polynomials to their value can have a very large condition number. As a result, the adaptive integration used to compute the recurrence coefficients of order n, based on the values of the polynomials of degree n-1 and n-2, can lead to wrong values and all the process falls down.
>>>>>>> 
>>>>>>> 3) The current state of the software:
>>>>>>> Since version 1.8, OpenTURNS no more generates the quadrature rule of the standard representatives, but the quadrature rule of the actual marginal distributions. The AdaptiveStieltjesAlgorithm class, introduced in release 1.8, is much more robust than the previous orthonormalization algorithms and is able to handle even stiff problems. There are still difficult situations (distributions with discontinuous PDF inside of the range, fixed in OT 1.9, or really badly conditioned distributions, hopefully fixed when ticket#885 will be solved) but most usual situations are under control even with marginal degrees of order 20.
>>>>>>> 
>>>>>>> 4) The (probable) bug in your code and the way to solve it
>>>>>>> You must be aware of the fact that the distribution you put into your WeightedExperiment object will be superseded by the distribution corresponding to your OrthogonalBasisFactory inside of the FunctionalChaosAlgorithm. If you need to have the input sample before to run the functional chaos algorithm, then you have to build your transformation by hand. Assuming that you already defined your projection basis called 'myBasis', your marginal integration degrees 'myDegrees' and your marginal distributions 'myMarginals', you have to write (in OT 1.7):
>>>>>>> 
>>>>>>> # Here the explicit cast into a NumericalMathFunction is to be able to evaluate the transformation over a sample
>>>>>>> myTransformation = ot.NumericalMathFunction(ot.MarginalTransformationEvaluation([myBasis.getDistribution().getMarginal(i) for i in range(dimension), myMarginals))
>>>>>>> sample = myTransformation(ot.GaussProductExperiment(myBasis.getDistribution(), myDegrees).generate())
>>>>>>> 
>>>>>>> 
>>>>>>> You should avoid to cast OT objects into np objects as much as possible, and if you cannot avoid these casts you should do them only in the sections where they are needed. They can be expansive for large objects, and if the sample you get from generate() is used only as an argument of a NumericalMathFunction, then it will be converted back into a NumericalSample!
>>>>>>> 
>>>>>>> Best regards
>>>>>>> 
>>>>>>> Régis
>>>>>>> ________________________________
>>>>>>> De : roy <roy at cerfacs.fr <mailto:roy at cerfacs.fr>>
>>>>>>> À : users <users at openturns.org <mailto:users at openturns.org>> 
>>>>>>> Envoyé le : Jeudi 5 octobre 2017 11h13
>>>>>>> Objet : [ot-users] Sample transformation
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I am facing consistency concerns in the API regarding distributions and sampling.
>>>>>>> 
>>>>>>> The initial goal was to get the sampling for Polynomial Chaos as I must not use the model variable.
>>>>>>> So for least square strategy I do something like this:
>>>>>>> 
>>>>>>> proj_strategy = ot.LeastSquaresStrategy(montecarlo_design)
>>>>>>> sample = np.array(proj_strategy.getExperiment().generate())
>>>>>>> 
>>>>>>> sample is correct as the bounds of each feature lie in the corresponding ranges.
>>>>>>> 
>>>>>>> But now if I want to use IntegrationStrategy:
>>>>>>> 
>>>>>>> ot.IntegrationStrategy(ot.GaussProductExperiment(dists, list))
>>>>>>> sample = np.array(proj_strategy.getExperiment().generate())
>>>>>>> 
>>>>>>> sample’s outputs lie between [-1, 1] which does not corresponds to the distribution I have initially.
>>>>>>> 
>>>>>>> So I used the conversion class but it does not work well with GaussProductExperiment as it requires [0, 1] instead of [-1, 1].
>>>>>>> 
>>>>>>> Thus I use this hack:
>>>>>>> 
>>>>>>> # Convert from [-1, 1] -> input distributions
>>>>>>> marg_inv_transf = ot.MarginalTransformationEvaluation(distributions, 1)
>>>>>>> sample = (proj_strategy.getExperiment().generate() + 1) / 2.
>>>>>>> 
>>>>>>> 
>>>>>>> Is it normal that the distribution classes are not returning in the same intervals?
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks for your support!
>>>>>>> 
>>>>>>> 
>>>>>>> Pamphile ROY
>>>>>>> Chercheur doctorant en Quantification d’Incertitudes
>>>>>>> CERFACS - Toulouse (31) - France
>>>>>>> +33 (0) 5 61 19 31 57
>>>>>>> +33 (0) 7 86 43 24 22
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> OpenTURNS users mailing list
>>>>>>> users at openturns.org <mailto:users at openturns.org>
>>>>>>> http://openturns.org/mailman/listinfo/users
>>>>>>> <example.py><example_standard.py>
>>>>>>> 
>>>>>>> _______________________________________________
>>>>> OpenTURNS users mailing list
>>>>> users at openturns.org <mailto:users at openturns.org>
>>>>> http://openturns.org/mailman/listinfo/users <http://openturns.org/mailman/listinfo/users>
>>>>> 
>>>>> 
>>>> _______________________________________________
>>>> OpenTURNS users mailing list
>>>> users at openturns.org <mailto:users at openturns.org>
>>>> http://openturns.org/mailman/listinfo/users <http://openturns.org/mailman/listinfo/users>
>>>> 
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openturns.org/pipermail/users/attachments/20171128/6d73eb53/attachment.html>


More information about the users mailing list