[ot-users] Sample transformation

regis lebrun regis_anne.lebrun_dutfoy at yahoo.fr
Wed Nov 29 08:07:05 CET 2017


 Hi Pamphile,
Could you please provide us a script that reproduce the error? It looks very strange, as this class is tested in more than 10 unit tests and is extensively used in industrial studies. BTW, ot.ComposedDistribution is by no means restricted to the independent copula as your comment could suggest.
Please note that it is not the physical space distribution which is checked by the GaussProductExperiment class, but the distribution defining the functional basis. You should use either OrthogonalProductPolynomialFactory or OrthogonalProductFunctionFactory to build your multivariate basis from 1D orthogonal bases to insure that the resulting multivariate distribution has an independent copula.
Please give me a feedback on this problem ASAP.
Cheer
Régis
    Le mardi 28 novembre 2017 à 23:01:53 UTC+1, roy <roy at cerfacs.fr> a écrit :  
 
 Hi Regis,
On the 1.10, I get this error calling the FunctionalChaosAlgorithm:
  File "/Users/roy/Applications/miniconda3/envs/batman3/lib/python3.6/site-packages/openturns/metamodel.py", line 3849, in __init__    this = _metamodel.new_FunctionalChaosAlgorithm(*args)TypeError: InvalidArgumentException : Error: the GaussProductExperiment can only be used with distributions having an independent copula.
But this was working on 1.9. I do not understand the issue as the distribution is an ot.ComposedDistribution. I tried to explicitly add ot.IndependentCopulawithout any change.
Thanks in advance,

Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



Le 21 oct. 2017 à 21:22, roy <roy at cerfacs.fr> a écrit :

Hi Regis,
For information, I have trouble pickling the class ot.FixedStrategy.From the example script : 
adaptiveStrategy = ot.FixedStrategy(basis, enumerateFunction.getStrataCumulatedCardinal(deg))
If I try to pickle this :
import picklepath = './model.dat'with open(path, 'wb') as f:    pickler = pickle.Pickler(f)    pickler.dump(adaptiveStrategy)
This works but then the deserialization does not :
with open(path, 'rb') as f:    unpickler = pickle.Unpickler(f)    adaptiveStrategy = unpickler.load()
I get this error:
Traceback (most recent call last):  File "example.py", line 58, in <module>    adaptiveStrategy = unpickler.load()  File "/Users/roy/Applications/miniconda3/envs/batman3/lib/python3.6/site-packages/openturns/common.py", line 344, in Object___setstate__    self.__init__()  File "/Users/roy/Applications/miniconda3/envs/batman3/lib/python3.6/site-packages/openturns/metamodel.py", line 1908, in __init__    this = _metamodel.new_FixedStrategy(*args)NotImplementedError: Wrong number or type of arguments for overloaded function 'new_FixedStrategy'.  Possible C/C++ prototypes are:    OT::FixedStrategy::FixedStrategy(OT::OrthogonalBasis const &,OT::UnsignedInteger const)    OT::FixedStrategy::FixedStrategy(OT::FixedStrategy const &)
Thanks in advance.

Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



Le 12 oct. 2017 à 16:16, roy <roy at cerfacs.fr> a écrit :

Hi Regis,
This is great thanks. It is now working as expected.Maybe this can be clarified in the documentation.

Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



Le 12 oct. 2017 à 00:11, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> a écrit :
I found it. In order to speed up the computation of the coefficients of the polynomial expansion, we developed a class named DesignProxy, which acts like a cache for the evaluation of the multivariate basis oven the input sample. Essentially, it contains a large matrix, with a default size given by ResourceMap.GetAsUnsignedInteger("DesignProxy-DefaultCacheSize") and equals to 16777216, it means 128Mo.


So if you add ot.ResourceMap.SetAsUnsignedInteger("DesignProxy-DefaultCacheSize", smallSize) with smallSize adapted to your memory budget (eg. smallSize=0), then everything should be ok.

You can also run the algorithm on the whole output sample. The DesignProxy instance is built once and shared among the different marginals. You can see that the memory cost of the algorithm is essentially the same for an output sample of dimension 1 or 14. Concerning the computation time, a part of the computation is shared between the marginals so the total cost is not proportional to the output dimension, even if no parallelization is implemented here (but the linear algebra is already parallelized using threads).

Tell me if it solved your problem!

Régis

________________________________
De : roy <roy at cerfacs.fr>
À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> 
Envoyé le : Mercredi 11 octobre 2017 10h40
Objet : Re: [ot-users] Sample transformation



I was able to make an extract.

I am fitting a case with functional output. So to parallelize the fitting I use a function that independently construct a model per feature.
The memory consumption is coming from every call to run() with a bump of ~130 Mo each time. Maybe OT can handle itself the parallelization?
I saw that it was working without needing the loop, so maybe I should do that instead.

But still, 130 Mo for a model is quite a lot.


Cheers,

Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



Le 11 oct. 2017 à 08:44, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> a écrit :


ouch! 3-4 Go is crazy! Do you have any script to share in order to help us catching the bug? I use the FunctionalChaosAlgorithm class more than often and I never faced this kind of behavior. If there is a bug it should be a good thing to catch it asap: we enter the 1.10 release candidate phase, a good slot to fix this kind of bugs.

Cheers

Régis



________________________________
De : roy <roy at cerfacs.fr>
À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> 
Cc : users <users at openturns.org>
Envoyé le : Mercredi 11 octobre 2017 0h21
Objet : Re: [ot-users] Sample transformation



Hi Régis,

Not sure about the leak as I only do python.
But using the tool I know, I was not able to free the memory(using some del and gc.collect()).

I saw the issue when constructing a model on a cluster (Quadrature with 121 points, degree 10 in 2d) and the batch manager killed the job
due to memory consumption. On my Mac the memory goes to 3-4 Go for this but on the cluster it explodes.

As always, thanks for the quick reply :)


Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



Le 10 oct. 2017 à 23:13, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> a écrit :



Hi Pamphil,

Nice to know that the code *seems* to work ;-)

Are you sure that there is a memory leak? The algorithm creates potentially large objects, which are stored into the FunctionalChaosResult member of the algorithm. If there is a pending reference to this object, the memory will not be released. Maybe Denis, Julien or Sofiane have more insight on this point?

Cheers

Régis



________________________________
De : roy <roy at cerfacs.fr>
À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> 
Cc : users <users at openturns.org>
Envoyé le : Mardi 10 octobre 2017 6h35
Objet : Re: [ot-users] Sample transformation



Hi Regis,

Thanks for this long and well detailed answer!
The code you provided seems to work as expected.

However during my tests I noticed that the memory was not freed correctly.
Once the class FunctionalChaosAlgorithm is called, there is a memory bump and even after calling del
and gc.collect(), memory is still not freed (using memory_profiler for that). Might be a memory leak?

Kind regards,

Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



Le 7 oct. 2017 à 19:59, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> a écrit :



Hi Pamphil,


You were almost right: the AdaptiveStieltjesAlgorithm is very close to what you are looking for, but not exactly what you need. It is the algorithmic part of the factory of orthonormal polynomials, the class you have to use is StandardDistributionPolynomialFactory, ie a factory (=able to build something) and not an algorithm (=something able to compute something). You have all the details here:

http://openturns.github.io/openturns/master/user_manual/_generated/openturns.StandardDistributionPolynomialFactory.html

I agree on the fact that the difference is quite subtle, as it can be seen by comparing the API of the two classes. The distinction was made at a time were several algorithms were competing for the task (GramSchmidtAlgorithm, ChebychevAlgorithm) but in fact the AdaptiveStieltjesAlgorithm proved to be much more accurate and reliable than the other algorithms, and now it is the only orthonormalization algorithm available.

Another subtle trick is the following.

If you create a basis this way:
basis = ot.StandardDistributionPolynomialFactory(dist)
you will get the basis associated to the *standard representative* distribution in the parametric family to which dist belongs. It means the distribution with zero mean and unit variance, or with support equals to [-1, 1], or dist itself if no affine transformation is able to reduce the number of parameters of the distribution. 
It is managed automatically within the FunctionalChaosAlgorithm, but can be disturbing if you do things by hand.

If you create a basis this way:
basis = ot.StandardDistributionPolynomialFactory(ot.AdaptiveStieltjesAlgorithm(dist))
then the distribution is preserved, and you get the orthonormal polynomials corresponding to dist. Be aware of the fact that the algorithm may have hard time to build the polynomials if your distribution is far away from its standard representative, as it may involves the computation of recurrence coefficients with a much wider range of variation. The benefit is that the orthonormality measure is exactly your distribution, assuming that its copula is the independent one, so you don't have to introduce a marginal transformation between both measures.

Some additional remarks:
+ it looks like dist has dimension>1, as you extract its marginal distributions later on. AdaptiveStieltjesAlgorithm and StandardDistributionPolynomialFactory only work with 1D distributions (it is not checked by the library, my shame). What you have to do is:

basis = ot.OrthogonalProductPolynomialFactory([ot.StandardDistributionPolynomialFactory(ot.AdaptiveStieltjesAlgorithm(dist.getMarginal(i))) for i in range(dist.getDimension())])
Quite a long line, I know...
It will build a multivariate polynomial basis orthonormal with respect to the product distribution (ie with independent copula) sharing the same 1D marginal distributions as dist.


After that, everything will work as expected and you will NOT have to build the transformation (if you build it it will coincide with the identity function). If you encounter performance issues (the polynomials of high degrees take ages to be built as in http://trac.openturns.org/ticket/885, or there is an overflow, or the numerical precision is bad) then use:
basis = ot.OrthogonalProductPolynomialFactory([ot.StandardDistributionPolynomialFactory(dist.getMarginal(i)) for i in range(dist.getDimension())])
and build the transformation the way you do it.

+ if you use the FunctionalChaosAlgorithm class by providing an input sample and an output sample, you also have to provide the weights of the input sample EVEN IF the experiment given in the projection strategy would allow to recompute them. It is because the fact that you provide the input sample overwrite the weighted experiment of the projection stratey by a FixedExperiment doe.

I attached two complete examples: one using the exact marginal distributions and the other using the standard representatives.

Best regards

Régis

________________________________
De : roy <roy at cerfacs.fr>
À : regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> 
Cc : users <users at openturns.org>
Envoyé le : Vendredi 6 octobre 2017 14h22
Objet : Re: [ot-users] Sample transformation



Hi Regis,

Thank you for this detailed answer.

- I am using the latest release from conda (OT 1.9, python 3.6.2, latest numpy, etc.) ,
- For the sample, I need it to generate externally the output (cost code that cannot be integrated into OT as model),
- I have to convert ot.Sample into np.array because it is then used by other functions to create the simulations, etc.

If I understood correctly, I can create the projection strategy using this snippet:

basis = ot.AdaptiveStieltjesAlgorithm(dist)
measure = basis.getMeasure()
quad = ot.Indices(in_dim)
for i in range(in_dim):
quad[i] = degree + 1

comp_dist = ot.GaussProductExperiment(measure, quad)
proj_strategy = ot.IntegrationStrategy(comp_dist)

inv_trans = ot.Function(ot.MarginalTransformationEvaluation([measure.getMarginal(i) for i in range(in_dim)], distributions))
sample = np.array(inv_trans(comp_dist.generate()))


It seems to work. Except that the basis does not work with ot.FixedStrategy(basis, dim_basis). I get a non implemented method error.

After I get the sample and the corresponding output, what is the way to go? Which arguments do I need to use for the
ot.FunctionalChaosAlgorithm? 

I am comparing the Q2 and on Ishigami and I was only able to get correct results using:

pc_algo = ot.FunctionalChaosAlgorithm(sample, output, dist, trunc_strategy)

But for least square strategy I had to use this:

pc_algo = ot.FunctionalChaosAlgorithm(sample, output)


Is it normal?


Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22



Le 5 oct. 2017 à 15:40, regis lebrun <regis_anne.lebrun_dutfoy at yahoo.fr> a écrit :



Hi Pamphile,




1) The problem:
The problem you get is due to the fact that in your version of OpenTURNS (1.7 I suppose), the GaussProductExperiment class has a different way to handle the input distribution than the other WeightedExperiment classes: it generates the quadrature rule of the *standard representatives* of the marginal distributions instead of the marginal distributions. It does not change the rate of convergence of the PCE algorithm and allows to use specific algorithms for distributions with known orthonormal polynomials. It is not explained in the documentation and if you ask the doe for its distribution it will give you the initial distribution instead of the standardized one.

2) The mathematical background:
The generation of quadrature rules for arbitrary 1D distributions is a badly conditioned problem. Even if the quadrature rule is well-defined (existence of moments of any order, distribution characterized by these moments), the application that maps the recurrence coefficients of the orthogonal polynomials to their value can have a very large condition number. As a result, the adaptive integration used to compute the recurrence coefficients of order n, based on the values of the polynomials of degree n-1 and n-2, can lead to wrong values and all the process falls down.

3) The current state of the software:
Since version 1.8, OpenTURNS no more generates the quadrature rule of the standard representatives, but the quadrature rule of the actual marginal distributions. The AdaptiveStieltjesAlgorithm class, introduced in release 1.8, is much more robust than the previous orthonormalization algorithms and is able to handle even stiff problems. There are still difficult situations (distributions with discontinuous PDF inside of the range, fixed in OT 1.9, or really badly conditioned distributions, hopefully fixed when ticket#885 will be solved) but most usual situations are under control even with marginal degrees of order 20.

4) The (probable) bug in your code and the way to solve it
You must be aware of the fact that the distribution you put into your WeightedExperiment object will be superseded by the distribution corresponding to your OrthogonalBasisFactory inside of the FunctionalChaosAlgorithm. If you need to have the input sample before to run the functional chaos algorithm, then you have to build your transformation by hand. Assuming that you already defined your projection basis called 'myBasis', your marginal integration degrees 'myDegrees' and your marginal distributions 'myMarginals', you have to write (in OT 1.7):

# Here the explicit cast into a NumericalMathFunction is to be able to evaluate the transformation over a sample
myTransformation = ot.NumericalMathFunction(ot.MarginalTransformationEvaluation([myBasis.getDistribution().getMarginal(i) for i in range(dimension), myMarginals))
sample = myTransformation(ot.GaussProductExperiment(myBasis.getDistribution(), myDegrees).generate())


You should avoid to cast OT objects into np objects as much as possible, and if you cannot avoid these casts you should do them only in the sections where they are needed. They can be expansive for large objects, and if the sample you get from generate() is used only as an argument of a NumericalMathFunction, then it will be converted back into a NumericalSample!

Best regards

Régis
________________________________
De : roy <roy at cerfacs.fr>
À : users <users at openturns.org> 
Envoyé le : Jeudi 5 octobre 2017 11h13
Objet : [ot-users] Sample transformation



Hi,

I am facing consistency concerns in the API regarding distributions and sampling.

The initial goal was to get the sampling for Polynomial Chaos as I must not use the model variable.
So for least square strategy I do something like this:

proj_strategy = ot.LeastSquaresStrategy(montecarlo_design)
sample = np.array(proj_strategy.getExperiment().generate())

sample is correct as the bounds of each feature lie in the corresponding ranges.

But now if I want to use IntegrationStrategy:

ot.IntegrationStrategy(ot.GaussProductExperiment(dists, list))
sample = np.array(proj_strategy.getExperiment().generate())

sample’s outputs lie between [-1, 1] which does not corresponds to the distribution I have initially.

So I used the conversion class but it does not work well with GaussProductExperiment as it requires [0, 1] instead of [-1, 1].

Thus I use this hack:

# Convert from [-1, 1] -> input distributions
marg_inv_transf = ot.MarginalTransformationEvaluation(distributions, 1)
sample = (proj_strategy.getExperiment().generate() + 1) / 2.


Is it normal that the distribution classes are not returning in the same intervals?


Thanks for your support!


Pamphile ROY
Chercheur doctorant en Quantification d’Incertitudes
CERFACS - Toulouse (31) - France
+33 (0) 5 61 19 31 57
+33 (0) 7 86 43 24 22


_______________________________________________
OpenTURNS users mailing list
users at openturns.org
http://openturns.org/mailman/listinfo/users
<example.py><example_standard.py>

_______________________________________________


OpenTURNS users mailing list
users at openturns.org
http://openturns.org/mailman/listinfo/users



_______________________________________________
OpenTURNS users mailing list
users at openturns.org
http://openturns.org/mailman/listinfo/users






  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://openturns.org/pipermail/users/attachments/20171129/a4ce854e/attachment.html>


More information about the users mailing list