通过与RcppEigensetRandom随机状态 - Pass random state to setRandom with RcppEigen

- 此内容更新于:2016-02-24
主题:

有办法通过随机状态特征的RcppEigen或我需要使用吗?这是一个例子:测试:注意第2列(即。runif)是可再生的,但第1列(即。setRandom)不是。

原文:

Is there a way to pass the random state to Eigen's setRandom with RcppEigen or do I need to use runif?

Here is an example:

// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
using namespace Rcpp;
using  Eigen::MatrixXd;
using  Eigen::VectorXd;

// [[Rcpp::export]]
NumericVector fx() {
  RNGScope   scope;
  MatrixXd   x(3,2);
  x=x.setRandom();
  x.col(1)=as<VectorXd>(runif(3,0,1));

return wrap(x);
}

Testing it:

set.seed(42); fx()
#           [,1]      [,2]
#[1,] -0.8105760 0.9148060
#[2,]  0.6498853 0.9370754
#[3,]  0.6221027 0.2861395

set.seed(42); fx()
#           [,1]      [,2]
#[1,] -0.9449154 0.9148060
#[2,]  0.8063267 0.9370754
#[3,] -0.0673205 0.2861395

Note how column 2 (i.e., runif) is reproducible, but column 1 (i.e., setRandom) is not.

解决方案:
RNG特征正交于R的随机数生成器。我们必须处理R和让你如你所知,你所做的与特征的RNG是分开的。特别是,您必须添加胶水注册特征的提高为用户提供RNGR.默认情况下,R是知道特征,这是讲得通的用户期望R的随机数生成器。你的问题似乎是一个香草编程特征的问题,你无法初始化特征提高。它与Rcpp无关。编辑:这是您的例子,纠正。原来特征使用系统提高,所以你需要的种子。它使用这个代码的修改版本:编辑2回应OP的评论:我不会期望的速度差异,将srand()调用由特征,所以你还是坚持将srand()一直有问题,并可能表现不同系统之间。快速演示脚本:生产和注意,RcppEigen慢甚至在一个简单的设置仅创建矢量。但我们在这里讨论的是微秒,可能不是我担心的东西无论如何在真实的应用程序中这将最有可能有其他瓶颈。
原文:

The RNG in Eigen is orthogonal to R's RNGs. We have RNGScope to deal with R and allow you to set.seed() as you know; what you do with Eigen's RNG is separate.

In particular, you would have to add the glue to register Eigen's RNG as a user-provided RNG for R. By default, R is unaware of Eigen, which makes sense as R users expect R's RNGs.

Your issue here seems to be a vanilla 'programming with Eigen' issue where you fail to initialize the Eigen RNG. It has nothing to do with Rcpp.

Edit: Here is your example, corrected. Turns out Eigen uses the system RNG, so you need srand() to seed it.

R> sourceCpp("/tmp/roland.cpp")
R> set.seed(42); fx(42)
          [,1]     [,2]
[1,] -0.933060 0.914806
[2,] -0.340072 0.937075
[3,]  0.381271 0.286140
R> set.seed(42); fx(42)
          [,1]     [,2]
[1,] -0.933060 0.914806
[2,] -0.340072 0.937075
[3,]  0.381271 0.286140
R>

It uses this modified version of your code:

// [[Rcpp::depends(RcppEigen)]]
#include <RcppEigen.h>
using namespace Rcpp;
using  Eigen::MatrixXd;
using  Eigen::VectorXd;

// [[Rcpp::export]]

  RNGScope   scope;
  MatrixXd   x(3,2);
  srand(seed);
  x=x.setRandom();
  x.col(1)=as<VectorXd>(runif(3,0,1));

  return wrap(x);
}

Edit 2 in response to OP's comments:

I would not expect much of a speed difference between Rcpp::runif() and the srand() calls made by Eigen, so you are still stuck with the fact that srand() has been having issues, and is possibly behaving differently between systems.

Quick demo script:

#include <RcppEigen.h> 

// [[Rcpp::depends(RcppEigen)]]

// [[Rcpp::export]]
Rcpp::NumericVector v1(int n) {
  return Rcpp::runif(n);
}

// [[Rcpp::export]]
Rcpp::NumericVector v2(int n) {
  Eigen::VectorXd   x(n);
  x = x.setRandom();
  return Rcpp::wrap(x);
}

/*** R
library(rbenchmark)
N <- 1e7
benchmark(v1(N), v2(N))
*/

which produces

R> sourceCpp("/tmp/roland.cpp")

R> library(rbenchmark)

R> N <- 1e7

R> benchmark(v1(N), v2(N))
   test replications elapsed relative user.self sys.self user.child sys.child
1 v1(N)          100  12.633    1.000    11.356    1.261          0         0
2 v2(N)          100  17.222    1.363    13.981    3.198          0         0
R> 

and note that RcppEigen is slower here even in a simpler setup of just creating vectors. But we are talking microseconds here and that is probably not something I'd worry about either way in a real application which will most likely have other bottlenecks.

楼主:是的,我也发现我可以使用,但这意味着我必须用一个参数的种子。我当然可以这样做,但是我必须处理与在R级,这似乎小于最优。

(原文:Yes, I also found I can use srand, but that means I have to use a parameter for the seed. I can certainly do that, but then I have to handle interfacing with set.seed at the R level, which seems less than optimal.)

网友:你看到我说什么是正交的了吗?这是两种不同的随机数生成器和其中一个不是很好。但简而言之,当你坚持使用两个不同的随机数生成器还需要种子两种不同的随机数生成器。没有免费的午餐,这一切。

(原文:Did you see what I said about being orthogonal ? These are two different RNGs and one of them is actually not that good. But in short, when you insist on using two different RNGs you also need to seed two different RNGs. No Free Lunch, and all that.)

楼主:我理解这一点。我还将基准选择是多快。如果太慢了,我可以让我通过一个整数。

(原文:I understand that. I'll also benchmark how fast the as<VectorXd>(runif(3,0,1)); alternative is. If it is too slow, I can use sample.int(2^31-1, 1) to get an integer that I pass to srand.)

网友:你还没有解释为什么你坚持。这是系统的,(至少在某些系统)不是很好。使用R的随机数生成器,直到你有一个很好的理由不去。你没有提供这样的原因。

(原文:You still have not explained why you insist on srand(). It is system-dependent, and (at least on some system) not very good. Use R's RNGs until you have a good reason not to. You have not provided such a reason.)

楼主:谢谢你的提示。要快得多(用于创建一个向量长度相同的)。我现在使用,但因为它是使用在一个循环中,我创建了n次随机数的数量我需要一个迭代和更新这个循环尾气预计算随机。问题是,n是很难优化。太小意味着调用runif太多,太大就意味着太多的内存需求,也会缓慢。最优取决于用户输入,所以我设置一个合理的默认,给用户的选项调整……

(原文:Thanks for the hints. setRandom is much faster than as<VectorXd>(runif()) (for creating a vector of the same length). I now use runif, but since it is used in a loop, I create n times the number of random numbers I need for one iteration and update this if the while loop exhausts the pre-calculated randoms. The problem is that n is hard to optimize. Too small means too many calls to runif, too large means too much memory demand and also gets slow. The optimum depends on user input, so I set a sensible default and give the user the option of adjusting it ...)