Following the guide I have written a function allowing to pow the NumericVector via RcppParallel package. The code is as follows:
#include <RcppArmadillo.h>
#include <RcppParallel.h>
using namespace Rcpp;
using namespace RcppArmadillo;
using namespace RcppParallel;
double sqr(double x)
{
return(x * x);
}
// Parallel pow of vectors
struct ParallelVectorPowStruct : public Worker
{
// source matrix
const RVector<double> input;
// powers matrix
const RVector<double> input_powers;
// destination matrix
RVector<double> output;
// type of power (0 - general, 1 - square, 2 - square root)
int pow_type;
// initialize with source and destination
ParallelVectorPowStruct(const NumericVector input, NumericVector output, NumericVector input_powers, int pow_type)
: input(input), output(output), input_powers(input_powers), pow_type(pow_type) {}
// take the powers of the range of elements requested
void operator()(std::size_t begin, std::size_t end) {
if (pow_type == 0)
{
// perform pow operation
std::transform(input.begin() + begin,
input.begin() + end,
input_powers.begin(),
output.begin() + begin,
::pow);
}
if (pow_type == 1)
{
std::transform(input.begin() + begin,
input.begin() + end,
output.begin() + begin,
sqr);
}
if (pow_type == 2)
{
// perform pow operation
std::transform(input.begin() + begin,
input.begin() + end,
output.begin() + begin,
::sqrt);
}
}
};
// [[Rcpp::export]]
NumericVector ParallelVectorPow(NumericVector x, double value) {
int pow_type = 0;
if (value == 0.5)
{
pow_type = 2;
}
if (value == 2)
{
pow_type = 1;
}
// allocate the output matrix
NumericVector output(x.size());
// allocate the powers matrix
NumericVector input_powers(x.size());
std::fill(input_powers.begin(), input_powers.end(), value);
// ParallelVectorPowStruct functor
ParallelVectorPowStruct parallelVectorPowStruct(x, output, input_powers, pow_type);
// call parallelFor to do the work
parallelFor(0, x.length(), parallelVectorPowStruct);
// return the output matrix
return output;
}
Note that I have added two additional distinct calculation options (when pow_type = 1 and when pow_type = 2) for the power of 0.5 (square root) and for the power of 2 (square). My motivation for making this stuff has been as follows.
First I have analyzed the code performance without introducing these two additional types of powers and have found that parallelized version works much faster (usually about 8 times on my 8-core 3800-mhz CPU)
For example the following code:
m <- runif(1000000, 1, 100)
library(rbenchmark)
res <- benchmark(m ^ 3,
ParallelVectorPow(m, 3),
order = "relative")
res[,1:4]
yields:
test replications elapsed relative
2 ParallelVectorPow(m, 3) 100 0.45 1.000
1 m^3 100 4.22 9.378
However the things are different when it comes to power of 0.5 and 2. Parallel version of code used to work much slower then not parallelized for these powers. I have solved the problem for the power of 0.5 using ::sqrt function following the approach mentioned in guide referenced above (see the code for pow_type == 2). However I can't figure out how to improve the code for the power of 2.
So consider the following code:
m <- runif(1000000, 1, 100)
library(rbenchmark)
res <- benchmark(m ^ 2,
ParallelVectorPow(m, 2),
order = "relative")
res[,1:4]
It yields:
test replications elapsed relative
1 m^2 100 0.14 1.000
2 ParallelVectorPow(m, 2) 100 0.36 2.571
Is it possible to speed it up via RcppParallel framework?
Will be very greatfull for help!
P.S.
-
I think that m ^ 2 and m ^ 0.5 make some special calls to some fast c-language functions while m ^ other_power makes some general alternative routine.
-
I also have tested these implementation against various RcppArmadillo based square approaches and have found that they are slower then both listed above.
Aucun commentaire:
Enregistrer un commentaire