c++ - Fastest (or most elegant) way of passing constant arguments to a CUDA kernel -


lets want cuda kernel needs lots of stuff, there dome parameters constant kernels. arguments passed main program input, can not defined in #define.

the kernel run multiple times (around 65k) , needs parameters (and other inputs) maths.

my question is: whats fastest (or else, elegant) way of passing these constants kernels?

the constants 2 or 3 element length float* or int* arrays. around 5~10 of these.


toy example: 2 constants const1 , const2

__global__ void kerneltoyexample(int inputdata, ?????){         value=inputdata*const1[0]+const2[1]/const1[2]; } 

is better

__global__ void kerneltoyexample(int inputdata, float* const1, float* const2){         value=inputdata*const1[0]+const2[1]/const1[2]; } 

or

__global__ void kerneltoyexample(int inputdata, float const1x, float const1y, float const1z, float const2x, float const2y){         value=inputdata*const1x+const2y/const1z; } 

or maybe declare them in global read memory , let kernels read there? if so, l1, l2, global? one?

is there better way don't know of?

running on tesla k40.

just pass them value. compiler automagically put them in optimal place facilitate cached broadcast threads in each block - either shared memory in compute capability 1.x devices, or constant memory/constant cache in compute capability >= 2.0 devices.

for example, if had long list of arguments pass kernel, struct passed value clean way go:

struct arglist {     float magicfloat_1;     float magicfloat_2;     //......     float magicfloat_19;     int magicint1;     //...... };  __global__ void kernel(...., const arglist args) {     // idea } 

[standard disclaimer: written in browser, not real code, caveat emptor]

if turned out 1 of magicint took 1 of small number of values know beforehand, templating extremely powerful tool:

template<int magiconstant1> __global__ void kernel(....) {     for(int i=0; < magconstant1; ++i) {        // .....     } }  template kernel<3>(....); template kernel<4>(....); template kernel<5>(....); 

the compiler smart enough recognise magconstant makes loop trip known @ compile time , automatically unroll loop you. templating very powerful technique building fast, flexible codebases , advised accustom if haven't done so.


Comments

Popular posts from this blog

Fail to load namespace Spring Security http://www.springframework.org/security/tags -

sql - MySQL query optimization using coalesce -

unity3d - Unity local avoidance in user created world -