cuda - How do I pass a shared pointer to a cublas function? -
i'm trying run cublas function within kernel in following way:
__device__ void dolinear(const float *w,const float *input, unsigned i, float *out, unsigned o) { unsigned idx = blockidx.x*blockdim.x+threadidx.x; const float alpha = 1.0f; const float beta = 0.0f; if(idx == 0) { cublashandle_t cnphandle; cublasstatus_t status = cublascreate(&cnphandle); cublassgemv(cnphandle, cublas_op_n, o, i, &alpha, w, 1, input, 1, &beta, out, 1); } __syncthreads(); }
this function works if input
pointer allocated using cudamalloc.
my issue is, if input
pointer points shared memory, contains data generated within kernel, error: cuda_exception_14 - warp illegal address
.
is not possible pass pointers shared memory cublas function being called kernel?
what correct way allocate memory here? (at moment i'm doing cudamalloc , using 'shared' memory, it's making me feel bit dirty)
you can't pass shared memory cublas device api routine because violates cuda dynamic parallelism memory model on device side cublas based. best can use malloc()
or new
allocate thread local memory on runtime heap cublas routine use, or portion of a priori allocated buffer allocated 1 of host side apis (as presently doing).
Comments
Post a Comment