@Ahmed_Salih wrote:
Hello!
My question title is properly worded poorly, but here goes. I have made a “subfunction”, called
gpu_ParticlePackStep!
which calculates properties for one particle. If I call this function as such in Julia:gpu_ParticlePackStep!(pg,pg_tmp,u,u_tmp,idxs,3)
Then it works, since something happens in the last row i.e.:
My problem is now if I want to use multiple threads, I make a code as:
function gpu_PackStep!(pg,pg_tmp,u,u_tmp,idxs) index = threadIdx().x stride = blockDim().x for iter = index:stride:length(pg) @inbounds gpu_ParticlePackStep!(pg,pg_tmp,u,u_tmp,idxs[iter],iter) end return nothing end
I call it using;
@cuda threads=3 gpu_PackStep!(pg,pg_tmp,u,u_tmp,idxs)
But it produces an error as:
ERROR: GPU compilation of gpu_PackStep!(CuDeviceArray{Tuple{Float32,Float32,Float32},1,CUDAnative.AS.Global}, CuDeviceArray{Tuple{Float32,Float32,Float32},1,CUDAnative.AS.Global}, CuDeviceArray{Tuple{Float32,Float32,Float32},1,CUDAnative.AS.Global}, CuDeviceArray{Tuple{Float32,Float32,Float32},1,CUDAnative.AS.Global}, Array{Array{Int64,1},1}) failed
KernelError: passing and using non-bitstype argument
Argument 6 to your kernel function is of type Array{Array{Int64,1},1}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.The warning seems pretty clear, but I don’t understand it, especially why it lets me run the subfunction, but not the main function. The full example code is below, and works out of the box on Julia v1.4:
using CuArrays using CUDAnative # Random constants const H = 0.04 const H1 = 1/H; const AD = 348.15; const FAC = 5/8; const BETA = 4; const ZETA = 0.060006; const V = 0.0011109; const DT = 0.016665; # Generate random points / code N = 3; pg = CuArrays.fill(tuple(0.f0,0.f0,0.f0), N); pg_tmp = deepcopy(pg) u = CuArrays.fill(tuple(0.f0,0.f0,0.f0), N); u_tmp = deepcopy(pg) # Arbitrary ID's idxs = [[3,2],[3,1],[2,1]] # Calculate for one particle - NOTE idxs[iter] function gpu_ParticlePackStep!(pg,pg_tmp,u,u_tmp,idxs,iter) Wgx = 0.f0; Wgz = 0.f0; filter!(x->x≠iter,idxs) @inbounds for i in idxs p_j = pg[iter] .- pg[i] RIJ = sqrt(sum(abs2,p_j.^2)) RIJ1 = 1.f0 / RIJ q = RIJ*H1; qq3 = q*(q-2)^3; Wq = AD * FAC * qq3; x_ij = p_j[1]; z_ij = p_j[3]; Wgx += Wq * (x_ij * RIJ1) * H1; Wgz += Wq * (z_ij * RIJ1) * H; end u_i = u[iter]; dux = (-BETA * Wgx * V - ZETA * u_i[1])*DT; duz = (-BETA * Wgz * V - ZETA * u_i[3])*DT; dx = dux*DT; dz = duz*DT; u_tmp[iter] = u_i .+ (dux, 0.0, duz) pg_tmp[iter] = pg[iter] .+ (dx, 0.0, dz) return nothing end # Do it for a lot of particles.. # Errors: isbit type function gpu_PackStep!(pg,pg_tmp,u,u_tmp,idxs) index = threadIdx().x stride = blockDim().x for iter = index:stride:length(pg) @inbounds gpu_ParticlePackStep!(pg,pg_tmp,u,u_tmp,idxs[iter],iter) end return nothing end
I know this is not the best way to do GPU programming, but I have to start somewhere, so this is why I have written such a poor kernel. I hope someone can spot where I am going wrong, in regards to getting this to work.
Kind regards
Posts: 2
Participants: 1