Quantcast
Channel: First steps - JuliaLang
Viewing all articles
Browse latest Browse all 2795

Use GPU subfunction in a bigger-function?

$
0
0

@Ahmed_Salih wrote:

Hello!

My question title is properly worded poorly, but here goes. I have made a “subfunction”, called gpu_ParticlePackStep! which calculates properties for one particle. If I call this function as such in Julia:

gpu_ParticlePackStep!(pg,pg_tmp,u,u_tmp,idxs,3)

Then it works, since something happens in the last row i.e.:

My problem is now if I want to use multiple threads, I make a code as:

function gpu_PackStep!(pg,pg_tmp,u,u_tmp,idxs)
    index = threadIdx().x
    stride = blockDim().x
    for iter = index:stride:length(pg)
        @inbounds gpu_ParticlePackStep!(pg,pg_tmp,u,u_tmp,idxs[iter],iter)
    end
    return nothing
end

I call it using;

@cuda threads=3 gpu_PackStep!(pg,pg_tmp,u,u_tmp,idxs)

But it produces an error as:

ERROR: GPU compilation of gpu_PackStep!(CuDeviceArray{Tuple{Float32,Float32,Float32},1,CUDAnative.AS.Global}, CuDeviceArray{Tuple{Float32,Float32,Float32},1,CUDAnative.AS.Global}, CuDeviceArray{Tuple{Float32,Float32,Float32},1,CUDAnative.AS.Global}, CuDeviceArray{Tuple{Float32,Float32,Float32},1,CUDAnative.AS.Global}, Array{Array{Int64,1},1}) failed
KernelError: passing and using non-bitstype argument
Argument 6 to your kernel function is of type Array{Array{Int64,1},1}.
That type is not isbits, and such arguments are only allowed when they are unused by the kernel.

The warning seems pretty clear, but I don’t understand it, especially why it lets me run the subfunction, but not the main function. The full example code is below, and works out of the box on Julia v1.4:

using CuArrays
using CUDAnative

# Random constants
const H = 0.04
const H1   = 1/H;
const AD = 348.15;
const FAC = 5/8;
const BETA = 4;
const ZETA = 0.060006;
const V    = 0.0011109;
const DT   = 0.016665;

# Generate random points / code
N = 3;
pg = CuArrays.fill(tuple(0.f0,0.f0,0.f0), N);
pg_tmp = deepcopy(pg)
u  = CuArrays.fill(tuple(0.f0,0.f0,0.f0), N);
u_tmp = deepcopy(pg)

# Arbitrary ID's
idxs  = [[3,2],[3,1],[2,1]]

# Calculate for one particle - NOTE idxs[iter]
function gpu_ParticlePackStep!(pg,pg_tmp,u,u_tmp,idxs,iter)
    Wgx = 0.f0;
    Wgz = 0.f0;

    filter!(x->x≠iter,idxs)
    @inbounds for i in idxs
            p_j = pg[iter] .- pg[i]
            RIJ  = sqrt(sum(abs2,p_j.^2))
            RIJ1 = 1.f0 / RIJ
            q   = RIJ*H1;
            qq3 = q*(q-2)^3;
            Wq  = AD * FAC * qq3;

            x_ij = p_j[1];
            z_ij = p_j[3];

            Wgx += Wq * (x_ij * RIJ1) * H1;
            Wgz += Wq * (z_ij * RIJ1) * H;
        end

        u_i = u[iter];
        dux = (-BETA * Wgx * V - ZETA * u_i[1])*DT;
        duz = (-BETA * Wgz * V - ZETA * u_i[3])*DT;
        dx  = dux*DT;
        dz  = duz*DT;
        u_tmp[iter]   =   u_i      .+ (dux, 0.0, duz)
        pg_tmp[iter]  =   pg[iter] .+ (dx,  0.0, dz)

    return nothing
end

# Do it for a lot of particles..
# Errors: isbit type
function gpu_PackStep!(pg,pg_tmp,u,u_tmp,idxs)
    index = threadIdx().x
    stride = blockDim().x
    for iter = index:stride:length(pg)
        @inbounds gpu_ParticlePackStep!(pg,pg_tmp,u,u_tmp,idxs[iter],iter)
    end
    return nothing
end

I know this is not the best way to do GPU programming, but I have to start somewhere, so this is why I have written such a poor kernel. I hope someone can spot where I am going wrong, in regards to getting this to work.

Kind regards

Posts: 2

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 2795

Latest Images

Trending Articles



Latest Images