代码之家  ›  专栏  ›  技术社区  ›  4lrdyD

ArrayFire:具有从主函数调用的OpenCL内核的函数

  •  1
  • 4lrdyD  · 技术社区  · 6 年前

    函数如下所示(摘自 http://arrayfire.org/docs/interop_opencl.htm )

    唯一的 main 作用

    int main() {
        size_t length = 10;
        // Create ArrayFire array objects:
        af::array A = af::randu(length, f32);
        af::array B = af::constant(0, length, f32);
        // ... additional ArrayFire operations here
        // 2. Obtain the device, context, and queue used by ArrayFire
        static cl_context af_context = afcl::getContext();
        static cl_device_id af_device_id = afcl::getDeviceId();
        static cl_command_queue af_queue = afcl::getQueue();
        // 3. Obtain cl_mem references to af::array objects
        cl_mem * d_A = A.device<cl_mem>();
        cl_mem * d_B = B.device<cl_mem>();
        // 4. Load, build, and use your kernels.
        //    For the sake of readability, we have omitted error checking.
        int status = CL_SUCCESS;
        // A simple copy kernel, uses C++11 syntax for multi-line strings.
        const char * kernel_name = "copy_kernel";
        const char * source = R"(
            void __kernel
            copy_kernel(__global float * gA, __global float * gB)
            {
                int id = get_global_id(0);
                gB[id] = gA[id];
            }
        )";
        // Create the program, build the executable, and extract the entry point
        // for the kernel.
        cl_program program = clCreateProgramWithSource(af_context, 1, &source, NULL, &status);
        status = clBuildProgram(program, 1, &af_device_id, NULL, NULL, NULL);
        cl_kernel kernel = clCreateKernel(program, kernel_name, &status);
        // Set arguments and launch your kernels
        clSetKernelArg(kernel, 0, sizeof(cl_mem), d_A);
        clSetKernelArg(kernel, 1, sizeof(cl_mem), d_B);
        clEnqueueNDRangeKernel(af_queue, kernel, 1, NULL, &length, NULL, 0, NULL, NULL);
        // 5. Return control of af::array memory to ArrayFire
        A.unlock();
        B.unlock();
        // ... resume ArrayFire operations
        // Because the device pointers, d_x and d_y, were returned to ArrayFire's
        // control by the unlock function, there is no need to free them using
        // clReleaseMemObject()
        return 0;
    }
    

    af_print(B); 匹配A,但当我按如下方式单独编写函数时:

    分别地 主要的 作用

    arraycopy

    void arraycopy(af::array A, af::array B,size_t length) {
        // 2. Obtain the device, context, and queue used by ArrayFire   
        static cl_context af_context = afcl::getContext();
        static cl_device_id af_device_id = afcl::getDeviceId();
        static cl_command_queue af_queue = afcl::getQueue();
        // 3. Obtain cl_mem references to af::array objects
        cl_mem * d_A = A.device<cl_mem>();
        cl_mem * d_B = B.device<cl_mem>();
        // 4. Load, build, and use your kernels.
        //    For the sake of readability, we have omitted error checking.
        int status = CL_SUCCESS;
        // A simple copy kernel, uses C++11 syntax for multi-line strings.
        const char * kernel_name = "copy_kernel";
        const char * source = R"(
            void __kernel
            copy_kernel(__global float * gA, __global float * gB)
            {
                int id = get_global_id(0);
                gB[id] = gA[id];
            }
        )";
        // Create the program, build the executable, and extract the entry point
        // for the kernel.
        cl_program program = clCreateProgramWithSource(af_context, 1, &source, NULL, &status);
        status = clBuildProgram(program, 1, &af_device_id, NULL, NULL, NULL);
        cl_kernel kernel = clCreateKernel(program, kernel_name, &status);
        // Set arguments and launch your kernels
        clSetKernelArg(kernel, 0, sizeof(cl_mem), d_A);
        clSetKernelArg(kernel, 1, sizeof(cl_mem), d_B);
        clEnqueueNDRangeKernel(af_queue, kernel, 1, NULL, &length, NULL, 0, NULL, NULL);
        // 5. Return control of af::array memory to ArrayFire
        A.unlock();
        B.unlock();
        // ... resume ArrayFire operations
        // Because the device pointers, d_x and d_y, were returned to ArrayFire's
        // control by the unlock function, there is no need to free them using
        // clReleaseMemObject()
    }
    

    主要的 作用

    int main()
    {
        size_t length = 10;
        af::array A = af::randu(length, f32);
        af::array B = af::constant(0, length, f32);
        arraycopy(A, B, length);
        af_print(B);//does not match A
    }
    

    B的最终值没有改变,为什么会发生这种情况?我该怎么做才能让它工作呢?提前谢谢

    2 回复  |  直到 6 年前
        1
  •  2
  •   yeputons    6 年前

    你通过了吗 af::array 进入 arraycopy 通过值,而不是通过引用,因此 A B 在里面 main 保持不变,无论你在里面做什么 数组复制 . 你可以通过 B 通过引用: af::array &B 在参数列表中。我也建议通过考试 A. const af::array &A ).

        2
  •  1
  •   pradeep    6 年前

    您看到的行为背后的原因是引用计数。但它并不是一个bug,它与C++语言行为是一致的。

    af::数组 使用赋值或等效操作创建对象时,对象只执行元数据的复制,并保留一个共享指针。

    在作为函数的代码版本中, 因此,内部 B来自arraycopy B从主干道来 函数并共享指向main的数组B中的数据的指针。此时,如果用户执行 device 调用以获取指针,我们假定它用于写入该指针的位置。所以,什么时候? 装置 对具有引用计数的共享指针的数组对象调用>1,我们复制原始数组(main中的B)并返回指向该内存的指针。因此,如果你这样做 af_print(B) 在内部,您将看到正确的值。这本质上是写时复制-因为B是按值传递的,所以您看不到arraycopy函数中B的修改结果。

    普拉迪普。

    推荐文章