代码之家  ›  专栏  ›  技术社区  ›  Dylan

将指向设备分配矩阵的指针从C返回到Fortran

  •  1
  • Dylan  · 技术社区  · 11 年前

    首先,我对Fortran/C/CUDA有点陌生。其次,我正在开发一个Fortran/C程序,该程序使用cuBLAS在GPU上执行矩阵向量乘法。在更新矩阵内容之前,我需要用一个矩阵乘以多个(最多1000个)向量。然而,我的当前版本必须在每次向GPU发送新向量时重新分配矩阵(这是难以置信的浪费和缓慢,因为矩阵没有改变)。

    我希望能够将矩阵与向量相乘,而不必为每个向量重新分配矩阵。我的一个想法是调用一个单独的C函数,将矩阵分配给GPU,返回一个指向Fortran主程序的指针,然后调用另一个执行矩阵向量乘法的C函数。

    使用ISO_C_BINDING,我向变量返回了一个指向浮点数的指针:

    type(C_PTR) :: ptr
    

    当我试图将其传递到矩阵向量C函数中时:

    在Fortran中

    call cudaFunction(ptr,vector, N)
    

    在C中

    extern "C" void cudaFunction_(float *mat, float *vector, int *N)
    

    所有程序都编译并运行,但cubrassgemv的执行无法执行。对为什么会发生这种情况有什么想法吗?我见过一些类似的帖子,但他们从未尝试将返回的指针返回到C,这就是我遇到的问题所在。

    提前感谢!

    1 回复  |  直到 10 年前
        1
  •  2
  •   Robert Crovella    11 年前

    我建议你不要重新发明轮子,而是使用 cublas fortran bindings 这是为此目的而提供的。

    “砰砰”的包装不是你想要的。在fortran中使用cublas调用时,它会根据需要执行隐式复制操作。

    您需要“非thunking”包装器,因此您可以显式控制正在进行的复制 Get/SetMatrix Get/SetVector 来回复制数据。

    有一个示例代码(示例B.2)显示了如何使用cublas文档中包含的非敲击包装器。

    即使您确实想重新发明轮子,包装器也会向您展示如何使必要的语法在C和Fortran之间移动。

    在标准的linux CUDA安装中,包装器位于 /usr/local/cuda/src 非敲击包装是 /usr/local/cuda/src/fortran.c

    下面是一个充分发挥作用的示例:

    立方英尺:

          program cublas_fortran_example
          implicit none
          integer i, j
    
    c     helper functions
          integer cublas_init
          integer cublas_shutdown
          integer cublas_alloc
          integer cublas_free
          integer cublas_set_vector
          integer cublas_get_vector
    c     selected blas functions
          double precision cublas_ddot
          external cublas_daxpy
          external cublas_dscal
          external cublas_dcopy
          double precision cublas_dnrm2
    c     cublas variables
          integer cublas_status
          real*8 x(30), y(30)
          double precision alpha, beta
          double precision nrm
          integer*8 d_x, d_y, d_alpha, d_beta, d_nrm
          integer*8 dsize1, dlength1, dlength2
          double precision dresult
    
    
    
          write(*,*) "testing cublas fortran example"
    
    c     initialize cublas library
    c     CUBLAS_STATUS_SUCCESS=0
          cublas_status = cublas_init()
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS Library initialization failed"
             write(*,*) "cublas_status=",cublas_status
             stop
          endif
    c     initialize data
          do j=1,30
            x(j) = 1.0
            y(j) = 2.0
          enddo
          dsize1 = 8
          dlength1 = 30
          dlength2 = 1
          alpha = 2.0
          beta = 3.0
    c     allocate device storage
          cublas_status = cublas_alloc(dlength1, dsize1, d_x)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS device malloc failed"
             stop
          endif
          cublas_status = cublas_alloc(dlength1, dsize1, d_y)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS device malloc failed"
             stop
          endif
          cublas_status = cublas_alloc(dlength2, dsize1, d_alpha)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS device malloc failed"
             stop
          endif
          cublas_status = cublas_alloc(dlength2, dsize1, d_beta)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS device malloc failed"
             stop
          endif
          cublas_status = cublas_alloc(dlength2, dsize1, d_nrm)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS device malloc failed"
             stop
          endif
    
    c     copy data from host to device
    
          cublas_status = cublas_set_vector(dlength1, dsize1, x, dlength2,
         >     d_x, dlength2)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS copy to device failed"
             write(*,*) "cublas_status=",cublas_status
             stop
          endif
          cublas_status = cublas_set_vector(dlength1, dsize1, y, dlength2,
         >     d_y, dlength2)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS copy to device failed"
             write(*,*) "cublas_status=",cublas_status
             stop
          endif
    
          dresult = cublas_ddot(dlength1, d_x, dlength2, d_y, dlength2)
          write(*,*) "dot product result=",dresult
    
          dresult = cublas_dnrm2(dlength1, d_x, dlength2)
          write(*,*) "nrm2 of x result=",dresult
    
          dresult = cublas_dnrm2(dlength1, d_y, dlength2)
          write(*,*) "nrm2 of y result=",dresult
    
          call cublas_daxpy(dlength1, alpha, d_x, dlength2, d_y, dlength2)
          cublas_status = cublas_get_vector(dlength1, dsize1, d_y, dlength2,
         >     y, dlength2)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS copy to host failed"
             write(*,*) "cublas_status=",cublas_status
             stop
          endif
          write(*,*) "daxpy y(1)  =", y(1)
          write(*,*) "daxpy y(30) =", y(30)
    
    
          call cublas_dscal(dlength1, beta, d_x, dlength2)
          cublas_status = cublas_get_vector(dlength1, dsize1, d_x, dlength2,
         >     x, dlength2)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS copy to host failed"
             write(*,*) "cublas_status=",cublas_status
             stop
          endif
          write(*,*) "dscal x(1)  =", x(1)
          write(*,*) "dscal x(30) =", x(30)
    
    
          call cublas_dcopy(dlength1, d_x, dlength2, d_y, dlength2)
          cublas_status = cublas_get_vector(dlength1, dsize1, d_y, dlength2,
         >     y, dlength2)
          if (cublas_status /= 0) then
             write(*,*) "CUBLAS copy to host failed"
             write(*,*) "cublas_status=",cublas_status
             stop
          endif
          write(*,*) "dcopy y(1)  =", y(1)
          write(*,*) "dcopy y(30) =", y(30)
    
    c     deallocate GPU memory and exit
          cublas_status = cublas_free(d_x)
          cublas_status = cublas_free(d_y)
          cublas_status = cublas_free(d_alpha)
          cublas_status = cublas_free(d_beta)
          cublas_status = cublas_free(d_nrm)
          cublas_status = cublas_shutdown()
          stop
          end
    

    编译/运行:

    $ gfortran -c -o cublasf.o cublasf.f
    $ gcc -c -DCUBLAS_GFORTRAN -I/usr/local/cuda/include -I/usr/local/cuda/src -o fortran.o /usr/local/cuda/src/fortran.c
    $ gfortran -L/usr/local/cuda/lib64 -lcublas -o cublasf cublasf.o fortran.o
    $ ./cublasf
     testing cublas fortran example
     dot product result=   60.0000000000000
     nrm2 of x result=   5.47722557505166
     nrm2 of y result=   10.9544511501033
     daxpy y(1)  =   4.00000000000000
     daxpy y(30) =   4.00000000000000
     dscal x(1)  =   3.00000000000000
     dscal x(30) =   3.00000000000000
     dcopy y(1)  =   3.00000000000000
     dcopy y(30) =   3.00000000000000
    $ 
    

    CUDA 5.0,RHEL 5.5