代码之家  ›  专栏  ›  技术社区  ›  Christopher Bruns

从CMAKE测试支持CUDA的GPU是否存在的最简单方法?

  •  14
  • Christopher Bruns  · 技术社区  · 15 年前

    我们有一些夜间制造的机器 cuda libraries 已安装,但未安装支持CUDA的GPU。这些机器能够构建支持CUDA的程序,但它们不能运行这些程序。

    在我们的自动夜间构建过程中,cmake脚本使用cmake命令

    find_package(CUDA)

    确定是否安装了CUDA软件。这将设置cmake变量 CUDA_FOUND 在安装了CUDA软件的平台上。这很好,而且效果很好。什么时候? 库达发现 设置为,可以生成启用CUDA的程序。即使机器没有支持CUDA的GPU。

    但是使用测试程序的CUDA在非GPU CUDA机器上自然会失败,导致我们的夜间仪表盘看起来“脏”。所以我希望cmake避免在这样的机器上运行这些测试。但我仍然想在这些机器上构建CUDA软件。

    在得到一个积极的 库达发现 结果,我想测试是否存在实际的GPU,然后设置一个变量,比如 CUDA_GPU_FOUND ,以反映这一点。

    让CMAKE测试支持CUDA的GPU的最简单方法是什么?

    这需要在三个平台上工作:带有MSVC、Mac和Linux的Windows。(这就是为什么我们首先使用cmake)

    编辑: 对于如何编写一个测试GPU存在的程序,答案中有一些很好的建议。仍然缺少的是让cmake在配置时编译和运行该程序的方法。我怀疑 TRY_RUN cmake中的命令在这里很关键,但不幸的是,该命令 nearly undocumented 我不知道该怎么做。这个问题的一部分可能是一个更困难的问题。也许我应该把这个问题作为两个独立的问题来问…

    5 回复  |  直到 8 年前
        1
  •  18
  •   Christopher Bruns    15 年前

    这个问题的答案由两部分组成:

    1. 一种检测支持CUDA的GPU的程序。
    2. cmake代码在配置时编译、运行和解释该程序的结果。

    对于第1部分,GPU嗅探程序,我从Fabrizimo提供的答案开始,因为它非常紧凑。我很快发现,我需要在未知答案中找到很多细节,才能让它工作得很好。我最后得到的是下面的C源文件,它是我命名的 has_cuda_gpu.c :

    #include <stdio.h>
    #include <cuda_runtime.h>
    
    int main() {
        int deviceCount, device;
        int gpuDeviceCount = 0;
        struct cudaDeviceProp properties;
        cudaError_t cudaResultCode = cudaGetDeviceCount(&deviceCount);
        if (cudaResultCode != cudaSuccess) 
            deviceCount = 0;
        /* machines with no GPUs can still report one emulation device */
        for (device = 0; device < deviceCount; ++device) {
            cudaGetDeviceProperties(&properties, device);
            if (properties.major != 9999) /* 9999 means emulation only */
                ++gpuDeviceCount;
        }
        printf("%d GPU CUDA device(s) found\n", gpuDeviceCount);
    
        /* don't just return the number of gpus, because other runtime cuda
           errors can also yield non-zero return values */
        if (gpuDeviceCount > 0)
            return 0; /* success */
        else
            return 1; /* failure */
    }
    

    请注意,在找到启用CUDA的GPU的情况下,返回代码为零。这是因为在我的一台有CUDA但没有GPU的机器上,这个程序用非零的退出代码生成一个运行时错误。所以任何非零退出代码都被解释为“CUDA在这台机器上不工作”。

    你可能会问我为什么不在非GPU机器上使用CUDA仿真模式。这是因为仿真模式有问题。我只想调试我的代码,解决CUDAGPU代码中的错误。我没有时间调试模拟器。

    问题的第二部分是使用该测试程序的cmake代码。经过一番斗争,我终于明白了。以下块是较大块的一部分 CMakeLists.txt 文件:

    find_package(CUDA)
    if(CUDA_FOUND)
        try_run(RUN_RESULT_VAR COMPILE_RESULT_VAR
            ${CMAKE_BINARY_DIR} 
            ${CMAKE_CURRENT_SOURCE_DIR}/has_cuda_gpu.c
            CMAKE_FLAGS 
                -DINCLUDE_DIRECTORIES:STRING=${CUDA_TOOLKIT_INCLUDE}
                -DLINK_LIBRARIES:STRING=${CUDA_CUDART_LIBRARY}
            COMPILE_OUTPUT_VARIABLE COMPILE_OUTPUT_VAR
            RUN_OUTPUT_VARIABLE RUN_OUTPUT_VAR)
        message("${RUN_OUTPUT_VAR}") # Display number of GPUs found
        # COMPILE_RESULT_VAR is TRUE when compile succeeds
        # RUN_RESULT_VAR is zero when a GPU is found
        if(COMPILE_RESULT_VAR AND NOT RUN_RESULT_VAR)
            set(CUDA_HAVE_GPU TRUE CACHE BOOL "Whether CUDA-capable GPU is present")
        else()
            set(CUDA_HAVE_GPU FALSE CACHE BOOL "Whether CUDA-capable GPU is present")
        endif()
    endif(CUDA_FOUND)
    

    这套 CUDA_HAVE_GPU CMAKE中的布尔变量,可随后用于触发条件操作。

    我花了很长时间才知道include和link参数需要放在cmake-flags节中,以及语法应该是什么。这个 try_run documentation 很轻,但在 try_compile documentation ,这是一个密切相关的命令。我仍然需要在网络上搜索一下try-compile和try-run的例子,然后才能开始工作。

    另一个棘手但重要的细节是 try_run “bindir”。你应该把这个设置为 ${CMAKE_BINARY_DIR} . 尤其是,不要将其设置为 ${CMAKE_CURRENT_BINARY_DIR} 如果您在项目的子目录中。cmake希望找到子目录 CMakeFiles/CMakeTmp 在bindir中,如果该目录不存在,则会发出错误。只是使用 $cmake_binary_dir_ 这是这些子目录似乎自然存在的一个位置。

        2
  •  8
  •   fabmilo    15 年前

    编写一个简单的程序

    #include<cuda.h>
    
    int main (){
        int deviceCount;
        cudaError_t e = cudaGetDeviceCount(&deviceCount);
        return e == cudaSuccess ? deviceCount : -1;
    }
    

    并检查返回值。

        3
  •  4
  •   Randall Radmer    15 年前

    我刚刚编写了一个纯Python脚本,它可以执行您似乎需要的一些操作(我从pystream项目中获取了大部分内容)。它基本上只是CUDA运行时库中某些函数的包装器(它使用CTypes)。查看main()函数以查看示例用法。另外,请注意我刚刚写了它,所以它很可能包含bug。小心使用。

    #!/bin/bash
    
    import sys
    import platform
    import ctypes
    
    """
    cudart.py: used to access pars of the CUDA runtime library.
    Most of this code was lifted from the pystream project (it's BSD licensed):
    http://code.google.com/p/pystream
    
    Note that this is likely to only work with CUDA 2.3
    To extend to other versions, you may need to edit the DeviceProp Class
    """
    
    cudaSuccess = 0
    errorDict = {
        1: 'MissingConfigurationError',
        2: 'MemoryAllocationError',
        3: 'InitializationError',
        4: 'LaunchFailureError',
        5: 'PriorLaunchFailureError',
        6: 'LaunchTimeoutError',
        7: 'LaunchOutOfResourcesError',
        8: 'InvalidDeviceFunctionError',
        9: 'InvalidConfigurationError',
        10: 'InvalidDeviceError',
        11: 'InvalidValueError',
        12: 'InvalidPitchValueError',
        13: 'InvalidSymbolError',
        14: 'MapBufferObjectFailedError',
        15: 'UnmapBufferObjectFailedError',
        16: 'InvalidHostPointerError',
        17: 'InvalidDevicePointerError',
        18: 'InvalidTextureError',
        19: 'InvalidTextureBindingError',
        20: 'InvalidChannelDescriptorError',
        21: 'InvalidMemcpyDirectionError',
        22: 'AddressOfConstantError',
        23: 'TextureFetchFailedError',
        24: 'TextureNotBoundError',
        25: 'SynchronizationError',
        26: 'InvalidFilterSettingError',
        27: 'InvalidNormSettingError',
        28: 'MixedDeviceExecutionError',
        29: 'CudartUnloadingError',
        30: 'UnknownError',
        31: 'NotYetImplementedError',
        32: 'MemoryValueTooLargeError',
        33: 'InvalidResourceHandleError',
        34: 'NotReadyError',
        0x7f: 'StartupFailureError',
        10000: 'ApiFailureBaseError'}
    
    
    try:
        if platform.system() == "Microsoft":
            _libcudart = ctypes.windll.LoadLibrary('cudart.dll')
        elif platform.system()=="Darwin":
            _libcudart = ctypes.cdll.LoadLibrary('libcudart.dylib')
        else:
            _libcudart = ctypes.cdll.LoadLibrary('libcudart.so')
        _libcudart_error = None
    except OSError, e:
        _libcudart_error = e
        _libcudart = None
    
    def _checkCudaStatus(status):
        if status != cudaSuccess:
            eClassString = errorDict[status]
            # Get the class by name from the top level of this module
            eClass = globals()[eClassString]
            raise eClass()
    
    def _checkDeviceNumber(device):
        assert isinstance(device, int), "device number must be an int"
        assert device >= 0, "device number must be greater than 0"
        assert device < 2**8-1, "device number must be < 255"
    
    
    # cudaDeviceProp
    class DeviceProp(ctypes.Structure):
        _fields_ = [
             ("name", 256*ctypes.c_char), #  < ASCII string identifying device
             ("totalGlobalMem", ctypes.c_size_t), #  < Global memory available on device in bytes
             ("sharedMemPerBlock", ctypes.c_size_t), #  < Shared memory available per block in bytes
             ("regsPerBlock", ctypes.c_int), #  < 32-bit registers available per block
             ("warpSize", ctypes.c_int), #  < Warp size in threads
             ("memPitch", ctypes.c_size_t), #  < Maximum pitch in bytes allowed by memory copies
             ("maxThreadsPerBlock", ctypes.c_int), #  < Maximum number of threads per block
             ("maxThreadsDim", 3*ctypes.c_int), #  < Maximum size of each dimension of a block
             ("maxGridSize", 3*ctypes.c_int), #  < Maximum size of each dimension of a grid
             ("clockRate", ctypes.c_int), #  < Clock frequency in kilohertz
             ("totalConstMem", ctypes.c_size_t), #  < Constant memory available on device in bytes
             ("major", ctypes.c_int), #  < Major compute capability
             ("minor", ctypes.c_int), #  < Minor compute capability
             ("textureAlignment", ctypes.c_size_t), #  < Alignment requirement for textures
             ("deviceOverlap", ctypes.c_int), #  < Device can concurrently copy memory and execute a kernel
             ("multiProcessorCount", ctypes.c_int), #  < Number of multiprocessors on device
             ("kernelExecTimeoutEnabled", ctypes.c_int), #  < Specified whether there is a run time limit on kernels
             ("integrated", ctypes.c_int), #  < Device is integrated as opposed to discrete
             ("canMapHostMemory", ctypes.c_int), #  < Device can map host memory with cudaHostAlloc/cudaHostGetDevicePointer
             ("computeMode", ctypes.c_int), #  < Compute mode (See ::cudaComputeMode)
             ("__cudaReserved", 36*ctypes.c_int),
    ]
    
        def __str__(self):
            return """NVidia GPU Specifications:
        Name: %s
        Total global mem: %i
        Shared mem per block: %i
        Registers per block: %i
        Warp size: %i
        Mem pitch: %i
        Max threads per block: %i
        Max treads dim: (%i, %i, %i)
        Max grid size: (%i, %i, %i)
        Total const mem: %i
        Compute capability: %i.%i
        Clock Rate (GHz): %f
        Texture alignment: %i
    """ % (self.name, self.totalGlobalMem, self.sharedMemPerBlock,
           self.regsPerBlock, self.warpSize, self.memPitch,
           self.maxThreadsPerBlock,
           self.maxThreadsDim[0], self.maxThreadsDim[1], self.maxThreadsDim[2],
           self.maxGridSize[0], self.maxGridSize[1], self.maxGridSize[2],
           self.totalConstMem, self.major, self.minor,
           float(self.clockRate)/1.0e6, self.textureAlignment)
    
    def cudaGetDeviceCount():
        if _libcudart is None: return  0
        deviceCount = ctypes.c_int()
        status = _libcudart.cudaGetDeviceCount(ctypes.byref(deviceCount))
        _checkCudaStatus(status)
        return deviceCount.value
    
    def getDeviceProperties(device):
        if _libcudart is None: return  None
        _checkDeviceNumber(device)
        props = DeviceProp()
        status = _libcudart.cudaGetDeviceProperties(ctypes.byref(props), device)
        _checkCudaStatus(status)
        return props
    
    def getDriverVersion():
        if _libcudart is None: return  None
        version = ctypes.c_int()
        _libcudart.cudaDriverGetVersion(ctypes.byref(version))
        v = "%d.%d" % (version.value//1000,
                       version.value%100)
        return v
    
    def getRuntimeVersion():
        if _libcudart is None: return  None
        version = ctypes.c_int()
        _libcudart.cudaRuntimeGetVersion(ctypes.byref(version))
        v = "%d.%d" % (version.value//1000,
                       version.value%100)
        return v
    
    def getGpuCount():
        count=0
        for ii in range(cudaGetDeviceCount()):
            props = getDeviceProperties(ii)
            if props.major!=9999: count+=1
        return count
    
    def getLoadError():
        return _libcudart_error
    
    
    version = getDriverVersion()
    if version is not None and not version.startswith('2.3'):
        sys.stdout.write("WARNING: Driver version %s may not work with %s\n" %
                         (version, sys.argv[0]))
    
    version = getRuntimeVersion()
    if version is not None and not version.startswith('2.3'):
        sys.stdout.write("WARNING: Runtime version %s may not work with %s\n" %
                         (version, sys.argv[0]))
    
    
    def main():
    
        sys.stdout.write("Driver version: %s\n" % getDriverVersion())
        sys.stdout.write("Runtime version: %s\n" % getRuntimeVersion())
    
        nn = cudaGetDeviceCount()
        sys.stdout.write("Device count: %s\n" % nn)
    
        for ii in range(nn):
            props = getDeviceProperties(ii)
            sys.stdout.write("\nDevice %d:\n" % ii)
            #sys.stdout.write("%s" % props)
            for f_name, f_type in props._fields_:
                attr = props.__getattribute__(f_name)
                sys.stdout.write( "  %s: %s\n" % (f_name, attr))
    
        gpuCount = getGpuCount()
        if gpuCount > 0:
            sys.stdout.write("\n")
        sys.stdout.write("GPU count: %d\n" % getGpuCount())
        e = getLoadError()
        if e is not None:
            sys.stdout.write("There was an error loading a library:\n%s\n\n" % e)
    
    if __name__=="__main__":
        main()
    
        4
  •  3
  •   Anycorn    15 年前

    如果找到CUDA,可以编译小的GPU查询程序。下面是一个简单的方法,您可以采用这些需求:

    #include <stdlib.h>
    #include <stdio.h>
    #include <cuda.h>
    #include <cuda_runtime.h>
    
    int main(int argc, char** argv) {
      int ct,dev;
      cudaError_t code;
      struct cudaDeviceProp prop;
    
     cudaGetDeviceCount(&ct);
     code = cudaGetLastError();
     if(code)  printf("%s\n", cudaGetErrorString(code));
    
    
    if(ct == 0) {
       printf("Cuda device not found.\n");
       exit(0);
    }
     printf("Found %i Cuda device(s).\n",ct);
    
    for (dev = 0; dev < ct; ++dev) {
    printf("Cuda device %i\n", dev);
    
    cudaGetDeviceProperties(&prop,dev);
    printf("\tname : %s\n", prop.name);
     printf("\ttotalGlobablMem: %lu\n", (unsigned long)prop.totalGlobalMem);
    printf("\tsharedMemPerBlock: %i\n", prop.sharedMemPerBlock);
    printf("\tregsPerBlock: %i\n", prop.regsPerBlock);
    printf("\twarpSize: %i\n", prop.warpSize);
    printf("\tmemPitch: %i\n", prop.memPitch);
    printf("\tmaxThreadsPerBlock: %i\n", prop.maxThreadsPerBlock);
    printf("\tmaxThreadsDim: %i, %i, %i\n", prop.maxThreadsDim[0], prop.maxThreadsDim[1], prop.maxThreadsDim[2]);
    printf("\tmaxGridSize: %i, %i, %i\n", prop.maxGridSize[0], prop.maxGridSize[1], prop.maxGridSize[2]);
    printf("\tclockRate: %i\n", prop.clockRate);
    printf("\ttotalConstMem: %i\n", prop.totalConstMem);
    printf("\tmajor: %i\n", prop.major);
    printf("\tminor: %i\n", prop.minor);
    printf("\ttextureAlignment: %i\n", prop.textureAlignment);
    printf("\tdeviceOverlap: %i\n", prop.deviceOverlap);
    printf("\tmultiProcessorCount: %i\n", prop.multiProcessorCount);
    }
    }
    
        5
  •  1
  •   mabraham Frahm    8 年前

    一种有用的方法是运行CUDA安装的程序,例如Nvidia SMI,以查看它们返回的内容。

            find_program(_nvidia_smi "nvidia-smi")
            if (_nvidia_smi)
                set(DETECT_GPU_COUNT_NVIDIA_SMI 0)
                # execute nvidia-smi -L to get a short list of GPUs available
                exec_program(${_nvidia_smi_path} ARGS -L
                    OUTPUT_VARIABLE _nvidia_smi_out
                    RETURN_VALUE    _nvidia_smi_ret)
                # process the stdout of nvidia-smi
                if (_nvidia_smi_ret EQUAL 0)
                    # convert string with newlines to list of strings
                    string(REGEX REPLACE "\n" ";" _nvidia_smi_out "${_nvidia_smi_out}")
                    foreach(_line ${_nvidia_smi_out})
                        if (_line MATCHES "^GPU [0-9]+:")
                            math(EXPR DETECT_GPU_COUNT_NVIDIA_SMI "${DETECT_GPU_COUNT_NVIDIA_SMI}+1")
                            # the UUID is not very useful for the user, remove it
                            string(REGEX REPLACE " \\(UUID:.*\\)" "" _gpu_info "${_line}")
                            if (NOT _gpu_info STREQUAL "")
                                list(APPEND DETECT_GPU_INFO "${_gpu_info}")
                            endif()
                        endif()
                    endforeach()
    
                    check_num_gpu_info(${DETECT_GPU_COUNT_NVIDIA_SMI} DETECT_GPU_INFO)
                    set(DETECT_GPU_COUNT ${DETECT_GPU_COUNT_NVIDIA_SMI})
                endif()
            endif()
    

    也可以查询linux/proc或lspci。参见完全工作的示例 https://github.com/gromacs/gromacs/blob/master/cmake/gmxDetectGpu.cmake