代码之家 › 专栏 › 技术社区 › proczell

基于初始输入生成所有组合(n选择k)的快速算法[重复]

combinatorics combinations matlab c++

proczell · 技术社区 · 6 年前

我正在研究一个应用集合覆盖问题。在这项研究中,我想生成所有可能的组合。一、 e.n=5,k=3产量

0 0 1
0 0 2 
0 0 3
etc..

这对于较小规模的问题来说不是问题,但当n和k增加时,例如n=250和k=6,组合的数量是3.1920e+11。所有组合不能存储在一个矩阵中,因此我需要一个算法,该算法可以计算x个组合,然后计算给定第一个矩阵终点的x个下一个组合。有人知道在C/C++/CUDA或Matlab中快速实现这一点的算法吗?

谢谢

2 回复 | 直到 6 年前

NuPagadi 6 年前

我认为您将遇到的最大问题不是计算,而是磁盘写入速度或内存大小。顺便说一下,您似乎错误地确定了 n = 250 和 k = 6 .您是否使用 uint64_t ?我的号码是 244 140 625 000 000 。

所以对于这个号码你需要 ~1.4 Petabyte ( ~1400 Tb )内存不足。这是你的主要问题。如果你有那么大的硬盘,你最好使用 memory mapping ,写入时。您可以考虑使用多个线程进行写入:每个线程将写入自己的内存块。

因此,我认为您应该考虑其他方法来提供组合,以解决您的实际目标。

天真的解决方案。改变 std::ofstream 使用内存映射对象。

int main()
{
    const constexpr uint8_t N = 250;
    const constexpr uint8_t K = 6;
    const constexpr uint64_t CombinationsCount = std::pow(N, K);
    using TCombination = std::array<uint8_t, K>;

    std::cout << CombinationsCount << std::endl;

    std::ofstream file("output.txt");
    TCombination c;
    for (uint64_t i = 0; i < CombinationsCount; ++i)
    {
        auto I = i;
        for (auto j = 0; j < K; ++j)
        {
            c[j] = I % N;
            I /= N;
            file << (int)c[j];
        }
        file << std::endl;
    }

}

如果要使用线程,只需划分 CombinationsCount 使用核心编号,并为每个线程分配一个从内存的特定地址(偏移量)写入的任务。

您要求提供类似函数的解决方案。您可以传递不同的文件名并使用不同的线程。购买时仍然需要使用内存映射。

const constexpr uint8_t N = 250;
const constexpr uint8_t K = 6;
const constexpr uint64_t CombinationsCount = std::pow(N, K);
using TCombination = std::array<uint8_t, K>;

void Generate(uint64_t start, uint64_t size, const char* fileName)
{
    std::ofstream file(fileName);
    TCombination c;
    for (uint64_t i = start; i < start + size; ++i)
    {
        auto I = i;
        for (auto j = 0; j < K; ++j)
        {
            c[j] = I % N;
            I /= N;
            file << (int)c[j];
        }
        file << std::endl;
    }
}

int main()
{
    std::cout << CombinationsCount << std::endl;

    unsigned int threadsNum = std::thread::hardware_concurrency();

    std::vector<std::thread> workers;
    for (size_t i = 0; i < threadsNum; ++i)
        workers.emplace_back(
            Generate, 
            i * CombinationsCount / threadsNum,
            CombinationsCount / threadsNum,
            (std::string("output") + std::to_string(i)).c_str());

    for (size_t i = 0; i < threadsNum; ++i)
        workers[i].join();
}

einpoklum 6 年前

我正在研究一个应用集合覆盖问题。在这项研究中,我想生成所有可能的组合。。。。有人知道在C/C++/CUDA或Matlab中快速实现这一点的算法吗?

不存在“快速”生成所有可能组合的情况。根据定义,这是非常慢的,因为n和k增加了:n/((n-k)!k!)上升速度快于(k/e)^n,作为n的函数渐近;因此,使用GPU以恒定因子加快组合生成速度只会让n和/或k增加一点点。

很抱歉听起来像是在说教,但您可能需要做一些事情,而不是尝试生成所有组合。