代码之家 › 专栏 › 技术社区 › Old Pro

从输出重构编码器的输入

constraint-programming combinatorics data-structures algorithm

Old Pro · 技术社区 · 6 年前

我想知道如何解密码 ArrayRecovery 挑战,但我甚至不知道该请教哪门知识。是组合学、最优化、计算机科学、集合论还是其他什么?

编辑: 咨询的知识分支是 constraint programming ,尤其是 constraint propagation . 你还需要一些 combinatorics 知道如果你千一次从范围[1。。 n个 ],有一个限制,任何数字都不能大于它之前的数字,这就是 (n+k-1)!/k!(n-1)! 可能的组合这和 带替换的组合 属于 n个拿走的东西千一次,它有数学符号 . 你可以读到为什么会这样 here .

彼得诺维格提供了一个很好的例子来说明如何用他的 Sudoku solver .

您可以通过上面的链接阅读ArrayRecovery问题的完整描述。简而言之 编码器 这需要一个 整数序列 在1到某个给定极限的范围内(就我们的目的来说是100),并且 对于输入序列的每个元素 输出 最近看到的整数,小于当前输入,如果没有,则为0。 .

input 1, 2, 3, 4 => output 0, 1, 2, 3
input 2, 4, 3 => output 0, 2, 2

完整的任务是,给定输出(和允许输入的范围),计算出有多少可能的输入生成了它。但在我开始计算之前,我甚至对如何建立方程都没有信心。这就是我所要求的帮助。(当然,如果有人解释,我们也欢迎完整的解决方案。)

我只是看看一些可能的结果,然后想知道。下面是一些示例编码器输出和输入 * 意思是任何有效的输入 > 4 表示任何大于4的有效输入。如果需要,输入被称为 A1, A2, A3, ... (基于1的索引)

编辑#2

我在这个挑战中遇到的部分问题是,我没有为输出手动生成完全正确的可能输入集。我相信下面的设置现在是正确的。如果你想看看我之前的错误,看看这个答案的编辑历史。

output #1: 0, 0, 0, 4 
possible inputs: [>= 4, A1 >= * >= 4, 4, > 4]                     

output #2: 0, 0, 0, 2, 3, 4          # A5 â´  See more in discussion below
possible inputs: [>= 2, A1 >= * >=2, 2, 3, 4, > 4]

output #3: 0, 0, 0, 4, 3, 1
possible inputs: none # [4, 3, 1, 1 >= * > 4, 4, > 1] but there is no number 1 >= * > 4

与第一个输入序列相比,第二个输入序列仅通过增加两个输出就受到了非常严格的约束。第三个序列被限制得不可能。

但在示例2中,A5上的约束集有点难以表达。当然,这是对所有输入的基本限制。但是任何输出>A4和O5之后的内容都必须出现在A4之后的输入中,因此A5必须是A5之后的一组数字中的一个元素,也就是>A4。因为只有1个这样的数字(A6==4),A5必须是它,但是如果后面有一个更长的数字串,它会变得更复杂。 (编者按:其实没有。)

随着输出集越来越长,我担心这些约束会变得越来越复杂,也越来越难以纠正。我想不出任何数据结构能够有效地表示这些数据,从而有效地计算可能的组合数。我也不太明白如何在算法上添加约束集。

以下是到目前为止我看到的任何给定的 _n个

一个 _n个 >否 _n个
一个 _n个 <=min(O中的一组其他可能数字 _1个到 _n-1号 >否 _n个 ). 如何定义大于O的可能数集 _n个 ?
1. 大于O的数字 _n个那是在最近发生的 _n个在输入中
一个 _n个 >=max(O中的其他可能数字集 _1个到 _n-1号 &升;O _n个 ). 如何定义小于O的可能数集 _n个 ?
1. 实际上这个集合是空的,因为 _n个根据定义,是最大的 可能的 前一个输入序列的编号。(这并不是说它是前一个输入序列中的最大数。)
2. 任何小于O的数 _n个由于“最近”规则,在输入中最后一次出现之前出现的值将不合格。没有比O小的数字 _n个可能发生在最近一次发生之后,因为“最近”规则和传递属性:如果 _我 &升;O _n个以及 _j型 <A _我然后A _j型 &升;O _n个
然后是集合论:
1. 一个 _n个必须是O集的未计数元素集的元素 _n+1个到O _米 ,其中m是最小的m>n,因此O _米 &升;O _n个 . 之后的任何输出 _米大于0 _米 (其中 _n个是)必须显示为 _米 .
2. 如果在输出中看到一个元素,但它没有出现在输入中与输出的其余部分一致的位置,则该元素是不确定的。显然,我需要一个比这个更好的定义,以便编码和算法来计算它。

似乎某种集合论和/或组合数学或线性代数可以帮助我们计算出可能的序列的数量,这些序列可以解释所有未计数的输出,并符合其他约束条件。 (编者按:事实上,事情从来没有那么复杂。)

2 回复 | 直到 6 年前

Old Pro 6 年前

下面的代码通过了Codibility的所有测试。手术增加了一个 main 函数在命令行上使用它。

这些限制并不像OP认为的那么复杂。尤其是,从来没有一种情况需要添加一个限制,即输入是输出中其他地方看到的一组特定整数的元素。每个输入位置都有明确的最小值和最大值。

该规则的唯一复杂之处在于,有时最大值是“前一个输入的值”,并且输入本身有一个范围。但即便如此,所有类似的值都是连续的并且具有相同的范围,因此可能性的数量可以用基本的组合数学来计算,并且这些输入作为一个组独立于其他输入(它们仅用于设置范围),所以这个组的可能性可以通过简单的乘法与其他输入位置的可能性相结合。

算法概述

该算法通过输出数组进行一次遍历,每次遍历后更新输入数组的可能数目跨度 ,这就是我所说的输出中的数字重复。(你可以说输出的最大子序列,其中每个元素都是相同的)。 0,1,1,2 我们有三个跨度: 0 , 1,1 和 2 . 当新跨距开始时,将计算上一个跨距的可能性数。

这一决定是基于以下几点意见:

为了跨度长度大于1,输入的最小值允许在第一个位置输入的值是什么在第二个位置。计算 span是简单的组合数学,但标准公式需要知道数字的范围和跨度的长度。
每次输出更改(以及新的范围存在),这会强烈限制上一个范围的值:
1. 当产出上升时,唯一可能的原因是先前的投入是新的、更高的产出的价值,而对应于新的、更高的产出的位置的投入更高。
2. 当输出下降时,会建立新的约束,但这些约束有点难以表达。算法存储楼梯 (见下文)以量化输出下降时施加的约束

这里的目的是限制跨度 . 一旦我们准确地做到了这一点,计算组合的数量就很简单了。

因为编码器会回溯到输出一个与输入相关的数字的两种方式,一种是更小的,另一种是更近的,我们知道我们可以舍弃更大的,更远的数字。在输出中出现一个小数字后,该位置之前的较大数字不会对后面的内容产生任何影响。

因此,为了在输出序列减少时限制这些输入范围,我们需要存储楼梯 -原始数组中位置的可能值越来越大的列表。E、 g代表 0,2,5,7,2,4 楼梯是这样搭起来的: 0个 , 0,2 , 0,2,5 , 0,2,5,7 , 0,2个 , 0,2,4 .

使用这些界限,我们可以确定第二个位置上的数字 2个 (在示例的最后一个位置旁边)必须位于 (2,5] ,因为 5 是下一个楼梯。如果输入大于5,则在该空间中会输出5而不是2。注意,如果编码数组中的最后一个数字不是 4 ,但是 6 我们会提前退出 0个 ,因为我们知道之前的数字不能大于 5个 .

复杂性是 O(n*lg(min(n,m))) .

功能

CombinationsWithReplacement -计数 number of combinations with replacements 大小 k 从 n 数字。E、 g.用于 (3, 2) 它很重要 3,3 , 3,2 , 3,1 , 2,2 , 2,1 , 1,1个 ,所以返回 6个 它和 choose(n - 1 + k, n - 1) .
nextBigger -在一个范围内查找下一个较大的元素。E、 g.用于 4个 在子数组中 1,2,3,4,5 它又回来了 5个 ,并在子数组中 1,3 它返回其参数 Max .
countSpan (lambda)-计算一个跨度有多少不同的组合 我们刚刚通过 可以有。考虑跨度 2,2个 对于 0,2,5,7,2,2,7 .
1. 什么时候? curr 到达最后位置, 货币 是 7 和 prev 是决赛 2个 的 2,2个 跨度。
2. 它计算最大和最小可能的值。 上一个 跨度。在这一点上,楼梯包括 2,5,7 然后最大可能值为 5个 ( 下一个迁移器 之后 2个 在 stair 2,5,7 ). 大于 5个 在这个范围内会有输出 5个 ,不是 2个 .
3. 它计算跨度的最小值(跨度中每个元素的最小值),即 上一个 在这一点上,(记住 货币 此时等于 7个 和 上一个 到 2个 ). 我们确信这将取代决赛 2个 输出,原始输入必须 7个 ,所以最小值是 7个 . (这是“产量上升”规则的结果。如果我们有 7,7,2 和 货币 会是 2个 则上一跨度的最小值 7,7 )会是 8 那就是 prev + 1 .
4. 它调整组合的数量。一段长度一有一系列 n个可能性(1+max min),有可能性,与千要么一或 L-1型 取决于跨度后面的内容。
  - 一段时间后 更大的 数字,比如 2,2,7 , k=L-1 因为 2,2个 跨度必须是 7个 (跨距后第一个数字的值)。
  - 一段时间后 较小的 数字,比如 七、七、二 , k=L 因为最后一个元素 7,7个 没有特殊限制。
5. 最后,它叫 组合开关更换 为了找出分支的数量(或可能性),计算新的 res 部分结果值(我们正在执行的模运算中的余数值),并返回新的 物件 价值和 max 以便进一步处理。
solution -迭代给定的编码器输出数组。在主循环中,在跨度中计算跨度长度,在跨度边界处更新 物件 通过打电话 countSpan公司 可能会更新楼梯 .
- 如果当前跨距包含的数字大于上一个跨距,则:
  1. 检查下一个号码的有效性。例如 0,2,5,2,7 是无效的输入,因为 7个 在倒数第二个位置,仅 3 ,或 4个 ,或 5个 .
  2. 它会更新楼梯。当我们只看到 0,2个 ,楼梯是 0,2个 ,但在下一个之后 5个 ,楼梯变成 0,2,5个 .
- 如果当前跨距包含的数字比前一个小,则:
  1. 它会更新楼梯。当我们只看到 0,2,5个 ,我们的楼梯 0,2,5个 ,但在我们看到 0,2,5,2 楼梯变成 0,2个 .
- 在主循环之后,它通过调用 countSpan公司 具有 -1 它触发了计算的“输出下降”分支。
normalizeMod , extendedEuclidInternal , extendedEuclid , invMod 这些辅助功能有助于处理模运算。

_{对于楼梯,我使用编码数组的存储,因为楼梯的数量永远不会超过当前位置。}

#include <algorithm>
#include <cassert>
#include <vector>
#include <tuple>

const int Modulus = 1'000'000'007;

int CombinationsWithReplacement(int n, int k);

template <class It>
auto nextBigger(It begin, It end, int value, int Max) {
    auto maxIt = std::upper_bound(begin, end, value);
    auto max = Max;
    if (maxIt != end) {
        max = *maxIt;
    }
    return max;
}

auto solution(std::vector<int> &B, const int Max) {
    auto res = 1;
    const auto size = (int)B.size();
    auto spanLength = 1;
    auto prev = 0;
    // Stairs is the list of numbers which could be smaller than number in the next position
    const auto stairsBegin = B.begin();
    // This includes first entry (zero) into stairs
    // We need to include 0 because we can meet another zero later in encoded array
    // and we need to be able to find in stairs
    auto stairsEnd = stairsBegin + 1;

    auto countSpan = [&](int curr) {
        const auto max = nextBigger(stairsBegin, stairsEnd, prev, Max);
        // At the moment when we switch from the current span to the next span
        // prev is the number from previous span and curr from current.
        // E.g. 1,1,7, when we move to the third position cur = 7 and prev = 1.
        // Observe that, in this case minimum value possible in place of any of 1's can be at least 2=1+1=prev+1.
        // But if we consider 7, then we have even more stringent condition for numbers in place of 1, it is 7
        const auto min = std::max(prev + 1, curr);
        const bool countLast = prev > curr;
        const auto branchesCount = CombinationsWithReplacement(max - min + 1, spanLength - (countLast ? 0 : 1));
        return std::make_pair(res * (long long)branchesCount % Modulus, max);
    };

    for (int i = 1; i < size; ++i) {
        const auto curr = B[i];
        if (curr == prev) {
            ++spanLength;
        }
        else {
            int max;
            std::tie(res, max) = countSpan(curr);
            if (prev < curr) {
                if (curr > max) {
                    // 0,1,5,1,7 - invalid because number in the fourth position lies in [2,5]
                    // and so in the fifth encoded position we can't something bigger than 5
                    return 0;
                }
                // It is time to possibly shrink stairs.
                // E.g if we had stairs 0,2,4,9,17 and current value is 5,
                // then we no more interested in 9 and 17, and we change stairs to 0,2,4,5.
                // That's because any number bigger than 9 or 17 also bigger than 5.
                const auto s = std::lower_bound(stairsBegin, stairsEnd, curr);
                stairsEnd = s;
                *stairsEnd++ = curr;
            }
            else {
                assert(curr < prev);
                auto it = std::lower_bound(stairsBegin, stairsEnd, curr);
                if (it == stairsEnd || *it != curr) {
                    // 0,5,1 is invalid sequence because original sequence lloks like this 5,>5,>1
                    // and there is no 1 in any of the two first positions, so
                    // it can't appear in the third position of the encoded array
                    return 0;
                }
            }
            spanLength = 1;
        }
        prev = curr;
    }
    res = countSpan(-1).first;
    return res;
}

template <class T> T normalizeMod(T a, T m) {
    if (a < 0) return a + m;
    return a;
}

template <class T> std::pair<T, std::pair<T, T>> extendedEuclidInternal(T a, T b) {
    T old_x = 1;
    T old_y = 0;
    T x = 0;
    T y = 1;
    while (true) {
        T q = a / b;
        T t = a - b * q;
        if (t == 0) {
            break;
        }
        a = b;
        b = t;
        t = x; x = old_x - x * q; old_x = t;
        t = y; y = old_y - y * q; old_y = t;
    }
    return std::make_pair(b, std::make_pair(x, y));
}

// Returns gcd and Bezout's coefficients
template <class T> std::pair<T, std::pair<T, T>> extendedEuclid(T a, T b) {
    if (a > b) {
        if (b == 0) return std::make_pair(a, std::make_pair(1, 0));
        return extendedEuclidInternal(a, b);
    }
    else {
        if (a == 0) return std::make_pair(b, std::make_pair(0, 1));
        auto p = extendedEuclidInternal(b, a);
        std::swap(p.second.first, p.second.second);
        return p;
    }
}

template <class T> T invMod(T a, T m) {
    auto p = extendedEuclid(a, m);
    assert(p.first == 1);
    return normalizeMod(p.second.first, m);
}


int CombinationsWithReplacement(int n, int k) {
    int res = 1;
    for (long long i = n; i < n + k; ++i) {
        res = res * i % Modulus;
    }
    int denom = 1;
    for (long long i = k; i > 0; --i) {
        denom = denom * i % Modulus;
    }
    res = res * (long long)invMod(denom, Modulus) % Modulus;
    return res;
}


//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

#include <string.h>

// Usage: 0 1 2,3, 4 M
// Last arg is M, the max value for an input.
// Remaining args are B (the output of the encoder) separated by commas and/or spaces
// Parentheses and brackets are ignored, so you can use the same input form as Codility's tests: ([1,2,3], M)
int main(int argc, char* argv[]) {
  int Max;
  std::vector<int> B;
  const char* delim = " ,[]()";

  if (argc < 2 ) {
    printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
    return 1;
  }

  for (int i = 1; i < argc; i++) {
    char* parse;
    parse = strtok(argv[i], delim);
    while (parse != NULL)
    {
       B.push_back(atoi(parse));
       parse = strtok (NULL, delim);
    }
  }
  Max = B.back();
  B.pop_back();

  printf("%d\n", solution(B, Max));
  return 0;
}

//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
// Only the above is needed for the Codility challenge. Below is to run on the command line.
//
// Compile with: gcc -std=gnu++14 -lc++ -lstdc++ array_recovery.cpp
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

#include <string.h>

// Usage: M 0 1 2,3, 4
// first arg is M, the max value for an input.
// remaining args are B (the output of the encoder) separated by commas and/or spaces
int main(int argc, char* argv[]) {
  int Max;
  std::vector<int> B;
  const char* delim = " ,";

  if (argc < 3 ) {
    printf("Usage: %s M 0 1 2,3, 4... \n", argv[0]);
    return 1;
  }

  Max = atoi(argv[1]);
  for (int i = 2; i < argc; i++) {
    char* parse;
    parse = strtok(argv[i], delim);
    while (parse != NULL)
    {
       B.push_back(atoi(parse));
       parse = strtok (NULL, delim);
    }
  }


  printf("%d\n", solution(B, Max));
  return 0;
}

让我们看一个例子:

最大值=5
数组是
0 1 3 0 1 1 3
1
1 2..5
1 3 4..5
1 3 4..5 1
1 3 4..5 1 2..5
1 3 4..5 1 2..5 >=..2 (抱歉,写得太麻烦了)
1 3 4..5 1 3..5 >=..3 4..5
现在计数:
1 1 2 1 3 2 相当于 12 总共。

×××¢× ××¨×§× 6 年前

有个主意。构造输出的一个已知方法是使用堆栈。当元素较大或相等时,我们弹出它,然后输出较小元素,如果它存在,然后将较大元素推到堆栈上。现在,如果我们试图从输出反向执行该怎么办?

首先,我们将使用c–dility示例演示堆栈方法。

[2, 5, 3, 7, 9, 6]
2: output 0, stack [2]
5: output 2, stack [2,5]
3: pop 5, output, 2, stack [2,3]
7: output 3, stack [2,3,7]
... etc.

Final output: [0, 2, 2, 3, 7, 3]

现在让我们尝试重建!我们会用 stack 作为虚拟堆栈和重新组合的输入:

(Input: [2, 5, 3, 7, 9, 6])
Output: [0, 2, 2, 3, 7, 3]

* Something >3 that reached 3 in the stack
stack = [3, 3 < *]

* Something >7 that reached 7 in the stack
but both of those would've popped before 3
stack = [3, 7, 7 < x, 3 < * <= x]

* Something >3, 7 qualifies
stack = [3, 7, 7 < x, 3 < * <= x]

* Something >2, 3 qualifies
stack = [2, 3, 7, 7 < x, 3 < * <= x]

* Something >2 and >=3 since 3 reached 2
stack = [2, 2 < *, 3, 7, 7 < x, 3 < * <= x]

我们来举几个例子:

例1:

[0, 0, 0, 2, 3, 4]

* Something >4
stack = [4, 4 < *]

* Something >3, 4 qualifies
stack = [3, 4, 4 < *]

* Something >2, 3 qualifies
stack = [2, 3, 4, 4 < *]

* The rest is non-increasing with lowerbound 2
stack = [y >= x, x >= 2, 2, 3, 4, >4]

例2:

[0, 0, 0, 4]

* Something >4
stack [4, 4 < *]

* Non-increasing
stack = [z >= y, y >= 4, 4, 4 < *]

计算组合的数量是通过将所有部分的可能性相乘来实现的。截面可以是有界的单个单元格;也可以是一个或多个单元格的有界的、不递增的子数组。要计算后者,我们使用多选二项式, (n + k - 1) choose (k - 1) . 考虑到我们可以表达 3 单元格为:

(ub - cell_3) + (cell_3 - cell_2) + (cell_2 - cell_1) + (cell_1 - lb) = ub - lb

那么分配的方式有多少 ub - lb 进入之内 (x + 1) 细胞是

(n + k - 1) choose (k - 1)
or
(ub - lb + x) choose x

For example, the number of non-increasing sequences between
(3,4) in two cells is (4 - 3 + 2) choose 2 = 3: [3,3] [4,3] [4,4]

And the number of non-increasing sequences between
(3,4) in three cells is (4 - 3 + 3) choose 3 = 4: [3,3,3] [4,3,3] [4,4,3] [4,4,4]

(解释归因于 Brian M. Scott .)

粗略的JavaScript草图(代码不可靠;它只是用来说明编码。编码器将[下限、上限]或非递增序列列为[非递增、长度、下限、上限]:

function f(A, M){
  console.log(JSON.stringify(A), M);
  let i = A.length - 1;
  let last = A[i];
  let s = [[last,last]];
  if (A[i-1] == last){
    let d = 1;
    s.splice(1,0,['non_inc',d++,last,M]);
    while (i > 0 && A[i-1] == last){
      s.splice(1,0,['non_inc',d++,last,M]);
      i--
    }
  } else {
    s.push([last+1,M]);
    i--;
  }
  if (i == 0)
    s.splice(0,1);

  for (; i>0; i--){
    let x = A[i];

    if (x < s[0][0])
      s = [[x,x]].concat(s);

    if (x > s[0][0]){
      let [l, _l] = s[0];
      let [lb, ub] = s[1];
      s[0] = [x+1, M];
      s[1] = [lb, x];
      s = [[l,_l], [x,x]].concat(s);
    }

    if (x == s[0][0]){
      let [l,_l] = s[0];
      let [lb, ub] = s[1];
      let d = 1;
      s.splice(0,1);
      while (i > 0 && A[i-1] == x){
        s =
    [['non_inc', d++, lb, M]].concat(s);
        i--;
      }
      if (i > 0)
        s = [[l,_l]].concat(s);
    }
  }

  // dirty fix
  if (s[0][0] == 0)
    s.splice(0,1);

  return s; 
}

var a = [2, 5, 3, 7, 9, 6]
var b = [0, 2, 2, 3, 7, 3]
console.log(JSON.stringify(a));
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,0,0,0,4]
console.log(JSON.stringify(f(b,10)));
b = [0,0,0,2,3,4]
console.log(JSON.stringify(f(b,10)));
b = [0,2,2]
console.log(JSON.stringify(f(b,4)));
b = [0,3,5,6]
console.log(JSON.stringify(f(b,10)));
b = [0,0,3,0]
console.log(JSON.stringify(f(b,10)));