代码之家 › 专栏 › 技术社区 › Chris Thomas

优化置换搜索循环(不能使用itertools),速度非常慢。有什么建议吗?

pypy itertools permutation optimization python

Chris Thomas · 技术社区 · 7 年前

这是一个游戏,你有12张牌,你选择你,直到你从同一组中选择3张。我试图找出选择每组的概率。我创作的剧本很管用,但速度非常慢。我的同事在R中创建了一个类似的脚本,没有函数,他的脚本花费的时间是我的脚本的1/100。我只是想找出原因。如果您有任何想法,我们将不胜感激。

from collections import Counter
import pandas as pd
from datetime import datetime

weight = pd.read_excel('V01Weights.xlsx')

Symb    Weight
Grand   170000
Grand   170000
Grand   105
Major   170000
Major   170000
Major   215
Minor   150000
Minor   150000
Minor   12000
Bonus   105000
Bonus   105000
Bonus   105000

Max Picks表示不同“卡”的总数。Total Picks表示用户选择的最大数量。这是因为在8次选择后,你保证每种类型有2次,所以在第9次选择时,你保证有3次匹配。

TotalPicks = 9
MaxPicks = 12

这应该被命名为PickedProbabilities。

Picks = {0:0,1:0,2:0,3:0}

def Time_It(function):
    start =datetime.now()

    x = function()

    finish = datetime.now()

    TotalTime = finish - start

    Minutes = int(TotalTime.seconds/60)
    Seconds = TotalTime.seconds % 60


    print('It took ' + str(Minutes) + ' minutes and ' + str(Seconds) + ' seconds')

    return(x)

给定x(按顺序选择),我找到概率。这些镐不需要更换

def Get_Prob(x,weight):
    prob = 1
    weights = weight.iloc[:,1]

    for index in x:
        num = weights[index]
        denom = sum(weights)

        prob *= num/denom
        weights.drop(index, inplace = True)
        # print(weights)

    return(prob)

这用于确定我的循环中是否存在重复项,因为这是不允许的

def Is_Allowed(x):
    return(len(x) == len(set(x)))

这决定了目前为止出现的所有牌中是否存在赢牌。

def Is_Win(x):
    global Picks

    WinTypes = [[0,1,2],[3,4,5],[6,7,8],[9,10,11]]

    IsWin = False

    for index,item in enumerate(WinTypes):
        # print(index)
        if set(item).issubset(set(x)):
            IsWin = True
            Picks[index] += Get_Prob(x,weight)
            # print(Picks[index])
            print(sum(Picks.values()))
            break

    return(IsWin)

def Cycle():

    for a in range(MaxPicks):
        x = [a]

        for b in range(MaxPicks):
            x = [a,b]

            if Is_Allowed(x):
                for c in range(MaxPicks):
                    x = [a,b,c]
                    if Is_Allowed(x):                    
                        if Is_Win(x):
                            # print(x)
                            continue
                        for d in range(MaxPicks):
                            x = [a,b,c,d]
                            if Is_Allowed(x):
                                if Is_Win(x):
                                    # print(x)
                                    continue
                            for e in range(MaxPicks):
                                x = [a,b,c,d,e]
                                if Is_Allowed(x):
                                    if Is_Win(x):
                                        continue
                                for f in range(MaxPicks):
                                    x = [a,b,c,d,e,f]
                                    if Is_Allowed(x):
                                        if Is_Win(x):
                                            continue
                                    for g in range(MaxPicks):
                                        x = [a,b,c,d,e,f,g]
                                        if Is_Allowed(x):
                                            if Is_Win(x):
                                                continue
                                        for h in range(MaxPicks):
                                            x = [a,b,c,d,e,f,g,h]
                                            if Is_Allowed(x):
                                                if Is_Win(x):
                                                    continue
                                            for i in range(MaxPicks):
                                                if Is_Allowed(x):
                                                    if Is_Win(x):
                                                        continue

调用主函数

x = Time_It(Cycle)
print(x)

将概率写入文本文件

with open('result.txt','w') as file:
    # file.write(pickle.dumps(x))
    for item in x:
        file.write(str(item) + ',' + str(x[item]) + '\n')

3 回复 | 直到 7 年前

Raymond Hettinger 7 年前

我的同事在R中创建了一个类似的脚本,没有函数,他的脚本花费的时间是我的脚本的1/100。

两个简单的优化:

Is_Allowed() 因为Python有很多函数调用开销(例如创建新的stackframe和参数元组)。

pypy 它非常擅长优化像这样的函数。

ead 7 年前

为了加快程序的速度,需要两种见解(我想你已经有了,只是为了完整性)

序列的概率 (card_1, card_2) 和 (card_2, card_1) 不相等,所以我们不能使用urn问题的结果,看起来我们需要尝试所有置换。
然而,考虑到目前为止我们挑选的一组牌,我们并不真正需要它们在哪里挑选的顺序信息——对于游戏的未来进程来说都是一样的。因此,使用动态规划并计算在游戏期间遍历每个子集的概率就足够了(因此我们需要检查 2^N 而不是 N! 国家)。

set i 接下来是:

 norm:=sum Wi for i in set 
 P(i|set)=Wi/norm if i not in set else 0.0

P(set) -游戏中出现一组拾取的牌的概率为:

set_without_i:=set/{i}
P(set)=sum P(set_without_i)*P(i|set_without_i) for i in set

但是,这只能用于 set_without_i 游戏尚未结束,即没有一组选择了3张牌。

这可以通过递归+记忆来实现,或者像我的版本那样,通过使用自底向上的动态规划来实现。它还使用整数的二进制表示来表示集合和(最重要的部分!)几乎立即返回结果 [('Grand', 0.0014104762718021384), ('Major', 0.0028878988709489244), ('Minor', 0.15321793072867956), ('Bonus', 0.84248369412856905)] :

#calculates probability to end the game with 3 cards of a type


N=12

#set representation int->list
def decode_set(encoded):
    decoded=[False]*N
    for i in xrange(N):
        if encoded&(1<<i):
            decoded[i]=True
    return decoded

weights = [170000, 170000, 105, 170000, 170000, 215, 150000, 150000, 12000, 105000, 105000, 105000]     
def get_probs(decoded_set):
    denom=float(sum((w for w,is_taken in zip(weights, decoded_set) if not is_taken)))
    return [w/denom if not is_taken else 0.0 for w,is_taken in zip(weights, decoded_set)]

def end_group(encoded_set):
    for i in xrange(4):
       whole_group =  7<<(3*i) #7=..000111, 56=00111000 and so on
       if (encoded_set & whole_group)==whole_group:
           return i
    return None


#MAIN: dynamic program:

MAX=(1<<N)#max possible set is 1<<N-1
probs=[0.0]*MAX

#we always start with the empty set:
probs[0]=1.0    
#building bottom-up
for current_set in xrange(MAX):
    if end_group(current_set) is None:  #game not ended yet!
       decoded_set=decode_set(current_set)
       trans_probs=get_probs(decoded_set)
       for i, is_set in enumerate(decoded_set):
           if not is_set:
              new_set=current_set | (1<<i) 
              probs[new_set]+=probs[current_set]*trans_probs[i]

#filtering wins:
group_probs=[0.0]*4
for current_set in xrange(MAX):
   group_won=end_group(current_set)
   if group_won is not None:
      group_probs[group_won]+=probs[current_set]


print zip(["Grand", "Major", "Minor", "Bonus"], group_probs)

对代码中使用的“技巧”的一些解释:

[a,b,c] ,因此我们可以表示集合 {b,c} 像 110 ,这意味着 a 0 b(1) 在场景中, c(1) 在集合中。然而 110 读取为整数 6 .

[a,b] 带配重 [2,1] .

0 作为整数,概率向量(给定集、其二进制表示和映射到概率的整数):

  probs=[{}=00=0->1.0,  01={a}=1->0.0, {b}=10=2->0.0, {a,b}=11=3->0.0]

current_set=0 ,有两种可能性66%可以持卡 一 b ,因此处理后的概率变为:

 probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.0]

current_set=1={a} 唯一的可能性是 b {a,b} 因此我们需要更新其( 3={a,b} )通过我们的公式,我们得到:

 probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->0.66]

在下一步中,我们处理 2 ,并且给定集合{b},唯一的可能性是选择卡片 ,所以集合的概率需要再次更新

probs=[{}=00=0->1.0,  01={a}=1->0.66, {b}=10=2->0.33, {a,b}=11=3->1.0]

我们可以到达 {a,b} 在我们比赛的某个时候 1.0 .

{a,b} 在我们处理这一组之前(这将是下一步)。

ead 7 年前

编辑: 我误解了原始问题,这里提出的解决方案针对以下问题:

我保持原样,因为在这么多年的概率论之后,我很高兴能解决这个问题,我就是不能删除它:)

提高性能有两种可能:加快代码速度(在开始之前,应该对代码进行分析,以便知道应该优化程序的哪一部分,否则时间会花在优化不重要的事情上)或改进算法。我提议做第二件事。

好的,这个问题似乎比第一个站点更复杂。让我们从一些观察开始。

如果 Pi i 在比赛中的某个地方被选中,然后我们寻找分数的预期值 E(Score)=P1*W1+P2*W2+...Pn*Wn 然而,如果我们看一组的牌,我们可以说,由于对称性,这个组的牌的概率是相同的,例如。 P1=P2=P3=:Pgrand 就你而言。因此,可以计算出我们的期望值:

E(Score)=3*Pgrand*(W1+W2+W3)/3+...+3*Pbonus*(W10+W11+W12)/3

我们打电话 averageWgrand:=(W1+W2+W3)/3 请注意 E(#grand)=3*Pgrand

E(Score)=E(#grand)*averageWgrand+...+E(#bonus)*averageWbonus

E(#grand)=E(#minor)=E(#major)=E(#grand)=:(E#group) 为了简单起见,在下面我们只考虑这种特殊情况(但概述的解决方案也可以扩展到一般情况)。这导致以下简化:

E(Score)=4*E(#group)(averageWgrand+...+averageWbonus)/4

averageW:=(averageWgrand+...+averageWbonus)/4 E(#cards)=4*E(#grand)

因此 E(Score)=E(#cards)*averageW ,因此我们的任务简化为在游戏结束时计算牌数的预期值:

 E(#cards)=P(1)*1+P(2)*2+...P(n)*n

P(i) 一、 P(1) , P(2) 和 P(k) , k>9 很容易看到-它们是 0

一、 挑选的卡片- P(一) :

一、 当且仅当满足以下条件时,才能赢得牌和胜利:

只有一组被选中了3张牌。我们称之为这个群体 full_group .
最后一张(第i张)牌来自 full_组 .

P(win) P(一) P(win, full=grand) full_group=grand ):

 P(win)=P(win, grand)+P(win, minor)+P(win, major)+P(win, bonus)
       =4*P(win, grand)

P(win, grand)

拣货后 i-1 卡片选取的大卡片数为2,即#grand=2,并且
拣货后卡片,对于每个组,拾取的卡片数量少于3张
我们在最后一轮中选了一张大牌。假设前两个约束成立,则该(条件)概率为 1/(n-i+1) (有 n-i+1 卡片左边,只有一张是“右边”)。

从urn问题中,我们知道

P(#grand=u, #minor=x, #major=y, #bonus=z) = binom(3,u)*binom(3,x)*binom(3,y)*binom(3,z)/binom(12, u+x+y+z)

binom(n,k)=n!/k!/(n-k)! 因此

P(win, grand) = 1/(n-i+1)*sum P(#grand=2, #minor=x, #major=y, #bonus=z)
               where x<=2, y<=2, z<=2 and 2+x+y+z=i-1

现在是代码:

import math

def binom(n,k):
  return math.factorial(n)//math.factorial(k)//math.factorial(n-k)

#expected number of cards:

n=12 #there are 12 cards
probs=[0]*n
for minor in xrange(3):
  for major in xrange(3):
     for bonus in xrange(3):
         i = 3 + minor +major +bonus 
         P_urn = binom(3,2)*binom(3,minor)*binom(3,major)*binom(3,bonus)/float(binom(n, n-i+1))
         P_right_last_card = 1.0/(n-i+1)
         probs[i]+=4*P_urn*P_right_last_card #factor 4 from symmetry


print "Expected number of cards:", sum((prob*card_cnt for card_cnt, prob in enumerate(probs)))

结果我 6.94285714286

显然,如果你想处理更一般的情况(更多的组,不同组中的数字卡),你必须扩展代码(递归,记忆) binom

但最关键的部分是:使用这种方法,你(几乎)不在乎选卡的顺序,因此你必须检查的状态数减少了一倍 (k-1)! 哪里 k k=9 因此,该方法按因子计算速度更快 40000