代码之家  ›  专栏  ›  技术社区  ›  ShanZhengYang

如何为pandas构造嵌套for循环和条件的列表理解?

  •  1
  • ShanZhengYang  · 技术社区  · 6 年前

    我很难像预期的那样理解下面的复杂列表。它是一个带条件的双嵌套for循环。

    import pandas as pd
    
    dict1 = {'stringA':['ABCDBAABDCBD','BBXB'], 'stringB':['ABDCXXXBDDDD', 'AAAB'], 'num':[42, 13]}
    
    df = pd.DataFrame(dict1)
    print(df)
            stringA       stringB  num
    0  ABCDBAABDCBD  ABDCXXXBDDDD   42
    1          BBXB          AAAB   13
    

    此数据帧有两列 stringA stringB 包含字符的字符串 A , B , C D , X . 根据定义,这两个字符串具有相同的长度。

    基于这两列,我创建了这样的字典 斯特林加 字符串B 从开始于的索引开始 num .

    def create_translation(x):
        x['translated_dictionary'] = {i: i +x['num'] for i, e in enumerate(x['stringA'])}
        return x
    
    df2 = df.apply(create_translation, axis=1).groupby('stringA')['translated_dictionary']
    
    
    df2.head()
    0    {0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: ...
    1                         {0: 13, 1: 14, 2: 15, 3: 16}
    Name: translated_dictionary, dtype: object
    
    print(df2.head()[0])
    {0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: 48, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}
    
    print(df2.head()[1])
    {0: 13, 1: 14, 2: 15, 3: 16}
    

    没错。

    但是,这些字符串中有“X”字符。这需要一个特殊的规则:如果 斯特林加 ,不要在字典中创建键值对。如果 字符串B ,则该值不应为 i + x['num'] 但是 -500 .

    我尝试了以下列表:

    def try1(x):
        for count, element in enumerate(x['stringB']):
            x['translated_dictionary'] = {i: -500 if element == 'X' else  i + x['num'] for i, e in enumerate(x['stringA']) if e != 'X'}
        return x
    

    这是错误的答案。

    df3 = df.apply(try1, axis=1).groupby('stringA')['translated_dictionary']
    
    print(df3.head()[0]) ## this is wrong!
    {0: 42, 1: 43, 2: 44, 3: 45, 4: 46, 5: 47, 6: 48, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}
    
    print(df3.head()[1])   ## this is correct! There is no key for 2:15!
    {0: 13, 1: 14, 3: 16}
    

    正确答案是:

    print(df3.head()[0])
    {0: 42, 1: 43, 2: 44, 3: 45, 4:-500, 5:-500, 6:-500, 7: 49, 8: 50, 9: 51, 10: 52, 11: 53}
    
    print(df3.head()[1])
    {0: 13, 1: 14, 3: 16}
    
    2 回复  |  直到 6 年前
        1
  •  1
  •   John Zwinck    6 年前

    这里有一个简单的方法,没有任何理解(因为它们无助于澄清代码):

    def create_translation(x):
        out = {}
        num = x['num']
        for i, (a, b) in enumerate(zip(x['stringA'], x['stringB'])):
            if a == 'X':
                pass
            elif b == 'X':
                out[i] = -500
            else:
                out[i] = num
            num += 1
        x['translated_dictionary'] = out
        return x
    
        2
  •  0
  •   BENY    6 年前

    post 重新创造 dict

    n=df.stringA.str.len()
    newdf=pd.DataFrame({'num':df.num.repeat(n),'stringA':sum(list(map(list,df.stringA)),[]),'stringB':sum(list(map(list,df.stringB)),[])})
    
    
    newdf=newdf.loc[newdf.stringA!='X'].copy()# remove stringA value X
    newdf['value']=newdf.groupby('num').cumcount()+newdf.num # using groupby create the cumcount 
    newdf.loc[newdf.stringB=='X','value']=-500# assign -500 when stringB is X
    [dict(zip(x.groupby('num').cumcount(),x['value']))for _,x in newdf.groupby('num')] # create the dict for different num by group
    Out[390]: 
    [{0: 13, 1: 14, 2: 15},
     {0: 42,
      1: 43,
      2: 44,
      3: 45,
      4: -500,
      5: -500,
      6: -500,
      7: 49,
      8: 50,
      9: 51,
      10: 52,
      11: 53}]