代码之家  ›  专栏  ›  技术社区  ›  Jason S

在Pandas数据帧中查找第一个匹配项

  •  4
  • Jason S  · 技术社区  · 7 年前

    import pandas as pd
    
    westcoast = pd.DataFrame([['Washington','Olympia'],['Oregon','Salem'],
                              ['California','Sacramento']],
                            columns=['state','capital'])
    print westcoast
    
            state     capital
    0  Washington     Olympia
    1      Oregon       Salem
    2  California  Sacramento
    

    westcoast[westcoast.state=='Oregon'].capital
    
    1    Salem
    Name: capital, dtype: object
    

    但我想得到字符串“Salem”:

    westcoast[westcoast.state=='Oregon'].capital.values[0]
    
    'Salem'
    

    .values[0]

    2 回复  |  直到 7 年前
        1
  •  5
  •   Dan jezrael    7 年前

    是的,你可以使用 Series.item 如果查找始终从 Series :

    westcoast.loc[westcoast.state=='Oregon', 'capital'].item()
    

    s = westcoast.loc[westcoast.state=='Oregon', 'capital']
    s = np.nan if s.empty else s.iat[0] 
    print (s) #Salem
    
    s = westcoast.loc[westcoast.state=='New York', 'capital']
    s = np.nan if s.empty else s.iat[0] 
    print (s)
    nan
    

    处理异常的更通用解决方案,因为有3种可能的输出场景:

    westcoast = pd.DataFrame([['Washington','Olympia'],['Oregon','Salem'],
                              ['California','Sacramento'],['Oregon','Portland']],
                            columns=['state','capital'])
    
    print (westcoast)
            state     capital
    0  Washington     Olympia
    1      Oregon       Salem
    2  California  Sacramento
    3      Oregon    Portland
    
    s = westcoast.loc[westcoast.state=='Oregon', 'capital']
    
    #if not value returned
    if s.empty:
        s = 'no match'
    
    #if only one value returned
    elif len(s) == 1:
        s = s.item()
    else:
    
    # if multiple values returned, return a list of values
        s = s.tolist()
    
    print (s)
    ['Salem', 'Portland']
    

    可以创建查找函数:

    def look_up(a):
        s = westcoast.loc[westcoast.state==a, 'capital']
        #for no match
        if s.empty:
            return np.nan
        #for match only one value
        elif len(s) == 1:
            return s.item()
        else:
        #for return multiple values
            return s.tolist()
    
    print (look_up('Oregon'))
    ['Salem', 'Portland']
    
    print (look_up('California'))
    Sacramento
    
    print (look_up('New Yourk'))
    nan
    
        2
  •  1
  •   unutbu    7 年前

    state 索引:

    state_capitals = westcoast.set_index('state')['capital']
    print(state_capitals['Oregon'])
    # Salem
    

    通过索引,每个查找都是 O(1) 平均而言 westcoast['state']=='Oregon' O(n) 比较。当然,建立索引也是 O(n) ,因此您需要进行多次查找才能获得回报。

    同时,一旦你 state_capitals 州首府 .