代码之家  ›  专栏  ›  技术社区  ›  Khalil Al Hooti

根据列中另一列中不同数组的元素值获取列中数组的元素

  •  0
  • Khalil Al Hooti  · 技术社区  · 6 年前

    我有以下熊猫数据集。

    import numpy as np
    import pandas as pd
    
    events = ['event1', 'event2', 'event3', 'event4', 'event5', 'event6']
    
    wells = [np.array([1, 2]), np.array([1, 3]), np.array([1]),
     np.array([4, 5, 6]), np.array([4, 5, 6]), np.array([7, 8])]
    
    traces_per_well = [np.array([24, 24]), np.array([24, 21]), np.array([18]),
     np.array([24, 24, 24]), np.array([24, 21, 24]), np.array([18, 21])]
    
    df = pd.DataFrame({"event_no": events, "well_array": wells,
      "trace_per_well": traces_per_well})
    
    df["total_traces"] = df['trace_per_well'].apply(np.sum)
    
    df['supposed_traces_no'] = df['well_array'].apply(lambda x: len(x)*24)
    
    df['pass'] = df['total_traces'] == df['supposed_traces_no']
    print(df)
    

      event_no well_array trace_per_well  total_traces  supposed_traces_no   pass
    0   event1     [1, 2]       [24, 24]            48                  48   True
    1   event2     [1, 3]       [24, 21]            45                  48  False
    2   event3        [1]           [18]            18                  24  False
    3   event4  [4, 5, 6]   [24, 24, 24]            72                  72   True
    4   event5  [4, 5, 6]   [24, 21, 24]            69                  72  False
    5   event6     [7, 8]       [18, 21]            39                  48  False
    

    trace_per_well 当它不等于24时,将放在一列中,并从列中取出相应的数组元素 well_array 在另一列中

    结果应该是这样的。

      event_no well_array trace_per_well  total_traces  supposed_traces_no   pass wrong_trace_in_well wrong_well
    0   event1     [1, 2]       [24, 24]            48                  48   True                 NaN        NaN
    1   event2     [1, 3]       [24, 21]            45                  48  False                  21          3
    2   event3        [1]           [18]            18                  24  False                  18          1
    3   event4  [4, 5, 6]   [24, 24, 24]            72                  72   True                 NaN        NaN
    4   event5  [4, 5, 6]   [24, 21, 24]            69                  72  False                  21          5
    5   event6     [7, 8]       [18, 21]            39                  48  False            (18, 21)     (7, 8)
    

    1 回复  |  直到 6 年前
        1
  •  2
  •   cs95 abhishek58g    6 年前

    我会做一个列表理解。在一次数据传递中生成结果,然后分配给相应的列。

    v = pd.Series(
            [list(zip(*((x, y) for x, y in zip(X, Y) if x != 24))) 
                for X, Y in zip(df['trace_per_well'], df['well_array'])])
    
    df['wrong_trace_in_well'] = v.str[0]
    df['wrong_well'] = v.str[-1]
    

    df[['wrong_trace_in_well', 'wrong_well']]
    
      wrong_trace_in_well wrong_well
    0                 NaN        NaN
    1               (21,)       (3,)
    2               (18,)       (1,)
    3                 NaN        NaN
    4               (21,)       (5,)
    5            (18, 21)     (7, 8)
    

    或者,如果要在多个过程中执行此操作,则

    df['wrong_trace_in_well'] = [[x for x in X if x != 24] for X in df['trace_per_well']]
    df['wrong_well'] = [
        [y for x, y in zip(X, Y) if x != 24] 
            for X, Y in  zip(df['trace_per_well'], df['well_array'])]
    

    df[['wrong_trace_in_well', 'wrong_well']]
    
      wrong_trace_in_well wrong_well
    0                  []         []
    1                [21]        [3]
    2                [18]        [1]
    3                  []         []
    4                [21]        [5]
    5            [18, 21]     [7, 8]