代码之家  ›  专栏  ›  技术社区  ›  M-M

如何根据列值和条件在数据框中插入行?

  •  3
  • M-M  · 技术社区  · 6 年前

    import pandas as pd
    import numpy as np
    df = pd.DataFrame({'Car_ID': ['B332', 'B332', 'B332', 'C315', 'C315', 'C315', 'C315', 'C315', 'F310', 'F310'], \
                        'Date': ['2018-03-12', '2018-03-14', '2018-03-15', '2018-03-17', '2018-03-13', '2018-03-15', \
                                 '2018-03-18', '2018-03-21', '2018-03-10', '2018-03-13'], \
                        'Driver': ['Alex', 'Alex', 'Mick', 'Sara', 'Sara', 'Jean', 'Sara', 'Sara', 'Franck','Michel']})
    df
    
    Out:    
        Car_ID  Date        Driver
    0   B332    2018-03-12  Alex
    1   B332    2018-03-14  Alex
    2   B332    2018-03-15  Mick
    3   C315    2018-03-17  Sara
    4   C315    2018-03-13  Sara
    5   C315    2018-03-15  Jean
    6   C315    2018-03-18  Sara
    7   C315    2018-03-21  Sara
    8   F310    2018-03-10  Franck
    9   F310    2018-03-13  Michel
    

    我为数据帧中的每个事件创建一个新列,如下所示:

    df["Event"] = np.where(df.Car_ID.str.contains('B', case=True, na=False), 'Rent_Car_B', \
                        np.where(df.Car_ID.str.contains('C', case=True, na=False), 'Rent_Car_C', \
                        np.where(df.Car_ID.str.contains('F', case=True, na=False), 'Rent_Car_F', df.Car_ID))) 
    df
    
    Out:
        Car_ID  Date        Driver  Event
    0   B332    2018-03-12  Alex    Rent_Car_B
    1   B332    2018-03-14  Alex    Rent_Car_B
    2   B332    2018-03-15  Mick    Rent_Car_B
    3   C315    2018-03-17  Sara    Rent_Car_C
    4   C315    2018-03-13  Sara    Rent_Car_C
    5   C315    2018-03-15  Jean    Rent_Car_C
    6   C315    2018-03-18  Sara    Rent_Car_C
    7   C315    2018-03-21  Sara    Rent_Car_C
    8   F310    2018-03-10  Franck  Rent_Car_F
    9   F310    2018-03-13  Michel  Rent_Car_F
    

    Event 我要为每个驱动程序更改添加新行的列如下:

    Out:
        Car_ID  Date        Driver  Event
    0   B332    2018-03-12  Alex    Rent_Car_B
    1   B332    2018-03-14  Alex    Rent_Car_B
    2   B332    2018-03-15  Mick    Rent_Car_B
    3   B332    2018-03-15          Alex to Mick
    4   C315    2018-03-17  Sara    Rent_Car_C
    5   C315    2018-03-13  Sara    Rent_Car_C
    6   C315    2018-03-15  Jean    Rent_Car_C
    7   C315    2018-03-15          Sara to Jean
    8   C315    2018-03-18  Sara    Rent_Car_C
    9   C315    2018-03-18          Jean to Sara
    10  C315    2018-03-21  Sara    Rent_Car_C
    11  F310    2018-03-10  Franck  Rent_Car_F
    12  F310    2018-03-13  Michel  Rent_Car_F
    13  F310    2018-03-13          Franck to Mike
    

    我会很感激你的建议!

    3 回复  |  直到 6 年前
        1
  •  3
  •   Ben.T    6 年前

    使用 shift

    df['Driver_shift'] = df['Driver'].shift()
    

    mask = (df['Driver'] != df['Driver_shift'])&(df['Car_ID'] == df['Car_ID'].shift())
    df_change = df[mask]
    

    现在,通过添加0.5来更改索引,以便以后进行连接和排序,并更改两列的值:

    df_change = df_change.set_index(df_change.index+0.5)
    df_change.loc[:,'Event'] = df_change['Driver_shift'] + ' to ' + df_change['Driver']
    df_change['Driver'] = '' # to replace the value
    

    pd.concat([df,df_change]).sort_index().reset_index(drop=True).drop('Driver_shift',1)
    

       Car_ID        Date  Driver             Event
    0    B332  2018-03-12    Alex        Rent_Car_B
    1    B332  2018-03-14    Alex        Rent_Car_B
    2    B332  2018-03-15    Mick        Rent_Car_B
    3    B332  2018-03-15              Alex to Mick
    4    C315  2018-03-17    Sara        Rent_Car_C
    5    C315  2018-03-13    Sara        Rent_Car_C
    6    C315  2018-03-15    Jean        Rent_Car_C
    7    C315  2018-03-15              Sara to Jean
    8    C315  2018-03-18    Sara        Rent_Car_C
    9    C315  2018-03-18              Jean to Sara
    10   C315  2018-03-21    Sara        Rent_Car_C
    11   F310  2018-03-10  Franck        Rent_Car_F
    12   F310  2018-03-13  Michel        Rent_Car_F
    13   F310  2018-03-13          Franck to Michel
    

    编辑:在每个驱动程序和日期之前添加一行

    df1 = df.copy()
    df1.index = df1.index +0.5
    df2 = pd.concat([df.drop('Event',1),df1]).sort_index().reset_index(drop=True)
    df2['Event'] = df2['Event'].fillna(df2['Driver'])
    

        2
  •  3
  •   Bharath M Shetty    6 年前

    # Add the Driver columns by shifting grouped by the Event
    df['new'] =  df.groupby('Event').apply(lambda x : x['Driver'].shift(1) +'to'+ x['Driver']).values
    # Split them by 'to'
    df['new'] =df['new'].str.split('to').bfill()
    # Check if both of them are equal
    m = df['new'].str[0] != df['new'].str[1]
    # Based on the condition create a new dataframe
    new_df = df.loc[m].copy().iloc[:-1]
    # Convert the list to the format you desired 
    new_df['new'] = new_df['new'].str[0] + ' to ' + new_df['new'].str[1]
    # Concat new dataframe and old dataframe 
    mdf = pd.concat([df.drop('new',1) , new_df.drop(['Driver','Event'],1) \
                   .rename(columns = {'new':'Event'})])
    
    
         Car_ID        Date  Driver             Event
    0   B332  2018-03-12    Alex        Rent_Car_B
    1   B332  2018-03-14    Alex        Rent_Car_B
    2   B332  2018-03-15    Mick        Rent_Car_B
    3   C315  2018-03-17    Sara        Rent_Car_C
    4   C315  2018-03-13    Sara        Rent_Car_C
    5   C315  2018-03-15    Jean        Rent_Car_C
    6   C315  2018-03-18    Sara        Rent_Car_C
    7   C315  2018-03-21    Sara        Rent_Car_C
    8   F310  2018-03-10  Franck        Rent_Car_F
    9   F310  2018-03-13  Michel        Rent_Car_F
    2   B332  2018-03-15     NaN      Alex to Mick
    5   C315  2018-03-15     NaN      Sara to Jean
    6   C315  2018-03-18     NaN      Jean to Sara
    8   F310  2018-03-10     NaN  Franck to Michel
    

    mdf = mdf.sort_index()
    
        3
  •  2
  •   ALollz    6 年前

    在你加入之后 Event DataFrame 以下内容:

    import pandas as pd
    
    # Modify the index so we can later append to the correct rows
    df.index= df.index*2
    
    # Determine when switches occur
    mask = (df.Driver != df.Driver.shift(1)) & (df.Car_ID == df.Car_ID.shift(1))
    sw_from = df[mask.shift(-1).fillna(False)].copy()
    sw_to = df[mask].copy()
    
    # Make the switching rows have the correct information
    sw_to['Event'] = sw_from.Driver.values + ' to ' + sw_to.Driver.values
    sw_to['Driver'] = ''
    
    # Modify the switching indices so they get added to the proper position
    sw_to.index = sw_to.index+1
    
    # Add them to df
    df = df.append(sw_to).sort_index().reset_index(drop=True)
    

       Car_ID        Date  Driver             Event
    0    B332  2018-03-12    Alex        Rent_Car_B
    1    B332  2018-03-14    Alex        Rent_Car_B
    2    B332  2018-03-15    Mick        Rent_Car_B
    3    B332  2018-03-15              Alex to Mick
    4    C315  2018-03-17    Sara        Rent_Car_C
    5    C315  2018-03-13    Sara        Rent_Car_C
    6    C315  2018-03-15    Jean        Rent_Car_C
    7    C315  2018-03-15              Sara to Jean
    8    C315  2018-03-18    Sara        Rent_Car_C
    9    C315  2018-03-18              Jean to Sara
    10   C315  2018-03-21    Sara        Rent_Car_C
    11   F310  2018-03-10  Franck        Rent_Car_F
    12   F310  2018-03-13  Michel        Rent_Car_F
    13   F310  2018-03-13          Franck to Michel