代码之家 › 专栏 › 技术社区 › ashish

如何在python中的数据帧中动态创建新列

dataframe pandas python

ashish · 技术社区 · 2 年前

df1 = pd.DataFrame(
{
    "empid" : [1,2,3,4,5,6],
    "empname" : ['a', 'b','c','d','e','f'],
    "empcity" : ['aa','bb','cc','dd','ee','ff']
})
df1

df2 = pd.DataFrame(
{
    "empid" : [1,2,3,4,5,6],
    "empname" : ['a', 'b','m','d','n','f'],
    "empcity" : ['aa','bb','cc','ddd','ee','fff']
})
df2

df_all = pd.concat([df1.set_index('empid'),df2.set_index('empid')],axis='columns',keys=['first','second'])
df_all

df_final = df_all.swaplevel(axis = 'columns')[df1.columns[1:]]
df_final

基于df_final数据帧,需要创建以下输出。这里需要为每个相同的列动态创建比较列,因为我试图比较列数超过300的两个数据帧(数据帧结构和列名都相同)

1 回复 | 直到 2 年前

jezrael 2 年前

使用 DataFrame.stack 可以比较所有级别的列 first 具有 second ,在中创建新列 DataFrame.assign 并通过 DataFrame.unstack 具有 DataFrame.swaplevel 和 DataFrame.reindex 对于原始订单:

#original ordering
orig = df1.columns[1:].tolist()
print (orig)
['empname', 'empcity']

df_final = (df_all.stack()
                  .assign(comparions=lambda x: x['first'].eq(x['second']))
                  .unstack()
                  .swaplevel(axis = 'columns')
                  .reindex(orig, axis=1, level=0))
print (df_final)
      empname                   empcity                  
        first second comparions   first second comparions
empid                                                    
1           a      a       True      aa     aa       True
2           b      b       True      bb     bb       True
3           c      m      False      cc     cc       True
4           d      d       True      dd    ddd      False
5           e      n      False      ee     ee       True
6           f      f       True      ff    fff      False

Akshay Sehgal 2 年前

直接将2个数据帧与 `==`

你可以用一个简单的 == 在需要比较的两个数据帧之间。让我们从原始的2个数据帧开始 df1 和 df2 -

first = df1.set_index('empid')
second = df2.set_index('empid')
comparisons = first==second      #<---

output = pd.concat([first, second, comparisons], axis=1,keys=['first','second', 'comparisons'])

#Swapping level and reindexing, borrowed from Jezrael's excellent answer
output = output.swaplevel(axis=1).reindex(first.columns, axis=1, level=0)
print(output)

      empname                    empcity                   
        first second comparisons   first second comparisons
empid                                                      
1           a      a        True      aa     aa        True
2           b      b        True      bb     bb        True
3           c      m       False      cc     cc        True
4           d      d        True      dd    ddd       False
5           e      n       False      ee     ee        True
6           f      f        True      ff    fff       False

大熊猫分组交替进场

除了 excellent answer by jezrael ,我添加了一种使用pandas groupby的替代方法。

转换以获取列作为行索引
包含empcity和empname的第一级Groupby
在两行之间应用比较
转换回列
按原始列和“比较”的乘积添加多索引列
组合两个数据帧(原始数据帧和带有比较的数据帧)
使用swaplevel和reindex获取所需列的顺序

#create comparisons
comparisons = (df_all.T
                     .groupby(level=-1)
                     .apply(lambda x: x.iloc[0]==x.iloc[1])
                     .T)

#add multi index columns
comparisons.columns = pd.MultiIndex.from_product([['comparison'],comparisons.columns])

#concatenate with original data
df_final = pd.concat([df_all, comparisons], axis='columns')

#Swapping level and reindexing, borrowed from Jezrael's excellent answer
df_final = (df_final.swaplevel(axis = 'columns')
                    .reindex(df1.set_index('empid')
                                .columns, axis=1, level=0))
print(df_final)

      empname                   empcity                  
        first second comparison   first second comparison
empid                                                    
1           a      a       True      aa     aa       True
2           b      b       True      bb     bb       True
3           c      m      False      cc     cc       True
4           d      d       True      dd    ddd      False
5           e      n      False      ee     ee       True
6           f      f       True      ff    fff      False

user7864386 user7864386 2 年前

(i) 使用 get_level_values 获取级别0的标签值

(ii)重复(i)的结果,并针对每个结果 level=0 ,使用进行元素比较 eq 之间 first 和 second

(iii)使用 sort_index 按所需顺序对列进行排序

for level_0 in df_final.columns.get_level_values(0).unique():
    df_final[(level_0, 'comparison')] = df_final[(level_0, 'first')].eq(df_final[(level_0,'second')])
df_final = df_final.sort_index(level=0, sort_remaining=False, axis=1)

输出:

      empcity                   empname                  
        first second comparison   first second comparison
empid                                                    
1          aa     aa       True       a      a       True
2          bb     bb       True       b      b       True
3          cc     cc       True       c      m      False
4          dd    ddd      False       d      d       True
5          ee     ee       True       e      n      False
6          ff    fff      False       f      f       True

如何在python中的数据帧中动态创建新列

直接将2个数据帧与 ==

大熊猫分组交替进场

直接将2个数据帧与 `==`