代码之家 › 专栏 › 技术社区 › JamesHudson81

根据索引值将df行附加到另一个df中

pandas python

JamesHudson81 · 技术社区 · 6 年前

df1 :

              col1    col2   col3  col4  col5
        A       3       4     1      2    1
        B       2       1     2      3    1
        C       2       3     4      2    1

另一方面我有 df2

              type    col1    col2   col3
        j      A       0.5     0.7    0.1
        k      B       0.2     0.3    0.9 
        l      A       0.5     0.3    0.2
        m      C       0.8     0.7    0.1
        n      A       0.3     0.3    0.2
        o      B       0.1     0.7    0.3

如果列 type 在里面 df2型

             col1    col2   col3  col4  col5
    A          3       4     1      2    1
        j     0.5     0.7    0.1
        l     0.5     0.3    0.2
        n     0.3     0.3    0.2
    B          2       1     2      3    1
        k     0.2     0.3    0.9 
        o     0.1     0.7    0.3
    C          2       3     4      2    1
        m     0.8     0.7    0.1

熊猫中有没有可以用来附加每一行的premade函数 df2型 低于其相应的索引 df1型

3 回复 | 直到 6 年前

jpp 6 年前

看来你需要 MultiIndex 在这里。你应该不使用 NaN 0 :

# set index as (type, current_index) for df2
df2 = df2.reset_index().set_index(['type', 'index']).sort_index()

# reassign index as (type, 0) for df1
df1.index = pd.MultiIndex.from_tuples([(i, 0) for i in df1.index])

# concatenate df1 and df2
res = pd.concat([df1, df2]).sort_index()

print(res)

     col1  col2  col3  col4  col5
A 0   3.0   4.0   1.0   2.0   1.0
  j   0.5   0.7   0.1   NaN   NaN
  l   0.5   0.3   0.2   NaN   NaN
  n   0.3   0.3   0.2   NaN   NaN
B 0   2.0   1.0   2.0   3.0   1.0
  k   0.2   0.3   0.9   NaN   NaN
  o   0.1   0.7   0.3   NaN   NaN
C 0   2.0   3.0   4.0   2.0   1.0
  m   0.8   0.7   0.1   NaN   NaN

rafaelc 6 年前

pd.merge 和 sort_index 指定 na_position='first'

pd.merge(df2.reset_index(), 
         df.reset_index().rename(columns={'index':'type'}),
         'outer')\
.set_index(['type', 'index'])\
.sort_index(na_position='first')

                col1   col2   col3  col4   col5
type    index                   
A       NaN     3.0    4.0    1.0   2.0    1.0
        j       0.5    0.7    0.1   NaN    NaN
        l       0.5    0.3    0.2   NaN    NaN
        n       0.3    0.3    0.2   NaN    NaN
B       NaN     2.0    1.0    2.0   3.0    1.0
        k       0.2    0.3    0.9   NaN    NaN
        o       0.1    0.7    0.3   NaN    NaN
C       NaN     2.0    3.0    4.0   2.0    1.0
        m       0.8    0.7    0.1   NaN    NaN

正如@jpp所强调的,在 sort_index

位置:{first,last},默认值last 先把南放在开头,最后把南放在结尾。

尽管事实上看起来 .

但是,如果您认为这种行为可能不一致,另一种方法是 sort_values 首先,然后设置索引。在 排序\u值

pd.merge(df2.reset_index(), 
         df.reset_index().rename(columns={'index':'type'}), 
         'outer')\
.sort_values(['type', 'index'], na_position='first')\
.set_index(['type', 'index'])

piRSquared 6 年前

@jpp

d2 = df2.rename_axis('k').set_index('type', append=True).swaplevel(0, 1)
d1 = df1.set_index(np.zeros(len(df1), str), append=True).rename_axis(['type', 'k'])

d1.append(d2).sort_index()

        col1  col2  col3  col4  col5
type k                              
A        3.0   4.0   1.0   2.0   1.0
     j   0.5   0.7   0.1   NaN   NaN
     l   0.5   0.3   0.2   NaN   NaN
     n   0.3   0.3   0.2   NaN   NaN
B        2.0   1.0   2.0   3.0   1.0
     k   0.2   0.3   0.9   NaN   NaN
     o   0.1   0.7   0.3   NaN   NaN
C        2.0   3.0   4.0   2.0   1.0
     m   0.8   0.7   0.1   NaN   NaN

候补

df1.rename_axis('type').assign(k='').set_index('k', append=True).append(
    df2.rename_axis('k').set_index('type', append=True).swaplevel(0, 1)
).sort_index()

        col1  col2  col3  col4  col5
type k                              
A        3.0   4.0   1.0   2.0   1.0
     j   0.5   0.7   0.1   NaN   NaN
     l   0.5   0.3   0.2   NaN   NaN
     n   0.3   0.3   0.2   NaN   NaN
B        2.0   1.0   2.0   3.0   1.0
     k   0.2   0.3   0.9   NaN   NaN
     o   0.1   0.7   0.3   NaN   NaN
C        2.0   3.0   4.0   2.0   1.0
     m   0.8   0.7   0.1   NaN   NaN