代码之家  ›  专栏  ›  技术社区  ›  Denis

如何进行自定义熊猫数据帧合并?

  •  0
  • Denis  · 技术社区  · 5 年前

    假设我有:

    data = [['tom', 10, 20], ['nick', 15, 30], ['juli', 14, 40]] 
    df = pd.DataFrame(data, columns = ['Name', 'Low-Age', 'High-Age']) 
    print(df)
    None
       Name  Low-Age  High-Age
    0   tom       10        20
    1  nick       15        30
    2  juli       14        40
    

    然后我有另一张桌子:

    data = [[10, 'school'], [30, 'college']] 
    edu = pd.DataFrame(data, columns = ['Age', 'Education']) 
    print(edu)
    None
       Age Education
    0   10    school
    1   30   college
    

    我如何得到一个表格,将edu[‘年龄’]与df[‘低年龄’]或df[‘高年龄’]进行匹配。如果他们匹配,我想在df后面加上edu[“Education”]。(假设低年龄或高年龄可以匹配,而不是两者都匹配)

    所以我希望我的输出是:

      Name  Low-Age  High-Age   Education
    0   tom       10        20    school
    1  nick       15        30    college
    2  juli       14        40     NaN
    
    3 回复  |  直到 5 年前
        1
  •  4
  •   piRSquared    5 年前

    stack -&燃气轮机; map

    edu_dict = dict(zip(edu.Age, edu.Education))
    
    Education = df[['Low-Age', 'High-Age']].stack().map(edu_dict).groupby(level=0).first()
    df.assign(Education=Education)
    
       Name  Low-Age  High-Age Education
    0   tom       10        20    school
    1  nick       15        30   college
    2  juli       14        40       NaN
    
        2
  •  3
  •   Vaishali    5 年前

    首先将地图与combine\u一起使用

    mapper = edu.set_index('Age')['Education']
    df['Education'] = df['Low-Age'].map(mapper).combine_first(df['High-Age'].map(mapper))
    
        Name    Low-Age High-Age    Education
    0   tom     10      20          school
    1   nick    15      30          college
    2   juli    14      40          NaN
    
        3
  •  2
  •   ansev    5 年前

    使用 Series.map + pd.concat :

    edu2=edu.set_index('Age')
    s=pd.concat([df['Low-Age'].map(edu2['Education']),df['High-Age'].map(edu2['Education'])])
    df['Education']=s[s.notna()].reindex(index=df.index)
    print(df)
    
       Name  Low-Age  High-Age Education
    0   tom       10        20    school
    1  nick       15        30   college
    2  juli       14        40       NaN
    

    也可以求和 pd.concat :

    edu2=edu.set_index('Age')
    df['Education']= ( df['High-Age'].map(edu2['Education']).fillna('')+
                      df['Low-Age'].map(edu2['Education']).fillna('') )
    

    edu2=edu.set_index('Age')
    df['Education']= df[['High-Age','Low-Age']].apply(lambda x: x.map(edu2['Education']).fillna('')).sum(axis=1)
    

    print(df)
    
       Name  Low-Age  High-Age Education
    0   tom       10        20    school
    1  nick       15        30   college
    2  juli       14        40          
    
        4
  •  1
  •   Sai Kiran    5 年前

    这种方法可以在处理大型数据集时节省时间。使用apply()。

    low_age_list = df['Low-Age'].tolist()
    high_age_list = df['High-Age'].tolist()
    
    def match(row):
       print(row[1])
          if row['Age'] in low_age_list or row['Age'] in high_age_list:
             return row[1]
    
    df['Education'] = edu.apply(match,axis=1)
    print(df)