代码之家  ›  专栏  ›  技术社区  ›  brendxn

如何在数据帧中透视一个包含字符串的列?[副本]

  •  6
  • brendxn  · 技术社区  · 6 年前

    我试图通过将数据中的一列转换为行(通过旋转或取消堆叠)来重塑熊猫数据框。

    我是新手,所以很可能我错过了一些明显的东西。我进行了广泛的搜索,但未能成功应用我遇到的任何解决方案。

    df
        Location    Month       Metric       Value
    0   Texas       January     Temperature  10
    1   New York    January     Temperature  20
    2   California  January     Temperature  30
    3   Alaska      January     Temperature  40
    4   Texas       January     Color        Red
    5   New York    January     Color        Blue
    6   California  January     Color        Green
    7   Alaska      January     Color        Yellow
    8   Texas       February    Temperature  15
    9   New York    February    Temperature  25
    10  California  February    Temperature  35
    11  Alaska      February    Temperature  NaN
    12  Texas       February    Color        NaN
    13  New York    February    Color        Purple
    14  California  February    Color        Orange
    15  Alaska      February    Color        Brown
    

    我正在尝试将度量值“透视”到列中。最终目标是这样的结果:

    Location    Month     Temperature   Color
    Texas       January   10            Red
    New York    January   20            Blue
    California  January   30            Green
    Alaska      January   40            Yellow
    Texas       February  15    
    New York    February  25            Purple
    California  February  35            Orange
    Alaska      February                Brown
    

    我尝试过使用pivot、pivot\u table以及unstack方法,但我确信我遗漏了一些东西。许多复杂的情况似乎是因为我将字符串与数字混合在一起,并且数据中也有一些缺失的值。

    这是迄今为止我所能得到的最接近的值,但我不希望月列有额外的行,这会导致更多的空白值:

    df.set_index(['Location','Month','Metric'], append=True, inplace=True)
    df.unstack()
    
        Value
        Metric              Color   Temperature
        Location    Month       
    0   Texas       January None    10
    1   New York    January None    20
    2   California  January None    30
    3   Alaska      January None    40
    4   Texas       January Red     None
    5   New York    January Blue    None
    6   California  January Green   None
    7   Alaska      January Yellow  None
    

    我们将非常感谢您的帮助。这似乎最有可能有一个简单的解决方案。

    2 回复  |  直到 6 年前
        1
  •  5
  •   Vivek Kalyanarangan    6 年前

    针对您需要的数据透视解决方案。输出是您想要的语义-

    Metric                Color Temperature
    Location   Month                       
    Alaska     February   Brown         NaN
               January   Yellow          40
    California February  Orange          35
               January    Green          30
    New York   February  Purple          25
               January     Blue          20
    Texas      February     NaN          15
               January      Red          10
    

    代码-

    df_p = df.pivot_table(index=['Location', 'Month'], columns=['Metric'], values='Value', aggfunc=np.sum)
    
        2
  •  4
  •   jezrael    6 年前

    第一个用于正确排序转换列 Month ordered categorical 然后通过 set_index 具有 unstack :

    #add another months
    cats = ['January','February']
    df['Month'] = pd.Categorical(df['Month'], categories=cats, ordered=True)
    

    或者,如果在原始列中订购了月份,请感谢@asongtoruin:

    df['Month'] = pd.Categorical(df['Month'], categories=df['Month'].unique(), ordered=True)
    

    df = (df.set_index(['Location','Month','Metric'])['Value']
           .unstack()
           .reset_index()
           .rename_axis(None, axis=1)
           .sort_values('Month'))
    print (df)
         Location     Month   Color Temperature
    0      Alaska   January  Yellow          40
    2  California   January   Green          30
    4    New York   January    Blue          20
    6       Texas   January     Red          10
    1      Alaska  February   Brown         NaN
    3  California  February  Orange          35
    5    New York  February  Purple          25
    7       Texas  February     NaN          15