代码之家  ›  专栏  ›  技术社区  ›  Scott Grammilo

如何使用seaborn生成标记的柱状图?

  •  1
  • Scott Grammilo  · 技术社区  · 3 年前

    我对Python有点陌生。我正在玩一个虚拟数据集,以获得一些Python数据操作练习。以下是生成伪数据的代码:

    d = {
        'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] , 
        'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] , 
        'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] , 
        'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] , 
        'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] , 
        'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
    CarWash = pd.DataFrame(data = d)
    
    
    categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
            numerical = ['DistancefromBranch']
    CarWash[categoricals] = CarWash[categoricals].astype('category')
    

    我基本上在为几件事而挣扎:

    #1.A 具有绝对值的堆叠条形图 ( 就像下面的excel示例 )

    #2.A 具有百分比值的堆叠条形图 ( 就像下面的excel示例 )

    以下是我使用的#1和#2的目标可视化 countplot() .

    1. enter image description here 2. enter image description here

    对于#1,而不是堆叠的条形图 countplot() 我可以制作一个集群的barplot,如下所示,而且注释片段更像是一种变通方法,而不是Python优雅。

    # Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
    figure, axes = plt.subplots(2,2,figsize = (10,10))
    
    for i,ax in zip(categoricals[:-1],axes.flatten()):
        sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
        for p in ax.patches:
            height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
            adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
            
            label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust)  #x,y coordinates where we want to put our data label
            ax.annotate(height,label_xy) # final annotation
    

    enter image description here

    对于#2,我尝试创建一个包含%值的新数据帧,但这感觉乏味且容易出错。

    我觉得有一个选择 stacked = True, proportion = True, axis = 1, annotate = True 本可以如此有用 countplot() 拥有。

    有没有其他库可以直接使用,代码密集度更低?欢迎任何意见或建议。

    0 回复  |  直到 3 年前
        1
  •  0
  •   Crystal L    3 年前

    在这种情况下,我认为 plotly.express 对你来说可能更直观。

    import plotly.express as px
    
    df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)
    
    fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
    fig.update_traces(textposition='inside')
    fig.show()
    

    enter image description here

    基本上,如果你想更灵活地调整图表,很难避免编写大量代码。

    我也尝试使用 matplotlib pandas 以创建百分比的堆叠条形图。如果你对它感兴趣,你可以试试。

    sns.set()
    fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)
    
    # Conver the axes matrix to a 1-d array
    axes = ax.flatten()
    
    
    for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
        
        # Calculate the number of plots
        df_temp = (CarWash.groupby(col)['ReversedPayment']
                   .value_counts()
                   .unstack(1).fillna(0)
                   .rename({0:f'No', 1:f'Yes'})
                   .rename({0:'No', 1:'Yes'}, axis=1))
        df_temp = df_temp / df_temp.sum(axis=0)
        df_temp.plot.bar(stacked=True, ax=axes[i])
    
        axes[i].set_title(col, y=1.03, fontsize=16)
        
        rects = axes[i].patches
        labels = df_temp.values.flatten()
    
        for rect, label in zip(rects, labels):
            if label == 0: continue
            axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
                    ha='center', va='bottom', color='white', fontsize=12)
    
        axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
        axes[i].tick_params(rotation=0)
    
    plt.tight_layout()
    plt.show()
    

    enter image description here