代码之家 › 专栏 › 技术社区 › Scott Grammilo

如何使用seaborn生成标记的柱状图?

data-visualization seaborn matplotlib pandas python

Scott Grammilo · 技术社区 · 3 年前

我对Python有点陌生。我正在玩一个虚拟数据集,以获得一些Python数据操作练习。以下是生成伪数据的代码:

d = {
    'SeniorCitizen': [0,1,0,0,0,0,0,1,0,1,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0] , 
    'CollegeDegree': [0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,1,1,1] , 
    'Married': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1] , 
    'FulltimeJob': [1,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0,0,0,1] , 
    'DistancefromBranch': [7,9,14,20,21,12,22,25,9,9,9,12,13,14,16,25,27,4,14,14,20,19,15,23,2] , 
    'ReversedPayment': [0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,1,0] }
CarWash = pd.DataFrame(data = d)


categoricals = ['SeniorCitizen','CollegeDegree','Married','FulltimeJob','ReversedPayment']
        numerical = ['DistancefromBranch']
CarWash[categoricals] = CarWash[categoricals].astype('category')

我基本上在为几件事而挣扎:

#1.A 具有绝对值的堆叠条形图 ( 就像下面的excel示例 )

#2.A 具有百分比值的堆叠条形图 ( 就像下面的excel示例 )

以下是我使用的#1和#2的目标可视化 countplot() .

1. 2.

对于#1,而不是堆叠的条形图 countplot() 我可以制作一个集群的barplot,如下所示,而且注释片段更像是一种变通方法,而不是Python优雅。

# Looping through each categorical column and viewing target variable distribution (ReversedPayment) by value
figure, axes = plt.subplots(2,2,figsize = (10,10))

for i,ax in zip(categoricals[:-1],axes.flatten()):
    sns.countplot(x= i, hue = 'ReversedPayment', data = CarWash, ax = ax)
    for p in ax.patches:
        height = np.nan_to_num(p.get_height()) # gets the height of each patch/bar
        adjust = np.nan_to_num(p.get_width())/2 # a calculation for adusting the data label later
        
        label_xy = (np.nan_to_num(p.get_x()) + adjust,np.nan_to_num(p.get_height()) + adjust)  #x,y coordinates where we want to put our data label
        ax.annotate(height,label_xy) # final annotation

对于#2,我尝试创建一个包含%值的新数据帧,但这感觉乏味且容易出错。

我觉得有一个选择 stacked = True, proportion = True, axis = 1, annotate = True 本可以如此有用 countplot() 拥有。

有没有其他库可以直接使用,代码密集度更低?欢迎任何意见或建议。

0 回复 | 直到 3 年前

Crystal L 3 年前

在这种情况下,我认为 plotly.express 对你来说可能更直观。

import plotly.express as px

df_temp = CarWash.groupby(['SeniorCitizen', 'ReversedPayment'])['DistancefromBranch'].count().reset_index().rename({'DistancefromBranch':'count'}, axis=1)

fig = px.bar(df_temp, x="SeniorCitizen", y="count", color="ReversedPayment", title="SeniorCitizen", text='count')
fig.update_traces(textposition='inside')
fig.show()

基本上,如果你想更灵活地调整图表,很难避免编写大量代码。

我也尝试使用 matplotlib 和 pandas 以创建百分比的堆叠条形图。如果你对它感兴趣,你可以试试。

sns.set()
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=[12,8], dpi=100)

# Conver the axes matrix to a 1-d array
axes = ax.flatten()


for i, col in enumerate(['SeniorCitizen', 'CollegeDegree', 'Married', 'FulltimeJob']):
    
    # Calculate the number of plots
    df_temp = (CarWash.groupby(col)['ReversedPayment']
               .value_counts()
               .unstack(1).fillna(0)
               .rename({0:f'No', 1:f'Yes'})
               .rename({0:'No', 1:'Yes'}, axis=1))
    df_temp = df_temp / df_temp.sum(axis=0)
    df_temp.plot.bar(stacked=True, ax=axes[i])

    axes[i].set_title(col, y=1.03, fontsize=16)
    
    rects = axes[i].patches
    labels = df_temp.values.flatten()

    for rect, label in zip(rects, labels):
        if label == 0: continue
        axes[i].text(rect.get_x() + rect.get_width() / 2, rect.get_y() + rect.get_height() / 3, '{:.2%}'.format(label),
                ha='center', va='bottom', color='white', fontsize=12)

    axes[i].legend(title='Reversed\nPayment', bbox_to_anchor=(1.05, 1), loc='upper left', title_fontsize = 10, fontsize=10)
    axes[i].tick_params(rotation=0)

plt.tight_layout()
plt.show()