代码之家  ›  专栏  ›  技术社区  ›  user517696

从数据框中绘制sankey图

  •  1
  • user517696  · 技术社区  · 6 年前

    我有一个数据框架:

    Vendor Name                 Category                    Count
    AKJ Education               Books                       846888
    AKJ Education               Computers & Tablets         1045
    Amazon                      Books                       1294423
    Amazon                      Computers & Tablets         42165
    Amazon                      Other                       415
    Flipkart                    Books                       1023
    

    我正试图使用上面的数据框架绘制一个sankey图,其来源是 厂商名称 目标是 类别 流或宽度是 伯爵 . 我试图用得很巧妙,但没有成功。有没有人能用plotlyly来制作一个sankey图?

    谢谢

    2 回复  |  直到 5 年前
        1
  •  2
  •   vestland    5 年前

    这篇文章的答案 How to define the structure of a sankey diagram using a dataframe? 将向您显示,将您的sankey数据源强制转换为一个数据帧可能会很快导致混淆。将节点与链接分离会更好,因为它们的构造不同。

    因此,您的节点数据框应该如下所示:

    ID               Label    Color
    0        AKJ Education  #4994CE
    1               Amazon  #8A5988
    2             Flipkart  #449E9E
    3                Books  #7FC241
    4  Computers & tablets  #D3D3D3
    5                Other  #4994CE
    

    您的链接数据框应该如下所示:

    Source  Target      Value      Link Color
    0       3          846888      rgba(127, 194, 65, 0.2)
    0       4            1045      rgba(127, 194, 65, 0.2)
    1       3         1294423      rgba(211, 211, 211, 0.5)
    1       4           42165      rgba(211, 211, 211, 0.5)
    1       5             415      rgba(211, 211, 211, 0.5)
    2       5               1      rgba(253, 227, 212, 1)
    

    现在,如果您使用类似于苏格兰公投图的设置 plot.ly ,您将能够构建:

    enter image description here

    由于数字之间的巨大差异,这个特殊的图表看起来有点奇怪。为了说明问题,我把你所有的号码都换成了 1 :

    enter image description here

    以下是简单复制和粘贴到Jupyter笔记本的全部内容:

    # imports
    import pandas as pd
    import numpy as np
    import plotly.graph_objs as go
    from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
    init_notebook_mode(connected=True)
    
    # Nodes & links
    nodes = [['ID', 'Label', 'Color'],
            [0,'AKJ Education','#4994CE'],
            [1,'Amazon','#8A5988'],
            [2,'Flipkart','#449E9E'],
            [3,'Books','#7FC241'],
            [4,'Computers & tablets','#D3D3D3'],
            [5,'Other','#4994CE'],]
    
    # links with your data
    links = [['Source','Target','Value','Link Color'],
    
            # AKJ
            [0,3,1,'rgba(127, 194, 65, 0.2)'],
            [0,4,1,'rgba(127, 194, 65, 0.2)'],
    
            # Amazon
            [1,3,1,'rgba(211, 211, 211, 0.5)'],
            [1,4,1,'rgba(211, 211, 211, 0.5)'],
            [1,5,1,'rgba(211, 211, 211, 0.5)'],
    
            # Flipkart
            [2,5,1,'rgba(253, 227, 212, 1)'],
            [2,3,1,'rgba(253, 227, 212, 1)'],]
    
    # links with some data for illustrative purposes ################
    #links = [
    #    ['Source','Target','Value','Link Color'],
    #    
    #    # AKJ
    #    [0,3,846888,'rgba(127, 194, 65, 0.2)'],
    #    [0,4,1045,'rgba(127, 194, 65, 0.2)'],
    #    
    #    # Amazon
    #    [1,3,1294423,'rgba(211, 211, 211, 0.5)'],
    #    [1,4,42165,'rgba(211, 211, 211, 0.5)'],
    #    [1,5,415,'rgba(211, 211, 211, 0.5)'],
    #    
    #    # Flipkart
    #    [2,5,1,'rgba(253, 227, 212, 1)'],]
    #################################################################
    
    
    # Retrieve headers and build dataframes
    nodes_headers = nodes.pop(0)
    links_headers = links.pop(0)
    df_nodes = pd.DataFrame(nodes, columns = nodes_headers)
    df_links = pd.DataFrame(links, columns = links_headers)
    
    # Sankey plot setup
    data_trace = dict(
        type='sankey',
        domain = dict(
          x =  [0,1],
          y =  [0,1]
        ),
        orientation = "h",
        valueformat = ".0f",
        node = dict(
          pad = 10,
        # thickness = 30,
          line = dict(
            color = "black",
            width = 0
          ),
          label =  df_nodes['Label'].dropna(axis=0, how='any'),
          color = df_nodes['Color']
        ),
        link = dict(
          source = df_links['Source'].dropna(axis=0, how='any'),
          target = df_links['Target'].dropna(axis=0, how='any'),
          value = df_links['Value'].dropna(axis=0, how='any'),
          color = df_links['Link Color'].dropna(axis=0, how='any'),
      )
    )
    
    layout = dict(
            title = "Draw Sankey Diagram from dataframes",
        height = 772,
        font = dict(
          size = 10),)
    
    fig = dict(data=[data_trace], layout=layout)
    iplot(fig, validate=False)
    
        2
  •  0
  •   Trupti    6 年前

    我用过 ggalluvial alluvial 绘制sankey图的库。尤其是, G冲积层 最适合使用,因为它有多种选择和足够的文献。