代码之家 › 专栏 › 技术社区 › S Meaden

pandas-如何按括号和唯一列值分组?

pandas python

S Meaden · 技术社区 · 7 年前

所以,我遇到了一个有趣的条形图。我发现了 underlying data here 我正在尝试重新创建数据是如何按范围箱分组的(我使用了 pd.cut )按国家划分。

这是我到目前为止尝试过的代码,但是我得到了错误,(errorneous)行被注释掉了

import pandas as pd

## csv file in zip http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/GEOSTAT-grid-POP-1K-2011-V2-0-1.zip

url="C:/Users/Simon/Downloads/GEOSTAT-grid-POP-1K-2011-V2-0-1/Version 2_0_1/GEOSTAT_grid_POP_1K_2011_V2_0_1.csv"
whole=pd.read_csv(url, low_memory=False)

populationDensity=whole[['TOT_P','CNTR_CODE']]


## trying to replicate graph here http://www.centreforcities.org/wp-content/uploads/2018/04/18-04-16-Square-kilometre-units-of-land-by-population.png
## which aggregates the records by brackets


# https://stackoverflow.com/questions/25010215/pandas-groupby-how-to-compute-counts-in-ranges#answer-25010952
ranges = [0,10000,15000,20000,25000,30000,35000,40000,45000,1000000]
bins=pd.cut(populationDensity['TOT_P'],ranges)



#print(bins)

## the following fails with error :
## AttributeError: Cannot access callable attribute 'groupby' of 'DataFrameGroupBy' objects, try using the 'apply' method
#print (populationDensity.groupby(['CNTR_CODE']).groupby(bins).count())

## the following fails with error :
## TypeError: 'Series' objects are mutable, thus they cannot be hashed
print (populationDensity.groupby(['CNTR_CODE'],pd.cut(populationDensity['TOT_P'],ranges)).count())

#relevant https://stackoverflow.com/questions/21441259/pandas-groupby-range-of-values#answer-21441621

我才刚刚开始使用熊猫。我明天再试一次,如果有人知道的话…

1 回复 | 直到 7 年前

jezrael 7 年前

更改:

print (populationDensity.groupby(['CNTR_CODE'],pd.cut(populationDensity['TOT_P'],ranges)).count())

到

print (populationDensity.groupby(['CNTR_CODE', pd.cut(populationDensity['TOT_P'],ranges)]).count())
                                            ^                                           ^

因为 groupby 参数 by 在中使用多个列名、组合列名和序列或多个序列 list :

由 :映射、函数、标签或 标签列表

用于确定GroupBy的组。如果by是一个函数,则对对象索引的每个值调用它。如果传递了dict或series,则将使用series或dict值来确定组(序列值首先对齐;请参见.align()方法)。如果传递了ndarray,则按原样使用值来确定组。标签或标签列表可以通过自身的列传递给组。注意,元组被解释为(单个)键。

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

11 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

11 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

11 月前

user29715306 · from_users=和chats=电视节目中的差异

11 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

11 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

11 月前

prayner · 更新嵌套字典包含列表中的项

11 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

11 月前

Dave · 如何在for循环中修改列表值

11 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

11 月前