代码之家 › 专栏 › 技术社区 › Code_Sipra

仅从大熊猫中混合数据类型的列中选择整数

dataframe pandas python-3.x python

Code_Sipra · 技术社区 · 6 年前

我有一个数据框架 df 如下图所示。专栏 col2 具有空值、空值、整数甚至浮点值。我想得到一个新的数据帧 new_df 从 东风 在哪列 COL2 只有整数值。

import pandas as pd
import numpy as np

col1 = ["a", "b", "c", "d", "e", "f", "g", "h"]
col2 = ["25.45", "", "200", np.nan, "N/A", "null", "35", "5,300"]

df = pd.DataFrame({"col1": col1, "col2": col2})

就是这样 测向 看:

  col1   col2
0    a  25.45
1    b       
2    c    200
3    d    NaN
4    e    N/A
5    f   null
6    g     35
7    h  5,300

以下是我期望的输出 纽泽夫 在哪列 COL2 值仅为整数:

  col1   col2  
2    c    200
6    g     35

我试过使用pd.to ou numeric()甚至isdigit()函数,但它们希望使用一个序列作为输入。有没有一种简单的方法来获得所需的输出?

1 回复 | 直到 6 年前

cs95 abhishek58g 6 年前

`str.isdigit`

过滤数字并按布尔索引选择:

df2 = df[df.col2.astype(str).str.isdigit()]    
print(df2)
  col1 col2
2    c  200
6    g   35

p.s.,要将“col2”转换为整数,请使用

df2['col2'] = df2['col2'].astype(int)

`str.contains`

你也可以用 包含在内 尽管速度较慢,因为它使用regex。

df[df.col2.astype(str).str.contains(r'^\d+$')]

  col1 col2
2    c  200
6    g   35

`pd.to_numeric`

第三种解决方案有点老土,但使用 全数字 . 我们需要一个预替换步骤来过滤浮动。

v = df.col2.astype(str).str.replace('.', '|', regex=False)
df[pd.to_numeric(v, errors='coerce').notna()]

  col1 col2
2    c  200
6    g   35

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前