代码之家 › 专栏 › 技术社区 › ChesuCR Ayansplatt

如何使用数据帧比较两个CSV文件并检索不同的单元格?为什么我在浮动单元格中得到这么多小数[[副本]

floating-accuracy dataframe pandas python-3.x python

ChesuCR Ayansplatt · 技术社区 · 6 年前

example.csv ).

STRING_COL,INT_1,INT_2,FLOAT,INT_3
Hello,9,65151651,3234.54848,7832
This is a string,2,5484651,34.234,-999
Another,2,62189548,51.51658,-999
Test,2,2131514,5.2156,-999
Ham,9,6546548,2.15,-999
String,9,3216546,2.15468,-999

因此,我编写了一个与此类似的代码来逐单元格比较值:

import pandas as pd

df = pd.read_csv(
    'example.csv', delimiter=',', comment='#', skip_blank_lines=True,
    verbose=False, engine='python', dtype=str
)
df = df.apply(lambda x: pd.to_numeric(x, errors='ignore', downcast='integer'))

df_2 = pd.read_csv(
    'example_2.csv', delimiter=',', comment='#', skip_blank_lines=True,  # file with small changes
    verbose=False, engine='python', dtype=str
)
df_2 = df_2.apply(lambda x: pd.to_numeric(x, errors='ignore', downcast='integer'))

for i in list(df.index):
    for column in list(df.columns):
        old = df.loc[i, column]
        new = df_2.loc[i, column]
        if old != new:
            print('DIFFERENT VALUE >> INDEX: {} | OLD: {} | NEW: {}'.format(i, old, new))

如果你用这个小CSV文件运行这个例子,我很肯定它会运行得很好。但是有了一个巨大的CSV文件,一些奇怪的事情正在发生。我不明白为什么有时许多值会被截断为以下值:

1.6440000000000001  >> original value 1.644
7.7189999999999985  >> original value 7.7189

如果我比较它们,就会发现它们是不同的,这是不正确的,因为值是相同的。发生了什么事?有办法解决这个问题吗?有没有更好的方法将值与数据帧进行比较?

注意:也许我在我的原始代码的其他部分做了什么错误,但我认为我已经写了最重要和相关的。

注2:我考虑到 != 操作员不能很好地与 NaN np.isnan 以检查此更改。

更新 . 我不需要比较和说“是的,它是平等的”和“不,它是不平等的”。我需要检索每个单元格都有变化的值。

1 回复 | 直到 6 年前

ChesuCR Ayansplatt 6 年前

np.isclose() . 我已经阅读了我发现的重复问题和一些关于epsilon值的其他问题: numpy.finfo() epsilon

Epsilon :小于机器epsilon的数字在数字上是相同的

    abs(a - b) < epsilon
    absolute(a - b) <= (atol + rtol * absolute(b))      # np.isclose() method

float32 and float64 or float16

eps64 = np.finfo(np.float64).eps
for col in df.columns:
    np.isclose(
        df[col],
        df_2[col],
        equal_nan=False,
        atol=0.0,
        rtol=eps64
    )

1.6440000000000001 . 我现在要做的是将值转换为float>&燃气轮机; float(1.6440000000000001)

推荐文章

TheCodeNovice · R中符号格式的尾随零和其他问题[重复]

4 月前

Daniel Estévez · 扩展数据帧以包含不存在的值

5 月前

T Richard · 根据条件交换分组数据中的字符串或值

5 月前

Homer Jay Simpson · R中flextable的标题字体和垂直合并

6 月前

RKIDEV · Panda迭代行并将第n行值乘以下一(n+1)行值

6 月前

Ssong · 如何有条件地运用资本化?

6 月前

Marcio Lino · 在Pandas中转换多个值列

6 月前

Ray · 在Python pandas包中使用groupby函数时,输出结果存在差异的原因是什么?

6 月前

RobertF · 如果列没有表头,如何在R数据帧中引用变量名?

6 月前

Homer Jay Simpson · ggplot2`geom_label()中的警告消息`

6 月前