代码之家  ›  专栏  ›  技术社区  ›  João Abrantes

查询hdf5 datetime列

  •  1
  • João Abrantes  · 技术社区  · 6 年前

    我有一个 hdf5 文件,该文件包含一个表,其中列 time 是datetime64[ns]格式。

    我要获取所有早于 thresh 。我该怎么做?这就是我所尝试的:

    thresh = pd.datetime.strptime('2018-03-08 14:19:41','%Y-%m-%d %H:%M:%S').timestamp()
    hdf = pd.read_hdf(STORE, 'gh1', where = 'time>thresh' )
    

    我得到以下错误:

    Traceback (most recent call last):
    
      File "<ipython-input-80-fa444735d0a9>", line 1, in <module>
        runfile('/home/joao/github/control_panel/controlpanel/controlpanel/reading_test.py', wdir='/home/joao/github/control_panel/controlpanel/controlpanel')
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 705, in runfile
        execfile(filename, namespace)
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/spyder/utils/site/sitecustomize.py", line 102, in execfile
        exec(compile(f.read(), filename, 'exec'), namespace)
    
      File "/home/joao/github/control_panel/controlpanel/controlpanel/reading_test.py", line 15, in <module>
        hdf = pd.read_hdf(STORE, 'gh1', where = 'time>thresh' )
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 370, in read_hdf
        return store.select(key, auto_close=auto_close, **kwargs)
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 717, in select
        return it.get_result()
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 1457, in get_result
        results = self.func(self.start, self.stop, where)
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 710, in func
        columns=columns, **kwargs)
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 4141, in read
        if not self.read_axes(where=where, **kwargs):
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 3340, in read_axes
        self.selection = Selection(self, where=where, **kwargs)
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/io/pytables.py", line 4706, in __init__
        self.condition, self.filter = self.terms.evaluate()
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 556, in evaluate
        self.condition = self.terms.prune(ConditionBinOp)
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 118, in prune
        res = pr(left.value, right.value)
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 113, in pr
        encoding=self.encoding).evaluate()
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 327, in evaluate
        values = [self.convert_value(v) for v in rhs]
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 327, in <listcomp>
        values = [self.convert_value(v) for v in rhs]
    
      File "/home/joao/anaconda3/lib/python3.6/site-packages/pandas/core/computation/pytables.py", line 185, in convert_value
        v = pd.Timestamp(v)
    
      File "pandas/_libs/tslib.pyx", line 390, in pandas._libs.tslib.Timestamp.__new__
    
      File "pandas/_libs/tslib.pyx", line 1549, in pandas._libs.tslib.convert_to_tsobject
    
      File "pandas/_libs/tslib.pyx", line 1735, in pandas._libs.tslib.convert_str_to_tsobject
    
    ValueError: could not convert string to Timestamp
    
    1 回复  |  直到 6 年前
        1
  •  1
  •   MaxU - stand with Ukraine    6 年前

    演示:

    创建示例DF(100.000行):

    In [9]: N = 10**5
    
    In [10]: dates = pd.date_range('1980-01-01', freq='99T', periods=N)
    
    In [11]: df = pd.DataFrame({'date':dates, 'val':np.random.rand(N)})
    
    In [12]: df
    Out[12]:
                         date       val
    0     1980-01-01 00:00:00  0.985215
    1     1980-01-01 01:39:00  0.452295
    2     1980-01-01 03:18:00  0.780096
    3     1980-01-01 04:57:00  0.004596
    4     1980-01-01 06:36:00  0.515051
    ...                   ...       ...
    99995 1998-10-27 15:45:00  0.509954
    99996 1998-10-27 17:24:00  0.046636
    99997 1998-10-27 19:03:00  0.026678
    99998 1998-10-27 20:42:00  0.660652
    99999 1998-10-27 22:21:00  0.839426
    
    [100000 rows x 2 columns]
    

    将其写入HDF5文件(索引 date 列):

    In [13]: df.to_hdf('d:/temp/test.h5', 'test', format='t', data_columns=['date'])
    

    按索引有条件地读取HDF5:

    In [14]: x = pd.read_hdf('d:/temp/test.h5', 'test', where="date > '1998-10-27 15:00:00'")
    
    In [15]: x
    Out[15]:
                         date       val
    99995 1998-10-27 15:45:00  0.509954
    99996 1998-10-27 17:24:00  0.046636
    99997 1998-10-27 19:03:00  0.026678
    99998 1998-10-27 20:42:00  0.660652
    99999 1998-10-27 22:21:00  0.839426