代码之家  ›  专栏  ›  技术社区  ›  nico9T

放大设置:如何将一行数据帧添加到另一个数据帧

  •  1
  • nico9T  · 技术社区  · 7 年前

    我想创建一个空 DataFrame 我将在其中附加其他单行 数据帧 Setting With Enlargement “用于高效附加。

    import numpy as np
    import pandas as pd
    from datetime import datetime
    from pandas import DataFrame
    
    df = DataFrame(columns=["open","high","low","close","volume","open_interest"])
    
    row_one = DataFrame({"open":10,"high":11,"low":9,"close":10,"volume":100,"open_interest":np.NAN}, index = [datetime(2017,1,1)])
    row_two = DataFrame({"open":9,"high":12,"low":8,"close":10.50,"volume":500,"open_interest":np.NAN}, index = [datetime(2017,1,2)])
    

    df[row_one.index] = row_one.columns
    

    "DatetimeIndex(['2017-01-01'], dtype='datetime64[ns]', freq=None) not in index"
    

    数据帧 .我做错了什么?

    2 回复  |  直到 3 年前
        1
  •  0
  •   jezrael    7 年前

    你需要 loc 对于放大设置,请选择 index 价值依据 [0] 对于标量和最后一个“转换” row_one iloc :

    df.loc[row_one.index[0]] = row_one.iloc[0]
    print (df)
                open  high  low  close  volume  open_interest
    2017-01-01  10.0  11.0  9.0   10.0   100.0            NaN
    

    但更好的是使用 concat ,尤其是在多个 df s:

    df = pd.concat([row_one, row_two])
    
        2
  •  0
  •   Bill    3 年前

    我从事件中获取新数据,因此每次只需添加一行数据帧。

    每次迭代都需要完整、更新的数据帧吗?

    如果没有,请执行以下操作:

    new_row_data = {'open': 10.0,
     'high': 11.0,
     'low': 9.0,
     'close': 10.0,
     'volume': 100.0,
     'open_interest': np.nan}
    new_row_index = pd.Timestamp('2017-01-01 00:00:00')
    
    index = []
    records = []
    for _ in range(500):
        index.append(new_row_index)
        records.append(new_row_data)  # add new data here
    
    # Create dataframe at the end
    df = pd.DataFrame.from_records(records, index=index)
    

    (上面的代码大约需要2.4毫秒)。

    buffer_size = 100  # adjust to your needs
    data_columns = ["open","high","low","close","volume","open_interest"]
    all_columns = ['DateTime'] + data_columns  # Add column for datetimes
    df_empty = pd.DataFrame(None, index=range(buffer_size),
                            columns=all_columns)
    # Note: You might want to specify dtypes above rather than np.nans
    
    df = df_empty.copy()
    index = 0
    for _ in range(500):
        df.loc[index, 'DateTime'] = new_row_index
        df.loc[index, columns] = new_row_data  # add new data here
        # Updated dataframe if you need it:
        #print(df.loc[:index])
    
        index += 1
        while index >= len(df):
            df = pd.concat([df, df_empty.reindex(range(index, index + buffer_size))])
    
    # To remove the integer index use:
    df = df.loc[:index-1].set_index('DateTime', drop=True)
    

    (上面的代码大约需要540毫秒)。

    concat append