我正试图将pandas数据框架导入到一个h2o框架中,并指定所需的列类型。问题是我最终试图对两个数据集执行.rbind(),但有时根据某些列的值,h2o会强制它们为实数或整数,然后它们无法执行.rbind(),因为列类型不同。我想确保我可以得到两个具有相同列类型的不同数据集,这样就可以完成这些失败。
可重复的示例如下:
import pandas as pd
import h2o
my_df1 = pd.DataFrame({'a':[1,1,0,0,1],
'b':[1,0,.5,.2,0]})
my_df2 = pd.DataFrame({'a':[.5,.8,0,0,1],
'b':[1,0,.5,.2,0]})
h2o.init()
my_h2o1 = h2o.H2OFrame(my_df1)
my_h2o2 = h2o.H2OFrame(my_df2)
my_h2o1.rbind(my_h2o2) ### This fails
### try to manually specify the column names and types
col_names = [k for k in my_h2o1.types.keys()]
col_types = [v for v in my_h2o1.types.values()]
my_h2o3 = h2o.H2OFrame(my_df2,column_names=col_names, column_types=col_types)
my_h2o1.types.values() == my_h2o3.types.values()
my_h2o1.rbind(my_h2o3) ### This fails still