代码之家  ›  专栏  ›  技术社区  ›  Swastik

如何获得列车和测试集的虚拟变量?

  •  0
  • Swastik  · 技术社区  · 7 年前

    我想为train和;测试集,然后仅在训练和;测试集。我正在运行以下代码,用于在两个数据集中创建虚拟变量,但得到 .

    我在Jupyter笔记本的一个单元格中键入了以下内容

    def get_features(train, test):
    trainval = list(train.columns.values) # list train features
    testval = list(test.columns.values) # list test features
    features = list(set(trainval) & set(testval)) # check wich features are in common (remove the outcome column)
    features.remove('Id') # remove non-usefull id column
    return features
    
    def process_features(train,test):
    tables=[test,train]
    for table in tables:
        table['SoldDt']= table[['MoSold','YrSold']].apply(lambda x : '{}-{}'.format(x[0],x[1]), axis=1)
        table['YearBuilt']= pd.to_datetime(table.YearBuilt,format="%Y")
        table['YearRemodAdd']= pd.to_datetime(table.YearRemodAdd,format="%Y")
        table['SoldDt']= pd.to_datetime(table.SoldDt,format="%m-%Y")
        table.GarageYrBlt.fillna(1,inplace=True)
        table.GarageYrBlt=table.GarageYrBlt.apply(int)
        table.GarageYrBlt.replace(1,'NaT',inplace=True)
        table['GarageYrBlt']= pd.to_datetime(table.GarageYrBlt,format="%Y")
        del table['MoSold']
        del table['YrSold']
        table['MSSubClass']=table['MSSubClass'].apply(str)
        table['OverallQual']=table['OverallQual'].apply(str)
        table['OverallCond']=table['OverallCond'].apply(str)
        table.Alley.fillna("NotAvl",inplace=True)
        table.BsmtQual.fillna("NB",inplace=True)
        table.BsmtCond.fillna("NB",inplace=True)
        table.BsmtExposure.fillna("NB",inplace=True)
        table.BsmtFinType1.fillna("NB",inplace=True)
        table.BsmtFinType2.fillna("NB",inplace=True)
        table.FireplaceQu.fillna("NF",inplace=True)
        table.GarageType.fillna("NG",inplace=True)
        table.GarageFinish.fillna("NG",inplace=True)
        table.GarageQual.fillna("NG",inplace=True)
        table.GarageCond.fillna("NG",inplace=True)
        table.PoolQC.fillna("NP",inplace=True)
        table.Fence.fillna("NFe",inplace=True)
        table.MiscFeature.fillna("NotAvl",inplace=True)
        table.LotFrontage.fillna(0,inplace=True)
    
        table=table.dropna(inplace=True)
        table=pd.get_dummies(table)
    
    features = get_features(train,test)
    return train,test,features
    

    然后,我在另一个单元格中调用该函数

    train = pd.read_csv('/mnt/disk2/Data/HousePrices/train.csv')
    test = pd.read_csv('/mnt/disk2/Data/HousePrices/test.csv')
    train,test,features = process_features(train,test)
    

    我发现了以下错误

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-17-b2727d6cdc63> in <module>()
      1 train = pd.read_csv('/mnt/disk2/Data/HousePrices/train.csv')
      2 test = pd.read_csv('/mnt/disk2/Data/HousePrices/test.csv')
    ----> 3 train,test,features = process_features(train,test)
    
    <ipython-input-16-dc47e5e9f9b6> in process_features(train, test)
     40 
     41         table=table.dropna(inplace=True)
    ---> 42         table=pd.get_dummies(table)
     43 
     44     print ("Getting features...")
    
    /usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in     get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first)
       1102     else:
       1103     result = _get_dummies_1d(data, prefix, prefix_sep, dummy_na,
    -> 1104                                  sparse=sparse, drop_first=drop_first)
       1105     return result
       1106 
    
    /usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in _get_dummies_1d(data, prefix, prefix_sep, dummy_na, sparse, drop_first)
       1123     # if all NaN
       1124     if not dummy_na and len(levels) == 0:
    -> 1125         return get_empty_Frame(data, sparse)
       1126 
       1127     codes = codes.copy()
    
    /usr/local/lib/python3.5/dist-packages/pandas/core/reshape.py in get_empty_Frame(data, sparse)
       1115             index = data.index
       1116         else:
    -> 1117             index = np.arange(len(data))
       1118         if not sparse:
       1119             return DataFrame(index=index)
    
    TypeError: object of type 'NoneType' has no len()
    
    1 回复  |  直到 7 年前
        1
  •  2
  •   Brian Cain    7 年前

    在这条线上

    table=table.dropna(inplace=True)
    

    dropna 返回 None

    inplace : boolean, default False
        If True, do operation inplace and return None.
    

    但后来你试着通过了 价值到 get_dummies()