代码之家  ›  专栏  ›  技术社区  ›  Daniel Zapata

Scikit管道中的链接转换

  •  1
  • Daniel Zapata  · 技术社区  · 6 年前

    我正在使用SciKit管道在数据集上创建预处理。我有一个包含四个变量的数据集: ['monetary', 'frequency1', 'frequency2', 'recency'] 我想预处理除 recency . 为了进行预处理,我首先要获取日志,然后进行标准化。但是,当我从管道中获取转换的数据时,我得到了7列(3个日志,3个标准化,最近)。有没有一种方法可以链接转换,这样我就可以获取日志,并且在日志执行标准化之后,只获取4个特性的数据集?

    def create_pipeline(df):
        all_but_recency = ['monetary', 'frequency1','frequency2']
    
        # Preprocess
        preprocessor = ColumnTransformer(
            transformers=[
                ( 'log', FunctionTransformer(np.log), all_but_recency ),
                ( 'standardize', preprocessing.StandardScaler(), all_but_recency ) ],
            remainder='passthrough')
    
        # Pipeline
        estimators = [( 'preprocess', preprocessor )]
        pipe = Pipeline(steps=estimators)
    
        print(pipe.set_params().fit_transform(df).shape)
    

    提前谢谢

    1 回复  |  直到 6 年前
        1
  •  0
  •   Venkatachalam    6 年前

    你必须申请 FunctionTransformer 顺序地。试试这个!

    def create_pipeline(df):
        all_but_recency = ['monetary', 'frequency1','frequency2']
    
        # Preprocess
        # Preprocess
        preprocessor1 = ColumnTransformer([('log', FunctionTransformer(np.log), all_but_recency)],'passthrough')
        preprocessor2 = ColumnTransformer([('standardize', preprocessing.StandardScaler(), all_but_recency)],'passthrough' )
    
    
        # Pipeline
        estimators = [('preprocess1', preprocessor1),('standardize', preprocessor2)]
        pipe = Pipeline(steps=estimators)
    
        print(pipe.set_params().fit_transform(df).shape)
    

    工作实例

    from sklearn.datasets import load_iris
    import pandas as pd
    import numpy as np
    from sklearn.compose import ColumnTransformer
    from sklearn.preprocessing import Normalizer
    from sklearn.preprocessing import FunctionTransformer
    from sklearn.pipeline import Pipeline
    from sklearn import preprocessing
    
    iris = load_iris()
    X, y = iris.data, iris.target
    df= pd.DataFrame(X,columns = iris.feature_names)
    
    all_but_one = [0,1,2]
    
    # Preprocess
    preprocessor1 = ColumnTransformer([('log', FunctionTransformer(np.log), all_but_one)],'passthrough')
    preprocessor2 = ColumnTransformer([('standardize', preprocessing.StandardScaler(), all_but_one)],'passthrough' )
    
    # Pipeline
    estimators = [('preprocess1', preprocessor1),('scalling', preprocessor2)]
    pipe = Pipeline(steps=estimators,)
    
    pipe.fit_transform(df)