代码之家 › 专栏 › 技术社区 › Daniel Zapata

Scikit管道中的链接转换

pipeline scikit-learn python

Daniel Zapata · 技术社区 · 6 年前

我正在使用SciKit管道在数据集上创建预处理。我有一个包含四个变量的数据集: ['monetary', 'frequency1', 'frequency2', 'recency'] 我想预处理除 recency . 为了进行预处理,我首先要获取日志,然后进行标准化。但是,当我从管道中获取转换的数据时,我得到了7列(3个日志,3个标准化,最近)。有没有一种方法可以链接转换,这样我就可以获取日志,并且在日志执行标准化之后,只获取4个特性的数据集?

def create_pipeline(df):
    all_but_recency = ['monetary', 'frequency1','frequency2']

    # Preprocess
    preprocessor = ColumnTransformer(
        transformers=[
            ( 'log', FunctionTransformer(np.log), all_but_recency ),
            ( 'standardize', preprocessing.StandardScaler(), all_but_recency ) ],
        remainder='passthrough')

    # Pipeline
    estimators = [( 'preprocess', preprocessor )]
    pipe = Pipeline(steps=estimators)

    print(pipe.set_params().fit_transform(df).shape)

提前谢谢

1 回复 | 直到 6 年前

Venkatachalam 6 年前

你必须申请 FunctionTransformer 顺序地。试试这个!

def create_pipeline(df):
    all_but_recency = ['monetary', 'frequency1','frequency2']

    # Preprocess
    # Preprocess
    preprocessor1 = ColumnTransformer([('log', FunctionTransformer(np.log), all_but_recency)],'passthrough')
    preprocessor2 = ColumnTransformer([('standardize', preprocessing.StandardScaler(), all_but_recency)],'passthrough' )


    # Pipeline
    estimators = [('preprocess1', preprocessor1),('standardize', preprocessor2)]
    pipe = Pipeline(steps=estimators)

    print(pipe.set_params().fit_transform(df).shape)

工作实例

from sklearn.datasets import load_iris
import pandas as pd
import numpy as np
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn import preprocessing

iris = load_iris()
X, y = iris.data, iris.target
df= pd.DataFrame(X,columns = iris.feature_names)

all_but_one = [0,1,2]

# Preprocess
preprocessor1 = ColumnTransformer([('log', FunctionTransformer(np.log), all_but_one)],'passthrough')
preprocessor2 = ColumnTransformer([('standardize', preprocessing.StandardScaler(), all_but_one)],'passthrough' )

# Pipeline
estimators = [('preprocess1', preprocessor1),('scalling', preprocessor2)]
pipe = Pipeline(steps=estimators,)

pipe.fit_transform(df)

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

4 月前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

4 月前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

4 月前

user29715306 · from_users=和chats=电视节目中的差异

4 月前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

4 月前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

4 月前

prayner · 更新嵌套字典包含列表中的项

4 月前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

4 月前

Dave · 如何在for循环中修改列表值

4 月前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

4 月前