代码之家  ›  专栏  ›  技术社区  ›  Joseph Oliver

Pad_序列为max_len(Keras)获取多个参数

  •  0
  • Joseph Oliver  · 技术社区  · 6 年前

    我试图在遗传算法中使用Keras模型进行文本分类,但是pad_序列出现了一个错误,它声称:

    TypeError: pad_sequences() got multiple values for argument 'maxlen'
    

    实际的pad_sequences变量赋值是:

    data = self.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
    

    可在以下位置找到:

    def get_data(self):
        """Retrieve the dataset and process the data."""
    
        batch_size = 128
        VALIDATION_SPLIT = 0.2
        MAX_SEQUENCE_LENGTH = 1000
        MAX_NUM_WORDS = 20000
        csv = 'VocabCSV.csv'
        my_df = self.pd.read_csv(csv,index_col=0,encoding = 'latin-1')
        my_df.dropna(inplace=True)
        my_df.reset_index(drop=True,inplace=True)
        print(my_df.info())
    
        texts = my_df.Text # list of text samples
        labellist = my_df.Target # list of labels
        label_vals = [] # label values list
        labels_index = {} # dictionary mapping label name to numeric id
        labels = [] # list of label ids
    
    
        for label in labellist:
            if label not in label_vals:
                label_vals.append(label)
    
        for idx, text in enumerate(texts):
            for label in label_vals:
                if label == labellist[idx]:
                    label_id = label_vals.index(label)
            labels_index[text] = label_id
            labels.append(label_id)
    
        print("labels index {}".format(len(labels_index)))
        print("labels size: %s " % len(labels))
    
        print("found %s texts." % len(texts))
    
        # finally, vectorize the text samples into a 2D integer tensor
        tokenizer = self.Tokenizer(num_words=MAX_NUM_WORDS)
        tokenizer.fit_on_texts(texts)
        sequences = tokenizer.texts_to_sequences(texts)
    
        word_index = tokenizer.word_index
        print('Found %s unique tokens.' % len(word_index))
    
        data = self.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
        print(self.np.asarray(labels).shape)
        labels = self.to_categorical(labels)
    
        print('Shape of data tensor:', data.shape)
        print('Shape of label tensor:', labels.shape)
    
        # split the data into a training set and a validation set
        indices = self.np.arange(data.shape[0])
        self.np.random.shuffle(indices)
        data = data[indices]
        labels = labels[indices]
        num_validation_samples = int(VALIDATION_SPLIT * data.shape[0])
    
        x_train = data[:-num_validation_samples]
        y_train = labels[:-num_validation_samples]
        x_test = data[-num_validation_samples:]
        y_test = labels[-num_validation_samples:]
    
        print(x_train.shape, y_train.shape)
        print(x_test.shape, y_test.shape)
    
        print(len(x_test))
        print(len(y_test))
    
        input_shape = MAX_SEQUENCE_LENGTH
    
        print(input_shape)
    
        nb_classes = len(label_vals)
    
        return (nb_classes, batch_size, input_shape, x_train, x_test, y_train, y_test, word_index)
    

    当get_data被另一个函数调用时,这个错误似乎正在发生,但是我无法确定是什么导致了它。

    1 回复  |  直到 6 年前
        1
  •  1
  •   Steven    6 年前

    问题是你有 self.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH) . 这个 pad_sequences 方法不属于您的类,但来自 keras.preprocessing.sequence

    因此,如果希望它正常工作,请执行以下导入操作:

    from keras.preprocessing import sequence
    

    然后打电话给 这样地:

    sequences = sequence.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)