代码之家  ›  专栏  ›  技术社区  ›  Emanuela Liaci

Keras中声音数据的自动编码器

  •  3
  • Emanuela Liaci  · 技术社区  · 7 年前

    我有5个不同类别的声音样本的对数比例mel频谱图的2d阵列。

    model = Sequential()
    model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
    model.add(MaxPooling1D(2,padding='same',strides=None))
    model.add(Flatten())
    initializer=initializers.TruncatedNormal()
    model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
    model.add(BatchNormalization())
    model.add(Dropout(0.8))
    model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
    model.add(Dropout(0.8))
    model.add(Dense(5, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
    model.compile(loss='categorical_crossentropy',
              optimizer='adam',lr=0.01,
              metrics=['accuracy'])
    

    什么样的自动编码器可以应用于这种类型的数据输入?什么型号?任何建议或代码示例都会有所帮助。:)

    1 回复  |  直到 7 年前
        1
  •  2
  •   Stepan Novikov    7 年前

    由于我没有关于数据性质的问题的答案,我将假设我们有一组形状类似(NSamples,68108)的二维数据。此外,我假设我的建议使用卷积2D而不是卷积1D的答案是肯定的

    以下是卷积自动编码器模型的示例,模型可以使用经过训练的自动编码器,以及如何使用自动编码器的权重作为最终模型:

    from keras.layers.core import Dense, Dropout, Flatten, Reshape
    from keras.layers import Conv1D, Conv2D, Deconv2D, MaxPooling1D, MaxPooling2D, UpSampling2D, Conv2DTranspose, Flatten, BatchNormalization, Dropout
    from keras.callbacks import ModelCheckpoint
    import keras.models as models
    import keras.initializers as initializers
    from sklearn.model_selection import train_test_split
    
    ae = models.Sequential()
    #model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
    #encoder
    c = Conv2D(80, 3, activation='relu', padding='same',input_shape=(60, 108, 1))
    ae.add(c)
    ae.add(MaxPooling2D(pool_size=(2, 2), padding='same', strides=None))
    ae.add(Flatten())
    initializer=initializers.TruncatedNormal()
    d1 = Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer)
    ae.add(d1)
    ae.add(BatchNormalization())
    ae.add(Dropout(0.8))
    d2 = Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer)
    ae.add(d2)
    ae.add(Dropout(0.8))
    #decodser
    ae.add(Dense(d2.input_shape[1], activation='sigmoid'))
    ae.add(Dense(d1.input_shape[1], activation='sigmoid'))
    ae.add(Reshape((30, 54, 80)))
    ae.add(UpSampling2D((2,2)))
    ae.add(Deconv2D(filters= c.filters, kernel_size= c.kernel_size, strides=c.strides, activation=c.activation, padding=c.padding, ))
    ae.add(Deconv2D(filters= 1, kernel_size= c.kernel_size, strides=c.strides, activation=c.activation, padding=c.padding, ))
    ae.compile(loss='binary_crossentropy',
    optimizer='adam',lr=0.001,
    metrics=['accuracy'])
    ae.summary()
    #now train your convolutional autoencoder to reconstruct your input data
    #reshape your data to (NSamples, 60, 108, 1)
    #Then train your autoencoder. it can be something like that:
    #X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=43)
    #pre_mcp = ModelCheckpoint("CAE.hdf5", monitor='val_accuracy', verbose=2, save_best_only=True, mode='max')
    #pre_history = ae.fit(X_train, X_train, epochs=100, validation_data=(X_val, X_val), batch_size=22, verbose=2, callbacks=[pre_mcp])
    
    #model
    model = models.Sequential()
    #model.add(Conv1D(80, 8, activation='relu', padding='same',input_shape=(60,108)))
    model.add(Conv2D(80, 3, activation='relu', padding='same',input_shape=(60, 108, 1)))
    model.add(MaxPooling2D(pool_size=(2, 2), padding='same',strides=None))
    model.add(Flatten())
    initializer=initializers.TruncatedNormal()
    model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
    model.add(BatchNormalization())
    model.add(Dropout(0.8))
    model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
    model.add(Dropout(0.8))
    model.add(Dense(5, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
    model.compile(loss='categorical_crossentropy',
    optimizer='adam',lr=0.001,
    metrics=['accuracy'])
    #Set weights              
    model.layers[0].set_weights(ae.layers[0].get_weights())       
    model.layers[3].set_weights(ae.layers[3].get_weights())  
    model.layers[4].set_weights(ae.layers[4].get_weights())  
    model.layers[6].set_weights(ae.layers[6].get_weights())  
    model.summary()
    #Now you can train your model with pre-trained weights from autoencoder
    

    这样的模型对我使用MNIST数据集很有用,与使用随机权重初始化的模型相比,使用自动编码器的初始权重提高了模型的准确性

    然而,我建议使用几个卷积/反卷积层,可能是3个或更多,因为根据我的经验,具有3个或更多卷积层的卷积自动编码器比具有1个卷积层的卷积自动编码器更有效。事实上,有一个卷积层,有时我甚至看不到任何精度的提高

    更新:

    我的假设是,数据不包含任何重要特征,这些特征可以通过自动编码器甚至CAE进行区分

    然而,我关于数据二维性质的假设似乎通过达到几乎99.99%的验证精度得到了证实: enter image description here

    此外,我建议使用网络集成。例如,您可以训练10个具有不同验证数据的网络,并按投票最多的类别为项目分配类别

    这是我的代码:

    from keras.layers.core import Dense, Dropout, Flatten
    from keras.layers import Conv2D, BatchNormalization
    from keras.callbacks import ModelCheckpoint
    from keras.optimizers import Adam
    from sklearn.model_selection import train_test_split
    import keras.models as models
    import keras.initializers as initializers
    import msgpack
    import numpy as np
    
    with open('SoundDataX.msg', "rb") as fx,open('SoundDataY.msg', "rb") as fy: 
        dataX=msgpack.load(fx) 
        dataY=msgpack.load(fy)
    
    num_samples = len(dataX)
    x = np.empty((num_samples, 60, 108, 1), dtype = np.float32)
    y = np.empty((num_samples, 4), dtype = np.float32)
    
    for i in range(0, num_samples):
        x[i] = np.asanyarray(dataX[i]).reshape(60, 108, 1)
        y[i] = np.asanyarray(dataY[i])
    
    X_train, X_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=43)
    
    #model
    model = models.Sequential()
    model.add(Conv2D(128, 3, activation='relu', padding='same',input_shape=(60, 108, 1)))
    model.add(Conv2D(128, 5, activation='relu', padding='same',input_shape=(60, 108, 1)))
    model.add(Conv2D(128, 7, activation='relu', padding='same',input_shape=(60, 108, 1)))
    model.add(Flatten())
    initializer=initializers.TruncatedNormal()
    model.add(Dense(200, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
    model.add(BatchNormalization())
    model.add(Dropout(0.8))
    model.add(Dense(50, activation='relu', kernel_initializer=initializer,bias_initializer=initializer))
    model.add(Dropout(0.8))
    model.add(Dense(4, activation='softmax', kernel_initializer=initializer,bias_initializer=initializer))
    model.compile(loss='categorical_crossentropy',
    optimizer=Adam(lr=0.0001),
    metrics=['accuracy'])
    model.summary()
    filepath="weights-{epoch:02d}-{val_acc:.7f}-{acc:.7f}.hdf5"
    mcp = ModelCheckpoint(filepath, monitor='val_acc', verbose=2, save_best_only=True, mode='max')
    history = model.fit(X_train, y_train, epochs=100, validation_data=(X_val, y_val), batch_size=64, verbose=2, callbacks=[mcp])