代码之家 › 专栏 › 技术社区 › nsidn98

CNN的输出不会随着输入而有太大变化

reinforcement-learning pytorch conv-neural-network deep-learning python

nsidn98 · 技术社区 · 5 年前

状态定义(2个输入):

输入1:[1,1,90128]图像,像素的最大值为45。

输入2:[1,1,45,80]图像,像素的最大值为45。

参与者的预期输出:[x,y]:根据状态的二维向量。此处x预计在[0160]范围内,y预计在[0112]范围内

尝试对输入进行不同类型的修改:

1:按原样馈送两个图像。

2:根据需要为两幅图像提供归一化 (img/45) 所以像素值从[0,1]开始

3:根据需要为两幅图像提供归一化 2*((img/45)-0.5) 所以像素值是从[-1,1]

4:根据需要为两幅图像提供归一化 (img-mean)/std

结果:CNN的输出几乎保持不变。

下面给出了演员定义的代码。

import numpy as np
import pandas as pd
from tqdm import tqdm
import time
import cv2
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F


device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

class Actor(nn.Module):
    def __init__(self, action_dim, max_action):
        super(Actor,self).__init__()
        # state image [1,1,90,128]
        self.conv11 = nn.Conv2d(1,16,5)
        self.conv11_bn = nn.BatchNorm2d(16)
        self.conv12 = nn.Conv2d(16,16,5)
        self.conv12_bn = nn.BatchNorm2d(16)
        self.fc11 = nn.Linear(19*29*16,500)
        # dim image [1,1,45,80]
        self.conv21 = nn.Conv2d(1,16,5) 
        self.conv21_bn = nn.BatchNorm2d(16)
        self.conv22 = nn.Conv2d(16,16,5)
        self.conv2_bn = nn.BatchNorm2d(16)
        self.fc21 = nn.Linear(8*17*16,250)
        # common pool
        self.pool  = nn.MaxPool2d(2,2)
        # after concatenation
        self.fc2 = nn.Linear(750,100)
        self.fc3 = nn.Linear(100,10)
        self.fc4 = nn.Linear(10,action_dim)
        self.max_action = max_action

    def forward(self,x,y):
        # state image
        x = self.conv11_bn(self.pool(F.relu(self.conv11(x))))
        x = self.conv11_bn(self.pool(F.relu(self.conv12(x))))
        x = x.view(-1,19*29*16)
        x = F.relu(self.fc11(x))
        # state dim
        y = self.conv11_bn(self.pool(F.relu(self.conv21(y))))
        y = self.conv11_bn(self.pool(F.relu(self.conv22(y))))
        y = y.view(-1,8*17*16)
        y = F.relu(self.fc21(y))
        # concatenate
        z = torch.cat((x,y),dim=1)
        z = F.relu(self.fc2(z))
        z = F.relu(self.fc3(z))
        z = self.max_action*torch.tanh(self.fc4(z))
        return z

# to read different sample states for testing
obs = []
for i in range(200):
    obs.append(np.load('eval_episodes/obs_'+str(i)+'.npy',allow_pickle=True))

obs = np.array(obs)

def tensor_from_numpy(state):
    # to add dimensions to tensor to make it [batch_size,channels,height,width] 
    state_img = state
    state_img = torch.from_numpy(state_img).float()
    state_img = state_img[np.newaxis, :]
    state_img = state_img[np.newaxis, :].to(device)
    return state_img


actor = Actor(2,torch.FloatTensor([160,112]))
for i in range(20):
    a = tensor_from_numpy(obs[i][0])
    b = tensor_from_numpy(obs[i][2])    
    print(actor(a,b))

上述代码的输出:

tensor([[28.8616,  3.0934]], grad_fn=<MulBackward0>)
tensor([[27.4125,  3.2864]], grad_fn=<MulBackward0>)
tensor([[28.2210,  2.6859]], grad_fn=<MulBackward0>)
tensor([[27.6312,  3.9528]], grad_fn=<MulBackward0>)
tensor([[25.9290,  4.2942]], grad_fn=<MulBackward0>)
tensor([[26.9652,  4.5730]], grad_fn=<MulBackward0>)
tensor([[27.1342,  2.9612]], grad_fn=<MulBackward0>)
tensor([[27.6494,  4.2218]], grad_fn=<MulBackward0>)
tensor([[27.3122,  1.9945]], grad_fn=<MulBackward0>)
tensor([[29.6915,  1.9938]], grad_fn=<MulBackward0>)
tensor([[28.2001,  2.5967]], grad_fn=<MulBackward0>)
tensor([[26.8502,  4.4917]], grad_fn=<MulBackward0>)
tensor([[28.6489,  3.2022]], grad_fn=<MulBackward0>)
tensor([[28.1455,  2.7610]], grad_fn=<MulBackward0>)
tensor([[27.2369,  3.4243]], grad_fn=<MulBackward0>)
tensor([[25.9513,  5.3057]], grad_fn=<MulBackward0>)
tensor([[28.1400,  3.3242]], grad_fn=<MulBackward0>)
tensor([[28.2049,  2.6622]], grad_fn=<MulBackward0>)
tensor([[26.7446,  2.5966]], grad_fn=<MulBackward0>)
tensor([[25.3867,  5.0346]], grad_fn=<MulBackward0>)

美国( .npy )可以找到文件 here 对于不同的状态,动作应该在[0-160,0-112]之间变化,但这里的输出只是略有变化。

注意:输入图像最初是稀疏的(图像中有很多零)

状态像素值或网络定义是否有问题?

编辑:我认为这个问题与输入的规范化或稀疏性有关,因为我也用tensorflow尝试了相同的网络,并且面临着相同的问题。

0 回复 | 直到 5 年前

nsidn98 5 年前

问题是权重初始化不合适。我使用高斯初始化,标准偏差是默认值的两倍。这有助于为不同的输入提供不同的输出。虽然在几集的训练之后,演员开始再次给予同样的价值,这是由于评论家网络变得饱和。