代码之家  ›  专栏  ›  技术社区  ›  Vlad

对于相同的网络、损耗和初始化,TensorFlow始终比PyTorch实现更小的误差

  •  0
  • Vlad  · 技术社区  · 5 年前

    9% 更快!)。两者都使用SGD进行了优化。即使配备了完全相同的初始化值,TensorFlow仍然可以获得较小的误差。这些是模拟中的典型曲线: 红色曲线是TensorFlow的测试误差,橙色曲线是PyTorch的测试误差。两者都用完全相同的值初始化。

    here, on GitHub .

    TF网络实现:

    def tf_model(graph, init=None):
        with graph.as_default():
            if init:
                conv1_init = init['conv1']
                conv2_init = init['conv2']
                logits_init = init['logits']
                conv1_init = tf.constant_initializer(conv1_init)
                conv2_init = tf.constant_initializer(conv2_init)
                logits_init = tf.constant_initializer(logits_init)
            else:
                conv1_init = tf.contrib.layers.xavier_initializer()
                conv2_init = tf.contrib.layers.xavier_initializer()
                logits_init = tf.contrib.layers.xavier_initializer()
    
            with tf.name_scope('Input'):
                x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name='x')
                y = tf.placeholder(tf.int32, shape=[None], name='y')
                keep_prob = tf.placeholder_with_default(1.0 - dropout_rate, shape=())
            with tf.device('/device:GPU:0'):
                with tf.name_scope('conv1'):
                    conv1 = tf.layers.conv2d(x,
                                             filters=6,
                                             kernel_size=5,
                                             strides=1,
                                             padding='valid',
                                             kernel_initializer=conv1_init,
                                             bias_initializer=tf.initializers.zeros,
                                             activation=tf.nn.relu,
                                             name='conv1'
                                             )
    
    
                    max_pool1 = tf.nn.max_pool(value=conv1,
                                               ksize=(1, 2, 2, 1),
                                               strides=(1, 2, 2, 1),
                                               padding='SAME',
                                               name='max_pool1')
    
                    dropout1 = tf.nn.dropout(max_pool1, keep_prob=keep_prob)
    
                with tf.name_scope('conv2'):
                    conv2 = tf.layers.conv2d(dropout1,
                                             filters=12,
                                             kernel_size=3,
                                             strides=1,
                                             padding='valid',
                                             bias_initializer=tf.initializers.zeros,
                                             activation=tf.nn.relu,
                                             kernel_initializer=conv2_init,
                                             name='conv2')
    
                    max_pool2 = tf.nn.max_pool(value=conv2,
                                               ksize=(1, 2, 2, 1),
                                               strides=(1, 2, 2, 1),
                                               padding='VALID',
                                               name='max_pool2')
    
                    dropout2 = tf.nn.dropout(max_pool2, keep_prob=keep_prob)
    
                with tf.name_scope('logits'):
                    flatten = tf.layers.Flatten()(max_pool2)
                    logits = tf.layers.dense(flatten,
                                             units=10,
                                             kernel_initializer=logits_init,
                                             bias_initializer=tf.initializers.zeros,
                                             name='logits')
    
        return x, y, keep_prob, logits
    

    class TorchModel(nn.Module):
        def __init__(self, dropout_rate=0.0, init=None):
            super(TorchModel, self).__init__()
    
            self.conv1 = nn.Sequential(
                nn.Conv2d(in_channels=3,
                          out_channels=6,
                          kernel_size=5,
                          padding=0,
                          bias=True),
                nn.ReLU(),
                nn.MaxPool2d(2),
                nn.Dropout(p=dropout_rate))
            if init:
                conv1_init = init['conv1']
                self.conv1[0].weight = nn.Parameter(torch.FloatTensor(conv1_init))
            else:
                torch.nn.init.xavier_uniform_(self.conv1[0].weight)
            torch.nn.init.zeros_(self.conv1[0].bias)
            self.conv2 = nn.Sequential(
                nn.Conv2d(in_channels=6,
                          out_channels=12,
                          kernel_size=3,
                          bias=True),
                nn.ReLU(),
                nn.MaxPool2d(2),
                nn.Dropout(p=dropout_rate))
            if init:
                conv2_init = init['conv2']
                self.conv2[0].weight = nn.Parameter(torch.FloatTensor(conv2_init))
            else:
                torch.nn.init.xavier_uniform_(self.conv2[0].weight)
            torch.nn.init.zeros_(self.conv2[0].bias)
    
            self.logits = nn.Linear(432, 10)
            if init:
                logits_init = init['logits']
                logits_init = np.reshape(logits_init, [10, 432])
                self.logits.weight = nn.Parameter(torch.FloatTensor(logits_init))
            else:
                torch.nn.init.xavier_uniform_(self.logits.weight)
            torch.nn.init.zeros_(self.logits.bias)
    
        def forward(self, x):
            x = self.conv1(x)
            x = self.conv2(x)
            x = x.view(x.size(0), -1)
            x = self.logits(x)
            return x
    

    我还想补充一点,我要问的原因是,对于我目前测试的不同优化算法(非常复杂,因此我在这里提供了一个简单的SGD示例),情况正好相反-Pytork在最小误差方面始终优于TF。

    我很高兴听到你的想法。你也有同样的经历吗?我在TensorFlow和PyTorch中的SGD实现中遗漏了什么吗?

    0 回复  |  直到 5 年前