代码之家 › 专栏 › 技术社区 › harveyslash

Tensorflow是否“知道”何时不将数据放入GPU?

tensorboard gpu tensorflow optimization python

harveyslash · 技术社区 · 7 年前

我尝试将tensorboard与tensorflow一起使用,并将其作为一种设置:

rand = tf.placeholder(dtype=tf.float32)    # this will be visualised in tensorboard later on 
tf.summary.image('random_noise_visualisation', rand,max_outputs=5)
merged_summary_op = tf.summary.merge_all() # to me this seems like a helper to 
                                           # merge all tensorboard related operations

然后我评估我的 merged_summary_op 并为其提供一个非常大的阵列,大小约为1 GB。

它似乎没有使用已经使用的内存中的任何额外GPU内存。

我还试着评估我的 rand 占位符,认为可能摘要操作有特殊处理,以防止数据进入GPU。我做到了:

random_value = np.random.randn(3000,224,224,1)
sess.run(rand,feed_dict={rand:random_value})

再次,没有额外的GPU使用。

然而,当我这样做的时候

sess.run(rand + 2 ,feed_dict={rand:random_value}) # forced to do some calculation

GPU的利用率增加了约1 GB。

对于上述所有实验,我将我的会话用作:

sess = tf.InteractiveSession(graph=tf.Graph())

我的问题是:

Tensorflow知道什么时候不用麻烦向GPU发送张量吗?
从交互式会话更改为普通会话是否会影响此行为?
是否有任何特定的文档?

1 回复 | 直到 7 年前

Maxim 7 年前

Tensorflow知道什么时候不用麻烦向GPU发送张量吗?

对

事实上,在你的第一次 rand 实验tensorflow解决了这个问题 任何设备 ,因为 兰德公司 已在中 feed_dict . 这种相当简单的优化可以在中看到 session.py :

self._final_fetches = [x for x in self._fetches if x not in feeds]

... 和 later on in the same file :

# We only want to really perform the run if fetches or targets are provided,
# or if the call is a partial run that specifies feeds.
if final_fetches or final_targets or (handle and feed_dict_tensor):
  results = self._do_run(handle, final_targets, final_fetches,
                         feed_dict_tensor, options, run_metadata)
else:
  results = []

第二个实验不属于这种优化,因此对图形进行了真正的评估。Tensorflow将占位符固定到可用的GPU上,因此也添加了占位符,这解释了GPU的利用率。

如果使用 log_device_placement=True :

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
  random_value = np.random.randn(300,224,224,1)
  print(sess.run(rand + 2, feed_dict={rand: random_value}).shape)

就图像摘要op而言,它确实很特别: ImageSummary op公司 没有GPU 实施这是源代码( core/kernels/summary_image_op.cc ):

REGISTER_KERNEL_BUILDER(Name("ImageSummary").Device(DEVICE_CPU),
                        SummaryImageOp);

因此,如果您尝试手动将其放置到CPU, session.run() 将引发错误:

# THIS FAILS!
with tf.device('/gpu:0'):
  tf.summary.image('random_noise_visualisation', rand,max_outputs=5)
  merged_summary_op = tf.summary.merge_all() # to me this seems like a helper to
                                             # merge all tensorboard related operations

这似乎是合理的,因为摘要操作不执行任何复杂的计算,主要处理磁盘I/O。

图像摘要 不是唯一的CPU操作,例如,所有摘要操作都是。有一个 related GitHub issue ,但目前没有更好的方法来检查GPU中是否支持特定的操作,除了检查源代码之外。

一般来说,tensorflow尝试尽可能多地利用可用资源,因此,当GPU的放置是可能的,并且没有其他限制时,引擎倾向于选择GPU而不是CPU。

从交互式会话更改为普通会话是否会影响此行为?

不 InteractiveSession 不影响设备放置逻辑。唯一的区别是 交互式会话 在创建时使自己成为默认会话,而 Session 默认值仅在 with 块

是否有任何特定的文档?

恐怕我错了,但可能不会。对我来说,最好的真相来源是源代码。