代码之家 › 专栏 › 技术社区 › Bastien Léonard

python套接字缓冲

buffering sockets python

Bastien Léonard · 技术社区 · 15 年前

假设我想用标准的 socket 模块:

def read_line(s):
    ret = ''

    while True:
        c = s.recv(1)

        if c == '\n' or c == '':
            break
        else:
            ret += c

    return ret

到底发生了什么 s.recv(1) ?每次都会发出系统呼叫吗?无论如何,我想我应该增加一些缓冲:

为了与硬件和网络现实最佳匹配, 蟾蜍尺码 应该是一个相对较小的2的幂,例如4096。

http://docs.python.org/library/socket.html#socket.socket.recv

但是,编写高效的线程安全缓冲似乎并不容易。如果我使用 file.readline() ?

# does this work well, is it efficiently buffered?
s.makefile().readline()

3 回复 | 直到 10 年前

Joe Koberg 15 年前

这个 recv() 调用是通过调用C库函数直接处理的。

它将阻止等待套接字获取数据。实际上,它只会让 记录() 系统呼叫块。

file.readline() 是一个有效的缓冲实现。它不是threadsafe,因为它假定只有它在读取文件。(例如,通过缓冲即将到来的输入。)

如果每次都使用文件对象, read() 使用正参数调用,基础代码将 记录() 只有请求的数据量,除非它已经被缓冲。

如果:

您调用了readline(),它读取一个完整的缓冲区
行的结尾在缓冲区的结尾之前

从而将数据留在缓冲区中。否则,缓冲区通常不会过满。

这个问题的目标不明确。如果在阅读前需要查看数据是否可用,可以 select() 或将套接字设置为非阻塞模式 s.setblocking(False) . 如果没有等待的数据,那么reads将返回空的,而不是阻塞的。

您正在读取一个带有多个线程的文件或套接字吗?我会让一个工作人员读取套接字,并将接收到的项目放入队列中,以供其他线程处理。

建议咨询 Python Socket Module source 和 C Source that makes the system calls .

Mathieu Rodic 10 年前

如果您关心性能并完全控制插座 (例如,您没有将其传递到库中)然后尝试实现您自己在python中的缓冲——python string.find和string.split等可以快得惊人。

def linesplit(socket):
    buffer = socket.recv(4096)
    buffering = True
    while buffering:
        if "\n" in buffer:
            (line, buffer) = buffer.split("\n", 1)
            yield line + "\n"
        else:
            more = socket.recv(4096)
            if not more:
                buffering = False
            else:
                buffer += more
    if buffer:
        yield buffer

如果你期望有效载荷由线组成不算太大,应该跑得很快, 避免跳过太多的功能层不必要的呼叫。我很想知道它如何与file.readline()或使用socket.recv(1)进行比较。

alex 12 年前

def buffered_readlines(pull_next_chunk, buf_size=4096):
  """
  pull_next_chunk is callable that should accept one positional argument max_len,
  i.e. socket.recv or file().read and returns string of up to max_len long or
  empty one when nothing left to read.

  >>> for line in buffered_readlines(socket.recv, 16384):
  ...   print line
    ...
  >>> # the following code won't read whole file into memory
  ... # before splitting it into lines like .readlines method
  ... # of file does. Also it won't block until FIFO-file is closed
  ...
  >>> for line in buffered_readlines(open('huge_file').read):
  ...   # process it on per-line basis
        ...
  >>>
  """
  chunks = []
  while True:
    chunk = pull_next_chunk(buf_size)
    if not chunk:
      if chunks:
        yield ''.join(chunks)
      break
    if not '\n' in chunk:
      chunks.append(chunk)
      continue
    chunk = chunk.split('\n')
    if chunks:
      yield ''.join(chunks + [chunk[0]])
    else:
      yield chunk[0]
    for line in chunk[1:-1]:
      yield line
    if chunk[-1]:
      chunks = [chunk[-1]]
    else:
      chunks = []