代码之家 › 专栏 › 技术社区 › Xantium

如何使用urllib读取一行。要求

urllib python-3.x python

Xantium · 技术社区 · 7 年前

我试着用 urllib.request 单元

我试过了 readline() , readlines() 和 read() 但我不能让它只读一行。

我只是想读一下 python.org .

import urllib.request

get_page = urllib.request.urlopen('https://www.python.org')
x = int('581')
get_ver = get_page.readline(x)

print("Currant Versions Are: ", get_ver)

Currant Versions Are:  b'<!doctype html>\n'

2 回复 | 直到 7 年前

ShmulikA 7 年前

限度 574字节,而不是第574行。

这样你就可以得到 n-th http range request 如果您需要更好的性能):

import urllib.request
from itertools import islice

get_page = urllib.request.urlopen('https://www.python.org')

def get_nth_line(resp, n):
    i = 1
    while i < n:
        resp.readline()
        i += 1
    return resp.readline()

print(get_nth_line(get_page, 574))

输出:

b'<p>Latest: <a href="/downloads/release/python-362/">Python 3.6.2</a> - <a href="/downloads/release/python-2713/">Python 2.7.13</a></p>\n'

建议

requests 用于http请求,而不是 urllib

requests.get('http://www.python.org').read()

使用正则表达式或 bs4 用于解析和提取python版本

请求(&A);正则表达式示例

import re, requests

resp = requests.get('http://www.python.org')
# regex might need adjustments
ver_regex = re.compile(r'<a href\="/downloads/release/python\-2\d+/">(.*?)</a>')
py2_ver = ver_regex.search(resp.text).group(1)
print(py2_ver)

Python 2.7.13

Xantium 7 年前

readlines() .

以下是工作脚本:

import urllib.request

get_page = urllib.request.urlopen('https://www.python.org')
get_ver = get_page.readlines()

print("Currant Versions Are: ", get_ver[580])

它不起作用,因为 readlines() 值必须是列表。此外,它是580而不是581,因为第一行计为0。