代码之家 › 专栏 › 技术社区 › Laveena

使用Python的href链接递归下载XML页面

href download web-scraping python-3.x xml

Laveena · 技术社区 · 5 年前

我有一个带有href链接的XML页面,该链接将我引导到下一个页面,而最后一个XML页面没有href元素。我需要递归地下载所有XML,并搜索相关的Python代码,以帮助我快速执行此任务。

0 回复 | 直到 5 年前

Laveena 5 年前

import xml.etree.ElementTree as ET
import os
import requests
from requests.auth import HTTPBasicAuth

def iterate_xml_automate(link):
#Parent page parsing
all_href = []
all_href.append(link)
tree = ET.fromstring(requests.get(link,
                     auth= HTTPBasicAuth('login', 'Password')).text.encode('utf-8'))   # Parser object
#accessing href component from the XML tree
href = [link.attrib['href'] for link in tree.iter('link')]
all_href.append(href) 
#Run the while loop till you find a href element in the successive xml file
while (len(href)!= 0):
    tree_1 = ET.fromstring(requests.get(str(href[0]),
                                      auth=HTTPBasicAuth('login', 'Password')).text.encode('utf-8'))
    #Update href for accessing next xml link
    href = [link.attrib['href'] for link in tree_1.iter('link')]
    all_href.appned(href)

#Returns all the href from subsequent pages
return href

推荐文章

My Car · HTML或JavaScript:当点击HTML按钮时,如何触发.txt文件的下载?

2 年前

Til Hund · PDF下载脚本仅下载部分PDF,速度较慢

6 年前

Nancy · 下载到PDF不工作

6 年前

SegFaultDev · 如何在Laravel中使用Response->download()下载文件后删除文件

6 年前

Nathan Verrilli · 如何在客户端javascript中多线程下载

6 年前

Raiper34 · 像在普通浏览器中一样下载电子文件

6 年前

Philip Svenningsen Arnevig · 如何将包/模块下载到python 3.6.0

6 年前

jmiller · 如何使用php输出图像并下载

6 年前

user9594573 · AttributeError:“YouTube”对象没有属性“get\u videos”

6 年前

Marakusa · Zip文件下载已损坏

6 年前