代码之家 › 专栏 › 技术社区 › Stefano Potter

从URL下载文件时指定输出路径

shutil python-requests python-3.x

1

Stefano Potter · 技术社区 · 7 年前

我有一些文件正在从url下载。

import requests
from bs4 import BeautifulSoup
import os

prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
download_url = "https:/path_to_website"

s = requests.session()                                                         
soup = BeautifulSoup(s.get(download_url).text, "lxml")  

for a in soup.find_all('a', href=True):

     final_link = os.path.join(prefix, a['href'])
     result = s.get(final_link, stream = True)
     with open(a['href'], 'wb') as out_file:
          shutil.copyfileobj(result.raw, out_file)

这将很好地下载文件,并将其放入C:/User的默认目录中。

wget 但我的方法是下载空文件,就好像它们没有被访问一样。

我试过这个 wget公司 这样地:

out_path = "C:/my_path"
prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'

s = requests.session()                                                         
soup = BeautifulSoup(s.get(download_url).text, "lxml")  

for a in page.find_all('a', href=True):

     final_link = os.path.join(prefix, a['href'])
     download = wget.download(final_link, out = out_path)

2 回复 | 直到 7 年前

1

TrakJohnson Stryker 7 年前

使用第一种方法,将打开的文件的路径替换为 os.path.join(out_path, a['href']) ?

import requests
from bs4 import BeautifulSoup
import os

out_path = "C:\\my_path"    
prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
download_url = "https:/path_to_website"

s = requests.session()                                                         
soup = BeautifulSoup(s.get(download_url).text, "lxml")  

for a in soup.find_all('a', href=True):
     final_link = os.path.join(prefix, a['href'])
     result = s.get(final_link, stream = True)
     new_file_path = os.path.join(out_path, a['href'])
     with open(new_file_path, 'wb') as out_file:    # this will create the new file at new_file_path
          shutil.copyfileobj(result.raw, out_file)

2

0

Murtuza Z 7 年前

target_path = r'c:\windows\temp'
with open(os.path.join(target_path, a['href']), 'wb') as out_file:
    shutil.copyfileobj(result.raw, out_file)