代码之家  ›  专栏  ›  技术社区  ›  Stefano Potter

从URL下载文件时指定输出路径

  •  1
  • Stefano Potter  · 技术社区  · 7 年前

    我有一些文件正在从url下载。

    import requests
    from bs4 import BeautifulSoup
    import os
    
    prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
    download_url = "https:/path_to_website"
    
    s = requests.session()                                                         
    soup = BeautifulSoup(s.get(download_url).text, "lxml")  
    
    for a in soup.find_all('a', href=True):
    
         final_link = os.path.join(prefix, a['href'])
         result = s.get(final_link, stream = True)
         with open(a['href'], 'wb') as out_file:
              shutil.copyfileobj(result.raw, out_file)
    

    这将很好地下载文件,并将其放入C:/User的默认目录中。

    wget 但我的方法是下载空文件,就好像它们没有被访问一样。

    我试过这个 wget公司 这样地:

    out_path = "C:/my_path"
    prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
    
    s = requests.session()                                                         
    soup = BeautifulSoup(s.get(download_url).text, "lxml")  
    
    for a in page.find_all('a', href=True):
    
         final_link = os.path.join(prefix, a['href'])
         download = wget.download(final_link, out = out_path)
    

    2 回复  |  直到 7 年前
        1
  •  1
  •   TrakJohnson Stryker    7 年前

    使用第一种方法,将打开的文件的路径替换为 os.path.join(out_path, a['href']) ?

    import requests
    from bs4 import BeautifulSoup
    import os
    
    out_path = "C:\\my_path"    
    prefix = 'https://n5eil01u.ecs.nsidc.org/MOST/MOD10A1.006/'
    download_url = "https:/path_to_website"
    
    s = requests.session()                                                         
    soup = BeautifulSoup(s.get(download_url).text, "lxml")  
    
    for a in soup.find_all('a', href=True):
         final_link = os.path.join(prefix, a['href'])
         result = s.get(final_link, stream = True)
         new_file_path = os.path.join(out_path, a['href'])
         with open(new_file_path, 'wb') as out_file:    # this will create the new file at new_file_path
              shutil.copyfileobj(result.raw, out_file)
    
        2
  •  0
  •   Murtuza Z    7 年前

    target_path = r'c:\windows\temp'
    with open(os.path.join(target_path, a['href']), 'wb') as out_file:
        shutil.copyfileobj(result.raw, out_file)