我写了一个脚本来抓取YouTube播放列表页面的标题
根据print语句,一切正常,直到我尝试将标题写入文本文件,这时我得到“UnicodeEncodeError:‘charmap’编解码器无法将字符编码到位…”
我尝试在打开文件时添加“encoding='utf8',虽然这样可以修复错误,但所有的汉字都被随机的乱码字符所取代
我还尝试用“replace”对输出字符串进行编码,然后对其进行解码,但这也只是用问号替换所有特殊字符
这是我的代码:
from bs4 import BeautifulSoup as BS
import urllib.request
import re
playlist_url = input("gib nem: ")
with urllib.request.urlopen(playlist_url) as response:
playlist = response.read().decode('utf-8')
soup = BS(playlist, "lxml")
title_attrs = soup.find_all(attrs={"data-title":re.compile(r".*")})
titles = [tag["data-title"] for tag in title_attrs]
titles_str = '\n'.join(titles)#.encode('cp1252','replace').decode('cp1252')
print(titles_str)
with open("playListNames.txt", "a") as f:
f.write(titles_str)
下面是我用来测试的播放列表示例:
https://www.youtube.com/playlist?list=PL3oW2tjiIxvSk0WKXaEiDY78KKbKghOOo