代码之家 › 专栏 › 技术社区 › dangel

带有SQLAlchemy的urllib request.urlopen(url.read()正在存储十六进制字符串,而不是HTML[重复]

urllib sqlalchemy postgresql python

dangel · 技术社区 · 6 年前

我试图用 urllib语言库 炼金术 但当插入/检索html时,似乎有些东西在这个过程中被混淆了

使用: SQLAlchemy 1.2、Python 3.6、postgres 10、urllib

class ParksTxState(Base):
    __tablename__ = 'parks_tx_state'

    id = Column(Integer, primary_key=True)
    park_name = Column(Text)
    url = Column(Text)
    html = Column(Text)


engine = create_engine("postgresql://<user>:<pass>@localhost/<db>", echo=False)

Session = sessionmaker(bind=engine)
session = Session()

url = 'https://tpwd.texas.gov/state-parks/abilene'
html = request.urlopen(url).read()

print(html)
# b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n...
# so far so good...

newpark = ParksTxState()
newpark.html = html

print(newpark.html)
# b'<!DOCTYPE html>\n<html xmlns="http://www.w3.org/1999/xhtml">\n<head>\n...
# so we're still good here before committing....

session.add(newpark)
session.commit()

print(newpark.html)
# \x3c21444f43545950452068746d6c3e0a3...
# and here is where the garbage comes in.

出于某种原因,HTML被存储为一个长字符串。。 \x3c21444f43545950452068746d6c3e0a3c68746d6c20786d6c6e733d22687474703a2f2f7777772e7...

echo=True 看到insert语句是正确的。

我做错什么了?

1 回复 | 直到 6 年前

dangel 6 年前

好吧,看起来 request.urlopen(url).read() 正在返回 bytes 对象(请参见 Methods of File Objects )这需要转换为字符串 .decode('utf-8')

html = request.urlopen(url).read()
html_string = html.decode('utf-8')

Convert bytes to a string?

推荐文章

July · 如何定义数字间隔,然后四舍五入

1 年前

Community wiki · 对象名称前的单下划线和双下划线的含义是什么?

1 年前

Brian Johnson · 为什么在Python中列出字典列表会引发TypeError?[已关闭]

1 年前

user026 · 如何根据特定窗口的平均值(行数)创建新列?

1 年前

Ashok Shrestha · 需要追踪特定的颜色线并获取坐标

1 年前

Nicote Ool · 在FastApi和Vue3中获得422

1 年前

NeoExceptCodeBad · 如果我有很多垂直线,我如何找到它们的边缘?

1 年前

Abdulaziz · 如何对集合内的列表进行排序[重复]

1 年前

user2743931 · 带有src目录的Python setup.py

1 年前

asmgx · 为什么合并数据帧不能按照python中的预期方式工作

1 年前