代码之家 › 专栏 › 技术社区 › Aisha Javed

从文件中删除非Unicode字符

non-unicode python-unicode non-ascii-characters ascii python-2.7

Aisha Javed · 技术社区 · 7 年前

我知道这是一个重复的问题,但到目前为止,我真的努力尝试了所有的解决方案。任何人都可以帮助从文件中删除像\xc3\xa2\xc2\x84\xc2\xa2这样的字符吗?

我当前尝试清理的文件内容是: b“烤洋葱酱”,“b”['2磅大黄洋葱,切成薄片','3大葱,切成薄片','4小枝百里香','1/4杯橄榄油','犹太教盐和新鲜磨碎的黑胡椒','1杯白葡萄酒','2汤匙香槟醋','2杯酸奶','1/2杯切碎的新鲜韭菜','1/4杯纯希腊酸奶','所有调味品和百里香装饰','Cape Cod Waves\xc3\xa2\xc2\xc2\xa2土豆提供“]”的芯片

我试过使用re。sub(“[^\x00-\x7F]+”,“”,whatevertext),但似乎无法到达任何地方。我怀疑此处没有被视为特殊字符。

1 回复 | 直到 7 年前

its me 7 年前

您可以这样做:

>>> f = open("test.txt","r")
>>> whatevertext = f.read()
>>> print whatevertext
b'Roasted Onion Dip',"b""['2 pounds large yellow onions, thinly sliced', '3 large shallots, thinly sliced', '4 sprigs thyme', '1/4 cup olive oil', 'Kosher salt and freshly ground black pepper', '1 cup white wine', '2 tablespoons champagne vinegar', '2 cups sour cream', '1/2 cup chopped fresh chives', '1/4 cup plain Greek yogurt', 'Everything seasoning and thyme to garnish', 'Cape Cod Waves\xc3\xa2\xc2\x84\xc2\xa2 Potato Chips for serving']"""

>>> import re
>>> result = re.sub('\\\\x[a-f|0-9]+','',whatevertext)
>>> print result
b'Roasted Onion Dip',"b""['2 pounds large yellow onions, thinly sliced', '3 large shallots, thinly sliced', '4 sprigs thyme', '1/4 cup olive oil', 'Kosher salt and freshly ground black pepper', '1 cup white wine', '2 tablespoons champagne vinegar', '2 cups sour cream', '1/2 cup chopped fresh chives', '1/4 cup plain Greek yogurt', 'Everything seasoning and thyme to garnish', 'Cape Cod Waves Potato Chips for serving']"""

>>>

“\\x[a-f | 0-9]+”在这个正则表达式中,每个斜杠都用斜杠转义,x之后我们知道可以是0-9中的数字,也可以是a-f中的字母。

推荐文章

Vasu Mistry · 如何用字符串值解析yaml文件

2 年前

user13643099 · Python2.7使用子流程。Popen向kubectl exec发送了一封不工作的吊舱

2 年前

kopew · 索引器:列表索引超出api的范围

2 年前

Atefeh Hedayati · 如何使用矩阵乘法简化循环?

2 年前

Sachin Verma · 如何使用sqlbuilder使用聚合函数(平均、计数、最大、最小)。智能SQL

2 年前

wayoh22 · 检查部分值和返回全部值的列表

2 年前

Samy Mostakim · chrome正常工作,但firefox给我这个erorr

3 年前

XManit · 无法在python2上安装pyinstaller。7.18

3 年前

arwind mohan kmm · Python中的图像拆分器

3 年前

Cranjis · 网址。解析Python2。7相当于

6 年前