代码之家 › 专栏 › 技术社区 › Harish Shetty

PostgreSQL 8.4编码错误

postgresql ruby-on-rails

Harish Shetty · 技术社区 · 15 年前

我正在从csv文件导入数据。其中一个区域有口音(telef_3 nica o2 uk limited)。应用程序在将数据插入表时引发错误。

PGError: ERROR:  invalid byte sequence for encoding "UTF8": 0xf36e6963
HINT:  This error can also happen if the byte sequence does not match the 
encoding expected by the server, which is controlled by "client_encoding".
: INSERT INTO "companies" ("name", "validated") 
    VALUES(E'Telef?nica O2 UK Limited', 't')

当我输入带有重音符号和umlaut的名称时,表单中的数据输入会起作用。我如何解决这个问题?

编辑

我通过转换文件编码解决了这个问题。我将csv文件上传到google docs并将其导出到csv。

2 回复 | 直到 15 年前

Henning 15 年前

错误信息非常清楚:您的 client_encoding 设置设置为 UTF8 您尝试插入一个没有用utf8编码的字符(如果它是MS Excel中的csv,那么您的文件可能是用windows-1252编码的)。

您可以在应用程序中转换它,也可以更改PostgreSQL连接以匹配要插入的编码(从而使PostgreSQL能够为您进行转换)。你可以通过执行 SET CLIENT_ENCODING TO 'WIN1252'; 在尝试插入该数据之前,在PostgreSQL连接上。导入后,应使用 RESET CLIENT_ENCODING;

嗯!

Jimmy Huang 14 年前

我认为你可以尝试使用RubyGemRchardet,这可能是一个更好的解决方案。示例代码:

require ârchardetâ

cd = CharDet.detect(string_of_unknown_encoding)
encoding = cd['encoding']
converted_string = Iconv.conv(âUTF-8â², encoding, str_of_unknown_encoding)

以下是一些相关链接:

https://github.com/jmhodges/rchardet

http://www.meeho.net/blog/2010/03/ruby-how-to-detect-the-encoding-of-a-string/