Python, convert 4-byte char to avoid MySQL error quot;Incorrect string value:quot;(Python,转换4字节字符以避免MySQL错误“字符串值不正确:)
问题描述
我需要将(在 Python 中)一个 4 字节的字符转换为其他字符.这是将它插入到我的 utf-8 mysql 数据库中而不会出现错误,例如:不正确的字符串值:'xF0x9Fx94x8E' for column 'line' at row 1"
通过将 4 字节 unicode-to-mysql 插入 mysql 引发警告 显示这样做:
<预><代码>>>>进口重新>>>高点 = re.compile(u'[U00010000-U0010ffff]')>>>示例 = u'一些带有困倦脸的示例文本:U0001f62a'>>>highpoints.sub(u'', 例子)u'一些带有困倦脸的示例文本:'但是,我在评论中遇到与用户相同的错误,...坏字符范围..."这显然是因为我的 Python 是 UCS-2(而不是 UCS-4)构建.但后来我不清楚该怎么做?
在 UCS-2 版本中,python 在内部为 U0000ffff 代码点上的每个 unicode 字符使用 2 个代码单元.正则表达式需要与这些一起使用,因此您需要使用以下正则表达式来匹配这些:
highpoints = re.compile(u'[uD800-uDBFF][uDC00-uDFFF]')
此正则表达式匹配使用 UTF-16 代理对编码的任何代码点(参见 UTF-16 代码点 U+10000 到 U+10FFFF.
要使其在 Python UCS-2 和 UCS-4 版本之间兼容,您可以使用
try:
/except
来使用一个或另一个:尝试:高点 = re.compile(u'[U00010000-U0010ffff]')除了重新错误:# UCS-2 构建highpoints = re.compile(u'[uD800-uDBFF][uDC00-uDFFF]')
关于 UCS-2 python 构建的演示:
<预><代码>>>>进口重新>>>highpoints = re.compile(u'[uD800-uDBFF][uDC00-uDFFF]')>>>示例 = u'一些带有困倦脸的示例文本:U0001f62a'>>>highpoints.sub(u'', 例子)u'一些带有困倦脸的示例文本:'
I need to convert (in Python) a 4-byte char into some other character. This is to insert it into my utf-8 mysql database without getting an error such as: "Incorrect string value: 'xF0x9Fx94x8E' for column 'line' at row 1"
Warning raised by inserting 4-byte unicode to mysql shows to do it this way:
>>> import re
>>> highpoints = re.compile(u'[U00010000-U0010ffff]')
>>> example = u'Some example text with a sleepy face: U0001f62a'
>>> highpoints.sub(u'', example)
u'Some example text with a sleepy face: '
However, I get the same error as the user in the comment, "...bad character range.." This is apparently because my Python is a UCS-2 (not UCS-4) build. But then I am not clear on what to do instead?
In a UCS-2 build, python uses 2 code units internally for each unicode character over the U0000ffff
code point. Regular expressions need to work with those, so you'd need to use the following regular expression to match these:
highpoints = re.compile(u'[uD800-uDBFF][uDC00-uDFFF]')
This regular expression matches any code point encoded with a UTF-16 surrogate pair (see UTF-16 Code points U+10000 to U+10FFFF.
To make this compatible across Python UCS-2 and UCS-4 versions, you could use a try:
/except
to use one or the other:
try:
highpoints = re.compile(u'[U00010000-U0010ffff]')
except re.error:
# UCS-2 build
highpoints = re.compile(u'[uD800-uDBFF][uDC00-uDFFF]')
Demonstration on a UCS-2 python build:
>>> import re
>>> highpoints = re.compile(u'[uD800-uDBFF][uDC00-uDFFF]')
>>> example = u'Some example text with a sleepy face: U0001f62a'
>>> highpoints.sub(u'', example)
u'Some example text with a sleepy face: '
这篇关于Python,转换4字节字符以避免MySQL错误“字符串值不正确:"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Python,转换4字节字符以避免MySQL错误“字符串值不正确:"


- 如何将 Byte[] 插入 SQL Server VARBINARY 列 2021-01-01
- 以一个值为轴心,但将一行上的数据按另一行分组? 2022-01-01
- 如何将 SonarQube 6.7 从 MySQL 迁移到 postgresql 2022-01-01
- 在SQL中,如何为每个组选择前2行 2021-01-01
- 导入具有可变标题的 Excel 文件 2021-01-01
- 使用 Oracle PL/SQL developer 生成测试数据 2021-01-01
- SQL 临时表问题 2022-01-01
- 远程 mySQL 连接抛出“无法使用旧的不安全身份验证连接到 MySQL 4.1+"来自 XAMPP 的错误 2022-01-01
- 更改自动增量起始编号? 2021-01-01
- 如何使用 pip 安装 Python MySQLdb 模块? 2021-01-01