代码之家 › 专栏 › 技术社区 › samba

在RedShift表中存储数组的正确方法是什么?

amazon-redshift apache-spark-sql dataframe apache-spark postgresql

samba · 技术社区 · 6 年前

在Redshift中创建表时,我面临以下错误:

Column "main.sales_metrics" has unsupported type "character varying[]".;

在数据帧架构中,如下所示:

|-- sales_metrics: array (nullable = true)
     |-- element: string (nullable = true)

我试着像在PostgreSQL中一样声明这个列: sales_metrics text[] 正如我从文档中读到的,Amazon Redshift不支持PostgreSQL数据类型。

那么我应该如何正确地声明 sales_metrics 存储的列 Array[String] 在RedShift中创建表时?

2 回复 | 直到 6 年前

ittus 6 年前

Redshift does not support arrays ,但是有一些 JSON functions 你可以用。基本上,您可以将数据存储为varchar并使用json函数来查询数据

例如:

create temporary table sales_metrics (col1 varchar(20));
insert into sales_metrics values ('[1,2,3]');

然后

select json_extract_array_element_text(col1, 2) from sales_metrics;
 json_extract_array_element_text
---------------------------------
 3
(1 row)

Dennis 6 年前

除了@ittus的答案,注意Redshift对数组的存储方式很挑剔。

         json_arrays          | is_valid_json_array
------------------------------+---------------------
 []                           | t
 ["a","b"]                    | t
 ["a",["b",1,["c",2,3,null]]] | t
 {"a":1}                      | f
 a                            | f
 {foo, bar}                   | f
 {"one", "two"}               | f
 [x,y,z]                      | f
 [1,2,]                       | f

推荐文章

user1245262 · 筛选Pandas数据帧时出现问题

1 年前

Foroand · 熊猫数据帧中的词频计数耗时过长

1 年前

user14696236 · 如何为每个对应的列创建一行[重复]

2 年前

Shawn Hemelstrand · 为什么我的自定义errorbar函数不能在R中工作?

2 年前

Karim Abou El Naga · 将带字符串的DataFrame绘制到堆叠条形图中

2 年前

The Great · 拆分并存储数据帧,但名称基于特定列中的唯一值

2 年前

nickolakis · 基于R中的列名复制列

2 年前

opposity · 形成一个数据帧,该数据帧包含R中包含类别和子类别的列

2 年前

A. Handler · 有没有办法将数据帧的列与完整列名向量相匹配?

2 年前

JasonX · 运行减法计算

2 年前