代码之家 › 专栏 › 技术社区 › JoeM05

将多项选择数据转换为数字

-1

JoeM05 · 技术社区 · 6 年前

我有这样的数据:

+-------------+------------+------------------+-------------------+------------------+
|   gender    |    age     |      income      | ate_string_cheese | tech_familiarity |
+-------------+------------+------------------+-------------------+------------------+
| A. Female   | D. 45-54   | B. $50K - $80K   | B. Once or twice  | A. Low           |
| A. Female   | C. 35-44   | A. $35K - $49K   | B. Once or twice  | B. Medium        |
| B. Male     | B. 25-34   | B. 50k - 79,999  | B. Once or twice  | C. High          |
| A. Female   | A. 18-24   | D. $100k - $149k | B. Once or twice  | B. Medium        |
+-------------+------------+------------------+-------------------+------------------+

我想找出不同观测结果之间的相关性。我需要数值。我想知道有没有一个简单的方法来做这个在R?

为了清楚起见,上面的结果如下所示:

+--------+-----+--------+-------------------+------------------+
| gender | age | income | ate_string_cheese | tech_familiarity |
+--------+-----+--------+-------------------+------------------+
|      1 |   4 |      2 |                 2 |                1 |
|      1 |   3 |      1 |                 2 |                2 |
|      2 |   2 |      2 |                 2 |                3 |
|      1 |   1 |      4 |                 2 |                2 |
+--------+-----+--------+-------------------+------------------+

4 回复 | 直到 6 年前

Dean 6 年前

要回答有关在R中将分类数据转换为数字数据的问题,请执行以下操作:

factor as.factor()

factor返回一个“factor”类的对象,它有一组长度为x的整数代码,其“levels”属性为mode character。

赞成的意见:

这将使用映射字符值以供参考的属性对数据进行数字编码。

欺骗:

注意将分类数据转换为数字,以便对数据进行统计分析。对于所有的问题,数值可能不在区间或比率范围内,所以取平均值或水平差之类的值可能没有意义。e、 g.考虑每个水平之间的距离是否是常数,是否有一个自然零点等。

Saurabh Chauhan 6 年前

# Your original data frame 
df=read.table(text="gender;age;income;ate_string_cheese;tech_familiarity
A. Female;D.45-54;B.$50K - $80K;B.Once or twice;A.Low
A. Female;C.35-44;A.$35K - $49K;B.Once or twice;B. Medium 
B. Male;B.25-34;B.50k - 79,999;B.Once or twice;C. High 
A. Female;A. 18-24;D.$100k - $149k;B.Once or twice;B. Medium",header=T,sep=";")

myLetters <- letters[1:26]
# Apply match function to df, convert to lowercase and map it with number 
sapply(df, function(x) match(tolower(gsub("([A-Za-z]+).*", "\\1", x)), myLetters))

输出:

      gender age income ate_string_cheese tech_familiarity
[1,]      1   4      2                 2                1
[2,]      1   3      1                 2                2
[3,]      2   2      2                 2                3
[4,]      1   1      4                 2                2

Onyambu 6 年前

A,B,C,D 零件和呼叫 factor level=LETTERS[1:4] 和 labels=1:4 .

 structure(factor(sub('\\..*','',trimws(as.matrix(df))),labels=1:4),.Dim=dim(df),dimnames=dimnames(df))

  gender age income ate_string_cheese tech_familiarity
1 1      4   2      2                 1               
2 1      3   1      2                 2               
3 2      2   2      2                 3               
4 1      1   4      2                 2

这是一个矩阵。可以转换为数据帧

akrun 6 年前

我们可以将列转换为 factor 强迫它 numeric

df[] <- lapply(df, function(x) as.integer(factor(x)))