因子是整数,而不是原子整数。
# Remove outliers from a column
remove_outliers <- function(x, na.rm = TRUE, ...) {
qnt <- quantile(x, probs=c(.25, .75), na.rm = na.rm, ...)
H <- 1.5 * IQR(x, na.rm = na.rm)
y <- x
y[x < (qnt[1] - H)] <- NA
y[x > (qnt[2] + H)] <- NA
y
}
可以按索引替换列,因此不需要创建单独的数据集。只需确保将相同的数据传递给
lapply
例如,你不想做
data[, 1:3] <- lapply(data, FUN)
我已经做了很多次了。
# Removes all outliers from a data set
remove_all_outliers1 <- function(df){
# We only want the numeric columns
df[,sapply(df, is.numeric)] <- lapply(df[,sapply(df, is.numeric)], remove_outliers)
df
}
与上面类似(我认为稍微容易一些),您可以将整个数据集传递给
重叠
.同时确保不
data <- lapply(data, if (x) something else anotherthing)
或
data[] <- lapply(data, if (x) something)
remove_all_outliers2 <- function(df){
df[] <- lapply(df, function(x) if (is.numeric(x))
remove_outliers(x) else x)
df
}
## test
mt <- within(mtcars, {
mpg <- factor(mpg)
gear <- letters[1:2]
})
head(mt)
identical(remove_all_outliers1(mt), remove_all_outliers2(mt))
# [1] TRUE
您的想法只需稍作调整即可生效。
!is.numeric
也可以
Negate(is.numeric)
function(x) !is.numeric(x)
或
!sapply(x, is.numeric)
通常地
function(function)
在r中无法开箱即用。
# Removes all outliers from a data set
remove_all_outliers <- function(df){
# We only want the numeric columns
## drop = FALSE in case only one column for either
a<-df[,sapply(df, is.numeric), drop = FALSE]
b<-df[,sapply(df, Negate(is.numeric)), drop = FALSE]
## note brackets
a[]<-lapply(a, function(x) remove_outliers(x))
## stack them back together, not merge
## you could merge if you had a unique id, one id per row
## then make sure the columns are returned in the original order
d<-cbind(a,b)
d[, names(df)]
}
identical(remove_all_outliers2(mt), remove_all_outliers(mt))
# [1] TRUE