r - Create a numeric variable in a dataframe based upon mean of another variable and a factor -
ok have simple question albeit 1 have found hard pose (that root of problem).
if have following example data:
v1 <- c(1,1,1,1,1,2,2,2,2,2) factor <- factor(v1) v2 <- c(1,2,3,4,5,6,7,8,9,10) v3 <- c(10,20,30,40,50,60,70,80,90,100) test <- data.frame(factor,v2,v3)
how might go generating variable, lets v4, mean of v3 each level of factor? can mean values using example tapply:
tapply(test$v3, test$factor, fun=mean)
which in case result in 30 , 80 respectively want form repeating variable length of relevant factor level follows:
factor v2 v3 v4 1 1 1 10 30 2 1 2 20 30 3 1 3 30 30 4 1 4 40 30 5 1 5 50 30 6 2 6 60 80 7 2 7 70 80 8 2 8 80 80 9 2 9 90 80 10 2 10 100 80
any suggestions , solutions welcome along how better phrase question.
use ave
instead of tapply
:
within(test, { v4 <- ave(v3, factor, fun = mean) }) factor v2 v3 v4 1 1 1 10 30 2 1 2 20 30 3 1 3 30 30 4 1 4 40 30 5 1 5 50 30 6 2 6 60 80 7 2 7 70 80 8 2 8 80 80 9 2 9 90 80 10 2 10 100 80
the construct similar way you've used tapply
. i've used within
2 reasons: (1) save typing, , (2) allow automatically create new column.
the data.table
package has convenient syntax these types of operations:
> library(data.table) data.table 1.8.8 type: help("data.table") > dt <- data.table(test) > dt[, v4 := mean(v3), = factor] > dt factor v2 v3 v4 1: 1 1 10 30 2: 1 2 20 30 3: 1 3 30 30 4: 1 4 40 30 5: 1 5 50 30 6: 2 6 60 80 7: 2 7 70 80 8: 2 8 80 80 9: 2 9 90 80 10: 2 10 100 80
not overwhelm reader, there lots of ways this. here 2 more solutions in base r (though less efficient alternatives shared).
aggregate
merge(test, setnames(aggregate(v3 ~ factor, test, mean), c("factor", "v4")), = true)
making use of tapply
output.
temp <- tapply(test$v3, test$factor, fun=mean) temp <- data.frame(v4 = temp) merge(test, temp, by.x = "factor", by.y = "row.names", = true)
Comments
Post a Comment