r - Create a numeric variable in a dataframe based upon mean of another variable and a factor -


ok have simple question albeit 1 have found hard pose (that root of problem).

if have following example data:

    v1 <- c(1,1,1,1,1,2,2,2,2,2)     factor <- factor(v1)     v2 <- c(1,2,3,4,5,6,7,8,9,10)     v3 <- c(10,20,30,40,50,60,70,80,90,100)     test <- data.frame(factor,v2,v3) 

how might go generating variable, lets v4, mean of v3 each level of factor? can mean values using example tapply:

    tapply(test$v3, test$factor, fun=mean) 

which in case result in 30 , 80 respectively want form repeating variable length of relevant factor level follows:

      factor v2  v3 v4    1       1  1  10 30    2       1  2  20 30    3       1  3  30 30    4       1  4  40 30    5       1  5  50 30    6       2  6  60 80    7       2  7  70 80    8       2  8  80 80    9       2  9  90 80    10      2 10 100 80 

any suggestions , solutions welcome along how better phrase question.

use ave instead of tapply:

within(test, {   v4 <- ave(v3, factor, fun = mean) })    factor v2  v3 v4 1       1  1  10 30 2       1  2  20 30 3       1  3  30 30 4       1  4  40 30 5       1  5  50 30 6       2  6  60 80 7       2  7  70 80 8       2  8  80 80 9       2  9  90 80 10      2 10 100 80 

the construct similar way you've used tapply. i've used within 2 reasons: (1) save typing, , (2) allow automatically create new column.


the data.table package has convenient syntax these types of operations:

> library(data.table) data.table 1.8.8  type: help("data.table") > dt <- data.table(test) > dt[, v4 := mean(v3), = factor] > dt     factor v2  v3 v4  1:      1  1  10 30  2:      1  2  20 30  3:      1  3  30 30  4:      1  4  40 30  5:      1  5  50 30  6:      2  6  60 80  7:      2  7  70 80  8:      2  8  80 80  9:      2  9  90 80 10:      2 10 100 80 

not overwhelm reader, there lots of ways this. here 2 more solutions in base r (though less efficient alternatives shared).

aggregate

merge(test,        setnames(aggregate(v3 ~ factor, test, mean),                 c("factor", "v4")), = true) 

making use of tapply output.

temp <- tapply(test$v3, test$factor, fun=mean) temp <- data.frame(v4 = temp) merge(test, temp, by.x = "factor", by.y = "row.names", = true) 

Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -