r - how to set author for each doc in a corpus by parsing doc ID -
i have tm corpus object this:
> summary(corp.eng) corpus 154 text documents metadata consists of 2 tag-value pairs , data frame available tags are: create_date creator available variables in data frame are: metaid the metadata each document in corpus looks this:
> meta(corp.eng[[1]]) available meta data pairs are: author : datetimestamp: 2013-04-18 14:37:24 description : heading : id : smith-john_e.txt language : en_ca origin : i know can set author of 1 document @ time this:
meta(corp.eng[[1]],tag="author") <- paste( rev( unlist( strsplit(meta(corp.eng[[1]],tag="id"), c("[-_]")) )[1:2] ), collapse=' ') which gives me result this:
> meta(corp.eng[[1]],tag="author") [1] "john smith" how batch job?
note: should still comment, there working portion, here goes example:
data(crude) extracted.values <- meta(crude,tag="places",type="local") (i in seq_along(extracted.values)) { meta(crude[[i]],tag="places") <- substr(extracted.values[[i]],1,3) } one should able using lapply well, not familiar inner workings of tm, i'll stick loop. substitute substr function 1 need, , data on left side of course. hope helps.
Comments
Post a Comment