r - how to set author for each doc in a corpus by parsing doc ID -


i have tm corpus object this:

> summary(corp.eng) corpus 154 text documents  metadata consists of 2 tag-value pairs , data frame available tags are:   create_date creator  available variables in data frame are:   metaid 

the metadata each document in corpus looks this:

> meta(corp.eng[[1]]) available meta data pairs are:   author       :    datetimestamp: 2013-04-18 14:37:24   description  :    heading      :    id           : smith-john_e.txt   language     : en_ca   origin       : 

i know can set author of 1 document @ time this:

meta(corp.eng[[1]],tag="author") <-     paste(     rev(       unlist(         strsplit(meta(corp.eng[[1]],tag="id"), c("[-_]"))       )[1:2]     ), collapse=' ') 

which gives me result this:

> meta(corp.eng[[1]],tag="author") [1] "john smith"  

how batch job?

note: should still comment, there working portion, here goes example:

data(crude) extracted.values <- meta(crude,tag="places",type="local") (i in seq_along(extracted.values)) {      meta(crude[[i]],tag="places") <- substr(extracted.values[[i]],1,3) } 

one should able using lapply well, not familiar inner workings of tm, i'll stick loop. substitute substr function 1 need, , data on left side of course. hope helps.


Comments

Popular posts from this blog

node.js - Bad Request - node js ajax post -

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -