python - List the words in a vocabulary according to occurrence in a text corpus , Scikit-Learn -


i have fitted countvectorizer documents in scikit-learn. see terms , corresponding frequency in text corpus, in order select stop-words. example

'and' 123 times, 'to' 100 times, 'for' 90 times, ... , on 

is there built-in function this?

if cv countvectorizer , x vectorized corpus, then

zip(cv.get_feature_names(),     np.asarray(x.sum(axis=0)).ravel()) 

returns list of (term, frequency) pairs each distinct term in corpus countvectorizer extracted.

(the little asarray + ravel dance needed work around quirks in scipy.sparse.)


Comments

Popular posts from this blog

Why does Ruby on Rails generate add a blank line to the end of a file? -

keyboard - Smiles and long press feature in Android -

node.js - Bad Request - node js ajax post -