python - List the words in a vocabulary according to occurrence in a text corpus , Scikit-Learn -
i have fitted countvectorizer
documents in scikit-learn
. see terms , corresponding frequency in text corpus, in order select stop-words. example
'and' 123 times, 'to' 100 times, 'for' 90 times, ... , on
is there built-in function this?
if cv
countvectorizer
, x
vectorized corpus, then
zip(cv.get_feature_names(), np.asarray(x.sum(axis=0)).ravel())
returns list of (term, frequency)
pairs each distinct term in corpus countvectorizer
extracted.
(the little asarray
+ ravel
dance needed work around quirks in scipy.sparse
.)
Comments
Post a Comment