python - List the words in a vocabulary according to occurrence in a text corpus , Scikit-Learn -
i have fitted countvectorizer documents in scikit-learn. see terms , corresponding frequency in text corpus, in order select stop-words. example
'and' 123 times, 'to' 100 times, 'for' 90 times, ... , on is there built-in function this?
if cv countvectorizer , x vectorized corpus, then
zip(cv.get_feature_names(), np.asarray(x.sum(axis=0)).ravel()) returns list of (term, frequency) pairs each distinct term in corpus countvectorizer extracted.
(the little asarray + ravel dance needed work around quirks in scipy.sparse.)
Comments
Post a Comment