2010-07-07

Nice bit of piping

I just made a nice little pipeline. I love the Linux shell.

cat file | tr A-Z a-z | tr -c a-z "\n" | sort | uniq -c | sort -nr

That gives a nice ordered count of the most used words in the file. It's pretty dumb -- it breaks words at anything which isn't an alphabetical character -- but it does the trick.

The first tr changes uppercase characters to lowercase, the second changes anything which isn't alphabetic to a newline. Then the words are sorted alphabetically, then uniq counts successive identical lines and outputs the count with the word, then that list is sorted numerically in descending order. Lovely.

No comments:

Post a Comment