Wednesday 19 December 2012

One-line concordance in Linux command line

A recipe. To create a "concordance" — actually, a list of forms from a text with frequencies added — using just a command line, skipping programs such as AntConc (which is great, nice and illuminating, but sometimes I just need to prepare a list quickly). It can be done with the following Bash one-liner:

tr '[:punct:]' ' ' < filename1 | tr '[:upper:]' '[:lower:]' | tr '[:blank:]' ' ' | sort | uniq -c | sed 's/ \{1,\}/","/g' | sed 's/^",//g' | sed 's/$/"/g' > filename2.csv (Filename1 is input file, filename2.csv output in csv format.)

Recently there was a discussion on HUMANIST list whether "bash scripting is a worthwhile approach to "tool" development in the Digital Humanities". People tended to reply no, either learn a "real" language (Python was recommended), or develop a GUI ("like most people, humanists have a strong distaste for the commandline"); I think it was 3:1 against the command line.

Obviously, I disagree. Using bash helped me cross the boundary between user and "programmer" — on the command line one just slides from one region into another. Without a formal education: you have a problem, you look for a solution (discovering gratefully that you stand on shoulders of many colleagues), and bam! it's solved.

I think that digital humanists in general should adopt this kind of sliding — from users to programmers as well as from "classical" to "avant-guarde" scholars — as their MO.

No comments:

Post a Comment