- 143 TEI XML files (including, alas, some duplicates)
- 437.218 words
- 29.637.450 characters
- 16.465,25 Textkarten
- 1029 Druckbogen
Last two strange categories belong to German printing tradition, which was influential in Croatian printing industry; we translated these terms (Textkarte = kartica teksta, Druckbogen = tiskarski arak), and use them still in text accounting.
[Technical note.] Numbers were produced by Linux wc command (cf. recipe) on all XML files currently in CroALa, also available on its Sourceforge page. The Linux one-liner for calculating number of characters and words in multiple XML files was simple:
wc *.xml | awk '{print $3-$1}'
No comments:
Post a Comment