Thursday 5 January 2012

Rare and Medium

This wintry afternoon I followed in the footsteps of William Whitaker, the author of WORDS Latin dictionary. The program contains a list of Latin words with very precise lexicographic descriptions -- data on period, area of application, frequency etc. The last part interested me most.

Whitaker, about whom I know almost nothing, but I'd like to know more (he seems to be outside the academia) [1], was very modest and careful in his claims, repeatedly warning users of the program that its philological expertise is limited, that he relied on other authorities and sources, that the program is intended just to be a reading help, not a research tool. Nevertheless, he has produced, I believe, the most informative freely available digital reference work on Latin usage. I'd like to see a review of his work in some scholarly journal, I think he has deserved it.

Anyway, in the documentation on word frequencies Whitaker says:

FREQ guessed from the relative number of citations given by sources need not be valid, but seems to work. (...)

type FREQUENCY_TYPE is ( -- For dictionary entries
X, -- -- Unknown or unspecified
A, -- very freq -- Very frequent, in all Elementary Latin books, top 1000+ words
B, -- frequent -- Frequent, next 2000+ words
C, -- common -- For Dictionary, in top 10,000 words
D, -- lesser -- For Dictionary, in top 20,000 words
E, -- uncommon -- 2 or 3 citations
F, -- very rare -- Having only single citation in OLD or L+S
I, -- inscription -- Only citation is inscription
M, -- graffiti -- Presently not much used
N -- Pliny -- Things that appear only in Pliny Natural History
);

(Of course, Whitaker knows about Diederich's work -- he is the one who OCR'd Diederich's 1939 thesis and put it online.)

So, we're pleased to report that the Profile of Croatian Neo-Latin Project converted Whitaker's DICTPAGE.RAW to a MySQL table, and learned the following about how Whitaker's ten frequency categories are distributed among the 39,225 lemmata in his wordlist:

  1. X (Unknown or unspecified): 0

  2. A (very freq): 2134

  3. B (frequent): 2747

  4. C (common): 5113

  5. D (lesser): 8365

  6. E (uncommon): 11193

  7. F (very rare): 7974

  8. I (inscription): 430

  9. M (graffiti): 0

  10. N (Pliny): 1269

  11. Total: 39225


Now we have something to compare. It is interesting to note that most words are uncommon.

[Further reading.] There is a recent publication, Joseph Denooz, Nouveau lexique fréquentiel de latin. Alpha-Omega. Reihe A Bd 258. Hildesheim/Zürich/New York: Georg Olms Verlag, 2010. Pp. ix, 453. ISBN 9783487144733. €148.00. (reviewed recently on BMCR, with a crucial question: "A dictionary such as this is a tool: so what can this one be used for?").

[1] A sad update. Thinking about possible reasons for William Whitaker's absence from the internet, I consulted the obituaries, and found the following:

Colonel William A. Whitaker (USAF-Retired) passed away on Tuesday, December 14, 2010. While at DARPA, he worked on the computer language ADA. In retirement, he created the Latin-English translation software program, "Whitaker Words". (...)
Published in Midland Reporter-Telegram on December 21, 2010
Source here.


Τάνδε κατ' εὔδενδρον στείβων δρίος εἴρυσα χειρὶ
πτώσσουσαν βρομίας οἰνάδος ἐν πετάλοις,
ὄφρα μοι εὐερκεῖ καναχὰν δόμῳ ἔνδοθι θείη,
τερπνὰ δι' ἀγλώσσου φθεγγομένα στόματος.

Requiescat in pace.

No comments:

Post a Comment