4612 lines
209 KiB
HTML
4612 lines
209 KiB
HTML
<HTML>
|
|
|
|
<HEAD>
|
|
<TITLE>WORDS 1.97F (LATIN-ENGLISH DICTIONARY) PROGRAM DOCUMENTATION</TITLE>
|
|
</HEAD>
|
|
<BODY>
|
|
|
|
|
|
<H1><CENTER>WORDS Version 1.97FC</CENTER>
|
|
<CENTER>LATIN-ENGLISH DICTIONARY PROGRAM</CENTER></H1>
|
|
|
|
<BR><BR>
|
|
|
|
<A HREF="#SUMMARY"><B>SUMMARY</B></A><BR>
|
|
<BR>
|
|
<A HREF="#INSTALLATION"><B>INSTALLATION</B></A><BR>
|
|
<A HREF="#Is There a Problem">Is There a Problem?</A><BR>
|
|
<BR>
|
|
|
|
<A HREF="#INTRODUCTION"><B>INTRODUCTION</B></A><BR>
|
|
<BR>
|
|
<A HREF="#OPERATIONAL DESCRIPTION"><B>OPERATIONAL DESCRIPTION</B></A><BR>
|
|
<A HREF="#Program Operation">Program Operation</A><BR>
|
|
<A HREF="#Modes of Operation">Modes of Operation</A><BR>
|
|
<A HREF="#Command Line Operation">Command Line Operation</A><BR>
|
|
<A HREF="#Latin-to-English Examples">Latin-to-English Examples</A><BR>
|
|
<A HREF="#English-to-Latin Examples">English-to-Latin Examples</A><BR>
|
|
<A HREF="#Design of the Meaning Line">Design of the Meaning Line</A><BR>
|
|
<A HREF="#Signs and Abbreviations in Meaning">Signs and Abbreviations in Meaning</A><BR>
|
|
<BR>
|
|
<A HREF="#PROGRAM DESCRIPTION"><B>PROGRAM DESCRIPTION</B></A><BR>
|
|
<A HREF="#Codes in Inflection Line">Codes in Inflection Line</A><BR>
|
|
<A HREF="#Help for Parameters">Help for Parameters</A><BR>
|
|
<A HREF="#Special Cases">Special Cases</A><BR>
|
|
<A HREF="#Uniques">Uniques</A><BR>
|
|
<A HREF="#Tricks">Tricks</A><BR>
|
|
<A HREF="#Trimming of uncommon results">Trimming of uncommon results</A><BR>
|
|
<BR>
|
|
<A HREF="#GUIDING PHILOSOPHY"><B>GUIDING PHILOSOPHY</B></A><BR>
|
|
<A HREF="#Purpose">Purpose</A> <BR>
|
|
<A HREF="#Method">Method</A><BR>
|
|
<A HREF="#Word Meanings">Word Meanings</A><BR>
|
|
<A HREF="#Proper Names">Proper Names</A><BR>
|
|
<A HREF="#Letter Conventions (u/v, i/j, w)">Letter Conventions (u/v, i/j, w)</A><BR>
|
|
<BR>
|
|
<A HREF="#DICTIONARY"><B>DICTIONARY</B></A><BR>
|
|
<A HREF="#Dictionary Codes">Dictionary Codes</A><BR>
|
|
<A HREF="#AGE">AGE</A><BR>
|
|
<A HREF="#AREA">AREA</A><BR>
|
|
<A HREF="#GEO">GEO</A><BR>
|
|
<A HREF="#FREQ">FREQ</A><BR>
|
|
<A HREF="#SOURCE">SOURCE</A><BR>
|
|
<A HREF="#Current Distribution of DICTLINE Flags">
|
|
Current Distribution of DICTLINE Flags</A><BR>
|
|
<A HREF="#Dictionary Conventions">Dictionary Conventions</A><BR>
|
|
<A HREF="#Evolution of the Dictionary">Evolution of the Dictionary</A><BR>
|
|
<A HREF="#Text Dictionary - DICTPAGE.TXT">Text Dictionary - DICTPAGE.TXT</A><BR>
|
|
<A HREF="#Latin Spellchecking - Text Processor List - LISTALL.ZIP">
|
|
Latin Spellchecking - Text Processor List - LISTALL.ZIP</A><BR>
|
|
<BR>
|
|
<A HREF="#INFLECTIONS"><B>INFLECTIONS</B></A><BR>
|
|
<BR>
|
|
<A HREF="#ENGLISH to LATIN"><B>ENGLISH to LATIN</B></A><BR>
|
|
<A HREF="#English Parsing of Meanings">English Parsing of Meanings</A><BR>
|
|
<A HREF="#Ordering English-to-Latin Output">Ordering English-to-Latin Output</A><BR>
|
|
<BR>
|
|
<A HREF="#TESTS AND STATUS"><B>TESTS AND STATUS</B></A><BR>
|
|
<A HREF="#Testing">Testing</A><BR>
|
|
<A HREF="#Current Status and Future Plans">Current Status and Future Plans</A><BR>
|
|
<BR>
|
|
<A HREF="#USER MODIFICATIONS"><B>USER MODIFICATIONS</B></A><BR>
|
|
<A HREF="#Writing DICT.LOC and UNIQUES.LAT">Writing DICT.LOC and UNIQUES.LAT</A><BR>
|
|
<A HREF="#DICT.LOC">DICT.LOC</A><BR>
|
|
<A HREF="#UNIQUES.LAT">UNIQUES.LAT</A><BR>
|
|
<BR>
|
|
<A HREF="#DEVELOPERS AND REHOSTING"><B>DEVELOPERS AND REHOSTING</B></A><BR>
|
|
<A HREF="#Program source code and data">Program source code and data</A><BR>
|
|
<A HREF="#License">License</A><BR>
|
|
<A HREF="#Rehosting WORDS">Rehosting WORDS</A><BR>
|
|
<A HREF="#Feedback">Feedback</A><BR>
|
|
<BR><BR>
|
|
|
|
|
|
<A NAME="SUMMARY">
|
|
<H2><CENTER>SUMMARY</CENTER>
|
|
</H2></A> <BR>
|
|
<BR>
|
|
|
|
<P>
|
|
This program, WORDS, takes keyboard input or a file of Latin text lines and
|
|
provides an analysis of each word individually. It uses an INFLECT.SEC,
|
|
UNIQUES.LAT, ADDONS.LAT, STEMFILE.GEN, INDXFILE.GEN, and DICTFILE.GEN, and
|
|
possibly .SPE and DICT.LOC.
|
|
<P>
|
|
The dictionary contains over 39000 entries, as would be counted in an
|
|
ordinary dictionary. This expands to almost twice that number of
|
|
individual stems (the count that the program may display at startup), and,
|
|
through additional word construction with hundreds of prefixes and
|
|
suffixes, may generate more, leading to many hundreds of thousands of
|
|
'words' that can be formed by declension and conjugation. This version of
|
|
WORDS provides a tool to help in translations for the Latin student. It
|
|
is now a large dictionary by any measure and can be helpful to advanced
|
|
users. The dictionary will continue to grow - slowly. <BR>
|
|
<BR>
|
|
|
|
|
|
<A NAME="INSTALLATION">
|
|
<H2><CENTER>INSTALLATION</CENTER>
|
|
</H2></A> <BR>
|
|
|
|
<P>
|
|
The WORDS program, with its accompanying data files should run on any
|
|
machine for which it is adapted, any monitor. Simply download the
|
|
self-extracting EXE files or the compressed file for the appropriate
|
|
system and execute/decompress it in your chosen subdirectory on the hard
|
|
disk, creating the necessary files. Then call/run WORDS, or do as instructed
|
|
in any README.
|
|
|
|
<P>The load includes SPQR.ICO, a possible icon for WORDS,
|
|
but just that, only an icon.
|
|
You have to install the program as per the directions
|
|
(put the downloaded files in a folder,
|
|
run them to expand to the WORDS system, then run from that folder).
|
|
However, If you are Windows-wise, you can use Explorer and
|
|
make a shortcut and put it on the desktop.
|
|
Windows will make a generic icon,
|
|
but you can change it (using Properties)
|
|
to whatever other icon you can find, for instance,
|
|
the one included with the package. Or not.
|
|
Make sure that the Properties on the icon
|
|
has as Target the WORDS.EXE
|
|
in the folder in which the system is loaded.
|
|
|
|
<P>
|
|
See the particular page for each specific system. <BR>
|
|
<A HREF="http://www.erols.com/whitaker/wordsdos.htm"><B>DOS</B></A><BR>
|
|
<A HREF="http://www.erols.com/whitaker/wordsw95.htm"><B>Windows 95/NT/98/2000/XP</B></A><BR>
|
|
<A HREF="http://www.erols.com/whitaker/wordslux.htm"><B>Linux and FreeBSD</B></A><BR>
|
|
<A HREF="http://www.erols.com/whitaker/wordsos2.htm"><B>OS/2</B></A><BR>
|
|
<A HREF="http://www.erols.com/whitaker/wordsmac.htm"><B>MAC OS X</B></A><BR>
|
|
<BR><BR>
|
|
|
|
<A NAME="Is There a Problem">
|
|
<H4>Is There a Problem?</H4></A>
|
|
|
|
<P>Did you download the two appropriate file(s) to your hard disk,
|
|
as listed in the download page for your system?
|
|
|
|
<P>Can you verify that they are there and full size (megabytes as indicated)?
|
|
|
|
<P>Did you execute/run/unzip these programs?
|
|
|
|
<P>If self-extracted, were you asked where to put the generated files?
|
|
(Maybe a default C:\WORDS)?
|
|
If not, did you put them in the folder/subdirectory from which you wish to operate?
|
|
|
|
<P>Can you verify that the full set of files (about 10 MB) was generated in that folder/subdirectory,
|
|
or wherever you chose? At least
|
|
WORDS.EXE, INFLECT.SEC, UNIQUES.LAT, ADDONS.LAT, STEMFILE.GEN, INDXFILE.GEN, and DICTFILE.GEN,
|
|
plus documentation.
|
|
|
|
<P>Did you run/execute WORDS in that folder/subdirectory? e.g. <BR>
|
|
<B>C:\WORDS</B>
|
|
|
|
<P>If when you try to run there is no WORDS.EXE (or equivalent),
|
|
the system should let you know.<BR>
|
|
If there is no INFLECTS.SEC, the program will say so and abort immediately.<BR>
|
|
If there are no dictionary files, the program will tell you, but will start
|
|
(you can get Roman numerals!).<BR>
|
|
If there is no ADDONS.LAT or UNIQUES.LAT, the program will tell you,
|
|
and if they are there it will tell you how many.<BR>
|
|
|
|
<BR><BR>
|
|
|
|
<A NAME="INTRODUCTION">
|
|
<H3><CENTER>INTRODUCTION</CENTER>
|
|
</H3></A><BR>
|
|
<BR>
|
|
|
|
<P>
|
|
I am no expert in Latin, indeed my training is limited to a couple of
|
|
years in high school more than 50 years ago. But I always felt that Latin, as
|
|
presented after two millennia, was a scientific language. It had the
|
|
interesting property of inflection, words were constructed in a logical
|
|
manner. I admired this feature, but could never remember the vocabulary
|
|
well enough when it came time to exercise it on tests.
|
|
<P>
|
|
I decided to automate an elementary-level Latin vocabulary list. As a
|
|
first stage, I produced a computer program that will analyze a Latin word
|
|
and give the various possible interpretations (case, person, gender,
|
|
tense, mood, etc.), within the limitations of its dictionary. This might
|
|
be the first step to a full parsing system, but, although just a
|
|
development tool, it is useful by itself.
|
|
<P>
|
|
<B>Please remember that this is only a computer exercise in automating a
|
|
Latin dictionary. I am not a Latin scholar and anything in the program or
|
|
documentation is filtered by me from reading the cited Latin dictionaries. Please
|
|
let no one go to his teacher and cite my interpretation as an authority. </B>
|
|
<P>
|
|
While developing this initial implementation, based on different sources,
|
|
I learned (or re-learned) something that I had overlooked at the
|
|
beginning. Latin courses, and even very large Latin dictionaries, are put
|
|
together under very strict ground rules. Some dictionary might be based
|
|
exclusively on 'Classical' (200 BC - 200 AD) texts; it might have every
|
|
word that appears in every surviving writing of Cicero, but nothing much
|
|
before or since. Such a dictionary will be inadequate for translating
|
|
medieval theological or scientific texts. In another example, one
|
|
textbook might use Caesar as their main source of readings (my high school
|
|
texts did), while another might avoid Caesar and all military writings
|
|
(either for pacifist reasons, or just because the author had taught Caesar
|
|
for 30 years and had grown bored with going over the same material, year
|
|
after year). One can imagine that the selection of words in such
|
|
different texts would differ considerably; moreover, even with the same
|
|
words, the meanings attached would be different. This presents a problem
|
|
in the development of a dictionary for general use.
|
|
<P>
|
|
One could produce a separate dictionary for each era and application or a
|
|
universal dictionary with tags to indicate the appropriate application and
|
|
meaning for each word. With such a tag arrangement one would not be
|
|
offered inappropriate or improbable interpretations. The present system
|
|
has such a mechanism, but it is not fully exploited.
|
|
<P>
|
|
The Version 1.97E dictionary may be found to be of fairly general use for
|
|
the student; it has all the easy words that every text uses. It also has the
|
|
adverbs, prepositions, and conjunctions, which are not as
|
|
sensitive to application as are the nouns and verbs. The system also
|
|
tests a few hundred prefixes and suffixes, if the raw word cannot be
|
|
found. Beyond that, there are a large number of TRICKS which may be applied.
|
|
These may be thought of as correcting for variations in spelling.
|
|
This allows an interpretation of many words which would otherwise
|
|
be marked unknown. The result of this analysis is fairly straightforward
|
|
in most cases, accurate but esoteric in some others. Some constructions
|
|
are recognized Latin words, and some are perfectly reasonable words which
|
|
may never have been used by Cicero or Caesar but might have been used by
|
|
Augustine or a monk of Jarrow. For about 1 in 10 constructed words the
|
|
result has no relation to the normal dictionary meaning.
|
|
<P>
|
|
BE WARNED! The program will go to great lengths if all tricks are
|
|
invoked. If you get a word formed with an enclitic, prefix, suffix, and
|
|
syncope, be very suspicious! It my well be right, but look carefully.
|
|
(Try siquempiamque!)
|
|
<P>
|
|
The final try is to look at the input as two words run together. In
|
|
most cases this works out, and is especially useful for late Latin number
|
|
usage. However, this algorithm may go very wrong. If it is not obviously
|
|
right, it is probably incorrect.
|
|
<P>
|
|
With this facility, and a 39000 word dictionary, trials on some tested
|
|
classical texts and the Vulgate Bible give hit rates of far better than
|
|
99%, excluding proper names (there are very few proper names in this
|
|
dictionary). (I am an old soldier so the dictionary may have
|
|
every possible word for attack or destroy. The system is near perfect for
|
|
Caesar.) The question arises, what hit rate can be expected for a general
|
|
dictionary. Classical Latin dictionaries have no references to the
|
|
terminology of Christian theology. The legal documents and deeds of the
|
|
Middle Ages are a challenge of jargon and abbreviations. These areas
|
|
require special knowledge and vocabulary, but even there the ability to
|
|
handle the non-specialized words is a large part of the effort.
|
|
<P>
|
|
The development system allows the inclusion of specialized vocabulary (for
|
|
instance a SPEcial dictionary for specialized words not wanted in most
|
|
dictionaries), and the opportunity for the user to add additional words to
|
|
a DICT.LOC.
|
|
<P>
|
|
It was initially expected that there would be special dictionaries for
|
|
special applications. That is why there is the possibility of a SPECIAL
|
|
dictionary. Now the general dictionary is coded by AGE and application
|
|
AREA. Thus special words used initially/only by St Thomas Aquinas would
|
|
be Medieval (AGE code F) and Ecclesiastical (AREA code E). Eventually
|
|
there needs to be a filter that allows one, upon setting parameters for
|
|
Medieval and Ecclesiastical, to push those words over others. Right now
|
|
there are not have enough non-classical vocabulary to rely on such a
|
|
scheme. The problem is that one needs a very complete classical
|
|
dictionary before one can assure that new entries are uniquely Medieval,
|
|
that they are not just classical words that appear in a Medieval text.
|
|
And the updated is only into the D's. So the situation is that the
|
|
mechanism is there, but not sufficient data. Nevertheless that is exactly
|
|
the application I had in mind when I set out to do the program.
|
|
<P>
|
|
One can set a parameter to exclude medieval words if there is a classical
|
|
word answering the same parse. Likewise, the program can ignore rare
|
|
meanings if there is a common meaning for the parse.
|
|
<P>
|
|
The program may be larger than is necessary for the present
|
|
application. It is still in development but some effort has now been put
|
|
into optimization. Nevertheless there is lots of room for speeding it up.
|
|
Specifically, the program is disk-oriented is order to run on small machines,
|
|
such as DOS with the 640KB limitation. Rejecting this limitation and assuming
|
|
that the user has tens of megabytes of memory (clearly realistic today)
|
|
would allow faster processing. The next version may go that way.
|
|
|
|
<P>
|
|
This is a free program, which means it is proper to copy it and pass it on
|
|
to your friends. Consider it a developmental item for which there is no
|
|
charge. However, just for form, it is Copyrighted (c).
|
|
Permission is hereby freely given for any and all use of program and data.
|
|
You can sell it as your own, but at least tell me.
|
|
|
|
<P>
|
|
This version is distributed without obligation, but the developer would
|
|
appreciate comments and suggestions.
|
|
<P>
|
|
<BR>
|
|
William A Whitaker <BR>
|
|
PO Box 3036 <BR>
|
|
McLean VA 22103-3036 <BR>
|
|
USA <BR>
|
|
whitaker@erols.com <BR>
|
|
<BR>
|
|
|
|
|
|
<A NAME="OPERATIONAL DESCRIPTION">
|
|
|
|
<H3><CENTER>OPERATIONAL DESCRIPTION</CENTER>
|
|
</H3></A> <BR>
|
|
|
|
<P>
|
|
This write up is rudimentary and assumes that the user is experienced with
|
|
computers, and as an example assumes a PC with a Windows OS.
|
|
Other systems operate essentially the same.
|
|
<P>
|
|
The WORDS program, Version 1.97E, with it's accompanying data files should
|
|
run on PC in Windows 95/98/NT, any monitor. Simply download the
|
|
self-extracting EXE file and execute it in your chosen subdirectory/folder to
|
|
UNZIP the files into a subdirectory of a hard disk. Then call WORDS.
|
|
<P>
|
|
There are a number of files associated with the program. These must be in
|
|
the subdirectory/folder of the program, and the program must be run from that
|
|
subdirectory. WORDS.EXE is the executable program. INFLECT.SEC holds the
|
|
encoded inflection records. STEMFILE.GEN contains the stems of the
|
|
GENERAL dictionary in a searchable form.
|
|
DICTFILE.GEN is an indexed form of the GENERAL dictionary entries with form
|
|
information and meanings. INDXFILE.GEN contains a set of indexes into the
|
|
DICTFILE. In some versions, there may be a set of files for a SPECIAL (.SPE)
|
|
dictionary of the same structure as the GENERAL dictionary, but there is
|
|
no SPECIAL dictionary in the present distribution. A LOCAL dictionary may
|
|
also be used. This is a limited dictionary of a different form, human
|
|
readable and writeable. The knowledgeable user can augment and modify it
|
|
on-line. It would consist of the file DICT.LOC. UNIQUES.LAT contains
|
|
certain words which regular processing does not get. ADDONS.LAT contains
|
|
the set of prefixes, suffixes and enclitics (-que, -ve) and the like.
|
|
Other files may be generated by the program, so run it in a configuration
|
|
that allows the creation of files.
|
|
<P>
|
|
All these files are necessary to run the program (except the optional
|
|
dictionaries SPE and LOC). This excess of files is a consequence of the
|
|
present developmental nature of the program. The files are very simple,
|
|
almost human-readable. Presumably, a later version could condense and
|
|
encode them. Nevertheless, beyond the original COPY, the user need not
|
|
worry about them.
|
|
<P>
|
|
Additionally, there are files that the program may produce on request.
|
|
All of these share the name WORD, with various extensions, and they are
|
|
all ASCII/DOS text files which can be viewed and processed with an ordinary
|
|
editor. The casual user may not want to get involved with
|
|
these. WORD.OUT will record the whole output, WORD.UNK will list only
|
|
words the program is unable to interpret. These outputs are turned on
|
|
through the PARAMETERS mechanism.
|
|
<P>
|
|
PARAMETERS may be changed while running the program by inputting a line
|
|
containing a '#' mark as the only (or first) character. Alternatively,
|
|
WORD.MOD contains the MODES that can be set by CHANGE_PARAMETERS. If this
|
|
file does not exist, default modes will be used. The file may be produced
|
|
or changed when changing parameters. It can also be modified, if the user
|
|
is sufficiently confident, with an editor, or deleted, thereby reverting
|
|
to defaults.
|
|
<P>
|
|
There is another set of developers parameters which may be set
|
|
with the input of '!'. These MODES may be changed and saved in a
|
|
file WORD.MDV. These are not normal user facilities, probably no one but
|
|
the developer would be interested. In any specific release these
|
|
facilities may, or may not, work. They are just mentioned here in case
|
|
they ever come up accidentally, and to point out that there are other
|
|
capabilities, actual and possible, which may be invoked if there is a
|
|
special need. The user is invited to review these parameters to see
|
|
if any address an unusual need.
|
|
<P>
|
|
WORD.OUT is the file produced if the user requests
|
|
output to a file. This output can be used for later manipulation with a
|
|
text editor, especially when the input was a text file of some length. If
|
|
the parameter UNKNOWNS_ONLY is set, the output serves as a sort of a Latin
|
|
spell checker. Those words it cannot match may just not be in the
|
|
dictionary, but alternatively they may be typos. A WORD.UNK file of
|
|
unknowns can be generated.
|
|
<BR>
|
|
<BR>
|
|
<A NAME="Program Operation">
|
|
<H4>Program Operation</H4></A>
|
|
|
|
<P>
|
|
To start the program, in the subdirectory that contains all the files,
|
|
type WORDS. A setup procedure will execute, processing files. Then the
|
|
program will ask for a word to be keyed in. Input the word and give a
|
|
return (ENTER). Information about the word will be displayed.
|
|
<P>
|
|
One can input a whole line at a time, however long,
|
|
but only one line since the return
|
|
at the end of line will start the processing. If the results would fill
|
|
more than a computer screen, the output is halted until the user responds
|
|
to the 'MORE' message with a return. A file containing a text, a series
|
|
of lines, can be input by keying in the character '@', followed (with no
|
|
spaces) by the DOS name of the file of text. This input file need not be
|
|
in the program subdirectory, just use the full path and name of the
|
|
file. This is usually accompanied with the setting of the parameter
|
|
switches to create and write to an output file, WORD.OUT.
|
|
<P>
|
|
One can have a comment in the file, a terminal portion of a line that is
|
|
not parsed. This could be an English meaning, a source where the word was
|
|
found, an indication that it may have been miscopied, etc. A comment
|
|
begins with a double dash [--] and continues to the end of the line. The
|
|
'--' and everything after on that line is ignored by the program.
|
|
<P>
|
|
A simple # character input at the start of a line (that is, a line
|
|
containing only #) will permit the user to set modes to prevent the
|
|
process from trying prefixes and suffixes to get a match on an item
|
|
unknown to the dictionary, put output to a file, etc. Going into the
|
|
CHANGE_PARAMETERS, the '?' character calls help for each entry.
|
|
<P>
|
|
Another set of parameters is invoked by the character !. These developer parameters
|
|
are fairly specialized and are probably not required by the average user,
|
|
nevertheless they are available for special applications.
|
|
<P>
|
|
Two successive returns with no text will terminate the program (except in
|
|
text being read from an @ disk file.)
|
|
|
|
<A NAME="Modes of Operation">
|
|
<H4>Modes of Operation</H4></A>
|
|
|
|
<P>The mode of operation of WORDS can be specialized by setting some combination
|
|
of available parameters. Here are a couple of example situations.
|
|
|
|
<P>If you want only meanings to show up, set the # parameter
|
|
<BR>
|
|
DO_ONLY_MEANINGS => Yes
|
|
<BR>
|
|
<P>If you do not even want to see the dictionary form (principle parts) set
|
|
# parameter
|
|
<BR>
|
|
DO_DICTIONARY_FORM => No
|
|
<BR>
|
|
<P>If you want to accept only the dictionary entry (amo, but not amas), set
|
|
the ! parameter (this is the tricky one, requiring two parameters set)
|
|
<BR>
|
|
DO_ONLY_INITIAL_WORD => Yes
|
|
<BR>
|
|
<P>This will ten require you to input one enrty per line, which is not
|
|
unreasonable for a dictionary look-up process. Then you will be offered
|
|
another, otherwise unavailable, option
|
|
<BR>
|
|
FOR_WORD_LIST_CHECK => Yes
|
|
<BR>
|
|
<P>There are a large number of other options. The user is invited
|
|
to consider all the options if needing anything more than the basic parse.
|
|
|
|
<P>Of course, for both sets of parameters, you will want to go to the end
|
|
of the parameter setting menu and save this set so you can restart with
|
|
the same situation.
|
|
|
|
|
|
|
|
<A NAME="Command Line Operation">
|
|
<H4>Command Line Operation</H4></A>
|
|
|
|
|
|
The main mode of usage for WORDS is a simple call, followed by screen interaction.
|
|
<P>
|
|
But there are other, command line, options.
|
|
WORDS may be called with arguments on the same line, in a number of different modes.
|
|
The program will execute with these arguments as input.
|
|
Remember that the saved parameter settings (in WORD.MOD and WORD.MDV)
|
|
are controlling, even for command line input.
|
|
|
|
<P>
|
|
Single argument, either a simple Latin word or an input file.
|
|
|
|
<P>
|
|
WORDS amo
|
|
<BR>which will cause it to execute for that input and then terminate. This is
|
|
for a quick word.
|
|
|
|
<P>
|
|
WORDS infile
|
|
<BR>causes WORDS to execute with the contents of the inflie.
|
|
The infile may be from any folder if the full path name is given.
|
|
|
|
<P>
|
|
With two arguments the options are: inputfile and outputfile,
|
|
two Latin words, or a language shift to English (Latin being the startup default)
|
|
and an English word (with no part of speech).
|
|
|
|
<P>
|
|
WORDS infile outfile
|
|
<BR>The program will read as input the INFILE and write
|
|
the output to the OUTFILE (as though it were WORD.OUT). It will then
|
|
await further input from the user. It terminates with a return. If the
|
|
parameters are not legal file names, the program will assume they are
|
|
Latin words to be processed as command line input.
|
|
|
|
<P>
|
|
WORDS amo amas
|
|
|
|
<P>
|
|
WORDS ^e love
|
|
<BR>switches to English input from the default Latin and searches for love.
|
|
|
|
<P>
|
|
With three arguments there could be three Latin words or a language shift
|
|
and and English word and part of speech.
|
|
|
|
<P>
|
|
WORDS amo amas amat
|
|
|
|
<P>
|
|
WORDS ^e love v
|
|
|
|
<P>
|
|
More than three arguments must all be Latin words.
|
|
|
|
<P>
|
|
WORDS amo amas amat amamus amatis amant
|
|
|
|
<P>
|
|
There cannot be more than one English word in the argument list,
|
|
since there can only be one English word per line for WORDS input.
|
|
|
|
|
|
<P>
|
|
An input file (either from interactive with @ or from command line)
|
|
can have changes of language, but the ^E or ^L must be on a seperate line.
|
|
Note that this capability can create confusing situations.
|
|
An input file that starts off Latin then switches to English will be
|
|
correctly processed. But if it is followed by a similiar input file, the
|
|
second file will start off English (from the setting in the earlier file) and fail
|
|
on the Latin input. Thus even submitting the same file twice in a run
|
|
will give different results. Ithis problem can be alleviated by starting each
|
|
input file with an explicit language instruction, but this will not normally be
|
|
the situation.
|
|
|
|
|
|
|
|
|
|
<A NAME="Latin-to-English Examples">
|
|
<H4>Latin-to-English Examples</H4>
|
|
<P>
|
|
Following are annotated examples of output. Examination of these will
|
|
give a good idea of the system. The present version may not match these
|
|
examples exactly - things are changing - but the principle is there. A
|
|
recent modification is the output of dictionary forms or 'principal parts'
|
|
(shown below for some examples).
|
|
|
|
<PRE><TT>=>agricolarum
|
|
agricol.arum N 1 1 GEN P M
|
|
agricola, agricolae N M [XAXBO]
|
|
farmer, cultivator, gardener, agriculturist; plowman, countryman, peasant;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
This is a simple first declension noun, and a unique interpretation. The
|
|
'1 1' means it is first declension, with variant 1. This is an internal
|
|
coding of the program, and may not correspond exactly with the grammatical
|
|
numbering. The 'N' means it is a noun. It is the form for genitive
|
|
(GEN), plural ('P'). The stem is masculine (M).
|
|
The stem is given as 'agricol' and the ending is
|
|
'arum'. The stem is normal in this case, but is a product of the program,
|
|
and may not always correspond to conventional usage.
|
|
|
|
<P>On the next line is given the expansion of the form that one might find
|
|
in a paper dictionary, the nominitive and genitive (agricola, agricolae).
|
|
The [XAXBO] is an internal code of the program and is documented below as Dictionary Codes.
|
|
Several codes are associated with each dictionary entry (presently AGE, AREA, GEO, FREQ, SOURCE).
|
|
These provide some information to enhance the interpretation of the dictionary entry.
|
|
In this case, the interesting piece is the B, which signifies
|
|
that this word is found frequently in texts, in the top 10 percent.
|
|
The O says it has been verified in the Oxford Latin Dictionary.
|
|
The A says it is an agrigultural word.
|
|
|
|
<P>The declension/conjugation numbers for nouns and verbs are
|
|
essentially arbitary (but will be familiar to Latin students).
|
|
The variants are complete inventions.
|
|
They have no real meaning, just codes for the program.
|
|
|
|
<P>(In the case of adjectives, they are even more arbitary,
|
|
although a Latin student might see how I came by them.
|
|
Again they are only codes for the program.
|
|
The initial release of the program did not put these out,
|
|
but there is some interest on the part of students, so they are now included.
|
|
The user may ignore them altogether.
|
|
There is no relation between the declension/variant codes of a noun
|
|
and the accompaning adjective.
|
|
They only agree in case, number, and gender (NOM S N),
|
|
which are listed in the output.)
|
|
|
|
|
|
<PRE><TT>=>feminae
|
|
femin.ae N 1 1 GEN S F
|
|
femin.ae N 1 1 DAT S F
|
|
femin.ae N 1 1 NOM P F
|
|
femin.ae N 1 1 VOC P F
|
|
femina, feminae N F [XXXAX]
|
|
woman; female;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
This word has several possible interpretations in case and number
|
|
(Singular and Plural). The gender is Feminine. Presumably, the user can
|
|
examine the adjoining words and reduce the set of possibilities.
|
|
|
|
<PRE><TT>=>cornu
|
|
corn.u N 4 1 ABL S F
|
|
cornus, cornus N F [XXXCO]
|
|
cornel-cherry-tree (Cornus mas); cornel wood; javelin (of cornel wood);
|
|
corn.u N 4 2 NOM S N
|
|
corn.u N 4 2 DAT S N
|
|
corn.u N 4 2 ABL S N
|
|
corn.u N 4 2 ACC S N
|
|
cornu, cornus N N [XXXAO]
|
|
horn; hoof; beak/tusk/claw; bow; horn/trumpet; end, wing of army; mountain top;
|
|
*
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is an example of another declension and two variants. The
|
|
Masculine (and few Feminine) (-us) nouns of the declension are '4 1' and the Neuter
|
|
(-u) nouns are coded as '4 2'. This word has both.
|
|
The horn parse is very frequent (A), while the cornel option (C) is
|
|
less so but still common.
|
|
|
|
|
|
<PRE><TT>=>ego
|
|
ego PRON 5 1 NOM S C
|
|
[XXXAX]
|
|
I, me; myself;</TT></PRE>
|
|
|
|
<P>
|
|
A pronoun is much like a noun. The gender is common (C), that is, it may
|
|
be masculine or feminine. For some odd words, especially including pronouns,
|
|
there is no dictionary form given.
|
|
|
|
<PRE><TT>=>illud
|
|
ill.ud PRON 6 1 NOM S N
|
|
ill.ud PRON 6 1 ACC S N
|
|
ille, illa, illud PRON [XXXAX]
|
|
that; those (pl.); also DEMONST; that person/thing; the well known; the former;
|
|
*
|
|
</TT></PRE>
|
|
<P>The asterisk means that there are other, less probable forms which have been
|
|
trimmed, but which may be recovered by running with the TRIM parameter reset.
|
|
|
|
<PRE><TT>=>hic
|
|
h.ic PRON 3 1 NOM S M
|
|
hic, haec, hoc PRON [XXXAX]
|
|
this; these (pl.); also DEMONST;
|
|
hic ADV POS
|
|
hic ADV [XXXCX]
|
|
here, in this place; in the present circumstances;</TT></PRE>
|
|
|
|
<P>
|
|
In this case there is a adjectival/demonstrative pronoun, or it may be an
|
|
adverb. The POS means that the comparison of the adverb is positive.
|
|
|
|
<PRE><TT>=>bonum
|
|
bon.um N 2 1 ACC S M
|
|
bonus, boni N M [XXXCO]
|
|
good/moral/honest/brave man; man of honor, gentleman; better/rich people (pl.);
|
|
bon.um N 2 2 NOM S N
|
|
bon.um N 2 2 ACC S N
|
|
bonum, boni N N [XXXAO]
|
|
good, good thing, profit, advantage; goods (pl.), possessions, wealth, estate;
|
|
bon.um ADJ 1 1 NOM S N POS
|
|
bon.um ADJ 1 1 ACC S M POS
|
|
bon.um ADJ 1 1 ACC S N POS
|
|
bonus, bona -um, melior -or -us, optimus -a -um ADJ [XXXAO]
|
|
good, honest, brave, noble, kind, pleasant, right, useful; valid; healthy;
|
|
*
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here we have an adjective, but it might also be a noun. The
|
|
interpretation of the adjective says that it is POSitive, and that
|
|
is the meaning listed, as is the convention for all dictionaries.
|
|
The user must generate form this the meanings for other comparisons.
|
|
Check the comparison value before deciding on the real meaning.
|
|
Again, there is an asterisk, indicating further inflected forms were trimmed out.
|
|
|
|
<PRE><TT>=>facile
|
|
facil.e ADJ 3 2 NOM S N POS
|
|
facil.e ADJ 3 2 ABL S X POS
|
|
facil.e ADJ 3 2 ACC S N POS
|
|
facilis, facile, facilior -or -us, facillimus -a -um ADJ [XXXAX]
|
|
easy, easy to do, without difficulty, ready, quick, good natured, courteous;
|
|
facile ADV POS
|
|
facile, facilius, facillime ADV [XXXBO]
|
|
easily, readily, without difficulty; generally, often; willingly; heedlessly;
|
|
*
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is an adjective or and adverb. Although they are related in meaning,
|
|
they are different words.
|
|
|
|
<PRE><TT>=>acerrimus
|
|
acerri.mus ADJ 3 3 NOM S M SUPER
|
|
acer, acris -e, acrior -or -us, acerrimus -a -um ADJ [XXXAO]
|
|
sharp, bitter, pointed, piercing, shrill; sagacious, keen; severe, vigorous;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here we have an adjective in the SUPERlative. The meanings are all
|
|
POSitive and the user must add the -est by himself.
|
|
|
|
<PRE><TT>=>optime
|
|
optime ADV SUPER
|
|
bene, melius, optime ADV [XXXAO]
|
|
well, very, quite, rightly, agreeably, cheaply, in good style; better; best;
|
|
opti.me ADJ 1 1 VOC S M SUPER
|
|
bonus, bona -um, melior -or -us, optimus -a -um ADJ [XXXAO]
|
|
good, honest, brave, noble, kind, pleasant, right, useful; valid; healthy;
|
|
</TT></PRE>
|
|
|
|
<P>Here is an adjective or and adverb, both are SUPERlative.
|
|
|
|
<PRE><TT>=>monuissemus
|
|
monu.issemus V 2 1 PLUP ACTIVE SUB 1 P
|
|
moneo, monere, monui, monitus V [XXXAX]
|
|
remind, advise, warn; teach; admonish; foretell, presage;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is a verb for which the form is PLUPerfect, ACTIVE, SUBjunctive, 1st
|
|
person, Plural. It is 2nd conjugation, variant 1.
|
|
|
|
<PRE><TT>=>amat
|
|
am.at V 1 1 PRES ACTIVE IND 3 S
|
|
amo, amare, amavi, amatus V [XXXAO]
|
|
love, like; fall in love with; be fond of; have a tendency to;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Another regular verb, PRESent, ACTIVE, INDicative.
|
|
|
|
<PRE><TT>=>amatus
|
|
amat.us VPAR 1 1 NOM S M PERF PASSIVE PPL
|
|
amo, amare, amavi, amatus V [XXXAO]
|
|
love, like; fall in love with; be fond of; have a tendency to;
|
|
amat.us ADJ 1 1 NOM S M POS
|
|
amatus, amata, amatum ADJ [XXXEO] uncommon
|
|
loved, beloved;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here we have the PERFect, PASSIVE ParticiPLe, in the NOMinative, Singular,
|
|
Masculine. In addition, there is the ADJective that is formed from
|
|
this participle. If the ADJective is common, it will likely have its own
|
|
dictionary entry. Sometimes there may be a special or idiomatic meaning
|
|
not obvious from the verb, or the meaning may stray from the original.
|
|
In this case, the verb is very frequent, but the use as a adjective is uncommon.
|
|
|
|
<PRE><TT>=>amatu
|
|
amat.u SUPINE 1 1 ABL S N
|
|
amo, amare, amavi, amatus V [XXXAO]
|
|
love, like; fall in love with; be fond of; have a tendency to;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is the SUPINE of the verb in the ABLative Singular.
|
|
|
|
<PRE><TT>=>orietur
|
|
ori.etur V 4 1 FUT IND 3 S
|
|
orior, oriri, oritus sum V DEP [XXXAO]
|
|
rise (sun/river); arise/emerge, crop up; get up (wake); begin; originate from;
|
|
be born/created; be born of, decend/spring from; proceed/be derived (from);
|
|
ori.etur V 3 1 FUT IND 3 S
|
|
orior, ori, ortus sum V DEP [XXXBO]
|
|
rise (sun/river); arise/emerge, crop up; get up (wake); begin; originate from;
|
|
be born/created; be born of, decend/spring from; proceed/be derived (from);
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
For DEPondent verbs the passive form is to be translated as if it were
|
|
active voice, so there is no VOICE given in the output.
|
|
|
|
<PRE><TT>=>ab
|
|
ab PREP ABL
|
|
ab PREP ABL [XXXAO]
|
|
by (agent), from (departure, cause, remote origin/time); after (reference);
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is a PREPosition that takes an ABLative for an object.
|
|
|
|
<PRE><TT>=>sine
|
|
sin.e N 2 2 NOM P N
|
|
sin.e N 2 2 ACC P N
|
|
sinum, sini N N [XXXCX]
|
|
bowl for serving wine, etc;
|
|
sin.e V 3 1 PRES ACTIVE IMP 2 S
|
|
sino, sinere, sivi, situs V [XXXAX]
|
|
allow, permit;
|
|
sine PREP ABL
|
|
sine PREP ABL [XXXAX]
|
|
without;
|
|
*
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is a PREPosition that might also be a Verb or a Noun.
|
|
While as a preperation it is so common that it is unlikely
|
|
that any other use would occur, there is no way to indicate that.
|
|
Just be reminded that the frequency given for a verb is for the
|
|
sum of all the couple of hundred forms of the verb, not just
|
|
the one form that is parsed.
|
|
|
|
<PRE><TT>=>contra
|
|
contra ADV POS
|
|
contra ADV [XXXAO]
|
|
facing, face-to-face, in the eyes; towards/up to; across; in opposite direction;
|
|
against, opposite, opposed/hostile/contrary/in reply to; directly over/level;
|
|
otherwise, differently; conversely; on the contrary; vice versa;
|
|
contra PREP ACC
|
|
contra PREP ACC [XXXAO]
|
|
against, facing, opposite; weighed against; as against; in resistance/reply to;
|
|
contrary to, not in conformance with; the reverse of; otherwise than;
|
|
towards/up to, in direction of; directly over/level with; to detriment of;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is a PREPosition that might also be an ADVerb. This is a very common
|
|
situation, with the meanings being much the same.
|
|
|
|
<PRE><TT>=>et
|
|
et CONJ
|
|
et CONJ [XXXAX]
|
|
and, and even; also, even; (et ... et = both ... and);
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is a straight CONJunction.
|
|
|
|
<PRE><TT>=>vae
|
|
vae INTERJ
|
|
vae INTERJ [XXXBX]
|
|
alas, woe, ah; oh dear; (Vae, puto deus fio - Vespasian); Bah!, Curses!;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Here is a straight INTERJection.
|
|
|
|
<PRE><TT>=>septem
|
|
septem NUM 2 0 X X X CARD
|
|
septem, septimus -a -um, septeni -ae -a, septie(n)s NUM [XXXAX]
|
|
7 - (CARD answers 'how many');</TT></PRE>
|
|
|
|
<P>
|
|
Numbers are recognized as such and given a value.
|
|
An additional provision is the attempt to recognize and display the value
|
|
of Roman numerals, even combinations of appropriate letters that do not
|
|
parse conventionally to a value but may be ill-formed Roman numerals.
|
|
|
|
<PRE><TT>=>VII
|
|
VII NUM 2 0 X X X CARD
|
|
7 as a ROMAN NUMERAL;
|
|
</TT></PRE>
|
|
|
|
|
|
<P>Beyond simple dictionary entry words, the program
|
|
can construct additional words with prefixes, suffixes and other ADDONS.
|
|
|
|
<PRE><TT>=>populusque
|
|
que TACKON
|
|
-que = and (enclitic, translated before attached word); completes plerus/uter;
|
|
popul.us N 2 1 NOM S M
|
|
populus, populi N M [XXXAO]
|
|
people, nation, State; public/populace/multitude/crowd; a following;
|
|
members of a society/sex; region/district (L+S); army (Bee);
|
|
</TT></PRE>
|
|
|
|
<P>Here the input word is recognized as a combination of a base word
|
|
and an enclitic (-que) tacked on. This particular enclitic is
|
|
extremely common and its omission, or the omission of the process
|
|
that handles it, would result in an very large number of UNKNOWNs
|
|
in the output.
|
|
|
|
|
|
|
|
<PRE><TT>=>pseudochristus
|
|
pseudo PREFIX
|
|
false, fallacious, deceitful; sperious; imitation of;
|
|
christ.us N 2 1 NOM S M
|
|
Christus, Christi N M [XEXAO]
|
|
Christ;</TT></PRE>
|
|
|
|
<P>Here there is a prefix and a base. The user must make the combination
|
|
into a word or phrase.
|
|
|
|
<P>
|
|
Generally, the meaning is given for the base word, as is usual for
|
|
dictionaries. For the verb, it will be a present meaning, even when the
|
|
tense given is perfect. For a noun, it will be the singular, and the user
|
|
must interpret when the form is plural.
|
|
|
|
<P>For an adjective, the positive meaning is given,
|
|
even if a comparative or superlative form is output.
|
|
The user is invited to expand to
|
|
comparative (-er) and superlative (-est).
|
|
For a few adjectives, the only stem in the dictionary is COMP or SUPER.
|
|
When there is just one comparison,
|
|
the WORDS dictionary gives that expanded meaning.
|
|
This might be considered inconsistant,
|
|
in that it expects the user to observe the FORM to interpret the meaning,
|
|
but it is consisent with ordinary dictionary practice.
|
|
|
|
|
|
<P>Initially there were more defective adjective entries.
|
|
I had accepted assertions in OLD or L+S and others like
|
|
'comparative does not exist'.
|
|
Later on I went over to the position that
|
|
even if theCicero did not use it, someone might.
|
|
I started generating COMP and SUPER where it seemed reasonable.
|
|
One can also count on a suffix to correct most omissions, and it will.
|
|
|
|
<P>Sometimes a word is constructed from a suffix and a stem of a different
|
|
part of speech.
|
|
Thus an adverb may be constructed from its adjective.
|
|
It will show the base adjective meaning and an indication of how to
|
|
make the adverb in English. The user must make the proper interpretation.
|
|
|
|
<P>
|
|
In some cases an adjective will be found that is a participle of a verb
|
|
that is also found. The participle meaning, as inferred by the user from
|
|
the verb meaning, is not superseded by the explicit adjective entry, but
|
|
supplemented by it with possible specialized meanings. <BR>
|
|
<BR>
|
|
|
|
|
|
|
|
<A NAME="English-to-Latin Examples">
|
|
<H4>English-to-Latin Examples</H4></A>
|
|
|
|
<P>~E (tilde E/e plus Enter/CR)
|
|
changes mode from Latin-to-English to English-to-Latin. ~L changes back.
|
|
|
|
<P>A single input English word is followed by the desired part of speech.
|
|
Omitting the part of speech defaults to all, which is not recommended
|
|
for any word which can be ambiguous. Since the program is looking for a
|
|
part of speech, it would be inconvenient to support the input of several
|
|
English words on a line. While a (@) file of words can be processed in the
|
|
English mode, it must be one word per line.
|
|
|
|
<P>Output looks much like a paper dictionary entry, with form, part of speech,
|
|
gender, etc. Also included are the WORDS coded declension/conjugation and the
|
|
TRANS flags, which give age, frequency and source, information for the user
|
|
in selecting the best trnslation. The output may also contain a vertical bar
|
|
leading the meaning. This is a continuation symbol which states that there
|
|
are other meanings for the Latin word. The user might want to run the Latin
|
|
phase of WORDS to get the full set of meanings so that no unintended conflicts
|
|
appear.
|
|
|
|
|
|
<PRE><TT>
|
|
love v
|
|
|
|
amo, amare, amavi, amatus V 1 1 [XXXAO]
|
|
love, like; fall in love with; be fond of; have a tendency to;
|
|
|
|
diligo, diligere, dilexi, dilectus V 3 1 [XXXAX]
|
|
select, pick, single out; love, value, esteem; approve, aspire to, appreciate;
|
|
|
|
amo, amare, additional, forms V 9 1 [BXXEO]
|
|
love, like; fall in love with; be fond of; have a tendency to;
|
|
|
|
ardeo, ardere, arsi, arsus V 2 1 [XXXAO]
|
|
be on fire; burn, blaze; flash; glow, sparkle; rage; be in a turmoil/love;
|
|
|
|
adamo, adamare, adamavi, adamatus V 1 1 TRANS [XXXBO]
|
|
fall in love/lust with; love passionately/adulterously; admire greatly; covet;
|
|
|
|
deamo, deamare, deamavi, deamatus V 1 1 TRANS [XXXCO]
|
|
love dearly; be passionately/desperately in love with; be delighted with/obliged
|
|
*
|
|
|
|
in prep
|
|
|
|
in PREP ABL [XXXAX]
|
|
in, on, at (space); in accordance with/regard to/the case of; within (time);
|
|
|
|
ante PREP ACC [XXXAO]
|
|
in front/presence of, in view; before (space/time/degree); over against, facing;
|
|
|
|
super PREP ABL [XXXAX]
|
|
over (space), above, upon, in addition to; during (time); concerning; beyond;
|
|
|
|
in PREP ACC [XXXAX]
|
|
into; about, in the mist of; according to, after (manner); for; to, among;
|
|
|
|
prae PREP ABL [XXXAX]
|
|
before, in front; in view of, because of;
|
|
|
|
praeter PREP ACC [XXXAX]
|
|
besides, except, contrary to; beyond (rank), in front of, before; more than;
|
|
*
|
|
|
|
in
|
|
|
|
intro ADV [XXXAX]
|
|
within, in; to the inside, indoors;
|
|
|
|
in PREP ABL [XXXAX]
|
|
in, on, at (space); in accordance with/regard to/the case of; within (time);
|
|
|
|
gener, generi N 2 3 M [XXXBX]
|
|
son-in-law;
|
|
|
|
baro, baronis N 3 1 M [XXXBL]
|
|
baron; magnate; tenant-in-chief (of crown/earl); burgess; official; husband;
|
|
|
|
sororius, sorori(i) N 2 4 M [XXXCX]
|
|
sister's husband, brother-in-law;
|
|
|
|
socrus, socrus N 4 1 M [XXXCX]
|
|
father-in-law; spouse's grandfather/great grandfather;
|
|
*
|
|
|
|
kill v
|
|
|
|
occido, occidere, occidi, occisus V 3 1 [XXXAX]
|
|
kill, murder, slaughter, slay; cut/knock down; weary, be the death/ruin of;
|
|
|
|
interficio, interficere, interfeci, interfectus V 3 1 [XWXAX]
|
|
kill; destroy;
|
|
|
|
consumo, consumere, consumpsi, consumptus V 3 1 TRANS [XXXAO]
|
|
burn up, destroy/kill; put end to; reduce/wear away; annul; extinguish (right);
|
|
|
|
perago, peragere, peregi, peractus V 3 1 [XXXAX]
|
|
disturb; finish; kill; carry through to the end, complete;
|
|
|
|
dejicio, dejicere, dejeci, dejectus V 3 1 TRANS [XXXAS]
|
|
|overthrow, bring down, depose; kill, destroy; shoot/strike down; fell (victim);
|
|
|
|
deicio, deicere, dejeci, dejectus V 3 1 TRANS [XXXAO]
|
|
|overthrow, bring down, depose; kill, destroy; shoot/strike down; fell (victim);
|
|
*
|
|
|
|
death n
|
|
|
|
mors, mortis N 3 3 F [XXXAX]
|
|
death; corpse; annihilation;
|
|
|
|
fatum, fati N 2 2 N [XPXAX]
|
|
utterance, oracle; fate, destiny; natural term of life; doom, death, calamity;
|
|
|
|
funus, funeris N 3 2 N [XXXAX]
|
|
burial, funeral; funeral rites; ruin; corpse; death;
|
|
|
|
nex, necis N 3 1 F [XXXBX]
|
|
death; murder;
|
|
|
|
letum, leti N 2 2 N [XXXBX]
|
|
death, ruin, annihilation; death and destruction;
|
|
|
|
Orcus, Orci N 2 1 M [XXXBX]
|
|
god of the underworld, Dis; death; the underworld;
|
|
*
|
|
|
|
destruction n
|
|
|
|
cinis, cineris N 3 1 C [XXXAO]
|
|
ashes; embers, spent love/hate; ruin, destruction; the grave/dead, cremation;
|
|
|
|
pestis, pestis N 3 3 F [XXXBX]
|
|
plague, pestilence, curse, destruction;
|
|
|
|
exitium, exiti(i) N 2 4 N [XXXBX]
|
|
destruction, ruin; death; mischief;
|
|
|
|
ruina, ruinae N 1 1 F [XXXBX]
|
|
fall; catastrophe; collapse, destruction;
|
|
|
|
interitus, interitus N 4 1 M [XXXBX]
|
|
ruin; violent/untimely death, extinction; destruction, dissolution;
|
|
|
|
excidium, excidi(i) N 2 4 N [XXXCX]
|
|
ruin, destruction, military destruction; overthrow;
|
|
*
|
|
</TT></PRE>
|
|
|
|
|
|
<P>While six prioritized translations may seem like enough,
|
|
and they will likely cover the needs of a student, the full set
|
|
(setting # parameter to not TRIM) contains much valuable information
|
|
for the advanced translator. For instance for the verb live vivo
|
|
usually works, but there are other options associated with specific
|
|
situations: cohabito meand live together, ruror means live in the country,
|
|
adjaceo means live near, judaizo means live in the Jewish manner keeping the law.
|
|
These sorts of meanings are often conveyed in Latin by a single word,
|
|
while in English one might just use live and a modifing word or phrase.
|
|
|
|
<A NAME="Design of the Meaning Line">
|
|
<H4>Design of the Meaning Line</H4></A>
|
|
|
|
<P>The role and complexity of the WORDS meaning line has evolved over time.
|
|
Initially it reflected an elementry, back-of-the-book, textbook dictionary
|
|
with a single word or two for each entry.
|
|
Nevertheless, the size of the MEAN element was set at 80 characters
|
|
(as God, Holerith and IBM decreed),
|
|
as appropriate for a standard computer screen in text mode.
|
|
(Depending on the system and mode of display, the output
|
|
may be limited to 78 or 79 characters, but the traditional 80
|
|
characters of the century-old IBM card was chosen.
|
|
They will likely appear on printed output.)
|
|
|
|
|
|
<P>With expansion of the dictionary beyond a few thousand elementary seentries
|
|
and the extensive inclusion of the Oxford dictionaries,
|
|
a much larger set of possible interpretations surfaced for many words,
|
|
filling and exceeding the 80 character limit.
|
|
A certain disipline was introduced to structure the line.
|
|
|
|
|
|
<P>Through the many phases of development of the
|
|
dictionary, standards were developed and modified and
|
|
rigor was not always maintained, therefore the rules
|
|
below are generally, but not universally, observed.
|
|
Evolution of the dictionary is bringing it more closely
|
|
in line with these rules.
|
|
<BR><BR>
|
|
|
|
|
|
<P>A decision was made to include as many meanings and synonyms as
|
|
convenient. The OLD will sometimes list a dozen or more meaning
|
|
groups with notably different senses, each with several similiar meanings.
|
|
Presumably these different meanings were the product of different
|
|
translations of the Latin word, different translators,
|
|
different context, and different eras.
|
|
The WORDS dictionary includes many of these synonyms, and specifically adds
|
|
some more modern ones, in order to give the user inspiration
|
|
for his translation.
|
|
Further, it is important to give the user the full flavor of the word
|
|
that various translations employ. A word with a nominal meaning of
|
|
respect may be found to also mean fear (which may be the basis of all
|
|
respect for the Romans), and that will certainly color the interpretation
|
|
of a passage.
|
|
Going the other way, one might not want to
|
|
apply it to a discription of Mother Teressa.
|
|
Also one should be warned if an otherwise simple word also is used as a rude
|
|
reference to female anatomy.
|
|
|
|
<P>There are a couple of other factors that may influence the user in determining
|
|
the appropriate meaning from the list. Some words have different meanings depending
|
|
on the age. If one is reading a text written recently in modern Latin, one must
|
|
consider hints about the meaning. While the classical meaning, the WORDS default,
|
|
may be appropriate, if there is a line with a late AGE code or an indication
|
|
of a modern dictionary source (e.g,. Cal), the user should take this into consideration.
|
|
<BR><BR>
|
|
|
|
<A NAME="Signs and Abbreviations in Meaning">
|
|
<H5>Signs and Abbreviations in Meaning</H5></A>
|
|
|
|
, [comma] is used to separate meanings that are similar. The philosophy
|
|
has been to list a number of synonyms just to key the reader in making his
|
|
translation.<BR>
|
|
<BR>
|
|
; [semicolon] is used to separate sets of meanings that differ in intent.
|
|
This is just a general tendency and is not always rigorously enforced. <BR>
|
|
<BR>
|
|
: [colon] is used with an AREA code to specify a single special meaning
|
|
appropriate for that AREA in a series of general meanings. For example,
|
|
L: has the same impact as (legal) before or after a defination in meaning.
|
|
This supplements the use of the AREA code in the set of flags, which
|
|
implies that all or most of the meanings are associated with that area.<BR>
|
|
<BR>
|
|
/ [solidus] means 'or' or gives an alternative word. It sometimes
|
|
replaces the comma and is often used to compress the meaning into a short
|
|
line. <BR>
|
|
<BR>
|
|
(...) [parentheses] set off and optional word or modifier, e.g., '(nearly)
|
|
white' means 'white' or 'nearly white', (matter in) dispute means either
|
|
the matter in dispute or the dispute itself. They are also used to set
|
|
off an explanation, further information about the word or meaning, or an
|
|
example of a translation or a word combination. <BR>
|
|
<BR>
|
|
? [question mark] in a meaning implies a doubt about the interpretation,
|
|
or even about the existence of the word at all. For the purposes of this
|
|
program, it does not matter much. If the dubious word does not exist, no
|
|
one will ask for it. If it appears in his text, the reader is warned that
|
|
the interpretation may be questionable to some degree, but is what is
|
|
available. May indicate somewhat more doubt than (perh.). <BR>
|
|
<BR>
|
|
~ [tilde] stands for the stem or word in question. Usually it does not
|
|
have an ending affixed, as is the convention in other dictionaries, but
|
|
represents the word with whatever ending is proper. It is just a space
|
|
saving shorthand or abbreviation. <BR>
|
|
<BR>
|
|
{~ [tilde] also is the flag for changing the language base. ~E (plus Enter/CR)
|
|
changes from Latin-to-English to English-to-Latin. ~L changes back.)<BR>
|
|
<BR>
|
|
=> in meaning this indicates a translation example. <BR>
|
|
<BR>
|
|
abb. abbreviation. <BR>
|
|
<BR>
|
|
(Dif) - [Diferrari] is used to indicate an additional meaning taken from A
|
|
Latin-English Dictionary of St. Thomas Aquinas by Roy J. Diferrari. This
|
|
is singled out because of the importance of Aquinas. The reference is to
|
|
be applied from the last semicolon before the mark. It is likely that the
|
|
meaning diverges from the base by being medieval and ecclesiastical, but
|
|
not so overwhelming as to deserve a separate entry. <BR>
|
|
<BR>
|
|
(Douay) is used to designate those words for which the meaning has been
|
|
derived or modified by examination of the Douay translation of the Latin
|
|
Vulgate Bible of St Jerome. <BR>
|
|
<BR>
|
|
(eccl.) ecclesiastical - designating a special church meaning in a list of
|
|
conventional meanings, an additional meaning not sufficient to justify a
|
|
separate entry with an ecclesiastical code. <BR>
|
|
<BR>
|
|
esp. [especially] - indicates a significant association, but is only
|
|
advisory. <BR>
|
|
<BR>
|
|
(King James) or (KJames) is used to designate those words for which the
|
|
meaning has been derived or modified by examination of the King James
|
|
Bible in connection with the Latin Vulgate Bible of St Jerome. <BR>
|
|
<BR>
|
|
(KLUDGE) This indicates that the particular form is distorted in order to
|
|
make it come out correctly. This usually takes the form of a special
|
|
conjugational form applied to a few words, not applicable to other words
|
|
of the same conjugation or declension. The user can expect the form and
|
|
meaning to be correct, but the numerical coding will be odd. <BR>
|
|
<BR>
|
|
(L+S) [Lewis and Short] is used to indicate that the meaning starting from
|
|
the previous semicolon is information from Lewis and Short 'A Latin
|
|
Dictionary' that differs from, or significantly expands on, the meaning in
|
|
the 'Oxford Latin Dictionary' (OLD) which is the baseline for this
|
|
program. This is not to imply that the meaning listed is otherwise taken
|
|
directly from the OLD, just that it is not inconsistent with OLD, but the
|
|
L+S information either inconsistent (likely OLD knows better) or Lewis and
|
|
Short has included meanings appropriate for late Latin writers beyond the
|
|
scope of OLD. The program is just warning the reader that there may be
|
|
some difference. There are cases in which this indication occurs in
|
|
entries that have Lewis and Short as the source. In those cases, the
|
|
basic word is in OLD but the entry is a variant form or spelling not cited
|
|
there. There are cases where OLD and L+S give somewhat different
|
|
spellings and meanings for the 'same' word (same in the sense that both
|
|
dictionaries point to the same citation). In these cases a combination of
|
|
meanings are given for both entries with the (L+S) code distinction and
|
|
the entries of different spelling or declension have the SOURCE coded. <BR>
|
|
<BR>
|
|
NT [New Testament] is a reference in the Bible.
|
|
<BR>
|
|
(OLD) [Oxford Latin Dictionary] is used to indicate an additional meaning
|
|
taken from the Oxford Latin Dictionary in an entry that is otherwise
|
|
attributed. While it is usually true that if a classical word has other
|
|
than OLD as the listed source then it does not appear in that form in OLD,
|
|
this is not always the case. On occasion some other dictionary gives a
|
|
much better or more complete and understandable definition and the honor
|
|
of source is thereto given. <BR>
|
|
<BR>
|
|
OT [Old Testament] is a reference in the Bible.
|
|
<BR>
|
|
Other source indicators are occasionally used and are indicated
|
|
in the general discription of SOURCE below.
|
|
<BR><BR>
|
|
(PASS) [passive] - indicates a special, unexpected meaning for the passive
|
|
form of the verb, not easily associated with the active meaning.
|
|
In addition this is often used to remind the user that compounds of facio
|
|
form the passive by using the active of fio. Ex: calefio (calefacio PASS).
|
|
There may be more translation information in the base word cited and
|
|
the user is encouraged to refer to it.<BR>
|
|
<BR>
|
|
perh. [perhaps] - denotes an additional uncertainty, but not as strong as
|
|
(?). <BR>
|
|
<BR>
|
|
(pl.) [plural] means that the Latin word is believed by scholars to be
|
|
used (almost) always in the plural form, with the meaning stated, even
|
|
though that meaning in English may be singular. If it appears in the
|
|
beginning of the meaning, before the first comma, it applies to all the
|
|
meanings. If it appears later, it applies only to that and later
|
|
meanings. For the purpose of this program, this is only advisory. While
|
|
it is used by some tools to find the expected dictionary entry, the
|
|
program does not necessarily exclude a singular form in the output. While it may be
|
|
true that in good, classical Latin it is never used in the singular, this
|
|
does not mean that some text somewhere might not use the singular, nor
|
|
that it is uncommon in later Latin. The TRIM_OUTPUT option may cause only plural
|
|
forms to appear, with no TRIM_OUTPUT the singular will be shown. <BR>
|
|
<BR>
|
|
prob. [probably] - denotes some uncertainty, but not as much as
|
|
(perh.). <BR>
|
|
<BR>
|
|
pure Latin ... indicates a pure Latin term for a word which is derived
|
|
from another language (almost certainly Greek). <BR>
|
|
<BR>
|
|
(rude) - indicates that this meaning was used in a rude, vulgar, coarse,
|
|
or obscene manner, not what one should hear in polite company. Such use
|
|
is likely from graffiti or epigrams, or in plays in which the dialogue is
|
|
to indicate that the characters are low or crude. Meanings given by the
|
|
program for these words are more polite, and the user is invited to
|
|
substitute the current street language or obscenity of his choice to get
|
|
the flavor of text. <BR>
|
|
<BR>
|
|
(sg.) [singular] means that the Latin word is believed by scholars to be
|
|
used always in the singular. If it appears in the beginning of the
|
|
meaning, before the first comma, it applies to all the meanings. If it
|
|
appears later, it applies only to that and later meanings. For the
|
|
purpose of this program, this is only advisory. <BR>
|
|
<BR>
|
|
usu. [usually] is weakly advisory. (usu. pl.) is even weaker than (pl.)
|
|
and may imply that the plural tendency occurred only during certain periods.
|
|
<BR>
|
|
<BR>
|
|
w/ means 'with'.
|
|
<BR>
|
|
<BR>
|
|
|
|
|
|
|
|
<A NAME="PROGRAM DESCRIPTION">
|
|
<H3><CENTER>PROGRAM DESCRIPTION</CENTER>
|
|
</H3></A> <BR>
|
|
|
|
<P>
|
|
A effect of the program is to derive the structure and meaning of
|
|
individual Latin words. A procedure was devised to: examine the ending of
|
|
a word, compare it with the standard endings, derive the possible stems
|
|
that could be consistent, compare those stems with a dictionary of stems,
|
|
eliminate those for which the ending is inconsistent with the dictionary
|
|
stem (e.g., a verb ending with a noun dictionary item), if unsuccessful,
|
|
it tries with a large set of prefixes and suffixes, and various tackons
|
|
(e.g., -que), finally it tries various 'tricks' (e.g., 'ae' may be
|
|
replaced by 'e', 'inp...' by 'imp...', syncope, etc.), and it reports any
|
|
resulting matches as possible interpretations.
|
|
<P>
|
|
With the input of a word, or several words in a line, the program returns
|
|
information about the possible accedience, if it can find an agreeable
|
|
stem in its dictionary.
|
|
|
|
<PRE><TT>=>amo
|
|
am.o V 1 1 PRES ACTIVE IND 1 S
|
|
love, like; fall in love with; be fond of; have a tendency to</TT></PRE>
|
|
|
|
<P>
|
|
To support this method, an INFLECT.SEC data file was constructed
|
|
containing possible Latin endings encoded by a structure that identifies
|
|
the part of speech, declension, conjugation, gender, person, number, etc.
|
|
This is a pure computer encoding for a 'brute force' search. No
|
|
sophisticated knowledge of Latin is used at this point. Rules of thumb
|
|
(e.g., the fact, always noted early in any Latin course, that a neuter
|
|
noun has the same ending in the nominative and accusative, with a final -a
|
|
in the plural) are not used in the search. However, it is convenient to
|
|
combine several identical endings with a general encoding (e.g., the
|
|
endings of the perfect tenses are the same for all verbs, and are so
|
|
encoded, not replicated for every conjugation and variant).
|
|
<P>
|
|
Many of the distinguishing differences identifying conjugations come from
|
|
the voiced length of stem vowels (e.g., between the present, imperfect and
|
|
future tenses of a third conjugation I-stem verb and a fourth conjugation
|
|
verb). These aural differences, the features that make Latin 'sound
|
|
right' to one who speaks it, are not relevant in the analysis of written
|
|
endings.
|
|
<P>
|
|
The endings for the verb conjugations are the result of trying to minimize
|
|
the number of individual endings records, while yet keeping the structure
|
|
of the inflections data file fairly readable. There is no claim that the
|
|
resulting arrangement is consonant with any grammarian's view of Latin,
|
|
nor should it be examined from that viewpoint. While it started from the
|
|
conjugations in text books, it can only be viewed as some fuzzy
|
|
intermediate step along a path to a mathematically minimal number of
|
|
encoded verb endings. Later versions of the program might improve the
|
|
system.
|
|
<P>
|
|
There are some egregious liberties taken in the encoding. With the
|
|
inclusion of two present stems, the third conjugation I-stem verbs may
|
|
share the endings of the regular third conjugation. The fourth
|
|
conjugation has disappeared altogether, and is represented internally as a
|
|
variant of the third conjugation (3, 4), but this is
|
|
replaced for the user in output by 4 1. There is an artificial fifth
|
|
conjugation for esse and others, a sixth for eo, and a seventh for other
|
|
irregularities.
|
|
<P>
|
|
As an example, a verb ending record has the structure:
|
|
<BR>PART -- the part code for a verb = V;
|
|
<BR>CONjugation -- consisting of two parts:
|
|
<BR>WHICH -- a conjugation identifier - range 0..9 and
|
|
<BR>VAR -- a variant identifier on WHICH - range 0..9;
|
|
<BR>TENSE -- an enumeration type - range PRES..FUTP + X;
|
|
<BR>VOICE -- an enumeration type - range ACTIVE..PASSIVE + X;
|
|
<BR>MOOD -- an enumeration type - range IND..PPL + X;
|
|
<BR>PERSON -- person, first to third - range 1..3 + 0;
|
|
<BR>NUMBER -- an enumeration type - range S..P + X;
|
|
<BR>KEY -- which stem to be used - range 1..4;
|
|
<BR>SIZE -- number of characters - range 0..9;
|
|
<BR>ENDING -- the ending as a string of SIZE characters;
|
|
<BR>AGE and FREQ flags which are not usually visible to the user.
|
|
<P>
|
|
Thus, the entry for the ending appropriate to 'amo' (with STEM = am) is:
|
|
|
|
<PRE><TT>V 1 1 PRES IND ACTIVE 1 S X 1 o</TT></PRE>
|
|
|
|
<P>
|
|
The elements are straightforward and generally use the
|
|
abbreviations that are common in any Latin text. An X or 0 represents the
|
|
'don't know' or 'don't care' for enumeration or numeric types. Details
|
|
are documented below in the CODES section.
|
|
<P>
|
|
A verb dictionary record has the structure:
|
|
<BR>STEMS -- for a verb there are 4 stems;
|
|
<BR>PART -- part code for a verb = V
|
|
<BR>WHICH -- a conjugation identifier - range 0..9
|
|
<BR>VAR -- a variant identifier - range 0..9;
|
|
<BR>KIND -- enumeration type of verb - range TO_BE..PERFDEF + X;
|
|
<BR>AGE, AREA, GEO, FREQ, and SOURCE flags
|
|
<BR>MEANING -- text for English translations (up to 80 characters).
|
|
<P>
|
|
Thus, an entry corresponding to 'amo amare amavi amatus' is:
|
|
|
|
<PRE><TT>am am amav amat
|
|
V 1 1 X X X X X X
|
|
love, like; fall in love with; be fond of; have a tendency to</TT></PRE>
|
|
|
|
|
|
<P>
|
|
Endings may not uniquely determine which stem, and therefore the right
|
|
meaning. 'portas' could be the accusitive plural of 'gate', or the second
|
|
person, singular, present indicative active of 'carry'. In both cases the
|
|
stem is 'port'. All possibilities are reported.
|
|
|
|
<PRE><TT>portas
|
|
port.as V 1 1 PRES IND ACTIVE 2 S X
|
|
carry, bring
|
|
|
|
port.as N 1 1 ACC P F T
|
|
gate, entrance; city gates; door; avenue;</TT></PRE>
|
|
|
|
<P>
|
|
And note that the same stem (port) has other uses (portus = harbor).
|
|
|
|
|
|
<PRE><TT>portum
|
|
port.um N 4 1 ACC S M T
|
|
port, harbor; refuge, haven, place of refuge</TT></PRE>
|
|
|
|
<P>
|
|
PLEASE NOTE: It is certainly possible for the program to find a valid
|
|
Latin construction that fits the input word and to have that
|
|
interpretation be entirely wrong in the context. It is even possible to
|
|
interpret a number, in Roman numerals, as a word! (But the number would
|
|
be reported also.)
|
|
|
|
<P>
|
|
For the case of defective verbs, the process does not necessarily have to
|
|
be precise. Since the purpose is only to translate from Latin, even if
|
|
there are unused forms included in the algorithm these will not come up
|
|
in any real Latin text. The endings for the verb conjugations are the
|
|
result of trying to minimize the number of individual endings records,
|
|
while keeping the structure of the base INFLECTIONS data file fairly
|
|
readable.
|
|
<P>
|
|
In general the program will try to construct a match with the inflections
|
|
and the dictionaries. There are some specific checks to reject
|
|
certain mathematically correct combinations that do not appear in the
|
|
language, but these checks are relatively few. The philosophy has been to
|
|
allow a generous interpretation. A remark in a text or dictionary that a
|
|
particular form does not exist must be tempered with the realization that
|
|
the author probably means that it has not been observed in the surviving
|
|
classical literature. This body of reference is minuscule compared to the
|
|
total use of Latin, even limited to the classical period. Who is to say
|
|
that further examples would not turn up such an example, even if it might
|
|
not have been approved of by Cicero. It is also possible that such
|
|
reasonable, if 'improper', constructs might occur in later writings by
|
|
less educated, or just different, authors. Certainly English shows this
|
|
sort of variation over time.
|
|
<P>
|
|
If the exact stem is not found in the dictionary, there are rules for the
|
|
construction of words which any student would try. The simplest situation
|
|
is a known stem to which a prefix or suffix has been attached. The method used
|
|
by the program (if DO_FIXES is on, default is Yes) is to try any fixes that fit,
|
|
to see if their removal results in an identifiable remainder. Then the
|
|
meaning is mechanically implied from the meaning of the fix and the
|
|
stem. The user may need to interpret with a more conventional English
|
|
usage. This technique improves the hit performance significantly. However,
|
|
in about 40% of the instances in which there is a hit, the derivation is
|
|
correct but the interpretation takes some imagination. In something less
|
|
than 10% of the cases, the inferred fix is just wrong, so the user must
|
|
take some care to see if the interpretation makes any sense.
|
|
<P>
|
|
This method is complicated by the tendency for prefixes to be modified
|
|
upon attachment (ab+fero = aufero, sub+fero = suffero). The program's
|
|
'tricks' take many such instances into account. Ideally, one should look
|
|
inside the stem for identifiable fragments. One would like to start with
|
|
the smallest possible stem, and that is most frequently the correct one.
|
|
While it is mathematically possible that the stem of 'actorum' is 'actor'
|
|
with the common inflection 'um', no intuitive first semester Latin student
|
|
would fail to opt for the genitive plural 'orum', and probably be right.
|
|
To first order, the procedure ignores such hints and may report this word in
|
|
both forms, as well as a verb participle. However, it can use certain
|
|
generally applicable rules, like the superlative characteristic 'issim',
|
|
to further guess.
|
|
<P>
|
|
In addition, there is the capability to examine the word for such common
|
|
techniques as syncope, the omission of the 've' or 'vi' in certain verb
|
|
perfect forms (audivissem = audissem).
|
|
<P>
|
|
If the dictionary can not identify a matching stem, it may be possible to
|
|
derive a stem from 'nearby' stems (an adverb from an adjective is the most
|
|
common example) and infer a meaning. If all else fails, a portion of the
|
|
possible dictionary stems can be listed, from which the user can draw in
|
|
making his own guess. <BR>
|
|
|
|
<A NAME="Codes in Inflection Line">
|
|
|
|
<H4>Codes in Inflection Line</H4></A>
|
|
<P>
|
|
For completeness, the enumeration codes used in the output are listed here
|
|
from the Ada statements. Simple numbers are used for person, declension,
|
|
conjugations, and their variants. Not all the facilities implied by these
|
|
values are developed or used in the program or the dictionary. This list
|
|
is only for Version 1.97E. Other versions may be somewhat different. This
|
|
may make their dictionaries incompatible with the present program.
|
|
<P>
|
|
NOTE: in print dictionaries certain information is conveyed by font
|
|
encoding, e.g., the use of bold face or italics. There is no system
|
|
independent method of displaying such on computers (although individual
|
|
programs can handle these, each in it own unique way). WORDS uses capital
|
|
letters to express some such differences, which method is system independent
|
|
in present usage.
|
|
|
|
<PRE><TT>
|
|
type PART_OF_SPEECH_TYPE
|
|
X, -- all, none, or unknown
|
|
N, -- Noun
|
|
PRON, -- PRONoun
|
|
PACK, -- PACKON -- artificial for code
|
|
ADJ, -- ADJective
|
|
NUM, -- NUMeral
|
|
ADV, -- ADVerb
|
|
V, -- Verb
|
|
VPAR, -- Verb PARticiple
|
|
SUPINE, -- SUPINE
|
|
PREP, -- PREPosition
|
|
CONJ, -- CONJunction
|
|
INTERJ, -- INTERJection
|
|
TACKON, -- TACKON -- artificial for code
|
|
PREFIX, -- PREFIX -- here artificial for code
|
|
SUFFIX -- SUFFIX -- here artificial for code
|
|
|
|
type GENDER_TYPE
|
|
X, -- all, none, or unknown
|
|
M, -- Masculine
|
|
F, -- Feminine
|
|
N, -- Neuter
|
|
C -- Common (masculine and/or feminine)
|
|
|
|
type CASE_TYPE
|
|
X, -- all, none, or unknown
|
|
NOM, -- NOMinative
|
|
VOC, -- VOCative
|
|
GEN, -- GENitive
|
|
LOC, -- LOCative
|
|
DAT, -- DATive
|
|
ABL, -- ABLative
|
|
ACC -- ACCusitive
|
|
|
|
type NUMBER_TYPE
|
|
X, -- all, none, or unknown
|
|
S, -- Singular
|
|
P -- Plural
|
|
|
|
type PERSON_TYPE is range 0..3;
|
|
|
|
type COMPARISON_TYPE
|
|
X, -- all, none, or unknown
|
|
POS, -- POSitive
|
|
COMP, -- COMParative
|
|
SUPER -- SUPERlative
|
|
|
|
type NUMERAL_SORT_TYPE
|
|
X, -- all, none, or unknown
|
|
CARD, -- CARDinal
|
|
ORD, -- ORDinal
|
|
DIST, -- DISTributive
|
|
ADVERB -- numeral ADVERB
|
|
|
|
type TENSE_TYPE
|
|
X, -- all, none, or unknown
|
|
PRES, -- PRESent
|
|
IMPF, -- IMPerFect
|
|
FUT, -- FUTure
|
|
PERF, -- PERFect
|
|
PLUP, -- PLUPerfect
|
|
FUTP -- FUTure Perfect
|
|
|
|
type VOICE_TYPE
|
|
X, -- all, none, or unknown
|
|
ACTIVE, -- ACTIVE
|
|
PASSIVE -- PASSIVE
|
|
|
|
type MOOD_TYPE
|
|
X, -- all, none, or unknown
|
|
IND, -- INDicative
|
|
SUB, -- SUBjunctive
|
|
IMP, -- IMPerative
|
|
INF, -- INFinative
|
|
PPL -- ParticiPLe
|
|
|
|
type NOUN_KIND_TYPE
|
|
X, -- unknown, nondescript
|
|
S, -- Singular "only" -- not really used
|
|
M, -- plural or Multiple "only" -- not really used
|
|
A, -- Abstract idea
|
|
G, -- Group/collective Name -- Roman(s)
|
|
N, -- proper Name
|
|
P, -- a Person
|
|
T, -- a Thing
|
|
L, -- Locale, name of country/city
|
|
W -- a place Where
|
|
|
|
type PRONOUN_KIND_TYPE
|
|
X, -- unknown, nondescript
|
|
PERS, -- PERSonal
|
|
REL, -- RELative
|
|
REFLEX, -- REFLEXive
|
|
DEMONS, -- DEMONStrative
|
|
INTERR, -- INTERRogative
|
|
INDEF, -- INDEFinite
|
|
ADJECT -- ADJECTival
|
|
|
|
type VERB_KIND_TYPE
|
|
X, -- all, none, or unknown
|
|
TO_BE, -- only the verb TO BE (esse)
|
|
TO_BEING, -- compounds of the verb to be (esse)
|
|
GEN, -- verb taking the GENitive
|
|
DAT, -- verb taking the DATive
|
|
ABL, -- verb taking the ABLative
|
|
TRANS, -- TRANSitive verb
|
|
INTRANS, -- INTRANSitive verb
|
|
IMPERS, -- IMPERSonal verb (implied subject 'it', 'they', 'God')
|
|
-- agent implied in action, subject in predicate
|
|
DEP, -- DEPonent verb
|
|
-- only passive form but with active meaning
|
|
SEMIDEP, -- SEMIDEPonent verb (forms perfect as deponent)
|
|
-- (perfect passive has active force)
|
|
PERFDEF -- PERFect DEFinite verb
|
|
-- having only perfect stem, but with present force
|
|
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
The KIND_TYPEs represent various aspects of a word which may be useful to
|
|
some program, not necessarily the present one. They were put in for
|
|
various reasons, and later versions may change the selection and use.
|
|
Some of the KIND flags are never used. In some cases more than one KIND
|
|
flag might be appropriate, but only one is selected. Some seemed to be a
|
|
good idea at one time, but have not since proved out. The lists above are
|
|
just for completeness.
|
|
<P>
|
|
NOUN KIND is used in trimming (when set) the output and removing possibly
|
|
spurious cases (locative for a person, but preserving the vocative).
|
|
<P>
|
|
VERB KIND allows examples (when set) to give a more reasonable meaning. A
|
|
DEP flag allows the example to reflect active meaning for passive form.
|
|
It also allows the dictionary form to be constructed properly from stems.
|
|
TRANS/INTRANS were included to allow a further program a hint as to what
|
|
kind of object it should expect. This flag is only now being fixed during
|
|
the update. There are some verbs which, although mostly used in one way,
|
|
might be either. These are assigned X rather than breaking into two
|
|
entries. This would be of no particular use at this point since it would
|
|
not allow the object to be determined. GEN/DAT/ABL flags have related
|
|
function, but are almost absent. TO_BE is used to indicate that a form of
|
|
esse may be part of a compound verb tense with a participle. TO_BEING
|
|
indicates a verb related to esse (e.g., abesse) which has no object,
|
|
neither is in used to form compounds. IMPERS is used to weed out person
|
|
and forms inappropriate to an impersonal verb, and to insert a special
|
|
meaning distinct from a general form associated with the same verb stem.
|
|
|
|
<P>There is a problem in that all values for this parameter are not orthogonal.
|
|
DEP is a different sort of thing from INTRANS. There ought to be a
|
|
KIND_1 and KIND_2 to separate the different classes. However, this would
|
|
be overkill considering the use made of this parameter, so far.
|
|
|
|
|
|
<P>There is a more difficult DEP problem.
|
|
'Good Latin' requires that the DEP be recognized and
|
|
processed to eliminate active forms.
|
|
In some cases there are dictionary examples, mostly medieval,
|
|
of the depondency being violated.
|
|
Some of those cases have been recognized with a separate entry.
|
|
This is not something that a suffix can handle appropriately,
|
|
even if mechanically it can function.
|
|
A better way might be to include the perfect form but still have the DEP flag,
|
|
thereby allow the trimming in most cases. This has not been done yet.
|
|
But an active form would be recognized if input, especially if the text is medieval.
|
|
|
|
<P>
|
|
NUMERAL KIND and VALUE are used by the program in constructing the meaning line.
|
|
<BR>
|
|
|
|
<A NAME="Help for Parameters">
|
|
<H4>Help for Parameters</H4></A>
|
|
|
|
<P>
|
|
One can CHANGE_PARAMETERS by inputting a '#' [number sign] character (ASCII
|
|
35) as the input word, followed by a return. (Note that this has changed
|
|
from early versions in which '?' was used.) Each parameter is listed and
|
|
the user is offered the opportunity to change it from the current value by
|
|
answering Y or N (any case). For each parameter there is some explanation
|
|
or help. This is displayed by in putting a '?' [question mark], followed
|
|
by a return. HINT: While going down the list if one has made all the
|
|
changes desired, one need not continue to the end. Just enter a space and
|
|
then give a return. The program will interpret this as an illegal entry
|
|
(not Y or N) and will cancel the rest of the list, while retaining any
|
|
changes made to that point.
|
|
|
|
<P>Some parameters may not function in the English mode, nor is the documentation
|
|
necessarily complete,
|
|
|
|
<P>
|
|
The various help displays are listed here:
|
|
|
|
<PRE><TT>
|
|
|
|
TRIM_OUTPUT_HELP
|
|
This option instructs the program to remove from the output list of
|
|
possible constructs those which are least likely. There is now a fair
|
|
amount of trimming, killing LOC and VOC plus removing Uncommon and
|
|
non-classical (Archaic/Medieval) when more common results are found
|
|
and this action is requested (turn it off in MDV (!) parameters).
|
|
When a TRIM has been done, the output is followed by an asterix (*).
|
|
There certainly is no absolute assurence that the items removed are
|
|
not correct, just that they are statistically less likely.
|
|
Note that poets are likely to employ unusual words and inflections for
|
|
various reasons. These may be trimmed out if this parameter in on.
|
|
When in English mode, trim just reduces the output to the top six
|
|
results, if there are that many. Asterix means there are more
|
|
The default is Y(es)
|
|
|
|
HAVE_OUTPUT_FILE_HELP
|
|
This option instructs the program to create a file which can hold the
|
|
output for later study, otherwise the results are just displayed on
|
|
the screen. The output file is named WORD.OUT
|
|
This means that one run will necessarily overwrite a previous run,
|
|
unless the previous results are renamed or copied to a file of another
|
|
name. This is available if the METHOD is INTERACTIVE, no parameters.
|
|
The default is N(o), since this prevents the program from overwriting
|
|
previous work unintentionally. Y(es) creates the output file.
|
|
|
|
WRITE_OUTPUT_TO_FILE_HELP
|
|
This option instructs the program, when HAVE_OUTPUT_FILE is on, to
|
|
write results to the WORD.OUT file.
|
|
This option may be turned on and off during running of the program,
|
|
thereby capturing only certain desired results. If the option
|
|
HAVE_OUTPUT_FILE is off, the user will not be given a chance to turn
|
|
this one on. Only for INTERACTIVE running. Default is N(o).
|
|
This works in English mode, but output in somewhat diffeent so far.
|
|
|
|
DO_UNKNOWNS_ONLY_HELP
|
|
This option instructs the program to only output those words that it
|
|
cannot resolve. Of course, it has to do processing on all words, but
|
|
those that are found (with prefix/suffix, if that option in on) will
|
|
be ignored. The purpose of this option is t allow a quick look to
|
|
determine if the dictionary and process is going to do an acceptable
|
|
job on the current text. It also allows the user to assemble a list
|
|
of unknown words to look up manually, and perhaps augment the system
|
|
dictionary. For those purposes, the system is usually run with the
|
|
MINIMIZE_OUTPUT option, just producing a list. Another use is to run
|
|
without MINIMIZE to an output file. This gives a list of the input
|
|
text with the unknown words, by line. This functions as a spelling
|
|
checker for Latin texts. The default is N(o).
|
|
This does not work in English mode, but may in the future.
|
|
|
|
WRITE_UNKNOWNS_TO_FILE_HELP
|
|
This option instructs the program to write all unresolved words to a
|
|
UNKNOWNS file named WORD.UNK
|
|
With this option on , the file of unknowns is written, even though
|
|
the main output contains both known and unknown (unresolved) words.
|
|
One may wish to save the unknowns for later analysis, testing, or to
|
|
form the basis for dictionary additions. When this option is turned
|
|
on, the UNKNOWNS file is written, destroying any file from a previous
|
|
run. However, the write may be turned on and off during a single run
|
|
without destroying the information written in that run.
|
|
This option is for specialized use, so its default is N(o).
|
|
This does not work in English mode, but may in the future.
|
|
|
|
IGNORE_UNKNOWN_NAMES_HELP
|
|
This option instructs the program to assume that any capitalized word
|
|
longer than three letters is a proper name. As no dictionary can be
|
|
expected to account for many proper names, many such occur that would
|
|
be called UNKNOWN. This contaminates the output in most cases, and
|
|
it is often convenient to ignore these sperious UNKNOWN hits. This
|
|
option implements that mode, and calls such words proper names.
|
|
Any proper names that are in the dictionary are handled in the normal
|
|
manner. The default is Y(es).
|
|
|
|
IGNORE_UNKNOWN_CAPS_HELP
|
|
This option instructs the program to assume that any all caps word
|
|
is a proper name or similar designation. This convention is often
|
|
used to designate speakers in a discussion or play. No dictionary can
|
|
claim to be exaustive on proper names, so many such occur that would
|
|
be called UNKNOWN. This contaminates the output in most cases, and
|
|
it is often convenient to ignore these sperious UNKNOWN hits. This
|
|
option implements that mode, and calls such words names. Any similar
|
|
designations that are in the dictionary are handled in the normal
|
|
manner, as are normal words in all caps. The default is Y(es).
|
|
|
|
DO_COMPOUNDS_HELP
|
|
This option instructs the program to look ahead for the verb TO_BE (or
|
|
iri) when it finds a verb participle, with the expectation of finding
|
|
a compound perfect tense or periphastic. This option can also be a
|
|
trimming of the output, in that VPAR that do not fit (not NOM) will be
|
|
excluded, possible interpretations are lost. Default choice is Y(es).
|
|
This processing is turned off with the choice of N(o).
|
|
|
|
DO_FIXES_HELP
|
|
This option instructs the program, when it is unable to find a proper
|
|
match in the dictionary, to attach various prefixes and suffixes and
|
|
try again. This effort is successful in about a quarter of the cases
|
|
which would otherwise give UNKNOWN results, or so it seems in limited
|
|
tests. For those cases in which a result is produced, about half give
|
|
easily interpreted output; many of the rest are etymologically true,
|
|
but not necessarily obvious; about a tenth give entirely spurious
|
|
derivations. The user must proceed with caution.
|
|
The default choice is Y(es), since the results are generally useful.
|
|
This processing can be turned off with the choice of N(o).
|
|
|
|
DO_TRICKS_HELP
|
|
This option instructs the program, when it is unable to find a proper
|
|
match in the dictionary, and after various prefixes and suffixes, to
|
|
try every dirty Latin trick it can think of, mainly common letter
|
|
replacements like cl -> cul, vul -> vol, ads -> ass, inp -> imp, etc.
|
|
Together these tricks are useful, but may give false positives (>10%).
|
|
They provide for recognized varients in classical spelling. Most of
|
|
the texts with which this program will be used have been well edited
|
|
and standardized in spelling. Now, moreover, the dictionary is being
|
|
populated to such a state that the hit rate on tricks has fallen to a
|
|
low level. It is very seldom productive, and it is always expensive.
|
|
The only excuse for keeping it as default is that now the dictionary
|
|
is quite extensive and misses are rare. Default is now Y(es). ) ;
|
|
|
|
DO_DICTIONARY_FORMS_HELP
|
|
This option instructs the program to output a line with the forms
|
|
normally associated with a dictionary entry (NOM and GEN of a noun,
|
|
the four principal parts of a verb, M-F-N NOM of an adjective, ...).
|
|
This occurs when there is other output (i.e., not with UNKNOWNS_ONLY).
|
|
The default choice is N(o), but it can be turned on with a Y(es).
|
|
|
|
SHOW_AGE_HELP
|
|
This option causes a flag, like '<Late>' to appear for inflection or
|
|
form in the output. The AGE indicates when this word/inflection was
|
|
in use, at least from indications is dictionary citations. It is
|
|
just an indication, not controlling, useful when there are choices.
|
|
No indication means that it is common throughout all periods.
|
|
The default choice is Y(es), but it can be turned off with a N(o).
|
|
|
|
SHOW_FREQUENCY_HELP
|
|
This option causes a flag, like '<rare>' to appear for inflection or
|
|
form in the output. The FREQ is indicates the relative usage of the
|
|
word or inflection, from indications is dictionary citations. It is
|
|
just an indication, not controlling, useful when there are choices.
|
|
No indication means that it is common throughout all periods.
|
|
The default choice is Y(es), but it can be turned off with a N(o).
|
|
|
|
DO_EXAMPLES_HELP
|
|
This option instructs the program to provide examples of usage of the
|
|
cases/tenses/etc. that were constructed. The default choice is N(o).
|
|
This produces lengthly output and is turned on with the choice Y(es).
|
|
|
|
DO_ONLY_MEANINGS_HELP
|
|
This option instructs the program to only output the MEANING for a
|
|
word, and omit the inflection details. This is primarily used in
|
|
analyzing new dictionary material, comparing with the existing.
|
|
However it may be of use for the translator who knows most all of
|
|
the words and just needs a little reminder for a few.
|
|
The default choice is N(o), but it can be turned on with a Y(es).
|
|
|
|
DO_STEMS_FOR_UNKNOWN_HELP
|
|
This option instructs the program, when it is unable to find a proper
|
|
match in the dictionary, and after various prefixes and suffixes, to
|
|
list the dictionary entries around the unknown. This will likely
|
|
catch a substantive for which only the ADJ stem appears in dictionary,
|
|
an ADJ for which there is only a N stem, etc. This option should
|
|
probably only be used with individual UNKNOWN words, and off-line
|
|
from full translations, therefore the default choice is N(o).
|
|
This processing can be turned on with the choice of Y(es).
|
|
|
|
SAVE_PARAMETERS_HELP
|
|
This option instructs the program, to save the current parameters, as
|
|
just established by the user, in a file WORD.MOD. If such a file
|
|
exists, the program will load those parameters at the start. If no
|
|
such file can be found in the current subdirectory, the program will
|
|
start with a default set of parameters. Since this parameter file is
|
|
human-readable ASCII, it may also be created with a text editor. If
|
|
the file found has been improperly created, is in the wrong format, or
|
|
otherwise uninterpretable by the program, it will be ignored and the
|
|
default parameters used, until a proper parameter file in written by
|
|
the program. Since one may want to make temporary changes during a
|
|
run, but revert to the usual set, the default is N(o).
|
|
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
There is also a set of DEVELOPER_PARAMETERS that are unlikely to be of
|
|
interest to the normal user. Some of these facilities may be disconnected
|
|
or not work for other reasons. Additional parameters may be included
|
|
without notice or documentation. The HELP may be the most reliable
|
|
source of information. These parameters are mostly for the use in the
|
|
development process. These may be changed or examined by in similar
|
|
change procedure by inputting a '!' [exclamation sign] character, followed
|
|
by a return.
|
|
|
|
<PRE><TT>
|
|
HAVE_STATISTICS_FILE_HELP
|
|
This option instructs the program to create a file which can hold
|
|
certain statistical information about the process. The file is
|
|
overwritten for new invocation of the program, so old data must be
|
|
explicitly saved if it is to be retained. The statistics are in TEXT
|
|
format. The statistics file is named WORD.STA
|
|
This information is only of development use, so the default is N(o).
|
|
|
|
WRITE_STATISTICS_FILE_HELP
|
|
This option instructs the program, with HAVE_STATISTICS_FILE, to put
|
|
derived statistics in a file named WORD.STA
|
|
This option may be turned on and off while running of the program,
|
|
thereby capturing only certain desired results. The file is reset at
|
|
each invocation of the program, if the HAVE_STATISTICS_FILE is set.
|
|
If the option HAVE_STATISTICS_FILE is off, the user will not be given
|
|
a chance to turn this one on. Default is N(o).
|
|
|
|
SHOW_DICTIONARY_HELP
|
|
This option causes a flag, like 'GEN>' to be put before the meaning
|
|
in the output. While this is useful for certain development purposes,
|
|
it forces off a few characters from the meaning, and is really of no
|
|
interest to most users.
|
|
The default choice is N(o), but it can be turned on with a Y(es).
|
|
|
|
SHOW_DICTIONARY_LINE_HELP
|
|
This option causes the number of the dictionary line for the current
|
|
meaning to be output. This is of use to no one but the dictionary
|
|
maintainer. The default choice is N(o). It is activated by Y(es).
|
|
|
|
SHOW_DICTIONARY_CODES_HELP
|
|
This option causes the codes for the dictionary entry for the current
|
|
meaning to be output. This may not be useful to any but the most
|
|
involved user. The default choice is N(o). It is activated by Y(es).
|
|
|
|
DO_PEARSE_CODES_HELP
|
|
This option causes special codes to be output flagging the different
|
|
kinds of output lines. 01 for forms, 02 for dictionary forms, and
|
|
03 for meaning. The default choice is N(o). It is activated by Y(es).
|
|
There are no Pearse codes in English mode.
|
|
|
|
DO_ONLY_INITIAL_WORD_HELP
|
|
This option instructs the program to only analyze the initial word on
|
|
each line submitted. This is a tool for checking and integrating new
|
|
dictionary input, and will be of no interest to the general user.
|
|
The default choice is N(o), but it can be turned on with a Y(es).
|
|
|
|
FOR_WORD_LIST_CHECK_HELP
|
|
This option works in conjunction with DO_ONLY_INITIAL_WORD to allow
|
|
the processing of scanned dictionarys or text word lists. It accepts
|
|
only the forms common in dictionary entries, like NOM S for N or ADJ,
|
|
or PRES ACTIVE IND 1 S for V. It is be used only with DO_INITIAL_WORD
|
|
The default choice is N(o), but it can be turned on with a Y(es).
|
|
|
|
DO_ONLY_FIXES_HELP
|
|
This option instructs the program to ignore the normal dictionary
|
|
search and to go direct to attach various prefixes and suffixes before
|
|
processing. This is a pure research tool. It allows one to examine
|
|
the coverage of pure stems and dictionary primary compositions.
|
|
This option is only available if DO_FIXES is turned on.
|
|
This is entirely a development and research tool, not to be used in
|
|
conventional translation situations, so the default choice is N(o).
|
|
This processing can be turned on with the choice of Y(es).
|
|
|
|
DO_FIXES_ANYWAY_HELP
|
|
This option instructs the program to do both the normal dictionary
|
|
search and then process for the various prefixes and suffixes too.
|
|
This is a pure research tool allowing one to consider the possibility
|
|
of strange constructions, even in the presence of conventional
|
|
results, e.g., alte => deeply (ADV), but al+t+e => wing+ed (ADJ VOC)
|
|
(If multiple suffixes were supported this could also be wing+ed+ly.)
|
|
This option is only available if DO_FIXES is turned on.
|
|
This is entirely a development and research tool, not to be used in
|
|
conventional translation situations, so the default choice is N(o).
|
|
This processing can be turned on with the choice of Y(es).
|
|
------ PRESENTLY NOT IMPLEMENTED ------
|
|
|
|
USE_PREFIXES_HELP
|
|
This option instructs the program to implement prefixes from ADDONS
|
|
whenever and wherever FIXES are called for. The purpose of this
|
|
option is to allow some flexibility while the program in running to
|
|
select various combinations of fixes, to turn them on and off,
|
|
individually as well as collectively. This is an option usually
|
|
employed by the developer while experimenting with the ADDONS file.
|
|
This option is only effective in connection with DO_FIXES.
|
|
This is primarily a development tool, so the conventional user should
|
|
probably maintain the default choice of Y(es).
|
|
|
|
USE_SUFFIXES_HELP
|
|
This option instructs the program to implement suffixes from ADDONS
|
|
whenever and wherever FIXES are called for. The purpose of this
|
|
option is to allow some flexibility while the program in running to
|
|
select various combinations of fixes, to turn them on and off,
|
|
individually as well as collectively. This is an option usually
|
|
employed by the developer while experimenting with the ADDONS file.
|
|
This option is only effective in connection with DO_FIXES.
|
|
This is primarily a development tool, so the conventional user should
|
|
probably maintain the default choice of Y(es).
|
|
|
|
USE_TACKONS_HELP
|
|
This option instructs the program to implement TACKONS from ADDONS
|
|
whenever and wherever FIXES are called for. The purpose of this
|
|
option is to allow some flexibility while the program in running to
|
|
select various combinations of fixes, to turn them on and off,
|
|
individually as well as collectively. This is an option usually
|
|
employed by the developer while experimenting with the ADDONS file.
|
|
This option is only effective in connection with DO_FIXES.
|
|
This is primarily a development tool, so the conventional user should
|
|
probably maintain the default choice of Y(es).
|
|
|
|
DO_MEDIEVAL_TRICKS_HELP
|
|
This option instructs the program, when it is unable to find a proper
|
|
match in the dictionary, and after various prefixes and suffixes, and
|
|
tring every Classical Latin trick it can think of, to go to a few that
|
|
are usually only found in medieval Latin, replacements of caul -> col,
|
|
st -> est, z -> di, ix -> is, nct -> nt. It also tries some things
|
|
like replacing doubled consonants in classical with a single one.
|
|
Together these tricks are useful, but may give false positives (>20%).
|
|
This option is only available if the general DO_TRICKS is chosen.
|
|
If the text is late or medieval, this option is much more useful than
|
|
tricks for classical. The dictionary can never contain all spelling
|
|
variations found in medieval Latin, but some constructs are common.
|
|
The default choice is N(o), since the results are iffy, medieval only,
|
|
and expensive. This processing is turned on with the choice of Y(es).
|
|
|
|
DO_SYNCOPE_HELP
|
|
This option instructs the program to postulate that syncope of
|
|
perfect stem verbs may have occured (e.g, aver -> ar in the perfect),
|
|
and to try various possibilities for the insertion of a removed 'v'.
|
|
To do this it has to fully process the modified candidates, which can
|
|
have a consderable impact on the speed of processind a large file.
|
|
However, this trick seldom producesa false positive, and syncope is
|
|
very common in Latin (first year texts excepted). Default is Y(es).
|
|
This processing is turned off with the choice of N(o).
|
|
|
|
DO_TWO_WORDS_HELP
|
|
There are some few common Lain expressions that combine two inflected
|
|
words (e.g. respublica, paterfamilias). There are numerous examples
|
|
of numbers composed of two words combined together.
|
|
Sometimes a text or inscription will have words run together.
|
|
When WORDS is unable to reach a satisfactory solution with all other
|
|
tricks, as a last stab it will try to break the input into two words.
|
|
This most often fails. Even if mechnically successful, the result is
|
|
usually false and must be examined by the user. If the result is
|
|
correct, it is probably clear to the user. Otherwise, beware.
|
|
This problem will not occur for a well edited text, such as one will
|
|
find on your Latin exam, but sometimes with raw text.
|
|
Since this is a last chanceand infrequent, the default is Y(es);
|
|
This processing is turned off with the choice of N(o).
|
|
|
|
INCLUDE_UNKNOWN_CONTEXT_HELP
|
|
This option instructs the program, when writing to an UNKNOWNS file,
|
|
to put out the whole context of the UNKNOWN (the whole input line on
|
|
which the UNKNOWN was found). This is appropriate for processing
|
|
large text files in which it is expected that there will be relatively
|
|
few UNKNOWNS. The main use at the moment is to provide display
|
|
of the input line on the output file in the case of UNKNOWNS_ONLY.
|
|
|
|
NO_MEANINGS_HELP
|
|
This option instructs the program to omit putting out meanings.
|
|
This is only useful for certain dictionary maintenance procedures.
|
|
The combination not DO_DICTIONARY_FORMS, MEANINGS_ONLY, NO_MEANINGS
|
|
results in no visible output, except spacing lines. Default is N)o.
|
|
|
|
OMIT_ARCHAIC_HELP
|
|
THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET!
|
|
This option instructs the program to omit inflections and dictionary
|
|
entries with an AGE code of A (Archaic). Archaic results are rarely
|
|
of interest in general use. If there is no other possible form, then
|
|
the Archaic (roughly defined) will be reported. The default is Y(es).
|
|
|
|
OMIT_MEDIEVAL_HELP
|
|
THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET!
|
|
This option instructs the program to omit inflections and dictionary
|
|
entries with AGE codes of E or later, those not in use in Roman times.
|
|
While later forms and words are a significant application, most users
|
|
will not want them. If there is no other possible form, then the
|
|
Medieval (roughly defined) will be reported. The default is Y(es).
|
|
|
|
OMIT_UNCOMMON_HELP
|
|
THIS OPTION IS CAN ONLY BE ACTIVE IF WORDS_MODE(TRIM_OUTPUT) IS SET!
|
|
This option instructs the program to omit inflections and dictionary
|
|
entries with FREQ codes indicating that the selection is uncommon.
|
|
While these forms area significant feature of the program, many users
|
|
will not want them. If there is no other possible form, then the
|
|
uncommon (roughly defined) will be reported. The default is Y(es).
|
|
|
|
DO_I_FOR_J_HELP
|
|
This option instructs the program to modify the output so that the j/J
|
|
is represented as i/I. The consonant i was writen as j in cursive in
|
|
Imperial times and called i longa, and often rendered as j in medieval
|
|
times. The capital is usually rendered as I, as in inscriptions.
|
|
If this is NO/FALSE, the output will have the same character as input.
|
|
The program default, and the dictionary convention is to retain the j.
|
|
Reset if this ia unsuitable for your application. The default is N(o).
|
|
|
|
DO_U_FOR_V_HELP
|
|
This option instructs the program to modify the output so that the u
|
|
is represented as v. The consonant u was writen sometimes as uu.
|
|
The pronounciation was as current w, and important for poetic meter.
|
|
With the printing press came the practice of distinguishing consonant
|
|
u with the character v, and was common for centuries. The practice of
|
|
using only u has been adopted in some 20th century publications (OLD),
|
|
but it is confusing to many modern readers. The capital is commonly
|
|
V in any case, as it was and is in inscriptions (easier to chisel).
|
|
If this is NO/FALSE, the output will have the same character as input.
|
|
The program default, and the dictionary convention is to retain the v.
|
|
Reset If this ia unsuitable for your application. The default is N(o).
|
|
|
|
PAUSE_IN_SCREEN_OUTPUT_HELP
|
|
This option instructs the program to pause in output on the screen
|
|
after about 16 lines so that the user can read the output, otherwise
|
|
it would just scroll off the top. A RETURN/ENTER gives another page.
|
|
If the program is waiting for a return, it cannot take other input.
|
|
This option is active only for keyboard entry or command line input,
|
|
and only when there is no output file. It is moot if only single word
|
|
input or brief output. The default is Y(es).
|
|
|
|
NO_SCREEN_ACTIVITY_HELP
|
|
This option instructs the program not to keep a running screen of the
|
|
input. This is probably only to be used by the developer to calibrate
|
|
run times for large text file input, removing the time necessary to
|
|
write to screen. The default is N(o).
|
|
|
|
UPDATE_LOCAL_DICTIONARY_HELP
|
|
This option instructs the program to invite the user to input a new
|
|
word to the local dictionary on the fly. This is only active if the
|
|
program is not using an (@) input file! If an UNKNOWN is discovered,
|
|
the program asks for STEM, PART, and MEAN, the basic elements of a
|
|
dictionary entry. These are put into the local dictionary right then,
|
|
and are available for the rest of the session, and all later sessions.
|
|
The use of this option requires a detailed knowledge of the structure
|
|
of dictionary entries, and is not for the average user. If the entry
|
|
is not valid, reloading the dictionary will raise and exception, and
|
|
the invalid entry will be rejected, but the program will continue
|
|
without that word. Any invalid entries can be corrected or deleted
|
|
off-line with a text editor on the local dictionary file. If one does
|
|
not want to enter a word when this option is on, a simple RETURN at
|
|
the STEM=> prompt will ignore and continue the program. This option
|
|
is only for very experienced users and should normally be off.
|
|
The default is N(o).
|
|
------ NOT AVAILABLE IN THIS VERSION -------
|
|
|
|
UPDATE_MEANINGS_HELP
|
|
This option instructs the program to invite the user to modify the
|
|
meaning displayed on a word translation. This is only active if the
|
|
program is not using an (@) input file! These changes are put into
|
|
the dictionary right then and permenently, and are available from
|
|
then on, in this session, and all later sessions. Unfortunately,
|
|
these changes will not survive the replacement of the dictionary by a
|
|
new version from the developer. Changes can only be recovered by
|
|
considerable prcessing by the deneloper, and should be left there.
|
|
This option is only for experienced users and should remain off.
|
|
The default is N(o).
|
|
------ NOT AVAILABLE IN THIS VERSION -------
|
|
|
|
MINIMIZE_OUTPUT_HELP
|
|
This option instructs the program to minimize the output. This is a
|
|
somewhat flexible term, but the use of this option will probably lead
|
|
to less output. The default is Y(es).
|
|
|
|
SAVE_PARAMETERS_HELP
|
|
This option instructs the program, to save the current parameters, as
|
|
just established by the user, in a file WORD.MDV. If such a file
|
|
exists, the program will load those parameters at the start. If no
|
|
such file can be found in the current subdirectory, the program will
|
|
start with a default set of parameters. Since this parameter file is
|
|
human-readable ASCII, it may also be created with a text editor. If
|
|
the file found has been improperly created, is in the wrong format, or
|
|
otherwise uninterpretable by the program, it will be ignored and the
|
|
default parameters used, until a proper parameter file in written by
|
|
the program. Since one may want to make temporary changes during a
|
|
run, but revert to the usual set, the default is N(o).
|
|
</TT></PRE>
|
|
|
|
<A NAME="Special Cases">
|
|
<H4>Special Cases</H4></A>
|
|
<P>
|
|
Some adjectives have no conventional positive forms (either missing or
|
|
undeclined), or the POS forms have more than one COMP/SUPER. In these few
|
|
cases, the individual COMP or SUPER form is entered separately. Since it
|
|
is not directly connected with a POS form, and only the POS forms have
|
|
different numbered declensions, the special form is given a declension of
|
|
(0, 0). An additional consequence is that the dictionary form in output
|
|
is only for the COMP/SUPER, and does not reflect all comparisons.
|
|
|
|
<A NAME="Uniques">
|
|
<H4>Uniques</H4></A>
|
|
<P>
|
|
There are some irregular situations which are not convenient to handle
|
|
through the general algorithms. For these a UNIQUES file and procedure
|
|
was established. The number of these special cases is less than one
|
|
hundred, but may increase as new situations arise, and decrease as
|
|
algorithms provide better coverage. The user will not see much
|
|
difference, except in that no dictionary forms are available for these
|
|
unique words.
|
|
|
|
<A NAME="Tricks">
|
|
<H4>Tricks</H4></A>
|
|
<P>
|
|
There are a number of situations in Latin writing where certain
|
|
modifications or conventions regularly are found. While often found,
|
|
these are not the normal classical forms. If a conventional match is not
|
|
found, the program may be instructed to TRY_TRICKS. Below is a partial
|
|
list of current tricks. The syncopated form of the perfect often drops
|
|
the 'v' and loses the vowel. An initial 'a' followed by a double letter
|
|
often is used for an 'ad' prefix, likewise an initial 'ad' prefix is often
|
|
replaced by an 'a' followed by a double letter. An initial 'i' followed
|
|
by a double letter often is used for an 'in' prefix, likewise an initial
|
|
'in' prefix is often replaced by an 'i' followed by a double letter. A
|
|
leading 'inp' could be an 'imp'. A leading 'obt' could be an 'opt'. An
|
|
initial 'har...' or 'hal...' may be rendered by an 'ar' or 'al', likewise
|
|
the dictionary entry may have 'ar'/'al' and the trial word begin with
|
|
'ha...'. An initial 'c' could be a 'k', or the dictionary entry uses 'c'
|
|
for 'k'. A nonterminal 'ae' is often rendered by an 'e'. An initial 'E'
|
|
can replace an 'Ae'. An 'iis...' beginning some forms of 'eo' may be
|
|
contracted to 'is...'. A nonterminal 'ii' is often replaced by just 'i';
|
|
including 'ji', since in this program and dictionary all 'j' are made 'i'.
|
|
A 'cl' could be a 'cul'. A 'vul' could be a 'vol'. and many others,
|
|
including a procedure to try to break the input word into two.
|
|
<P>
|
|
Various manipulations of 'u' and 'v' are possible: 'v' could be replaced
|
|
by 'u', like the new Oxford Latin Dictionary, leading 'U' could be
|
|
replaced by 'V', checking capitalization, all 'U's could have been
|
|
replaced by 'V', like stone cutting. Previous versions had various
|
|
kludges attempting to calculate the correct interpretation. They were
|
|
surprisingly good, but philosophically baseless and certainly failed in a
|
|
number of cases. The present version simply considers 'u' and 'v' as the
|
|
same letter in parsing the word. However, the dictionary entries make the
|
|
distinction and this is reflected in the output.
|
|
<P>
|
|
Various combinations of these tricks are attempted, and each try that
|
|
results in a possible hit is run against the full dictionary, which can
|
|
make these efforts time consuming. That is a good reason to make the
|
|
dictionary as large as possible, rather than counting on a smaller number
|
|
of roots and doing the maximum word formation.
|
|
<P>
|
|
Finally, while the program could succeed on a word that requires two or
|
|
three of these tricks to work in combination, there are limits. Some
|
|
words for which all the modifications are supported will fail, if there
|
|
are just too many. In fact, it is probably better that that be the case,
|
|
otherwise one will generate too many false positives. Testing so far does
|
|
not seem to show excessive zeal on the part of the program, but the user
|
|
should examine the results, especially when several tricks are involved.
|
|
<P>
|
|
There is a basic conflict here. At the state of the 1.97E dictionary there
|
|
are so few words that both fail the main program and are caught by tricks
|
|
that this option could be defaulted to No. However, one could argue that
|
|
there will be very few occasions for trying TRICKS, so that the cost is
|
|
minimal. Unfortunately the degree of completeness of the dictionary for
|
|
classical latin does not carry over to medieval Latin. With the hope that
|
|
the program will become more useful in that area, the default has been
|
|
set to Yes, reflecting the philosophy early in the development
|
|
for classical Latin.
|
|
|
|
<A NAME="Trimming of uncommon results">
|
|
<H4>Trimming of uncommon results</H4></A>
|
|
<P>
|
|
Trimming has an impact on output. If TRIM_OUTPUT parameter is set, and
|
|
specific parameters set in the MDEV, the program will deprecate those
|
|
possible forms which come from archaic or medieval (non-classical) stems
|
|
or inflections, also stems or inflections which are relatively uncommon.
|
|
It will report such if no classical/common solutions are found. The
|
|
default is set for this, expecting that most users are students and
|
|
unlikely to encounter rare forms. Other users can set the parameters
|
|
appropriately for their situation.
|
|
<P>
|
|
This capability is preliminary. It is just becoming useful in that the
|
|
factors are set for about half the dictionary entries. There are still a
|
|
large number of entries and inflections that are not set and will continue
|
|
to be reported until determination of rarity is made.
|
|
<BR>
|
|
<BR>
|
|
|
|
<A NAME="GUIDING PHILOSOPHY">
|
|
<H3><CENTER>GUIDING PHILOSOPHY</CENTER></H3></A>
|
|
|
|
<A NAME="Purpose">
|
|
<H4>Purpose</H4></A>
|
|
<P>
|
|
The dictionary is intended as a help to someone who knows roughly enough
|
|
Latin for the document under study. It gives the accidence and meanings
|
|
possible for an input Latin word. It is for someone reading Latin text.
|
|
|
|
<P>
|
|
This is a translation dictionary. Mostly it provides individual words in
|
|
English that correspond to, and might be used in a translation of, words
|
|
in Latin text. The program assumes a fair command of English. This is in
|
|
contrast to a conventional same-language desktop dictionary which would
|
|
explain the meanings of words in the same language. The distinction may
|
|
be obvious but it is important. A Latin dictionary in medieval times
|
|
would have explanations in Latin of Latin words.
|
|
<P>
|
|
There are various approaches to the preparation of a dictionary. The most
|
|
scholarly might be to select only proper and correct entries, only correct
|
|
derivations, grammar, and spelling. This would be a dictionary for one
|
|
who wished to write 'correct' Latin. (Correct being defined as the way
|
|
Cicero, or your favorite writer or grammarian, used it.) The current
|
|
project has a different goal. This program is successful if a word found in
|
|
text is given an appropriate meaning, whether or not that word is spelled
|
|
in the generally approved way, or is 'good Latin'. Thus the program
|
|
includes various words and forms that may have been rejected by recent
|
|
scholars, but still appear in some texts. Philosophically, thus program
|
|
deals with Latin as it was, not as it should have been. I make no
|
|
corrections to Cicero, which some might have been tempted to do if
|
|
producing an academic dictionary instead of a program. Moreover I make no
|
|
corrections of St Jerome. If your copy of the Vulgate has a particular
|
|
spelling, that may be recognized by the program, either through a TRICK or
|
|
as a dictionary entry that I have generated.
|
|
<P>
|
|
A philosophical difference from many dictionary projects is that this one
|
|
has no firm model of the user or application. It is not limited to
|
|
classical Latin, or to 'good practice', or to common words, or to words
|
|
appearing in certain texts. As a result there will be a lot of chaff in
|
|
the output. Some of this may be trimmed out automatically if desired, but
|
|
it is there and available.
|
|
<P>
|
|
However inadequately, I hope to document decisions that went into the
|
|
arrangement of the program and dictionary. I am surprised that there is
|
|
little or no such information to the user of published dictionaries. If
|
|
others generate similar products, or use the data from this one, they can
|
|
do so in knowledge of how and why processes and forms were constructed.
|
|
<P>
|
|
I make few value judgments and those are mechanical, not scholarly, and
|
|
are documented herein. Nevertheless some may be inappropriate, in spite of
|
|
good intentions.
|
|
<BR>
|
|
|
|
<A NAME="Method">
|
|
|
|
<H4>Method</H4></A>
|
|
<P>
|
|
The program subtracts possible endings from an input words and searches a
|
|
list of stems, trying to make a match. If no exact match is possible, it
|
|
tries various modifications, beginning with prefixes and suffixes, and
|
|
eventually involving various regular spelling variations (or 'tricks')
|
|
common in classical and medieval Latin.
|
|
<P>
|
|
A choice was made that the base was classical Latin as defined by the
|
|
Oxford Latin Dictionary (OLD). Their primary time period is
|
|
arbitrary/roughly 100 BC to 100 AD.
|
|
<P>
|
|
The classical form of words is taken as the base. Modifications are in
|
|
such a way to correct to this base. Further additions to local
|
|
dictionaries should keep this in mind. Modifications are made to the
|
|
input words, not to the dictionary stems. It could be done the other way,
|
|
but the present situation was initially much easier. There are some
|
|
consequences of this approach. For instance, it is easy to remove an 'h'
|
|
from an input word to match with a stem. It is much more difficult (but
|
|
not impossible) to add 'h' in all possible positions to check against
|
|
stems.
|
|
<P>
|
|
It would be possible to match most words with a relatively smaller list of
|
|
stems (or roots) and generous application of word construction. This
|
|
approach is not followed. One difficulty is that while words may be
|
|
constructed correctly, and the underlying meaning to be found from this
|
|
construction, the common usage may be obscured by a formal interpretation
|
|
of the parts. In practice this occurs in 20-40% of the cases. This
|
|
method is still very useful in approaching a word for which there has been
|
|
no dictionary interpretation, but it puts a considerable burden on the
|
|
normal user. Further, in about 10% of constructions, the result is just
|
|
wrong.
|
|
<P>
|
|
In normal usage, if the program finds a simple match, it does not go
|
|
further and consider what constructed words might also be valid. (One can
|
|
override and force prefix/suffix construction with a switch, but one might
|
|
not want to force all possible tricks.)
|
|
<P>
|
|
For instance, if there is an adjective that matches, a corresponding
|
|
identically spelled, logically valid noun will not be reported unless it
|
|
is explicitly found in the dictionary, even though it could be constructed
|
|
or inferred from the adjective or constructed with a suffix from a verb in
|
|
the dictionary.
|
|
<P>
|
|
An exception to this is that enclitics (eg., -que) are always considered.
|
|
Coloque can be a verb or collo-que. The latter is in Virgil and should
|
|
not be omitted. Verb syncope is also favored. In the vast majority of
|
|
cases, if there is a possible syncope it is the correct parse. This is
|
|
given preference over word construction with suffix. Audii is syncope of
|
|
audivi, but it could also be aud-i-i. The latter is considered very
|
|
unlikely.
|
|
<P>
|
|
There are a large number of paths and possibilities. Choices have been
|
|
made in the code that result in the exclusion of some. It is hoped that
|
|
they were the best choices. The method was constructed by taking a number
|
|
of primary procedures and combining/assembling them in such a way as to
|
|
give reasonable parses for a number of test cases. Basicly, this is
|
|
hacking, but it might be considered and emperical starting point from
|
|
which one could construct a logical rationale.
|
|
<P>
|
|
Therefore, the philosophy is to populate the stem list as densely as
|
|
possible. Even easily resolved differences are included redundantly
|
|
(adligo as well as alligo - ad- is most of duplicates). The advantage is
|
|
that while regular single-letter modifications are fairly easy, and two
|
|
letter differences are possible (but more expensive), further deviations
|
|
are problematical. The better populated the stem list, the better the
|
|
chance of a result.
|
|
<P>
|
|
Even in easy cases the overpopulation is helpful. Antebasis is easily
|
|
parsed as ante-basis ('pedestal before', which is reasonable), but
|
|
inclusion as a separate word allows the additional information that it is
|
|
the hindmost pillar of the pedestal of a ballista.
|
|
<P>
|
|
The stem list is also populated with variants suggested by different
|
|
sources. The problem is that the remains of classical Latin have gone
|
|
through many monks along the way. These copyists may have made simple
|
|
mistakes (typos!), or have made what they thought were proper corrections
|
|
(spell checkers!). And twenty centuries later scholars work hard to
|
|
reassemble the best Latin to present in the dictionary. But a particular
|
|
document in the form presented to the reader may have have a variety of
|
|
spellings for exactly the same word in the same referenced passage
|
|
(Pliny's Natural History is often subject to this problem). (It may even
|
|
be that modern texts and dictionaries have misprints!) All forms found in
|
|
various dictionaries can be included, with the exception of those
|
|
explicitly labeled 'misread' (and the argument probably could mandate
|
|
their inclusion also). However, a single example of a variant in one case
|
|
will not be included as a dictionary entry. If such a word is
|
|
sufficiently important, if it is used frequently or by several authors, it
|
|
will be entered as a UNIQUE.
|
|
<P>
|
|
Lewis and Short seem to be more willing than the more recent Oxford Latin
|
|
Dictionary to raise a few examples of variation to an entry (at least an
|
|
alternate). Generally, I make an entry if some dictionary does so. But
|
|
within an entry I generate additional possible stems not noted elsewhere,
|
|
e.g., I expand first declension verbs with '-av' perfect stems, even
|
|
though no example exists in classical Latin. This is often the practice
|
|
in other dictionaries also.
|
|
|
|
<P>
|
|
Verb parts omitted from source dictionaries are mechanically added where it is clear,
|
|
(ex. where the base verb is documented, but parts are omitted in compounds).
|
|
Whether Cicero used them or not, some later text might.
|
|
|
|
<P>
|
|
In some cases I also have expanded adjectives and adverbs to include comparative
|
|
and superlative stems where they seem reasonable or have corresponding English
|
|
instances, even when there is no specific dictionary citation.
|
|
This effort was modivated primarily by finding examples of such comparisons
|
|
in processing of large amounts of text beyond the classical
|
|
works upon which authoritative dictionaries are based, but even classical
|
|
works yielded examples. The point is that, while these forms would usually be
|
|
caught by the word formation (prefix/suffix) process in the program,
|
|
the process is limited to how many operations can be done serially.
|
|
Having more/expended stems allows another level of word modification to be
|
|
implemented.
|
|
|
|
<P>Adjectives are extrapolated to COMP and SUPER where it makes sense
|
|
(when those meanings are reasonable, and in many cases they are not)
|
|
even if the source dictionary only lists POS.
|
|
They are expanded fully especially even when the source lists a COMP but no SUPER.
|
|
|
|
<P>
|
|
Perhaps a bit out of context, consider the common question of SECLORUM in
|
|
the Great Seal of the USA. This pure word in not in any dictionary I know of,
|
|
not the OLS or L+S. A simple trick gives seculorum (seculum = world),
|
|
but the favored translation is from the twice modified saeculorum (saeculum = age),
|
|
which would not be found by a minimalistic system.
|
|
<P>
|
|
It is often the practice in paper dictionaries to double up on an entry
|
|
that may be either adjective or noun, usually by leading with the
|
|
adjective and mentioning its use as a noun. A much larger set of
|
|
adjective/noun pairs is favored with separate entries. It is the
|
|
philosophy of this program to make separate entries whenever there is an
|
|
example in any reference dictionary. This might faciliate the task of a
|
|
larger translation program which would handle phrases or sentences.
|
|
However there has been no effort to explicitly generate such pair
|
|
expansion if there is no precedent, and the user must still recognize the
|
|
possibility of unexpanded multiple possibilities for substantives.
|
|
<P>
|
|
An argument against a large stem list is that it increases the storage
|
|
required (but this is extremely modest by current standards) and increases
|
|
processing time for search of the stems (this is far offset by the
|
|
processing which would be required to construct or analyze words working
|
|
from a smaller stem list).
|
|
<P>
|
|
A significant objection is that artifically generated stems may conflict with
|
|
real/common ones and produce false output confusing to the user. A certain
|
|
amount of this is eliminated by trimming the output to emphisize the most
|
|
probable results, but it is still a problem.
|
|
<P>
|
|
Perhaps a counterexample would be an inferred fourth stem to no/nare (swim).
|
|
Natus conflicts with the fourth stem of nascor (be produced/born) and the
|
|
nouns and adjectives stemming from it. The nare natus does not appear in
|
|
dictionaries, nor does it occur in compounds of nare, so it has been omitted
|
|
from the WORDS dictionary.
|
|
<P>
|
|
Additional parts of verbs are included (first conjugation is easily filled
|
|
out, even eccentric verbs if they are compounds of known parts), although
|
|
they may not have been found in any well known texts. Cases can be
|
|
logically constructed that are 'missing' in classical Latin. Verbs with
|
|
prefix can be expanded when the base is known. That a form has not been
|
|
found in surviving copies of classical texts does not mean that it was
|
|
not on the lips of every centurion and his girl friend, or that it might
|
|
not find its way into medieval texts.
|
|
<P>
|
|
It may be argued in some cases that forms are missing because their
|
|
pronounciation would be awkward. This may well be true when Cicero is the
|
|
arbiter, but others may not be so elegant. Moreover, much of the texts
|
|
are represented by medieval documents, Latin the was written but may not
|
|
have been spoken, so the problem did not arise.
|
|
However, I might be willing to accept this argument for considering carefully
|
|
some perfect stems of first conjugation verbs which otherwise would end
|
|
in -avav. In the end, the only one I found that I could not support
|
|
was lavo (wash), and its compounds, for which the perfect is lavi.
|
|
<P>
|
|
In some cases there are good reasons not to do the mathematical expansion,
|
|
and these are pointedly avoided. There is no mechanical generation of,
|
|
for instance, conl- words for every coll- word, unless there is some
|
|
citation or reasonable rationale. They may be paired in almost every
|
|
case, but, for instance, collis and collyra are not. However, forms that
|
|
are mentioned in dictionaries explicitly, or implicitly by being derived
|
|
from words having variant forms, are included in order to reduce the
|
|
dependence on 'tricks'. OLD has a conp- for almost every comp- (except
|
|
derivatives from como). Rare exceptions seem to be rare words for which
|
|
few examples (or only one) exist. Even in some of these cases, OLD
|
|
(mechanically?) gives two forms. L+S follows the same pattern, except for
|
|
words of late Latin (which would not be found in OLD). It is presumed
|
|
that the general practice in later times was always to use comp-, and the
|
|
program dictionary follows that. There are many acc-/adc- pairs, but OLD
|
|
has a fair number of acc- words without mention of a corresponding adc-,
|
|
and so the possible generation of these words has been resisted. If an
|
|
example turns up in text, the appropriate trick procedure should suffice
|
|
<P>
|
|
One suspects that some amount of analytical expansion is present even in
|
|
the best dictionaries. Otherwise how can one explain four alternate
|
|
spellings for a word which apparently only appears in citation as a single
|
|
inscription.
|
|
<P>
|
|
In a some few cases I have infered a declension to certain very obscure Greek words
|
|
which other dictionaries have treated as indeclinable
|
|
(having only a single classical example of its use).
|
|
My argument is that some later writer, using this word, might attempt
|
|
to decline in it in a conventional manner,
|
|
no matter what Vitruvius thought. I have indicated the indecl. option in the meaning.
|
|
<P>
|
|
Adjectives from participles are included if an entry is found in some
|
|
reference dictionary. In some case the adjective has a special meaning
|
|
not obvious from the verb. The program will return both the adjective and
|
|
the participle with its verb meaning. The user should give some
|
|
additional consideration to the adjective meaning in this case. If the
|
|
adjective is marked rare while the verb is common, it is likely there is
|
|
reference to a special meaning.
|
|
<P>
|
|
Tricks are expensive in processing time. Each possible modification is
|
|
made, then the resulting word goes through the full recognition process.
|
|
If it passed, that is reported as the answer. If it fails, another trick
|
|
is tried. This is effective if very few words get this far. It is
|
|
expected that application of single tricks will solve most of the
|
|
resolvable difficulties. It would be impractical to mechanically apply
|
|
several tricks in series to a word. A large stem population reduces the
|
|
likelyhood of multiple tricks being required. If the dictionary is heavily and
|
|
redundantly populated, tricks are rarely necessary (and therefore not an
|
|
overall processing burden) and largely successful (if the input word is a
|
|
valid, but unusual, variant/construction).
|
|
<P>
|
|
Further, a conventional dictionary, especially one that wishes to set a
|
|
standard for proper language, excludes words that may not meet criteria of
|
|
propriety, slang, misspellings, etc. This may place the onus on the
|
|
reader to convert words. A computer dictionary ought to relieve the
|
|
reader as much as possible. The present program may be a far way from
|
|
complete, but it's goal is to strive for that.
|
|
|
|
<A NAME="Word Meanings">
|
|
<H4>Word Meanings</H4></A>
|
|
<P>
|
|
The meanings listed are generally those in the literature/dictionaries.
|
|
In the case of common words, there is general agreement among authors.
|
|
Some uncommon words display convoluted interpretations.
|
|
<P>
|
|
Generally, the meaning is given for the base word, as is usual for
|
|
dictionaries. For the verb, it will be a present meaning, even when the
|
|
tense input is perfect. For an adjective, the positive meaning is given,
|
|
even if a comparative or superlative form is shown. This is also so when
|
|
a word is constructed with a suffix, thus an adverb constructed from its
|
|
adjective will show the base adjective meaning and an indication of how to
|
|
make the adverb in English.
|
|
<P>
|
|
For the level of usage for this program, and for convenience in coding,
|
|
the meaning field has been fixed at 80 characters. It is possible to have
|
|
multiple 80 character lines for an entry, but this only necessary for the
|
|
most common words. In order to conserve space, extraneous helpers like
|
|
'a', 'the', 'to', which sometimes appear in dictionary definitions, are
|
|
generally omitted. The solidus ('/') is used both to separate equivalent
|
|
English meanings and to conserve space.
|
|
<P>
|
|
I have taken it upon myself to add some interpretations and synonyms, and
|
|
propose common usage for otherwise complex descriptive definitions. The
|
|
idea is to prompt the reader, expecting that the text may not be that from
|
|
which some dictionary copied the meaning (from some 18th century
|
|
translator!).
|
|
|
|
<P>
|
|
In the meanings I only use words of which I know the meaning.
|
|
I find that in some cases the Oxford Latin Dictionary uses English
|
|
that is not in the Oxford English Dictionary.
|
|
|
|
<P>
|
|
Where available, the Linnean or 'scientific Latin' name is given in
|
|
parentheses, mostly for plants. This is not a classical Latin name, but a
|
|
modern designation. Similarity of this designation to some Latin word may
|
|
not be historically significant.
|
|
<P>
|
|
The spelling of the English meanings is US (plow not plough, color not
|
|
colour, and English corn is rendered as grain or wheat), in spite of the
|
|
fact that most of the Latin dictionaries that I have are British and use
|
|
British spelling. The reason for this is (besides uniformity in the
|
|
program) that there is much computer processing and checking of the
|
|
dictionary data, including spell-checking of the English. (This is not to
|
|
say that everything is correct, but it is much better than it would be
|
|
without the computer checking.) All my programs speak US English, so I can
|
|
count on it. Only some are available in UK English, and I do not have all
|
|
of those versions.
|
|
<P>
|
|
Latin dictionaries seem to be locked into the 19th century. The
|
|
English terms seem stilted, even by current British usage. This is
|
|
probably because much work in translation was started then and later work
|
|
tended to copy from the previous dictionaries. While this dictionary has
|
|
done some modernization, some of the previous obscurities have been
|
|
preserved. This was done in order that certain machine processes could
|
|
compare the results of automatic translation with existing published work.
|
|
|
|
<P>
|
|
In addition, I have given US meanings to some terms that seem to be
|
|
literally translated from the Latin (or German!) (a person who
|
|
steals/drives off cattle is a rustler in the US).
|
|
<P>
|
|
Most dictionaries have an etymological approach, they are driven by the
|
|
derivation of words to distinguish with separate entries words that may be
|
|
identical in spelling but different derivations. But they can lump
|
|
entirely different, even contradictory, meanings in a single entry if
|
|
there is some common derivation. Philosophically, this dictionary is
|
|
usually not sensitive to derivations, but sometimes supports multiple
|
|
entries for vastly different meanings, application areas, or eras. <BR>
|
|
|
|
<P>
|
|
In a very small number of cases a source, such as OLD, will have an entry for which no English meaning is ptovifrd.
|
|
Instead, a few words of Latin text containg the word is given.
|
|
If they cannot figure it out, I certainly cannot.
|
|
Such a source entry is usually omitted ftom WORDS.
|
|
|
|
|
|
<A NAME="Proper Names">
|
|
<H4>Proper Names</H4></A>
|
|
<P>
|
|
Only a very few proper names are included, many just for test purposes,
|
|
others that users have requested. The number of proper names is almost
|
|
limitless but very few are applicable to a particular document, and if it
|
|
is an obscure document it is unlikely that the names would be found in any
|
|
dictionary.
|
|
<P>
|
|
Meaning for proper names may cite a likely example of a person with that
|
|
name. This is just an example; there are lots of others with that name.
|
|
<P>
|
|
There is a switch (defaulted to Yes) that allows the program to assume
|
|
that any capitalized unknown word is a proper name, and to ignore it.
|
|
Also, one can make up a local dictionary of names for one's particular
|
|
application.
|
|
|
|
<A NAME="Letter Conventions (u/v, i/j, w)">
|
|
<H4>Letter Conventions (u/v, i/j, w)</H4></A>
|
|
|
|
<H5>U and/or V</H5>
|
|
<P>
|
|
Strictly speaking, Latin did not have a V, just a consonant U, or a U
|
|
character that was easier in capitals (the way Latin was written by the
|
|
Romans) to write or chisel in stone as V. However, many modern texts and
|
|
dictionaries (with the important exception of the OLD) make the
|
|
distinction with two characters (u and v). It appeared most appropriate
|
|
in a computer context (never destroy information) to make the distinction
|
|
and follow the common practice. So all dictionary entries maintain the
|
|
V/v. However, an input word following the U convention will be found. At
|
|
an earlier version, an algorithm was kludged to convert where necessary.
|
|
While this worked in most cases, there were difficulties. The present
|
|
system processes the dictionary and the input word as though U and V were
|
|
the same letter, although the basic dictionary maintains the distinction
|
|
and the output reflects this. There is no need for the user to
|
|
set modes for this process.
|
|
|
|
<H5>I and/or J</H5>
|
|
<P>
|
|
A similar situation arises with I, and its consonant form, J. In this
|
|
instance, the common practice is use only I, but there are many
|
|
counter-examples, both text and dictionaries. (Lewis+Short uses J, but
|
|
OLD does not.) Because of common practice, the program started out as
|
|
pure-I dictionary with conversion of J-to-I on input. It remained that
|
|
way through many versions, in spite of the logical inconsistency with U-V.
|
|
The technique worked perfectly, but eventually the aesthetic of
|
|
consistency won out and the U/V technique described above was extended to
|
|
I/J.
|
|
|
|
<H5>W</H5>
|
|
<P>
|
|
While the letter W does not exist in classical Latin,
|
|
there are examples of W in medieval Latin. I have not directly
|
|
faced this, and have few words in the dictionary yet with W. The W
|
|
problem is not analogous to U/V. While W sometimes could correspond to V
|
|
or UU, in most cases it is a valid letter, reflecting a Germanic origin of
|
|
the word. It will be treated as a real letter, and tricks employed as useful.
|
|
|
|
<A NAME="DICTIONARY">
|
|
<H3><CENTER>DICTIONARY</CENTER>
|
|
</H3></A> <BR>
|
|
|
|
|
|
<A NAME="Dictionary Codes">
|
|
<H4>Dictionary Codes</H4></A>
|
|
<P>
|
|
Several codes are associated with each dictionary entry (e.g., AGE,
|
|
AREA, GEO, FREQ, SOURCE). Initially these were provided against the possibility of
|
|
the program using them to make a better interpretation, however
|
|
this additional information may be of some help to the reader.
|
|
It is carried in codes because it is not available to the program in any
|
|
other way. Other codes, like the KIND code for nouns, may be
|
|
used, others may not. The program is still in development and these are
|
|
put in to experiment with a possible capability. Later versions may use
|
|
them, omit them, or provide others.
|
|
<P>
|
|
The program covers a combination of time periods and applications areas.
|
|
This is certainly not the way in which dictionaries are usually prepared.
|
|
Usually there is a clear limit to the time or area of coverage, and with
|
|
good reason. A computer dictionary may have capabilities that mitigate
|
|
those reasons. Time or area can be coded into each entry, so that one
|
|
could return only classical words, even though matching medieval entries
|
|
existed. (The program has that capability now, but it is not yet clear
|
|
how to apply it.)
|
|
<P>
|
|
There is some measure of period and frequency that can be used to
|
|
discriminate between identical forms, but if there is only one possible
|
|
match to an input word, it will be displayed no matter its era or rarity.
|
|
The user can choose to display age and frequency warnings associated with
|
|
stems and meanings, but the present default is not to, although inflectios
|
|
are so identified by default.
|
|
<P>
|
|
So far these codes have not been of much use, especially since the only
|
|
significant exercises have been with classical Latin. Other situations
|
|
may change this. Perhaps the only impact now is for those words which
|
|
have different meanings in different applications or periods. For these
|
|
the warning may be useful. Otherwise, if there is only one interpretation
|
|
for a word, that is given.
|
|
<P>
|
|
Rare and age specific inflection forms are also displayed, but there is a
|
|
warning associated with each such. <A NAME="AGE">
|
|
|
|
<H5>AGE</H5>
|
|
<P>
|
|
The designation of time period is very rough. It is presently based on
|
|
dictionary information. If the quotes cited are in the 4th century, and
|
|
none earlier, then the word is assumed to be late Latin, and one might
|
|
conclude that it was not current earlier. One flaw in this argument could
|
|
be that the citation given was just the best illustration from a large
|
|
number covering a wide period. On the other hand, the word could have
|
|
been well known in classical times but did not appear in any surviving
|
|
classical writings. In such a case, it is reasonable to warn the reader
|
|
of Cicero that this is not likely the correct interpretation for his
|
|
example. This capability is still developmental, and its usefulness is
|
|
still an open question.
|
|
<P>
|
|
If there is a classical citation, the word could be designated as
|
|
classical, but unless there is some reason to conclude otherwise, it is
|
|
expected that classical words are valid for use in all periods (X), are
|
|
universal for well considered (published) Latin.
|
|
<P>
|
|
A designation of Early (B) means that there are not classical citations,
|
|
except for poetry, in which the poet is invoking the past (or just
|
|
straining for meter). Obsolete words occur similarly in English
|
|
literature and poetry.
|
|
<P>
|
|
Much which is designated late or medieval may be vulgar Latin, in common
|
|
use in classical times but not thought suitable for literary works.
|
|
<P>
|
|
In all periods the target is Latin. Archaic Latin, for purposes of the
|
|
program, is still Latin, not Etruscan or Greek. Medieval Latin is that
|
|
which was written by scholars as the universal Latin, not versions of
|
|
early French or Italian.
|
|
|
|
<PRE><TT> type AGE_TYPE is (
|
|
X, -- -- In use throughout the ages/unknown -- the default
|
|
A, -- archaic -- Very early forms, obsolete by classical times
|
|
B, -- early -- Early Latin, pre-classical, used for effect/poetry
|
|
C, -- classical -- Limited to classical (~150 BC - 200 AD)
|
|
D, -- late -- Late, post-classical (3rd-5th centuries)
|
|
E, -- later -- Latin not in use in Classical times (6-10) Christian
|
|
F, -- medieval -- Medieval (11th-15th centuries)
|
|
G, -- scholar -- Latin post 15th - Scholarly/Scientific (16-18)
|
|
H -- modern -- Coined recently, words for new things (19-20)
|
|
);</TT></PRE>
|
|
<A NAME="AREA">
|
|
|
|
<H5>AREA</H5>
|
|
<P>
|
|
While the reader can make his own interpretation of the area of
|
|
application from the given meaning, there may be some cases in which the
|
|
program can also use that information (which it can only get from a direct
|
|
coding). This has not yet been used in the program, but the possibility
|
|
exists. If the reader were doing a medical text, then higher priority
|
|
should be given to words coded B, if a farming book, then A coded words
|
|
should be given preference.
|
|
<P>
|
|
The area need not apply to all the meanings, just that there is some part
|
|
of the meaning that is specialized to or applies specifically to that area
|
|
and so is called out.
|
|
|
|
<PRE><TT>type AREA_TYPE is (
|
|
X, -- All or none
|
|
A, -- Agriculture, Flora, Fauna, Land, Equipment, Rural
|
|
B, -- Biological, Medical, Body Parts
|
|
D, -- Drama, Music, Theater, Art, Painting, Sculpture
|
|
E, -- Ecclesiastic, Biblical, Religious
|
|
G, -- Grammar, Retoric, Logic, Literature, Schools
|
|
L, -- Legal, Government, Tax, Financial, Political, Titles
|
|
P, -- Poetic
|
|
S, -- Science, Philosophy, Mathematics, Units/Measures
|
|
T, -- Technical, Architecture, Topography, Surveying
|
|
W, -- War, Military, Naval, Ships, Armor
|
|
Y -- Mythology
|
|
);</TT></PRE>
|
|
<A NAME="GEO">
|
|
|
|
<H5>GEO</H5>
|
|
<P>
|
|
This code was included to enable the program to distinguish between
|
|
different usages of a word depending on where it was used or what country
|
|
was the subject of the text. This is a dual usage, origin or subject.
|
|
|
|
<PRE><TT>type GEO_TYPE is (
|
|
X, -- All or none
|
|
A, -- Africa
|
|
B, -- Britian
|
|
C, -- China
|
|
D, -- Scandinavia
|
|
E, -- Egypt
|
|
F, -- France, Gaul
|
|
G, -- Germany
|
|
H, -- Greece
|
|
I, -- Italy, Rome
|
|
J, -- India
|
|
K, -- Balkans
|
|
N, -- Netherlands
|
|
P, -- Persia
|
|
Q, -- Near East
|
|
R, -- Russia
|
|
S, -- Spain, Iberia
|
|
U -- Eastern Europe
|
|
);
|
|
</TT></PRE>
|
|
<A NAME="FREQ">
|
|
|
|
<H5>FREQ</H5>
|
|
<P>
|
|
There is an indication of relative frequency for each entry. These codes
|
|
also apply to inflections, with somewhat different meaning. If there were
|
|
several matches to an input word, this key may be used to sort the output,
|
|
or to exclude rare interpretations. The first problem is to provide the
|
|
score. The initial method is to grade each word by how much column space
|
|
is allocated to it in the Oxford Latin Dictionary, or the number of
|
|
citations, on the assumption that many citations mean a word is common.
|
|
This is not the main intent of the compilers of existing dictionaries, but it
|
|
is almost the only indication of frequency that can be inferred from the
|
|
dictionaries. In many cases it seems to be a reasonable guess, certainly
|
|
for those most common words, and for those that are very rare.
|
|
|
|
<P>FREQ guessed from the relative number of citations given by sources
|
|
need not be valid, but seems to work.
|
|
If the compiler's purpose were just to give sufficient
|
|
examples to clarify the use of the word,
|
|
perhaps a single reference would serve for a simple word.
|
|
However one might observe that dictionary people seem to be enamored
|
|
with filling up this section whenever possible.
|
|
('et' has more than a page in OLD.)
|
|
If there is only one citation, they could only find one.
|
|
(This assertion can now easily be verified by searching the texts
|
|
available on the Internet.)
|
|
|
|
|
|
With the
|
|
understanding that adjustments can be made when additional information is
|
|
available, the initial numeric criteria are:
|
|
|
|
<PRE>
|
|
A full column or more, more than 50 citations - very frequent
|
|
B half column, more than 20 citations - frequent
|
|
C more then 5 citations - common
|
|
D 4-5 citations - lesser
|
|
E 2-3 citations - uncommon
|
|
F only 1 citation - very rare
|
|
</PRE>
|
|
|
|
|
|
<P>
|
|
In the case of late Latin in Lewis and Short, these frequencies may be
|
|
significant underestimates, since the volume of applicable texts
|
|
considered seems to be much smaller than for classical Latin resulting in
|
|
fewer opportunities for citations. Nevertheless, barring additional
|
|
information, the system is generally followed.
|
|
<P>
|
|
For the situation where there are several slightly different spellings
|
|
given for a word, they all are given the same initial frequency. The
|
|
theory is that the spelling is author's choice while the frequency is
|
|
attached to the word no matter how it is spelled. I presume that for a
|
|
specific text the author always spells the word the same way, that there
|
|
is no distribution of spellings within a individual text. One exception
|
|
to this rule is the case where a variant spelling is cited only for
|
|
inscriptions. There may be some significance to this and a FREQ of I is
|
|
assigned. The logic of this choice is debatable. However, for some
|
|
variations there is clearly a difference in application and this can be
|
|
reflected in the frequency code. Likewise, there are situations wherein
|
|
words of the same spelling but different meanings may have different
|
|
frequencies. This may help to select the most likely interpretation.
|
|
<P>
|
|
One has a check against the frequency list of Diederich for the most
|
|
common, and those are probably the only ones that matter. But the
|
|
frequency depends on the application, and it should be possible to run a
|
|
new set of frequencies if one had a reasonable volume of applicable text.
|
|
The mechanical verification of word frequency codes is a long-term goal of
|
|
the development, but must wait until the dictionary data is complete.
|
|
<P>
|
|
Inscription and Graffiti are designations of frequency only in that the
|
|
only citations found were of that nature. One might suppose that if
|
|
literary examples were known they would have been used. So one might
|
|
expect that such words would not be found in a student's text. There is
|
|
no implication that they were not common in the spoken language.
|
|
<P>
|
|
A very special case has been created for 'N' words, words for which the
|
|
only dictionary citation is Pliny's Natural History. It seems, from
|
|
reading of dictionaries, that this work may be the only source for these
|
|
words, that they do not appear in any other surviving texts. They are
|
|
usually names for animals, plants or stones, many without identification.
|
|
Such words may appear only in Lewis and Short and the Oxford Latin
|
|
Dictionary, the unabridged Latin classical dictionaries. These words are
|
|
omitted from most other Latin dictionaries and, although they fall in the
|
|
classical period and are from a very well known writer, there is no
|
|
mention of the omission. So there may be an argument to disparage these
|
|
words, unless one is reading Pliny.
|
|
<P>
|
|
Most of these words are of Greek origin (although that is also true for
|
|
much of Latin). For many, the dictionaries report different forms or
|
|
declensions for the word giving the same citation. Often one dictionary
|
|
will give a Greek-like form (-os, -on) where another gives a Latinized
|
|
form (-us). There is no consistency. Both OLD and L+S disagree on Latin
|
|
and Greek forms, with no overwhelming favoritism to one form attached to
|
|
either dictionary. This may be a reflection of the fact that the
|
|
dictionaries grew over a long time with several editors, many workers, and
|
|
no rigid enforcement of standards.
|
|
<P>
|
|
I have made it a point to try to complete (give M, F, N) Greek adjectives
|
|
where other dictionaries give only a single form. To do this I have referred
|
|
to the base Greek in Liddell + Scott Greek-English Lexicon, assuming that
|
|
any Roman scholar pedantic enough to use a Greek form knew the Greek and
|
|
would draw on that knowledge.
|
|
<P>
|
|
There is another problem that is found chiefly in connection with
|
|
Pliny-type words. Since the literature is very sparse on examples, it is
|
|
often uncertain whether a particular usage is appropriately listed as a
|
|
noun, as an adjective, or as adjective used as a substantive. The present
|
|
dictionary, in blessed innocence, records all forms without bias.
|
|
|
|
<PRE><TT> type FREQUENCY_TYPE is ( -- For dictionary entries
|
|
X, -- -- Unknown or unspecified
|
|
A, -- very freq -- Very frequent, in all Elementry Latin books, top 1000+ words
|
|
B, -- frequent -- Frequent, next 2000+ words
|
|
C, -- common -- For Dictionary, in top 10,000 words
|
|
D, -- lesser -- For Dictionary, in top 20,000 words
|
|
E, -- uncommon -- 2 or 3 citations
|
|
F, -- very rare -- Having only single citation in OLD or L+S
|
|
I, -- inscription -- Only citation is inscription
|
|
M, -- graffiti -- Presently not much used
|
|
N -- Pliny -- Things that appear only in Pliny Natural History
|
|
);</TT></PRE>
|
|
|
|
<P>
|
|
For inflections, the same type is used with different weights
|
|
|
|
<PRE><TT>
|
|
-- X, -- -- Unknown or unspecified
|
|
-- A, -- most freq -- Very frequent, the most common
|
|
-- B, -- sometimes -- sometimes, a not unusual VARIANT
|
|
-- C, -- uncommon -- occasionally seen
|
|
-- D, -- infrequent -- recognizable variant, but unlikely
|
|
-- E, -- rare -- for a few cases, very unlikely
|
|
-- F, -- very rare -- singular examples,
|
|
-- I, -- -- Presently not used
|
|
-- M, -- -- Presently not used
|
|
-- N -- -- Presently not used
|
|
|
|
</TT></PRE>
|
|
<A NAME="SOURCE">
|
|
|
|
<H5>SOURCE</H5>
|
|
<P>
|
|
Source is the dictionary or grammar which is the source of the
|
|
information, not the Cicero or Caesar text in which it is found.
|
|
|
|
<P>
|
|
For a number of entries, X is now given as Source. This is primarily from
|
|
the vocabulary (about 13000 words) which was in place before the Source
|
|
parameter was put in, and some have not been updated. They are
|
|
from no particular Source, just general vocabulary picked up in various
|
|
texts and readings. Although, during the dictionary update beginning in
|
|
1998, all entries are being checked against sources, it may be improper to
|
|
credit (blame?) a Source when that was not the origin of the entry,
|
|
remembering that the actual entries are of my generation entirely and may
|
|
not correspond exactly to any other view. However, in the second pass (as
|
|
far as it has progressed) all classical entries have been verified with
|
|
the Oxford Latin Dictionary (OLD). (By that I mean that I have checked,
|
|
not to imply that I have not made errors.) This does not mean that the
|
|
entry copies or agrees with the OLD, but that I read the OLD entry with
|
|
great respect and put down what I did anyway. Newer entries, added in
|
|
this process, and those checked later in the process, if found in the OLD,
|
|
have the O code. Words added from Lewis and Short, but not in OLD, have
|
|
the S code, etc.
|
|
|
|
All entries for which there is a Source will be found in
|
|
some form in that Source, but the details of the interpretation of
|
|
declension and meaning is mine.
|
|
Each entry is
|
|
my responsibility alone, and there are significant differences and
|
|
elaborations. They may not necessarily be found as
|
|
primary entries, or even directly referrenced, but they will have been
|
|
constructed from information in that source. For instance, the remark 'adp see app'
|
|
in a source dictionary may generate 'adp' WORDS entries that are not explicitlt mentioned in the
|
|
source dictionary.
|
|
There might be occasions where the source gives a noun
|
|
but on my own initiative I have also introduced the corresponding adjective
|
|
(or the converse), particularly if that usage was found in a text.
|
|
In such a case the source would be the same. If I have
|
|
done a proper job, the reader will not often be surprised.
|
|
|
|
<P>An important implication of the SOURCE is age. OLD contains words
|
|
from the classical period of Latin, and these are carried forward to all ages.
|
|
Thus AGE for OLD entries will be X (all ages). Those in L+S, but not in OLD,
|
|
might be checked against the premise that they were late/post-classical Latin (D),
|
|
citations being the determining factor. Souter (SOURCE=P) is a wordlist of
|
|
later Latin (AGE=E), so his entries might be presumed not to be common in
|
|
classical times. Other sources, indicated by AGE flags or by parenthesized
|
|
comments, may also indicate to the user the age appropriate for the entry.
|
|
Calepinus Novus (Cal) (SOURCE=K) is especially noteworthy in that it is
|
|
of modern, 20th century Latin and its meanings should probably not be applied to
|
|
earlier texts.
|
|
|
|
<P>
|
|
OLD is taken as the most authorative source
|
|
and if it is in OLD then it was used in classical times within a very limited period.
|
|
An entry with source O will have AGE X (or C if it is unique to classical).
|
|
This also define good Latin and the usage should be valid for all ages.
|
|
|
|
<P>
|
|
Lewis and Short (S) is next in authority and also somewhat in time.
|
|
It covers, in addition to classical, a later period. That a word appears in S but not in O
|
|
may mean it is a somewhat later usage.
|
|
If that poiint is well established, the AGE is D.
|
|
But most often the main source is OLD and there are additional meanings
|
|
indicated as L+S. The user is warned that this may be a case of modified meaning
|
|
coming into use at a later age.
|
|
But it may be that, after review of L+S, OLD differs for reason and has a better interpretation.
|
|
|
|
<P>
|
|
A formation from a classical word with the natural meaning is usually assumed
|
|
to also be appropriately classical/general - X.
|
|
Such a word with an enhanced, specialized or modified meaning might
|
|
indicate a later usage, and is so labeled.
|
|
It may be that the word was in use earlier but no reference is available,
|
|
In some cases, an additional meaning is identified as (L+S)
|
|
just to give credit, without implying anything further.
|
|
|
|
<P>
|
|
In time, Souter (P) is next.
|
|
Again if it is in Souter but not O of S it is very likely later Latin
|
|
The date may reflect this, but the source is a hint to the user, not an firm promise.
|
|
|
|
<P>
|
|
Next in line in time is Latham (M) for medieval Latin.
|
|
|
|
<P>
|
|
Souter and Latham are poorly represented. There is no attempt to include these sources
|
|
with the throughness of the OLD and L+S effort.
|
|
Entries from these sources come up only when a particular
|
|
word is submitted from a text and no other source serves, giving credance to the assumption that
|
|
such entries belong to a later AGE..
|
|
|
|
<P>
|
|
Stelten (Ecc) is more fully represented (goal to complete) since it specializes in an area not well
|
|
covered by other sources. While it is a complete dictionary, with all the general words, it has
|
|
a number of entries specifically or solely applicable to the Christian Church.
|
|
These are from later (non-classical) times, chiefly medieval.
|
|
|
|
<P>
|
|
Licoppe (K) is modern. An additional meaning on a word from an earlier AGE is likely to be uniquely modern.
|
|
|
|
<P>
|
|
Note that there are examples in which different sources at different ages give contrary meanings.
|
|
This may reflect a real and not uncommon shift in meaning, or there may be errors in the sources.
|
|
At least in such cases the sources (and their implied ages) are identified.
|
|
|
|
<P>The list of sources goes far beyond what has been directly used so far.
|
|
There should be no expectation at this point in the development that
|
|
all these sources have even been used. They are listed as I have copies
|
|
and as they might be consulted. They are encoded so that the program might
|
|
recognize and process the source should it come up.
|
|
I have sought and received permission for those which have been
|
|
extensively used. Others have only been used for an occasional check
|
|
(fair use).
|
|
|
|
|
|
|
|
<PRE><TT> type SOURCE_TYPE is (
|
|
X, -- General or unknown or too common to say
|
|
A,
|
|
B, -- C.H.Beeson, A Primer of Medieval Latin, 1925 (Bee)
|
|
C, -- Charles Beard, Cassell's Latin Dictionary 1892 (CAS)
|
|
D, -- J.N.Adams, Latin Sexual Vocabulary, 1982 (Sex)
|
|
E, -- L.F.Stelten, Dictionary of Eccles. Latin, 1995 (Ecc)
|
|
F, -- Roy J. Deferrari, Dictionary of St. Thomas Aquinas, 1960 (DeF)
|
|
G, -- Gildersleeve + Lodge, Latin Grammar 1895 (G+L)
|
|
H, -- Collatinus Dictionary by Yves Ouvrard
|
|
I, -- Leverett, F.P., Lexicon of the Latin Language, Boston 1845
|
|
J,
|
|
K, -- Calepinus Novus, modern Latin, by Guy Licoppe (Cal)
|
|
L, -- Lewis, C.S., Elementary Latin Dictionary 1891
|
|
M, -- Latham, Revised Medieval Word List, 1980
|
|
N, -- Lynn Nelson, Wordlist
|
|
O, -- Oxford Latin Dictionary, 1982 (OLD)
|
|
P, -- Souter, A Glossary of Later Latin to 600 A.D., Oxford 1949
|
|
Q, -- Other, cited or unspecified dictionaries
|
|
R, -- Plater & White, A Grammar of the Vulgate, Oxford 1926
|
|
S, -- Lewis and Short, A Latin Dictionary, 1879 (L+S)
|
|
T, -- Found in a translation -- no dictionary reference
|
|
U, -- Du Cange
|
|
V, -- Vademecum in opus Saxonis - Franz Blatt (Saxo)
|
|
W, -- My personal guess
|
|
Y, -- Temp special code
|
|
Z -- Sent by user -- no dictionary reference
|
|
-- Mostly John White of Blitz Latin
|
|
|
|
-- Consulted but used only indirectly
|
|
-- Liddell + Scott Greek-English Lexicon
|
|
|
|
-- Consulted but used only occasionally, seperately referenced
|
|
-- Allen + Greenough, New Latin Grammar, 1888 (A+G)
|
|
-- Harrington/Pucci/Elliott, Medieval Latin 2nd Ed 1997 (Harr)
|
|
-- C.C./C.L. Scanlon Latin Grammar/Second Latin, TAN 1976 (SCANLON)
|
|
-- W. M. Lindsay, Short Historical Latin Grammar, 1895 (Lindsay)
|
|
);
|
|
</TT></PRE>
|
|
|
|
|
|
<A NAME="Current Distribution of DICTLINE Flags">
|
|
<H4>Current Distribution of DICTLINE Flags</H4></A>
|
|
<PRE><TT>
|
|
Number of lines in DICTLINE GENERAL 1.97F 39187
|
|
|
|
AGE
|
|
X 28858
|
|
A 61
|
|
B 446
|
|
C 58
|
|
D 3937
|
|
E 1718
|
|
F 1996
|
|
G 1920
|
|
H 193
|
|
|
|
AREA
|
|
X 29181
|
|
A 2955
|
|
B 912
|
|
D 410
|
|
E 1916
|
|
G 504
|
|
L 1221
|
|
P 181
|
|
S 730
|
|
T 382
|
|
W 722
|
|
Y 73
|
|
|
|
GEO
|
|
X 38147
|
|
A 64
|
|
B 52
|
|
C 1
|
|
D 3
|
|
E 49
|
|
F 67
|
|
G 20
|
|
H 278
|
|
I 141
|
|
J 4
|
|
K 6
|
|
N 8
|
|
P 9
|
|
Q 312
|
|
R 1
|
|
S 25
|
|
U 0
|
|
|
|
FREQ
|
|
X 11
|
|
A 2133
|
|
B 2711
|
|
C 10757
|
|
D 2678
|
|
E 11218
|
|
F 7982
|
|
I 424
|
|
M 0
|
|
N 1273
|
|
|
|
SOURCE
|
|
X 7554
|
|
A 0
|
|
B 41
|
|
C 1751
|
|
D 14
|
|
E 1417
|
|
F 119
|
|
G 59
|
|
H 0
|
|
I 4
|
|
J 116
|
|
K 2100
|
|
L 60
|
|
M 759
|
|
N 84
|
|
O 16039
|
|
P 296
|
|
Q 24
|
|
R 12
|
|
S 8094
|
|
T 88
|
|
U 0
|
|
V 47
|
|
W 316
|
|
Y 35
|
|
Z 158
|
|
</TT></PRE>
|
|
|
|
|
|
|
|
<A NAME="Dictionary Conventions">
|
|
<H4>Dictionary Conventions</H4></A>
|
|
<P>
|
|
There are a few special conventions in setting codes.
|
|
<P>
|
|
Proper Names
|
|
<P>
|
|
Proper names are often identified by the AGE in which the person lived,
|
|
not the age of the text in which he is referenced, the AREA of his fame or
|
|
occupation, and the GEO from which he hailed. This refers to some
|
|
most-likely person of this name. A name may be shared by others in
|
|
different ages. Thus Jason, the Argonaut, is Archaic, Myth, Greek (A Y
|
|
H). (It is not likely that a Latin text would refer to a TV star.)
|
|
Tertullian, an early 3rd century Church Father from Carthage, author of
|
|
the first Christian writings in Latin, is Late, Ecclesiastic, Africa (D E
|
|
A). Jupiter is (A E I), which is a bit sloppy since he is present later.
|
|
Today he may be a myth, but then he was a god. But even gods are not
|
|
eternal (X) in language, and an initial place is found for them. Place
|
|
names are likewise coded, although with less confidence.
|
|
<P>
|
|
Vertical Bar
|
|
<P>
|
|
While not visible to the user, the dictionary contains certain meanings
|
|
starting with a vertical bar (|). This is a code used to identify meanigs
|
|
that run beyond the conventional 80 characters. One or more vertical bars
|
|
leading the meaning allows tools to recognize that they are additional
|
|
meanings to an entry already encountered, usually the entry immediately
|
|
before when the sort is for that reason. This is only of concern to those
|
|
dealing with the raw dictionary who have asked. <BR>
|
|
|
|
<BR><A NAME="Evolution of the Dictionary">
|
|
<H4>Evolution of the Dictionary</H4></A>
|
|
<P>
|
|
The stem list was originally put together from what might be called
|
|
'common knowledge', those words that most Latin texts have. The first
|
|
version had about 5000 dictionary entries, giving up to 95% coverage of
|
|
simple classical texts. This grew to about 13000 entries with specific
|
|
additions when gaps were found. With this number it was possible to get
|
|
better than a 99% hit rate on Caesar (an area from which the dictionary
|
|
was built). Parse of other works fell to 95-97%, which may be
|
|
mathematically attractive but leaves a lot to be desired in a dictionary,
|
|
since a translator is usually familiar with the vast bulk of the language
|
|
and just needs help on the obscure words. Having just the common words is
|
|
not enough, indeed not much help at all. So an attempt is made to make
|
|
the dictionary as complete as possible. All possible spellings found in
|
|
dictionaries are included.
|
|
<P>
|
|
Starting with the 13000, the expansion project beginning in 1998 sought to
|
|
verify the existing words and supplement with any new found ones. Thus
|
|
all classical Latin words are consistent with the OLD (not to say taken
|
|
from, because most were not, but checked against). Any significant
|
|
deviation is indicated, either as from another source, or in the
|
|
definition itself.
|
|
<P>
|
|
L+S is used for later Latin and to check OLE work. This started with the
|
|
thought that if a word was in L+S but not in OLE it must be later Latin,
|
|
beyond the range of OLD. I was surprised at how many words with classical
|
|
citations were in L+S but not in OLD, and how many are of different
|
|
spelling.
|
|
<P>
|
|
The refinement is proceeding one letter at a time, as is the tradition for
|
|
all great dictionaries. First stage refinement has proceeded through DI.
|
|
<BR>
|
|
<BR>
|
|
|
|
<BR><A NAME="Text Dictionary - DICTPAGE.TXT">
|
|
<H4>Text Dictionary - DICTPAGE.TXT</H4></A>
|
|
|
|
<P>In response to many requests, a simple ASCII text list has been created of
|
|
the WORDS dictionary, in what might be called the paper dictionary form.
|
|
Each coded dictionary entry has been expanded to its dictionary form
|
|
(nominitive and genitive for nouns, four principle parts for verbs, etc.).
|
|
In content it is like a paper dictionary, but each entry is on one long line
|
|
and the headwords are in all capitals, convenient for case-sensitive search.
|
|
The headwords are listed alphabetically (not the same as the coded file)
|
|
and offered in an ASCII/DOS text file
|
|
<A HREF="http://www/erols/com/whitaker/dictpage.txt"><B>DICTPAGE.TXT</B></A>
|
|
which may be searched from the user's browser,
|
|
or best downloaded and searched by any editor off-line.
|
|
To make it possible to search on-line, the file is not compressed and so is
|
|
about 3 MB.
|
|
<BR><BR>
|
|
|
|
<BR><A NAME="Latin Spellchecking - Text Processor List - LISTALL.ZIP">
|
|
<H4>Latin Spellchecking - Text Processor List - LISTALL.ZIP</H4></A>
|
|
|
|
<P>I have done a lot of Latin spell checking directly with WORDS.
|
|
All you have to do is put the text in a file,
|
|
run WORDS with a text file (@) input,
|
|
and require output of an WORD.UNK file (see # parameters).
|
|
It is sometimes useful to run without FIXes and TRICKS first,
|
|
then run the resulting first-pass UNKNOWNs
|
|
and look at the full WORD.OUT to make sure the modifications are reasonable.
|
|
|
|
|
|
<P>There are other techniques.
|
|
As I understand it, WORD2000 and other processors take a simple list of valid spellings
|
|
and use that for spellchecking.
|
|
I am speaking on secondhand information.
|
|
I have not tried to do the WORD2000 job.
|
|
However several people have proposed to use my dictionary files to do so.
|
|
|
|
<P>Since Latin is an inflected language, each dictionary entry expands
|
|
to many "words", often hundreds.
|
|
The present WORDS raw dictionary would expand to an enormous number of simple words,
|
|
but that is not the end of it.
|
|
Each of those words might have attached prefixes and suffixes, enclitics, and spelling variations.
|
|
Literaly billions of different words can be parsed and analyzed by the WORDS program.
|
|
These are legal Latin words, whether any Roman actually spoke them.
|
|
Of course, one could make a list of all the words in Cicero, or in the Vulgate,
|
|
and make a dictionary of those (and we are close to that),
|
|
but the body of medieval Latin is enormously greater
|
|
than that of classical Latin on which most dictionaries are based.
|
|
|
|
|
|
<P>In response to several requests, a simple ASCII text list has been created of
|
|
the two million primary words
|
|
that the WORDS program and dictionary can form by adding inflections to stems.
|
|
This list has been reduced to half by eliminating duplicates.
|
|
The downloadable
|
|
<A HREF="http://www/erols/com/whitaker/listall.zip"><B>ZIP</B></A>
|
|
of this file is over 2 MB.
|
|
|
|
<P>The purpose of such a list is to provide data for conventional
|
|
word processor spell checking.
|
|
|
|
<P>Currently there are some ommissions.
|
|
|
|
<P>1) Latin has a widely used enclitic, -que, also -ne and -ve.
|
|
In principle these could be tacked on to almost any word.
|
|
If the spell checking system had the capability of recognizing
|
|
them, that would be the most convenient way of handling this problem.
|
|
Otherwise, completeness would require their addition to every word,
|
|
quadrupling the size of the list.
|
|
|
|
<P>2) Many Latin verb forms are subject to syncope, contracting the
|
|
form for pronounciation. In WORDS this is handled by a process.
|
|
For the list another method must be used and the contracted words
|
|
generated by modifing both stem and ending.
|
|
|
|
<P>3) There are some common combined words in Latin in which the first
|
|
part of the word is declined, followed by a fixed form. Unlike the
|
|
enclitic situation, these forms are limited and should be generated
|
|
seperately (quidam). Other qu- pronouns are handled seperately
|
|
in WORDS and need special processing here also.
|
|
|
|
<P>4) Uniques have not yet been added. This is a trivial
|
|
matter.
|
|
|
|
<P>5) There is the problem of prefixes and suffixes. WORDS provides for
|
|
hundreds of these. It would be impractical to multiply the list
|
|
by mechanically including all such possibilities. Fortunately,
|
|
this may not be a significant problem. The philosophy for the
|
|
dictionary has been to include all words, even those which could
|
|
be easily generated by a base and fixes, as they occur or are found
|
|
in sources. This means that the most common compound words are
|
|
in the system, but that coverage is mostly concentrates on classical Latin.
|
|
|
|
<P>6) In later times especially,
|
|
there came some more or less common spelling variations.
|
|
These are handled in WORDS by TRICKS.
|
|
They can be relatively expensive, but are only applied to words
|
|
which otherwise have failed, are these are becoming rarer.
|
|
This process, if generally applied, would not only expand the
|
|
list enormously, the added words would not advance the goal
|
|
of spell checking. They are, in some sense, misspelled words.
|
|
For a reader, it can be useful to have a guess at the word.
|
|
He can examine the form and context and judge whether it makes
|
|
sense. It is not a process to be applied mechanically.
|
|
|
|
<P>7) There is a divergence in the way editors treat the non-Latin
|
|
characters J and V. These are the consonant forms of I and U.
|
|
They are explicit in English, so for convenience, familiarity, and
|
|
pronounciation general practice in the past has been to use them.
|
|
More recently, some academic purists have rejected this and eliminated
|
|
J and V altogether. (Note that the same purists use lower case
|
|
letters, in spite of the fact that the Romans had only the upper case.)
|
|
WORDS keeps the variant characters in the dictionary and maps them
|
|
to a single character in processing. A list could include both
|
|
expressions, and it would only add a few percent in size. However,
|
|
that would allow inconsistent spelling choices in a text. This
|
|
seems to be contrary to the goals of a spell checker.
|
|
It is probably better eventually to offer two seperate lists so that the user
|
|
may select the option appropriate for his work.
|
|
|
|
|
|
|
|
<P>All the above factors are applied by processes in the WORDS program.
|
|
Running WORDS looking for UNKNOWNS will give a superior spell check,
|
|
but the list can be useful in conjunction with common editors.
|
|
Experience will determine its effectiveness.
|
|
<BR><BR>
|
|
|
|
|
|
|
|
<A NAME="INFLECTIONS">
|
|
<H3><CENTER>INFLECTIONS</CENTER>
|
|
</H3></A> <BR>
|
|
|
|
<P>
|
|
Inflections for WORDS are in a human-readable file called INFLECTS.LAT.
|
|
Presently there are almost 1800 separate entries.
|
|
This data is processed to produce a file INFLECTS.SEC used by the code.
|
|
The format of INFLECTS.LAT is simple, as for example:
|
|
|
|
<PRE><TT>N 1 1 NOM S C 1 1 a X A
|
|
|
|
V 1 1 PRES ACTIVE IND 1 P 2 4 amus X A
|
|
|
|
PREP ACC 1 0 X A</TT></PRE>
|
|
|
|
<P>
|
|
The part of speech is given,along with the appropriate characteristics
|
|
for a particular inflection. The inflection/ending is specified by
|
|
the stem to which it is attached, a number of characters, and the ending string.
|
|
There is an AGE and FREQ for each entry.
|
|
|
|
|
|
<BR>
|
|
<BR>
|
|
|
|
|
|
<A NAME="ENGLISH to LATIN">
|
|
<H3><CENTER>ENGLISH to LATIN</CENTER>
|
|
</H3></A> <BR>
|
|
|
|
|
|
<P>A fairly new application for the WORDS dictionary has been an attempt to
|
|
go English to Latin.
|
|
Up to now there is no satisfactory computer facility for this.
|
|
The best on the net is a search of the Perseus dictionary,
|
|
finding all uses of the English word in the text of the dictionary.
|
|
One can do the same with the WORDS dictionary,
|
|
and DICTPAGE.TXT is a convenient form for that purpose.
|
|
In the present release of WORDS, a primitive English-to-Latin
|
|
facility has been implemented, based on this inverted dictionary method.
|
|
|
|
<P>However, except for very simple situations,
|
|
the resulting raw output can be excessive and often spurious.
|
|
It is necessary to TRIM the output for the general user.
|
|
In order to do this, one needs to be able to computer parse the MEAN field
|
|
and prioritize the significance of a word appearing therein.
|
|
This is a more rigorous requirement than the one applied hitheretofore,
|
|
that MEAN should be human-readable. Now it must be computer parsed.
|
|
Therein lies the reason for a formal set of rules for constructing MEAN.
|
|
These rules are new and certainly have not been applied throughout
|
|
the dictionary yet, further,
|
|
they may change in the future if more powerful ordering algorithms evolve.
|
|
|
|
<P>The primary rule is that nothing should surprise or
|
|
inconvenience the casual user of WORDS.
|
|
Further, for system independence,
|
|
the MEAN line should be readable by anyone in ASCII,
|
|
without special characters or fonts.
|
|
|
|
|
|
<P>I have just begun to work on an English-to-Latin capability.
|
|
Initially this is just a inversion of the WORDS Latin dictionary,
|
|
extracting all the English words in the MEAN field of the WORDS dictionary
|
|
and associating these with the corresponding Latin entry.
|
|
A real English-to-Latin is much more than that.
|
|
To construct from first principles,
|
|
one should take a set of English words and find the Latin equivalent,
|
|
not the reverse.
|
|
Nevertheless, WORDS now has some primitive capability.
|
|
|
|
<P>The raw inversion produces almost 200_000 English words.
|
|
WEEDing them by the present algorithms (eliminate a, the, to, ...,
|
|
and a number of common modifiers when included in meanings not of
|
|
their part of speach)
|
|
reduces this number only by a third.
|
|
But this finally results in only somewhat over 20_000 unique words,
|
|
less than the number of Latin entries!
|
|
This probably reflects more on dictionary makers than on the languages.
|
|
|
|
|
|
<P>English is certainly a far richer lanaguage than Latin, measured
|
|
by the number of individual words.
|
|
WORDS has about 40_000 Latin entries, and the corresponding inversion to
|
|
English yields only 22_000 unique English words.
|
|
|
|
|
|
<P>The reason seens to be that,
|
|
while English may have lots of words for love or hate,
|
|
in making up a Latin
|
|
dictionary one will opt to give a simple translation.
|
|
So while love is a proper translation of a number of Latin words,
|
|
and one could as well replace it with any of dozens of English synonyms,
|
|
a dictionary compiller will usually take the simplest English word
|
|
that provides the reader with the meaning. That is what the reader usually wants.
|
|
|
|
|
|
<P>Starting from an English basis to produce an English-to-Latin dictinoary
|
|
gives an entirely different outcome.
|
|
In that case, the full power of English can be invoked,
|
|
and it is the Latin that will seem simple by comparison.
|
|
|
|
<P>In many cases, an English-to-Latin dictionary bound in the same volume
|
|
as a Latin-to-English will have been
|
|
developed by a different author, and sometimes they are not consistent.
|
|
At least the inversion procedure assures basic consistency.
|
|
|
|
<P>One problem with the inversion method is that one needs
|
|
to weed out a lot of the chaff before presenting to the user.
|
|
And even then there are a lot of choices for the user.
|
|
GOLD occurs 120 times; COPPER, 57 times: ABANDON, 24 times,
|
|
plus several times for ABANDONED, ABANDONS, ABANDONING, and ABANDONMENT.
|
|
Further trimming has to get very severe!
|
|
|
|
|
|
|
|
<P>If the program is run with TRIM_OUTPUT parameter set
|
|
(this parameter works on both Latin and English output),
|
|
the six highest priority (by FREQ or whatever the current algorithm is)
|
|
will be listed. This should serve for the general user.
|
|
Turning off this parameter allows the program to list all instances
|
|
found in the Latin dictionary,
|
|
which were not removed by WEEDing in the data preparation.
|
|
|
|
<P>Finally there is the problem that most paper Latin dictionaries harken back
|
|
to the 19th century or earlier, even those published more recently.
|
|
Their base English may not be current.
|
|
Take a purely hypothetical example. On the first page of every English-Latin
|
|
dictionary is <B>abase</B>. This is a good 18th century word. Today one is
|
|
more likely to see humble, degrade or humiliate, and those are the words the
|
|
user is more likely to request. But the dictionaries from which WORDS draws
|
|
may be fonder of abase as the meaning of a Latin word which could serve for
|
|
any of these. The user may want to try some synonyms, but this can be
|
|
a considerable burden. A built-in thesaurus could mechanically generate
|
|
a broad range of words to include, but this is surely overkill and will
|
|
generate so many inappropriate results as to render the search excessively
|
|
cumbersome. The user is advised to check the meanings returned for suggestions
|
|
as to what other words might be tried, if the immediate result does not
|
|
seem satisfactory.
|
|
|
|
|
|
<P>One important point is that the program mechanically searches the Latin dictionary.
|
|
If one is looking for a adjective, presently one will find all adjectives for which the
|
|
MEAN contains the search word, no more. However one should be aware that
|
|
participles of appropriate verbs can also serve as adjectives and may be
|
|
a better choice.
|
|
|
|
<P>At the present time there is no complex constrution/deconstruction
|
|
of the English input. Thus if the input is 'kill', only Latin entries
|
|
with the exact word 'kill' in their MEAN will be selected. The suffixed words
|
|
'kills'/'killing'/'killed'/'killer'/etc. will not be found. They must
|
|
be queried seperately.
|
|
Likewise, unlike the Latin phase of WORDS, prefixes are not extracted.
|
|
It may be desirable in the future to provide such additional capabilities.
|
|
This would be value added over simple search by the program.
|
|
|
|
|
|
|
|
<BR>
|
|
<BR>
|
|
|
|
|
|
<A NAME="English Parsing of Meanings">
|
|
<H4>English Parsing of Meanings</H4></A>
|
|
<BR>
|
|
|
|
|
|
<P>Puncuation in meanings is now formalized, in order
|
|
to allow computer processing of the text.
|
|
Diviation from these rules would make parsing
|
|
of the English very difficult, so they must be enforced.
|
|
There is nothing which will mislead the user,
|
|
but it goes beyond standard text practice.
|
|
|
|
|
|
<P>The semicolon separator has greater significance.
|
|
Various groups of meanings may have varying frequency or likelihood.
|
|
The most likely are placed first and thereby prioritized.
|
|
Within a semicolon group (SEMI) of meaning/synonyms separated by commas or slashes,
|
|
their probability is assumed to be the same.
|
|
Where possible, a PURE word (e.g., 'perhaps') should lead,
|
|
followed by compound meanings (e.g., 'it may be').
|
|
There is much work to do before this ordering is complete.
|
|
|
|
<P>Any PURE meaning (one not involving modifiers) set off by
|
|
commas or slashes, is assigned a high priority on output that
|
|
a modified/compound meaning in the same MEAN SEMI.
|
|
|
|
<P>Semicolons seperate meaning groups that have a different
|
|
flavor/sense.
|
|
Initially the interpretation and selection among these were left
|
|
to the user, as in paper dictionaries.
|
|
Recent requirements demand an ordering of these groups.
|
|
The order of the semicolon groups (called SEMIs in the code) should
|
|
indicate the frequency or probability of that meaning
|
|
among different groups, where this inferrence can be made.
|
|
This ideal is not yet rigorously enforced, even in recent entries,
|
|
and less so in those earlier in the update.
|
|
|
|
<P>Commas separate meanings that are roughly equivalent -
|
|
synomyns. In parsing, a COMMA consistes of the words between commas.
|
|
There is no inherent logical order within a SEMI, however,
|
|
to support another application for the dictionary,
|
|
full sentence Latin-to-English translation,
|
|
it is desirable to be able to pick a single,
|
|
simple, modern English word that is most likely to be the translation.
|
|
This should be the first word of the meaning.
|
|
|
|
<P>Question marks and exclamation points may appear as
|
|
an integral part of the meaning. They do not replace
|
|
the comma/semicolon separator, as in normal text.
|
|
|
|
<P>The soldius/slash (/) does the work of 'or' in many cases.
|
|
It is used solely to conserve space,
|
|
to compress the meaning line to no more than 80 characters.
|
|
It separates (generally close) synonyms and
|
|
also alternative options (jump up/out = jump up; jump out).
|
|
|
|
<P>Plus (+) is used in the dictionary, as well as in this documentation,
|
|
in place of ampersand, for compatibility with HTML.
|
|
It s a full separator, between two words, each recognized separately.
|
|
|
|
<P>Hyphen (-) should be is used in the dictionary only to break
|
|
into two words in the parse, each recognized separately.
|
|
Thus, book-keeper will appear in the English pharse as two words.
|
|
But it is likely that a user looking for an accountant would search
|
|
for bookkeeper, rather than book or keeper.
|
|
The dictionary has not yet been scrubbed for this situation.
|
|
|
|
<P>Parentheses set off both possible supporting words
|
|
(go (down) = go; go down) and explanatory information.
|
|
Since parenthesized words are excluded from the extraction process,
|
|
they are a way to further reduce clutter in the English dictionary.
|
|
Words in the meaning that should not find this entry when searched
|
|
can be excluded from the English dictionary tables by parenthesizing.
|
|
(NOTE: two sets of parentheses not separated by a comma or semicolon
|
|
can cause processing troubles and should be avoided.)
|
|
|
|
<P>Square brackets enclose translation examples or idioms,
|
|
a Latin expression to English equivalent.
|
|
The English translation of the Latin is introduced by =>.
|
|
The parser expects this (=>) token.
|
|
A bracketed expression is always
|
|
at the end of the meanings line so that it may be
|
|
extracted before spellchecking, otherwise the spellcheck
|
|
will fail on the Latin and there are an inconveniently large
|
|
number of these examples.
|
|
Brackets should never be use where parentheses are appropriate.
|
|
|
|
<P>Generally, articles (a, an, the) are omitted in meanings.
|
|
While this compresses the line, it also reflects the fact
|
|
that Latin does not distinguish between those uses.
|
|
To define agricola as 'a' farmer would disparage the possiblity
|
|
of the proper translation being 'the' farmer. Most dictionaries
|
|
report nouns without an article. This one go further and
|
|
avoids the use of articles almost everywhere.
|
|
|
|
<P>Some dictionaries prefix verb meanings with TO.
|
|
This is superfulous, except in the case of a list of meanings
|
|
not distinguished by part of speech (to cut, a cut), not
|
|
the situation for this dictionary.
|
|
|
|
<P>Vertical bars at the begining indicate continuation meaning lines.
|
|
There may be several continuation lines,
|
|
numbered/ordered by the number of leading vertical bars.
|
|
For words with a large number of meanings,
|
|
additional meaning lines are provided by another entry for the same stems
|
|
and part with what amounts to a continuation line for MEAN.
|
|
In order to associate the resulting series of meaning lines,
|
|
a vertical bar (|)is placed at the begining of the
|
|
first continuation MEAN, two bars for the second, etc.
|
|
The dictionary is sorted so as to assure that
|
|
these entries are grouped and ordered.
|
|
This allows checking of the dictionary for spurious duplicate entries
|
|
without flagging intended continuation entries.
|
|
Further it facilitates compression
|
|
of the WORDS output by combining the inflection output for the several
|
|
identical parts followed by the group of meaning lines.
|
|
The STEMS and PART are identical for the base (no |) and all extensions.
|
|
They are all the same word, however they may have different flags,
|
|
that is, there may be different meanings for different AGE or AREA.
|
|
|
|
<P>The bar is a code for MEAN continuation seen only in the raw DICTLINE.
|
|
Bars are removed before WORDS output and are not visible to the user.
|
|
There are also some entries with identical STEMS and PART which are
|
|
really different words,
|
|
different derivation and completely different meanings.
|
|
These will not be | coded and will be reported separately in output.
|
|
{NOTE: The vertical bar should not appear anywhere in meanings except
|
|
at the begining as a continuation flag.)
|
|
|
|
<P>Correct use of symbols/codes in MEAN is very important.
|
|
One must not use them 'free form'.
|
|
They are used in the parsing of MEAN and inproper use can defeat a processing program.
|
|
While some main programs have many built-in checks,
|
|
there are a number of secondary tools which are not so 'fool proof'.
|
|
MAKEEWDS is a complicated program which I did not do well.
|
|
If it hits something strange it might well fail to properly
|
|
parse that MEAN. The program will still complete and the
|
|
output will only lose a part of the strange MEAN, affecting only the English mode,
|
|
and may or may not not give a report on the failure.
|
|
|
|
|
|
<A NAME="Ordering English-to-Latin Output">
|
|
<H4>Ordering English-to-Latin Output</H4></A>
|
|
<BR>
|
|
|
|
|
|
<P>Essentially we start by associating English words in the dictionary entry meaning
|
|
with the entry number (line number in DICTLINE).
|
|
The list of English words (EWDSLIST) is sorted so that all occurances of a particular word are together.
|
|
Then, upon inquiry, a list of the associated Latin dictionary entries is output.
|
|
Unfortunately this list could be large (a hundred or more for some common words) and thereby user-unfriendly.
|
|
The task is to order the list and reduce the output to a few most likely
|
|
|
|
|
|
<P>Priorities for display are based on frequencies. Besides the basic
|
|
FREQ assigned to the entry, it is presumed that the frequency is
|
|
greater for those meanings in the first SEMIs, with gradually
|
|
decreasing frequency assigned to later SEMIs and to bar flagged continuations.
|
|
The algorithm presently used is summarized below,
|
|
but it is subject to modification in future versions.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
<P>Each English word found is given a numerical RANK/priority/weight based on the algorithm below.
|
|
The numerical values of each consideration are added or subtracted to give the priority of the entry.
|
|
|
|
|
|
|
|
<H5>FREQ</H5>
|
|
|
|
<P>The obvious choice for frequency weights might be the comparative paper dictionary citations,
|
|
which would be roughly:
|
|
|
|
<PRE><TT>
|
|
A=>50
|
|
B=>25
|
|
C=>10
|
|
D=> 5
|
|
E=> 3
|
|
F=> 1</TT></PRE>
|
|
|
|
|
|
<P>However these would weight the A frequency so heavily
|
|
that it would be impossible to overcome with anything
|
|
that could be applied to lower frequencies. So we must reject this scale
|
|
for a more managable set:
|
|
|
|
<PRE><TT>
|
|
A=>70
|
|
B=>60
|
|
C=>50
|
|
D=>40
|
|
E=>30
|
|
F=>20
|
|
etc.</TT></PRE>
|
|
|
|
|
|
<P>(N is special case, add 25 after formula)
|
|
|
|
<H5>Compounds</H5>
|
|
|
|
<P>Compound words ('very tall' vs. 'tall') are often useful,
|
|
indeed the user may be looking for components to make up a compound translation,
|
|
however generally they should be disparaged relative to the pure/simple word.
|
|
A compound A FREQ might be no better than a pure D.
|
|
<BR>
|
|
<BR>
|
|
Compound Yes=> 0<BR>
|
|
Compound No (Pure) => 10<BR>
|
|
<BR>
|
|
|
|
Which SEMI (a SEMI is a part of MEAN set off by semicolons)<BR>
|
|
<BR>
|
|
|
|
-3 per SEMI after 1
|
|
|
|
<P>Further, a word on a continuation line is disparaged by 3 SEMIs (-9).
|
|
|
|
|
|
<P>The words in the first SEMI are enhanced in the expectation that they are
|
|
the primary meaning. This follows the tenuous idea that there is a single simple
|
|
translation for each English word. At least the first SEMI is emphasized. <BR>
|
|
<BR>
|
|
|
|
If PURE and 1st SEMI => 5<BR>
|
|
|
|
<P>Priority = FREQ value + Compound value + SEMI value + Continuation value + First SEMI value <BR>
|
|
<BR>
|
|
|
|
|
|
Example: for lamp - lanum N 3 2 N
|
|
|
|
<PRE><TT>
|
|
FREQ A => 70
|
|
Compound No => 10
|
|
Semi 2 => -3
|
|
Continuattion Line No => 0
|
|
Pure 1st No => 0
|
|
RANK/priority => 77</TT></PRE>
|
|
|
|
|
|
|
|
|
|
|
|
<A NAME="TESTS AND STATUS">
|
|
<H3><CENTER>TESTS AND STATUS</CENTER>
|
|
</H3></A> <BR>
|
|
|
|
<A NAME="Testing">
|
|
<H4>Testing</H4></A>
|
|
|
|
<P>
|
|
The program has been run against common classical texts. Initially
|
|
this was mostly a check of the process and reliability of the program. It
|
|
is now possible to run real texts and get valid statistics. Relatively
|
|
few texts have been run multiple times in order to understand exactly
|
|
where failure occured and to regression test the solutions. Such testing
|
|
has taken place on texts totaling well over a million words. The best
|
|
results come from those which have been run the most times. Caesar and
|
|
the Vulgate are essentially without unknowns (excluding proper names.,
|
|
Seutonius and Virgil are at the 0.1% level, Varro and Pliny have somewhat
|
|
more than 1% unknowns due to their specialized vocabulary. While this is
|
|
a mechanical test and does not assure that the form and meaning reported
|
|
by the program is always correct, the actual number of misses found by
|
|
limited detailed examination is vanishingly small.
|
|
|
|
<P>
|
|
A far larger test (with feedback) has been made by John White in the development
|
|
of his Blitz Latin. While not using WORDS, he has a program from much the
|
|
same basis, incorporating approximately the WORDS dictionary. He has run
|
|
a much larger set of texts, including both classical and medieval, to the
|
|
extent of 20 million Latin words, and provided significant unknowns
|
|
to be included in WORDS.
|
|
|
|
<P>
|
|
The hardest test is against another dictionary. While getting a 97+% hit
|
|
rate on long classical texts, a run against a large dictionary might fall
|
|
to 85-90%, the missing words being in those letters which the update has
|
|
not reached. This is to be expected, since we both have the 10000 most
|
|
common words and have made somewhat different additions beyond that. So
|
|
large electronic wordlists are a check on the program, and have been reserved
|
|
for that purpose, not simply incorporated as such.
|
|
|
|
<P>We have gone so far that this is no longer significant and wordlists can be
|
|
integrated. The only real impact has been the inclusion of modern Latin words
|
|
which come from such lists, and not from scans of texts.
|
|
|
|
<BR>
|
|
<BR>
|
|
|
|
|
|
|
|
<H5>English-to-Latin Tests</H5>
|
|
|
|
<P>So far there have been no formal validation of the English-to-Latin capability.
|
|
There have been numerous individual checks and anacdotal testing, as well
|
|
as some mechnical performance tests, but nothing fundemental.
|
|
|
|
<P>The first test proposed is to take a small English-to-Latin dictionary,
|
|
say from the back of an introductory textbook, and check that the Latin
|
|
suggested for each entry is found in the top six returned by WORDS.
|
|
It is expected that there will be a high corespondence (to be shown).
|
|
Taking a much larger example may give a different result.
|
|
It may be that the Latin words chosen by WORDS are not the same as
|
|
the paper dictionary.
|
|
|
|
<BR><BR>
|
|
|
|
|
|
<A NAME="Current Status and Future Plans">
|
|
|
|
<H4>Current Status and Future Plans</H4></A>
|
|
<P>
|
|
The present phase of refinement has incorporated the Oxford Latin
|
|
Dictionary and Lewis and Short entries into <B>D</B> (about a fourth).
|
|
Periodically, when I need a change of task, I run a major author
|
|
to check the
|
|
effectiveness of the code. I may then include some words which turn up
|
|
frequently as unknowns, but this is done as the spirit moves me. Smaller
|
|
sections of later authors may also be processed, giving some growth in
|
|
medieval Latin entries. Recently I have worked the Vulgate of St. Jerome.
|
|
|
|
<P>John White in support of his Blitz Latin program has run a very large
|
|
body of Latin text, including much medieval legal documents. He provides
|
|
input to the dictionary as he finds significant unknowns.
|
|
|
|
<P>
|
|
I will continue to refine the dictionary and the program. The major goal
|
|
is to complete the inclusion of OLD and L+S, and this may take years.
|
|
Along the way, and later, I will expand to medieval Latin. I am not so
|
|
unrealistic as to believe that I will 'finish', indeed, this is a hobby
|
|
and there is no advantage to finishing.
|
|
<P>
|
|
An eventual outcome would be to have some institution, with real Latin
|
|
capability, provide an exhaustive and authoritative program of this
|
|
nature. Until then, I and other individuals will make available our
|
|
programs. <BR>
|
|
<BR><BR>
|
|
|
|
<A NAME="USER MODIFICATIONS">
|
|
<H3><CENTER>USER MODIFICATIONS</CENTER>
|
|
</H3></A>
|
|
<BR>
|
|
<A NAME="Writing DICT.LOC and UNIQUES.LAT">
|
|
<H4>Writing DICT.LOC and UNIQUES.LAT</H4></A>
|
|
<P>
|
|
To make the dictionary files used by the program is not difficult, but it
|
|
takes several auxiliary programs for checking and ordering which are best
|
|
handled by one center. These are available to anyone who needs them, but
|
|
it is better that any general additions to the dictionary be handled
|
|
centrally that they can be included in the public release for everyone.
|
|
<P>
|
|
However, it is possible for a user to enhance the dictionary for special
|
|
situations. This may be accomplished either by providing new dictionary
|
|
entries in a DICT.LOC file, those to be processed in the regular manner,
|
|
or to add a unique (single case/number/gender/...) in a text file called
|
|
UNIQUES. <A NAME="DICT.LOC">
|
|
|
|
<H4>DICT.LOC</H4></A>
|
|
<P>
|
|
A dictionary entry for WORDS (in the simplest, editable form as read in a
|
|
DICT.LOC) is
|
|
|
|
<PRE><TT>
|
|
aqu aqu
|
|
N 1 1 F T X X X X X
|
|
water;
|
|
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
For a noun there are two stems. The definition of STEM is inherent in
|
|
the coding of inflections in the program. Different grammars have
|
|
different definitions. There is no formal connection with any other
|
|
usage.
|
|
<P>
|
|
To these stems are applied, as appropriate, the endings
|
|
|
|
<PRE><TT>
|
|
S P
|
|
NOM a ae
|
|
GEN ae arum
|
|
DAT ae is
|
|
ACC am as
|
|
ABL a is
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Or rather, the input word is analyzed for possible endings, and when these
|
|
are subtracted a match is sought with the dictionary stems. A file
|
|
(INFLECTS.LAT) gives all the endings.
|
|
|
|
<P>
|
|
In this example, the first line
|
|
<PRE><TT>
|
|
aqu aqu
|
|
</TT></PRE>
|
|
contains the two noun stems for the word found in printed dictionaries as
|
|
|
|
<PRE><TT>
|
|
aqua, -ae
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
The second line
|
|
|
|
<PRE><TT>
|
|
N 1 1 F T X X X X X
|
|
</TT></PRE>
|
|
says it is a noun (N), of the first declension, first variant, is feminine
|
|
(F), and is a thing (T), as opposed to a person, location, etc. The X X X
|
|
X X represents coding about the age in which it is applicable, the
|
|
geographic and application area of the word, its frequency of use, and the
|
|
dictionary source of the entry. None of this is necessary in a DICT.LOC
|
|
although something must be filled in and X X X X X is always satisfactory.
|
|
|
|
<P>
|
|
The last line is the English definition. It can be as long as 80
|
|
characters.
|
|
|
|
<PRE><TT>
|
|
water;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
The case and exact spacing of the stems and codes is unimportant, as long
|
|
as they are separated by at least one blank.
|
|
<P>
|
|
The PART_OF_SPEECH_TYPE that you are most interested in are (X, N, ADJ,
|
|
ADV, V). X is always a valid entry. It stands for none, or all, or
|
|
unknown. 0 has the same function for numeric types.
|
|
<P>
|
|
The others in the type (PRON, PACK, VPAR, SUPINE, PREP, CONJ, INTERJ, NUM,
|
|
TACKON, PREFIX, SUFFIX) are either less interesting or artificial, used
|
|
only internally to the code.
|
|
<P>
|
|
A noun or a verb has a DECN_RECORD consisting of two small integers. The
|
|
first is the declension/conjugation, and the second is a variant within
|
|
that.
|
|
<P>
|
|
N 1 1 is the conventional first declension. But there are variants (6, 7,
|
|
8) which model Greek-line declensions. (Greek-like variant start at 6);
|
|
<P>
|
|
N 2 1 is the regular -us, -i second declension.
|
|
<P>
|
|
N 2 2 is the regular -um, -i neuter form.
|
|
<P>
|
|
There is a N 2 3 for 'r' forms like puer, pueri. In this case there is
|
|
the possibility of a difference in stems (ager, agri has stems coded as
|
|
ager, agr).
|
|
<P>
|
|
Again there are Greek-like variants (6, 7, 8, 9).
|
|
<P>
|
|
N 3 1 is regular third declension (lex, legis - lex, leg) for masculine
|
|
and feminine.
|
|
<P>
|
|
N 3 2 is for neuter (iter, itineris - iter, itiner).
|
|
<P>
|
|
Variants 3 and 4 are for I-stems. And so it goes.
|
|
<P>
|
|
Each noun has a GENDER_TYPE (X, M, F, N, C). X for unknown (something I
|
|
avoid for gender - guess if you have to) or all genders (useful in the
|
|
code but not in a dictionary), and C for common (M + F).
|
|
<P>
|
|
There is also a
|
|
|
|
<PRE><TT>
|
|
NOUN_KIND_TYPE (X, -- unknown, nondescript
|
|
N, -- proper Name
|
|
L, -- Locale, country, city
|
|
W, -- a place Where
|
|
P, -- a Person type
|
|
T) -- a Thing
|
|
</TT></PRE>
|
|
which you probably do not care about either. Most entries will all be
|
|
Thing.
|
|
<P>
|
|
Other codes are enumerated in the body of this document.
|
|
<P>
|
|
Verbs are done likewise, but there are four stems, as described below. An
|
|
example is
|
|
|
|
<PRE><TT>
|
|
am am amav amat
|
|
V 1 1 X X X X A O
|
|
love;
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Now comes the hard part. When starting from a dictionary one has all the
|
|
information to decide the values. Just having a single instance of the
|
|
word lacks a lot. Consider some examples from a user.
|
|
<P>
|
|
Elytris is surely from the Greek for sheath. The question is how
|
|
Latinized did it get. I suspect that by the 17th century it was
|
|
completely Latinized. Even in classical times there was very little left
|
|
in the way of Greek forms ( elythris (or -es), elythris (N 3 3) but it
|
|
could be a Greek-like form (N 3 9). I do not even know what case I
|
|
started with, if NOM, then it must be -is, -is, if GEN then -es, -is is
|
|
reasonable. Then again, if it is DAT P we might have a N 1 1.
|
|
<P>
|
|
All this seems very uncertain, and, in the absence of a real dictionary
|
|
entry, it is. However you can make the choices such that the result (the
|
|
output of the code) matches exactly what you have. If you have more
|
|
information, lots of examples, the uncertainty shrinks. If you have just
|
|
a single isolated example, there are limits. (But if you do 100 and have
|
|
more information about some, you can make better guesses about the rest.)
|
|
<P>
|
|
Next we need a gender. It may not make much difference (if M or F, or C)
|
|
in this case, but sometimes it matters. You might be able to figure that
|
|
out from the text.
|
|
<P>
|
|
It is a thing (T), but X will work for your purposes. For the rest, X X X
|
|
X X works fine.
|
|
<P>
|
|
So we have
|
|
|
|
<PRE><TT>
|
|
elythris elythr
|
|
N 3 3 F T X X X X X
|
|
elytra, wing cover of beetles
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
sat, I happen to know is an abbreviated form of satis, so it is easy. If
|
|
you want the adverb form, as you indicate:
|
|
|
|
<PRE><TT>
|
|
Sat
|
|
ADV POS X X X X X
|
|
sufficiently, adequately; quite, well enough; fairly, (moderately)
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Adverbs have a comparison parameter (X, POS, COMP, SUPER). Most will be
|
|
POS.
|
|
<P>
|
|
It also is an indeclinable (N 9 9) substantive:
|
|
|
|
<PRE><TT>
|
|
sat
|
|
N 9 9 N T X X X X X
|
|
enough, sufficient; enough and some to spare; one of sufficient power
|
|
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Deplanata seems to be a 1-2 declension adjective, the -us, -a, -um form.
|
|
It also seems to derived from the verb deplanto (V 1 1) - break off/sever
|
|
(branch/shoot).
|
|
|
|
<PRE><TT>
|
|
deplanat deplanat
|
|
ADJ 1 1 POS X X X X X
|
|
broken off/severed (branch/shoot); (flattened)
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
Adjectives have a DECN and a comparison.
|
|
<P>
|
|
The following were not at the time in the dictionary, but were in the OLD.
|
|
|
|
<PRE><TT>
|
|
alat alat
|
|
ADJ 1 1 POS X X X X X
|
|
winged, having wings; having a broad/expanded margin
|
|
|
|
|
|
(punct - ul - at -> hole/prick/puncture - small - having)
|
|
|
|
punctulat punctulat
|
|
ADJ 1 1 POS X X X X X
|
|
punctured; having small holes/pricks/stabs/punctures
|
|
|
|
appendiculat appendiculat
|
|
ADJ 1 1 POS X X X X X
|
|
appendiculate; having/fringed by small appendages/bodies
|
|
|
|
|
|
acetabul acetabul
|
|
N 2 2 N T X X X X X
|
|
small cup (vinegar), 1/8 pint; cupped part (plant); sucker; socket, (cavity)
|
|
|
|
|
|
ruf ruf
|
|
ADJ 1 1 POS X X X X X
|
|
red (various); tawny; red-haired (persons); strong yellow/moderate orange
|
|
|
|
|
|
testace testace
|
|
ADJ 1 1 POS X X X X X
|
|
bricks; resembling bricks (esp. color); having hard covering/shell (animals)
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
This one had no classical correspondence.
|
|
|
|
<PRE><TT>
|
|
brunne brunne
|
|
ADJ 1 1 POS X X X X X
|
|
brown
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
There is one other remark. It is probably wise to include in the
|
|
definition a more complete English meaning. Just saying the meaning of appendiculatus is
|
|
appendiculate is not as interesting as it might be.
|
|
<P>
|
|
All the inflections are in a file called INFLECTS.LAT now a part of the
|
|
general distribution of <A HREF="http://www.erols.com/whitaker/wordsall.zip">source code and data files</A>. <BR>
|
|
|
|
<P>
|
|
Here is a quick reference for the most common types.
|
|
|
|
<PRE><TT>
|
|
|
|
-- All first declension nouns - N 1 1
|
|
-- Ex: aqua aquae => aqu aqu
|
|
|
|
-- Second declension nouns in "us" - N 2 1
|
|
-- Ex: amicus amici => amic amic
|
|
|
|
-- Second declension neuter nouns - N 2 2
|
|
-- Ex: verbum verbi => verb verb
|
|
|
|
-- Second declension nouns in "er" whether of not the "er" in base - N 2 3
|
|
-- Ex; puer pueri => puer puer
|
|
-- Ex: ager agri => ager agr
|
|
|
|
-- Early (BC) 2nd declension nouns in ius/ium (not filius-like) - N 2 4
|
|
-- for the most part formed GEN S in 'i', not 'ii' -- G+L 33 R 1
|
|
-- Dictionaries often show as ...(i)i
|
|
-- N 2 4 uses GENDER discrimination to reduce to single VAR
|
|
-- Ex: radius rad(i)i => radi radi M
|
|
-- Ex: atrium atr(i)i => atri atri N
|
|
|
|
-- Third declension M or F nouns whose stems end in a consonant - N 3 1
|
|
-- Ex: miles militis => miles milit
|
|
-- Ex: lex legis => lex leg
|
|
-- Ex: frater fratris => frater fratr
|
|
-- Ex: soror sororis => soror soror
|
|
-- All third declension that have the endings -udo, -io, -tas, -x
|
|
-- Ex: pulcritudo pulcritudinis => plucritudo pulcritudin
|
|
-- Ex: legio legionis => legio legion
|
|
-- Ex: varietas varietatis => varietas varietat
|
|
-- Ex: radix radicis => radix radic
|
|
|
|
-- Third declension N nouns with stems ending in a consonant - N 3 2
|
|
-- Ex: nomen nomenis => nomen nomen
|
|
-- Ex: iter itineris => iter itiner
|
|
-- Ex: tempus temporis => tempus tempor
|
|
|
|
-- Third declension nouns I-stems (M + F) - N 3 3
|
|
-- Ex: hostis hostis => hostis host
|
|
-- Ex: finis finis => finis fin
|
|
-- Consonant i-stems
|
|
-- Ex: urbs urbis => urbs urb
|
|
-- Ex: mons montis => mons mont
|
|
-- Also use this for present participles (-ns) used as substantives in M + F
|
|
|
|
-- Third declension nouns I-stems (N) - N 3 4
|
|
-- Ex: mare amris => mare mar -- ending in "e"
|
|
-- Ex: animal animalis => animal animal -- ending in "al"
|
|
-- Ex: exemplar exemplaris => exemplar exemplar -- ending in "ar"
|
|
-- Also use this for present participles (-ns) used as substantives in N
|
|
|
|
-- Fourth declension nouns M + F in "us" - N 4 1
|
|
-- Ex: passus passus => pass pass
|
|
-- Ex: manus manus => man man
|
|
|
|
-- Fourth declension nouns N in "u" - N 4 2
|
|
-- Ex: genu genus => gen gen
|
|
-- Ex: cornu cornus => corn corn
|
|
|
|
-- All fifth declension nouns - N 5 1
|
|
-- Ex: dies diei => di di
|
|
-- Ex: res rei => r r
|
|
|
|
|
|
|
|
-- Adjectives will mostly only be POS and have only the first two stems
|
|
-- ADJ X have four stems, zzz stands for any unknown/non-existent stem
|
|
|
|
-- Adjectives of first and second declension (-us in NOM S M) - ADJ 1 1
|
|
-- Two stems for POS, third is for COMP, fourth for SUPER
|
|
-- Ex: malus mala malum => mal mal pei pessi
|
|
-- Ex: altus alta altum => alt alt alti altissi
|
|
|
|
-- Adjectives of first and second declension (-er) - ADJ 1 2
|
|
-- Ex: miser misera miserum => miser miser miseri miserri
|
|
-- Ex: sacer sacra sacrum => sacer sacr zzz sacerri -- no COMP
|
|
-- Ex: pulcher pulchri => pulcher pulchr pulchri pulcherri
|
|
|
|
-- Adjectives of third declension - one ending - ADJ 3 1
|
|
-- Ex: audax (gen) audacis => audax audac audaci audacissi
|
|
-- Ex: prudens prudentis => prudens prudent prudenti prudentissi
|
|
|
|
-- Adjectives of third declension - two endings - ADJ 3 2
|
|
-- Ex: brevis breve => brev brev brevi brevissi
|
|
-- Ex: facil facil => facil facil facili facilli
|
|
|
|
-- Adjectives of third declension - three endings - ADJ 3 3
|
|
-- Ex: celer celeris celere => celer celer celeri celerri
|
|
-- Ex: acer acris acre => acer acr acri acerri
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-- Verbs are mostly TRANS or INTRANS, but X works fine
|
|
-- Depondent verbs must have DEP
|
|
-- Verbs have four stems
|
|
-- The first stem is the first principal part (dictionary entry) - less 'o'
|
|
-- For 2nd decl, the 'e' is omitted, for 3rd decl i-stem, the 'i' is included
|
|
-- Third principal part always ends in 'i', this is omitted in stem
|
|
-- Fourth part in dictionary ends in -us (or -um), this is omitted
|
|
-- DEP verbs omit (have zzz) the third stem
|
|
|
|
-- Verbs of the first conjugation -- V 1 1
|
|
-- Ex: voco vocare vocavi vocatus => voc voc vocav vocat
|
|
-- Ex: porto portave portavi portatus => port port portav portat
|
|
|
|
-- Verbs of the second conjugation - V 2 1
|
|
-- The characteristic 'e' is in the inflection, not carried in the stem
|
|
-- Ex: moneo monere monui monitum => mon mon monu monit
|
|
-- Ex: habeo habere habui habitus => hab hab habu habit
|
|
-- Ex: deleo delere delevi deletus => del del delev delet
|
|
-- Ex: iubeo iubere iussi iussus => iub iub iuss iuss
|
|
-- Ex: video videre vidi visus => vid vid vid vis
|
|
|
|
-- Verbs of the third conjugation, variant 1 - V 3 1
|
|
-- Ex: rego regere rexi rectum => reg reg rex rect
|
|
-- Ex: pono ponere posui positus => pon pon posu posit
|
|
-- Ex: capio capere cepi captus => capi cap cep capt -- I-stem too w/KEY
|
|
|
|
-- Verbs of the fourth conjugation are coded as a variant of third - V 3 4
|
|
-- Ex: audio audire audivi auditus => audi aud audiv audit
|
|
|
|
-- Verbs like to be - coded as V 5 1
|
|
-- Ex: sum esse fui futurus => s . fu fut
|
|
-- Ex: adsum adesse adfui adfuturus => ads ad adfu adfut
|
|
|
|
</TT></PRE>
|
|
|
|
|
|
<A NAME="UNIQUES.LAT">
|
|
<H4>UNIQUES.LAT</H4></A>
|
|
<P>
|
|
There are a few Latin words that cannot be represented with the scheme of
|
|
stems and endings used by the program. For these very few cases, the
|
|
program invokes a unique procedure. The file UNIQUES. contains a list of
|
|
such words and is read in at the loading of the program. This is a simple
|
|
ASCII/DOS text file which the user can augment. It is expected that there
|
|
will be very few occasions to do so, indeed, the tendency has been that
|
|
better processing has allowed uniques to be removed. If a user finds an
|
|
important word that should be included, please communicate that to the
|
|
author.
|
|
<P>
|
|
The UNIQUES record is essentially the form as one might have it in output
|
|
if the word was processed normally. In addition there are some additional
|
|
fields that the program presently expects. While these could be
|
|
eliminated, it is convenient for the program not to make the UNIQUES a
|
|
special case. So a noun form
|
|
|
|
<PRE><TT>
|
|
N 3 1 ACC S F T
|
|
</TT></PRE>
|
|
is followed by two zeros and an X
|
|
|
|
<PRE><TT>
|
|
N 3 1 ACC S F T 0 0 X X X X B O
|
|
</TT></PRE>
|
|
and then the five X's or, more properly, the dictionary codes.
|
|
|
|
<PRE><TT>
|
|
N 3 1 ACC S F T 0 0 X X X X B O
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
These pro forma codes are absolutely necessary, but have no further
|
|
impact.
|
|
<P>
|
|
The program is written in Ada and uses Ada techniques. Ada is designed
|
|
for high reliability systems (there is no claim the WORDS was developed
|
|
with all the other safeguards that that implies!) as a consequence is
|
|
unforgiving. The exact form is required. If you want to be sloppy you
|
|
have to deliberately program that in.
|
|
<P>
|
|
The following examples, and an examination of the UNIQUES.LAT file, should
|
|
allow the user to insert any unique necessary.
|
|
|
|
<PRE><TT>
|
|
requiem
|
|
N 3 1 ACC S F T 0 0 X X X X B O
|
|
rest (from labor), respite; intermission, pause, break; amusement, hobby;
|
|
bobus
|
|
N 3 1 DAT P C T 0 0 X X X X C X
|
|
ox, bull; cow; cattle (pl.)
|
|
quicquid
|
|
PRON 1 6 NOM S N INDEF 0 0 X X X X B X
|
|
whatever, whatsoever; everything which; each one; each; everything; anything
|
|
mavis
|
|
V 6 2 PRES ACTIVE IND 2 S X 0 0 X X X X B X
|
|
prefer
|
|
cette
|
|
V 3 1 PRES ACTIVE IMP 2 P TRANS 0 0 X X X X B O
|
|
give/bring here!/hand over, come (now/here); tell/show us, out with it! behold!
|
|
</TT></PRE>
|
|
|
|
<P>
|
|
<BR>
|
|
<BR>
|
|
<A NAME="DEVELOPERS AND REHOSTING">
|
|
|
|
<H3><CENTER>DEVELOPERS AND REHOSTING</CENTER>
|
|
</H3></A>
|
|
|
|
|
|
<A NAME="Program source code and data">
|
|
|
|
<H4>Program source code and data</H4></A>
|
|
<P>
|
|
The program is written in Ada, and is machine independent. Ada source
|
|
code is available for compiling onto other machines. <BR>
|
|
<BR>
|
|
|
|
<A NAME="Licence">
|
|
<H4>Licence</H4>
|
|
<P>
|
|
<B>All parts of the WORDS system, source code and data files, are made freely
|
|
available to anyone who wishes to use them, for whatever purpose.</B><BR>
|
|
<BR>
|
|
|
|
<A NAME="Rehosting WORDS">
|
|
<H4>Rehosting WORDS</H4>
|
|
<P>
|
|
There is a <A HREF="wordsall.zip"><B>wordsall.zip</B></A>
|
|
zip of all the Ada source files to port WORDS, and
|
|
support programs and data to generate the necessary dictionaries and
|
|
inflections for re-hosting the WORDS Latin dictionary
|
|
parsing/translation system on any machine with an Ada 95 compiler. (It
|
|
can be made to work with Ada 83 also by replacing just tha short driver routine.)
|
|
<P>
|
|
This a console program (keyboard entry), without fancy Windows GUI, and has
|
|
thereby been made system independent.
|
|
<P>
|
|
wordsall contains the Ada source files for WORDS, and complete details for rehosting
|
|
summarized below:
|
|
<P>
|
|
Ada source files for the WORDS system are:
|
|
<PRE>
|
|
strings_package.ads
|
|
strings_package.adb
|
|
latin_file_names.ads
|
|
latin_file_names.adb
|
|
config.ads
|
|
preface.ads
|
|
word_parameters.ads
|
|
developer_parameters.ads
|
|
preface.adb
|
|
put_stat.adb
|
|
word_parameters.adb
|
|
inflections_package.adb
|
|
inflections_package.ads
|
|
dictionary_package.ads
|
|
dictionary_package.adb
|
|
addons_package.ads
|
|
addons_package.adb
|
|
uniques_package.ads
|
|
word_support_package.ads
|
|
word_support_package.adb
|
|
english_support package.ads
|
|
english_support package.adb
|
|
word_package.ads
|
|
line_stuff.ads
|
|
line_stuff.adb
|
|
developer_parameters.adb
|
|
tricks_package.ads
|
|
word_package.adb
|
|
tricks_package.adb
|
|
list_package.ads
|
|
list_sweep.adb
|
|
dictionary_form.adb
|
|
search_english.adb
|
|
put_example_line.adb
|
|
list_package.adb
|
|
parse.adb
|
|
words.adb
|
|
</PRE>
|
|
|
|
<P>
|
|
There are four supporting programs
|
|
|
|
<PRE>
|
|
makedict.adb
|
|
makestem.adb
|
|
makeinfl.adb
|
|
makeefil.adb
|
|
</PRE>
|
|
|
|
<P>
|
|
and DOS ASCII data files for them to act upon to produce WORDS data files
|
|
|
|
<PRE>
|
|
DICTLINE.GEN
|
|
STEMLIST.GEN
|
|
EWDSLIST.GEN
|
|
INFLECTS.LAT
|
|
</PRE>
|
|
|
|
<P>
|
|
and other WORDS DOS ASCII supporting files
|
|
|
|
<PRE>
|
|
ADDONS.LAT
|
|
UNIQUES.LAT
|
|
</PRE>
|
|
|
|
<P>
|
|
<P>
|
|
The process is to download the 197fall.zip and unzip into a
|
|
subdirectory. (If the zip form is unsuitable for your system, I can
|
|
provide the files in an uncompressed form.) The wordy file names are for
|
|
compliance with the restrictions of the GNAT system. They may be renamed,
|
|
and I can provide an alternative. However, the long file names demand an
|
|
UNZIP that preserves them, if GNAT is to be used.
|
|
|
|
<P>For example, in a GNAT
|
|
environment (-O3 optimizes if your system supports it):
|
|
|
|
<PRE>
|
|
gnatmake -O3 words
|
|
gnatmake makedict
|
|
gnatmake makestem
|
|
gnatmake ewdsefil
|
|
gnatmake makeinfl
|
|
</PRE>
|
|
|
|
<P>
|
|
This produces executables for WORDS, MAKEDICT, MAKESTEM, MAKEEFIL, and MAKEINFL.
|
|
Executing the latter four against the input respectively of
|
|
|
|
<PRE>
|
|
DICTLINE.GEN
|
|
STEMLIST.GEN
|
|
EWDSLIST.GEN
|
|
INFLECTS.LAT
|
|
</PRE>
|
|
|
|
<P>
|
|
(when they ask for DICTIONARY say G) producing
|
|
|
|
<PRE>
|
|
DICTFILE.GEN
|
|
STEMFILE.GEN
|
|
INDXFILE.GEN
|
|
EWDSFILE.GEN
|
|
INFLECTS.SEC
|
|
</PRE>
|
|
|
|
<P>
|
|
Along with ADDONS.LAT and UNIQUES.LAT, this is the set of data for WORDS.
|
|
<P>
|
|
The only problem that has appeared on porting so far is that one must be
|
|
careful of file names. Problems sometimes turn up but have been easily
|
|
rectified by inspection. All of my systems are case-independent on file
|
|
names. If one is running in a case-dependent system (UNIX), this is a
|
|
point to check. Note that the data files are capitalized, source files
|
|
are not.
|
|
|
|
|
|
<P>The source is in Ada and therefore very readable, which is not claimed for the
|
|
logic which is my. not Ada's, fault. The source and data are freely available
|
|
for anyone to use for any purpose. It may be converted to other
|
|
languages, used in pieces, or modified in any way without further permission
|
|
or notification.
|
|
|
|
<P>There is one oddity that the reader may remark upon. The code is loaded
|
|
with PUT/print statements which are now commented out. These were used at
|
|
some time for debug purposes and were just left in. They (mostly) are left
|
|
justified and may fairly easily be removed for a cleaner presentation.
|
|
Further there are many blocks of code which during development have been
|
|
moved or removed, but have in their previous place been left commented.
|
|
This is also messy. I cannot really justify not having fixed this, but there it is.
|
|
|
|
|
|
<A NAME="Feedback">
|
|
<H4>Feedback</H4>
|
|
<P>
|
|
Feedback is invited. If there is a problem in installing or operating, in
|
|
the results or their display, or if your favorite word is omitted from the
|
|
dictionary, please let me know.
|
|
<P>
|
|
All comments are appreciated. Check back for new version releases at<BR>
|
|
<BR>
|
|
<A HREF="http://www.erols.com/whitaker/words.htm">
|
|
http://www.erols.com/whitaker/words.htm</A>
|
|
<P>
|
|
Contact e-mail <A HREF="mailto:whitaker@erols.com"> <B>whitaker@erols.com</B></A>,
|
|
<P>
|
|
or William Whitaker, PO Box 51225 Midland TX 79710 USA. <BR>
|
|
|
|
|
|
</BODY>
|
|
</HTML>
|