Table of Contents
morphy - discussion of WordNet's morphological processing
Although only base forms of words are usually stored in WordNet, searches
may be done on inflected forms. A set of morphology functions, Morphy,
is applied to the search string to generate a form that is present in
WordNet.
Morphology in WordNet uses two types of processes to try to convert
the string passed into one that can be found in the WordNet database. There
are lists of inflectional endings, based on syntactic category, that can
be detached from individual words in an attempt to find a form of the
word that is in WordNet. There are also exception list files, one for
each syntactic category, in which a search for an inflected form is done.
Morphy tries to use these two processes in an intelligent manner to translate
the string passed to the base form found in WordNet. Morphy first checks
for exceptions, then uses the rules of detachment. The Morphy functions
are not independent from WordNet. After each transformation, WordNet is
searched for the resulting string in the syntactic category specified.
The Morphy functions are passed a string and a syntactic category. A
string is either a single word or a collocation. Since some words, such
as axes can have more than one base form (axe and axis ), Morphy works
in the following manner. The first time that Morphy is called with a specific
string, it returns a base form. For each subsequent call to Morphy made
with a NULL
string argument, Morphy returns another base form. Whenever
Morphy cannot perform a transformation, whether on the first call for
a word or subsequent calls, NULL
is returned. A transformation to a
valid English string will return NULL
if the base form of the string
is not in WordNet.
The morphological functions are found in the WordNet
library. See morph(3WN)
for information on using these functions.
The following table shows the rules of detachment used by
Morphy. If a word ends with one of the suffixes, it is stripped from the
word and the corresponding ending is added. Then WordNet is searched for
the resulting string. No rules are applicable to adverbs.
| POS | Suffix | Ending |
| NOUN | "s" | "" |
| NOUN | "ses" | "s" |
| NOUN | "xes" | "x" |
| NOUN | "zes" | "z" |
| NOUN | "ches" | "ch" |
| NOUN | "shes" | "sh" |
| VERB
| "s" | "" |
| VERB | "ies" | "y" |
| VERB | "es" | "e" |
| VERB | "es" | "" |
| VERB
| "ed" | "e" |
| VERB | "ed" | "" |
| VERB | "ing" | "e" |
| VERB | "ing" | "" |
| ADJ | "er" | "" |
| ADJ | "est" | "" |
| ADJ | "er" | "e" |
| ADJ | "est" | "e"
|
There is one exception list file for each syntactic
category. The exception lists contain the morphological transformations
for strings that are not regular and therefore cannot be processed in
an algorithmic manner. Each line of an exception list contains an inflected
form of a word or collocation, followed by one or more base forms. The
list is kept in alphabetical order and a binary search is used to find
words in these lists. See wndb(5WN)
for information on the format of the
exception list files.
In general, single words are relatively
easy to process. Morphy first looks for the word in the exception list.
If it is found the first base form is returned. Subsequent calls with
a NULL
argument return additional base forms, if present. A NULL
is
returned when there are no more base forms of the word.
If the word is
not found in the exception list corresponding to the syntactic category,
an algorithmic process using the rules of detachment looks for a matching
suffix. If a matching suffix is found, a corresponding ending is applied
(sometimes this ending is a NULL
string, so in effect the suffix is
removed from the word), and WordNet is consulted to see if the resulting
word is found in the desired part of speech.
As opposed to
single words, collocations can be quite difficult to transform into a
base form that is present in WordNet. In general, only base forms of words,
even those comprising collocations, are stored in WordNet, such as
attorney general
. Transforming the collocation attorneys general is then simply a matter
of finding the base forms of the individual words comprising the collocation.
This usually works for nouns, therefore non-conforming nouns, such as
customs duty are presently entered in the noun exception list.
Verb collocations that
contain prepositions, such as ask for it, are more difficult. As with
single words, the exception list is searched first. If the collocation
is not found, special code in Morphy determines whether a verb collocation
includes a preposition. If it does, a function is called to try to find
the base form in the following manner. It is assumed that the first word
in the collocation is a verb and that the last word is a noun. The algorithm
then builds a search string with the base forms of the verb and noun,
leaving the remainder of the collocation (usually just the preposition,
but more words may be involved) in the middle. For example, passed
asking for it, the database search would be performed with
ask for it, which is found
in WordNet, and therefore returned from Morphy. If a verb collocation
does not contain a preposition, then the base form of each word in the
collocation is found and WordNet is searched for the resulting string.
Hyphenation also presents special difficulties when searching
WordNet. It is often a subjective decision as to whether a word is hyphenated,
joined as one word, or is a collocation of several words, and which of
the various forms are entered into WordNet. When Morphy breaks a string
into "words", it looks for both spaces and hyphens as delimiters. It also
looks for periods in strings and removes them if an exact match is not
found. A search for an abbreviation like oct. return the
synset for { October, Oct }. Not
every pattern of hyphenated and collocated string is searched for
properly, so it may be advantageous to specify several search strings
if the results of a search attempt seem incomplete.
Since many noun
collocations contains prepositions, such as line of products, an algorithm
similar to that used for verbs should be written for nouns. In the present
scheme, if Morphy is passed lines of products, the search string becomes
line of product, which is not in WordNet
- WNHOME
- Base directory for WordNet. Unix default is /usr/local/wordnet1.6 , PC
default is C:\wn16 , Macintosh default is : .
- WNSEARCHDIR
- Directory in
which the WordNet database has been installed. Unix default is WNHOME/dict
, PC default is WNHOME\dict , Macintosh default is :Database .
In
directory WNSEARCHDIR :
- pos.exc
- morphology exception lists
wn(1WN)
, wnb(1WN)
, binsrch(3WN)
, morph(3WN)
, wndb(5WN)
.
Table of Contents