Implementing Synonyms for ERG Predicates
Implementing each predicate in Prolog took time and so I wanted to make sure the engine took advantage of synonyms reasonably well. For example, after implementing d_get_v_1__exx
for phrases like “get the diamond”, I wanted to make sure I’d get “[“acquire”, “take”, “collect”, “pick up”, “gather up”, “get hold”, “obtain”, “retrieve”, “fetch”, “receive”, “capture”, “seize”, “prehend”, “clutch”, “grab”] the diamond” for free. It wasn’t as simple as doing autocorrect of “take the diamond” to “get the diamond” because the word “take” might sometimes be used in a sense that was not synonymous with my implementation of d_get_v_1__exx
.
Luckily, the ERG provides a lot of data in the name and arguments of a predicate. For example, the ERG predicate d_get_v_1__exx
indicates that it:
- has the lemma
get
- is a verb because of the
_v_
in the name - is used in a particular way because of the signature
exx
In practice this seemed to be enough to do great job of picking synonyms. So, synonyms are applied after we have the MRS, not before (as autocorrect is). Here’s the approach I used:
Let’s say the user typed “take the diamond” and, let’s say, the MRS predicate d_take_v_1__exx
was not implemented. The algorithm:
- Finds any implemented predicates that list the unimplemented lemma (e.g.
take
) as a synonym. In this cased_get_v_1__exx
does. - Checks to make sure the part of speech (e.g.
_v_
) matches on both - Makes sure their argument signatures are the same (e.g.
exx
)
If all of that worked, then the predicate name (only) was replaced, as is, in the MRS. In this case:
"take the diamond"
[ TOP: h0
INDEX: e2
RELS: < [ pronoun_q__xhh LBL: h4 ARG0: x3 [ x PERS: 2 PT: zero ] RSTR: h5 BODY: h6 ]
[ pron__x LBL: h7 ARG0: x3 [ x PERS: 2 PT: zero ] ]
[ _the_q__xhh LBL: h9 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ]
[ _diamond_n_1__x LBL: h12 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] ]
[ _take_v_1__exx LBL: h1 ARG0: e2 [ e SF: comm TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x8 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h10 qeq h12 > ]
was transformed to:
[ TOP: h0
INDEX: e2
RELS: < [ pronoun_q__xhh LBL: h4 ARG0: x3 [ x PERS: 2 PT: zero ] RSTR: h5 BODY: h6 ]
[ pron__x LBL: h7 ARG0: x3 [ x PERS: 2 PT: zero ] ]
[ _the_q__xhh LBL: h9 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ]
[ _diamond_n_1__x LBL: h12 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] ]
[ _get_v_1__exx LBL: h1 ARG0: e2 [ e SF: comm TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x8 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h10 qeq h12 > ]
With the only change being that one name. This works because the signature (i.e. the set of args) used by a term really says a lot about its structural usage. That, plus the part of speech match, goes a long way towards ensuring the right synonym is used.