Implementing Synonyms for ERG Predicates

Next: Building Scope-Resolved Trees From MRS

Previous: Understanding the Output of the ERG: the Minimal Recursion Semantics Format

Implementing each predicate in Prolog took time and so I wanted to make sure the engine took advantage of synonyms reasonably well. For example, after implementing d_get_v_1__exx for phrases like “get the diamond”, I wanted to make sure I’d get “[“acquire”, “take”, “collect”, “pick up”, “gather up”, “get hold”, “obtain”, “retrieve”, “fetch”, “receive”, “capture”, “seize”, “prehend”, “clutch”, “grab”] the diamond” for free. It wasn’t as simple as doing autocorrect of “take the diamond” to “get the diamond” because the word “take” might sometimes be used in a sense that was not synonymous with my implementation of d_get_v_1__exx.

Luckily, the ERG provides a lot of data in the name and arguments of a predicate. For example, the ERG predicate d_get_v_1__exx indicates that it:

has the lemma get
is a verb because of the _v_ in the name
is used in a particular way because of the signature exx

In practice this seemed to be enough to do great job of picking synonyms. So, synonyms are applied after we have the MRS, not before (as autocorrect is). Here’s the approach I used:

Let’s say the user typed “take the diamond” and, let’s say, the MRS predicate d_take_v_1__exx was not implemented. The algorithm:

Finds any implemented predicates that list the unimplemented lemma (e.g. take) as a synonym. In this case d_get_v_1__exx does.
Checks to make sure the part of speech (e.g. _v_) matches on both
Makes sure their argument signatures are the same (e.g. exx)

If all of that worked, then the predicate name (only) was replaced, as is, in the MRS. In this case:

"take the diamond"

[ TOP: h0
INDEX: e2
RELS: < [ pronoun_q__xhh LBL: h4 ARG0: x3 [ x PERS: 2 PT: zero ] RSTR: h5 BODY: h6 ]
[ pron__x LBL: h7 ARG0: x3 [ x PERS: 2 PT: zero ] ]
[ _the_q__xhh LBL: h9 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ]
[ _diamond_n_1__x LBL: h12 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] ]
[ _take_v_1__exx LBL: h1 ARG0: e2 [ e SF: comm TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x8 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h10 qeq h12 > ]

was transformed to:

[ TOP: h0
INDEX: e2
RELS: < [ pronoun_q__xhh LBL: h4 ARG0: x3 [ x PERS: 2 PT: zero ] RSTR: h5 BODY: h6 ]
[ pron__x LBL: h7 ARG0: x3 [ x PERS: 2 PT: zero ] ]
[ _the_q__xhh LBL: h9 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ]
[ _diamond_n_1__x LBL: h12 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] ]
[ _get_v_1__exx LBL: h1 ARG0: e2 [ e SF: comm TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x8 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h10 qeq h12 > ]

With the only change being that one name. This works because the signature (i.e. the set of args) used by a term really says a lot about its structural usage. That, plus the part of speech match, goes a long way towards ensuring the right synonym is used.

Next: Building Scope-Resolved Trees From MRS

Previous: Understanding the Output of the ERG: the Minimal Recursion Semantics Format