Building Scope-Resolved Trees From MRS

Next: Conversion From Scope-Resolved Tree to Prolog

Previous: Implementing Synonyms for ERG Predicates

To understand this section, first make sure you have a basic understanding of the MRS format. The original paper describing it is here.

Let’s use the sentence “every book is in a cave” as an example. To understand the meaning of a phrase in the MRS, you eventually need to end up with a tree where each node is an MRS predicate like _every_q__xhh and the branches of the tree are links between the arguments of the predicate and another node like this:

                       ┌_book_n_of__xi:x3,i8
 _every_q__xhh:x3,h5,h6┤
                       │                    ┌_cave_n_1__x:x9
                       └_a_q__xhh:x9,h11,h12┤
                                            └_in_p_loc__exx:e2,x3,x9

This tree represents one interpretation of “every book is in a cave”, namely, “every book is in a (possibly different) cave”. But this is only one interpretation, another is: “all books are in the same exact cave”.

A key feature of the MRS structure as is that it is underspecified. This means that, even though there are multiple ways to interpret the phrase, they can all be represented by a single MRS structure:

MRS for `Every book is in a cave`:

[ TOP: h0
INDEX: e2
RELS: < 
[ _a_q__xhh LBL: h10 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ]
[ _cave_n_1__x LBL: h13 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] ]
[ _every_q__xhh LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ _book_n_of__xi LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ARG1: i8 ]
[ _in_p_loc__exx LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]

This is because it avoids building a single tree which would lock you into one interpretation. Instead, it leaves “holes” in various arguments in the MRS predicates and provides constraints (the HCONS) for plugging the predicates together “legally’. If you combine the predicates and follow the contraints you’ll end up with a “scope-resolved” MRS which defines a valid “interpretation” of the sentence.

This interpretation is what we need in order to execute the sentence in a logic engine. This section describes how to derive it.

Holes and Constraints

“Holes” are h arguments that refer to a predicate label (indicated by LBL: in the MRS) that is not defined. In the above MRS h0 (the TOP:), h11, h12, h5, and h6 are all holes since none of the predicates use those labels as their LBL:.

The HCONS section of the MRS puts CONStraints on where the Handles can be put and still end up with a legal interpretation of the phrase.

All I’ve ever seen here are qeq constraints. A qeq constraint always relates a hole to a (non-hole) handle and says that the handle must be a direct or eventual child in the tree and, if not direct, the only things between the hole and the handle can be quantifiers. Said a different way:

A qeq constraint of h0 qeq h1 (as in the above example) says that the direct path from h0 to h1 must only contain quantifier predicates, but can contain as many as you want as long as they don’t violate other constraints.

So, in this example, h1 is the LBL of the _in_p_loc__exx predicate. Given the qeq constraint of h0 qeq h1 it would be perfectly valid to say that h0 = h1 since the path from h0 to h1 is direct.

You could also say that h0 = h4 since LBL: h4 _every_q__xhh... is a quantifier, followed by h6 = h1 (h6 is an argument of _every_q__xhh) since the path from h0 to h1 only includes one quantifier and h1 itself.

Once you fill all the holes and you follow all of the qeq constraints, you’ll end up with a tree that might be valid. There is one more set of constraints to check.

X Variable Scoping

All of the arguments that aren’t handles in the MRS for Every book is in a cave except two (e2 and i8) are x variables:

[ TOP: h0
INDEX: e2
RELS: < 
[ _a_q__xhh LBL: h10 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ]
[ _cave_n_1__x LBL: h13 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] ]
[ _every_q__xhh LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ _book_n_of__xi LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ARG1: i8 ]
[ _in_p_loc__exx LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]

The rules for MRS say that any variable in the MRS is “global” (or existentially qualified in logic terms) to the whole structure except for x variables. So, both e2 and i8 don’t need any special handling, they are globally defined.

x variables, on the other hand, can only be declared by quantifiers, and are only in scope to the branches of the tree that are attached to their arguments: RSTR and BODY.

So, while the predicates can be in any order in the tree with respect to their e (or i or u if it had them) arguments, the tree must also be checked to make sure all of the x arguments have a parent which is a quantifier which introduces them (i.e. has the x variable as its ARG0). This is an additional constraint that has to be checked to build a valid tree.

If the built tree passes all qeq arguments, and the x variables are all properly scoped, then it is a *scope-resolved” tree that we can now execute in a logic engine. That’s what we’re going for here.

Resolving the tree

Finding ways to efficiently create these trees is an area of active research because natural language can easily create MRS structures that have a ton of holes. N holes, in the worst case, can require N! checks to resolve if done exhaustively. So, an MRS structure with 12 holes (which is easy to generate) could require about 480,000,000 checks.

My first implementation simply generated all possible combinations, did the qeq and x scoping checks, and threw away invalid trees. It was definitely not efficient enough to use for the prototype. I needed a better approach.

I landed on the following algorithm is able to prune the search space and works much faster. After doing some non-scientific testing, the Python implementation can usually handle 12 holes in around 1.5s (with some outliers being slower) on a MacBook. My prototype doesn’t attempt to understanding anything larger than 12 holes because it just takes too long. Something like “put the diamond on the table where the safe is and the book is” generates MRS structures with up to 14 holes and could take up to 30 seconds to generate all the valid interpretations (1500+ valid interpretations in some cases!) for each MRS. Because it scales factorially, things go south fast after 12 holes.

There are definitely more efficient approaches but the algorithm below has the advantage of being very simple. Here is one alternative. There are definitely more.

A Simple, Fast Enough, Algorithm

First some definitions:

Hole: A scopal (i.e. h type) argument in an MRS predicate that has not yet been assigned a predication
Floater: A tree of predications that have had their scopal (i.e. h type) arguments filled by any predications that are initially assigned by the MRS. [This is not at official MRS term, it is one I made up for this algorithm]

As a reminder, a tree is “scope-resolved” if:

Each floater is assigned to one, and only one, hole. No holes
or floaters are left at the end
None of the assignments of floaters to holes violates a qeq constraint
Any variable introduced by a quantifier is not used outside of
the branches assigned to its RSTR or Body arguments

Here’s the intuition for how the algorithm works: We are going to walk a search tree. Every node of the tree represents a partial assignment of floaters to holes that meets the above 3 constraints. Every arc from a parent node to a child node represents a single new assignment of a (currently unassigned) floater represented by the arc to a (currently empty) hole in the parent node. If that assignment violates a constraint, the node is not valid and we stop searching that whole branch of the tree (this pruning is what makes it faster than the really naive “try every option” approach). Every node that has no holes left to assign is a solution.

Algorithm Flow: We start at the TOP: hole and record any qeq constraints that apply to it and any X variables that are in scope for it. As we traverse an arc and assign a new floater to a hole, we propagate any constraints and in-scope variables from the (parent) hole to the holes in the (child) floater. Then we recurse.

Start with:

allHolesDict: Dictionary containing all the holes in the MRS. Each hole has information about: - the qeq constraints it inherits - the X variables that are in scope - the floater it is from
nodeAssignmentList: Assignments of floaters to holes for this node. Initially empty.
nodeRemainingHolesList: Holes left to fill in this node. Initially filled only with the TOP: hole
nodeRemainingFloatersList: Floaters unassigned so far. Initially contains all floaters. Each floater contains information about: - a list of holes they contain - a list of unresolved x variables they contain - a list of any Lo parts of a qeq constraint they contain (if they don’t also have the Hi part in the floater)

Algorithm:

Get currentHole by removing the first hole from nodeRemainingHolesList
Get currentFloater by removing each floater from nodeRemainingFloatersList and:
- If currentFloater does not violate the constraints in currentHole:
  - Add currentHole = currentFloater to nodeAssignmentList
  - Propagate the constraints and variables from the new parent to all holes in currentFloater
  - Add holes from currentFloater to the end of nodeRemainingHolesList
  - Check number of holes left:
    - if == 0, return nodeAssignmentList as a solution
    - otherwise recurse and continue the search

Once this has run its course you will have all the valid scope-resolved trees for the MRS.

Next: Conversion From Scope-Resolved Tree to Prolog

Previous: Implementing Synonyms for ERG Predicates