Let’s use the sentence “every book is in a cave” as an example. To understand the meaning of a phrase in the MRS, you eventually need to end up with a tree where each node is an MRS predicate like
_every_q__xhh and the branches of the tree are links between the arguments of the predicate and another node like this:
┌_book_n_of__xi:x3,i8 _every_q__xhh:x3,h5,h6┤ │ ┌_cave_n_1__x:x9 └_a_q__xhh:x9,h11,h12┤ └_in_p_loc__exx:e2,x3,x9
This tree represents one interpretation of “every book is in a cave”, namely, “every book is in a (possibly different) cave”. But this is only one interpretation, another is: “all books are in the same exact cave”.
A key feature of the MRS structure as is that it is underspecified. This means that, even though there are multiple ways to interpret the phrase, they can all be represented by a single MRS structure:
MRS for `Every book is in a cave`: [ TOP: h0 INDEX: e2 RELS: < [ _a_q__xhh LBL: h10 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ] [ _cave_n_1__x LBL: h13 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] ] [ _every_q__xhh LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ] [ _book_n_of__xi LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ARG1: i8 ] [ _in_p_loc__exx LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ] > HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]
This is because it avoids building a single tree which would lock you into one interpretation. Instead, it leaves “holes” in various arguments in the MRS predicates and provides constraints (the HCONS) for plugging the predicates together “legally’. If you combine the predicates and follow the contraints you’ll end up with a “scope-resolved” MRS which defines a valid “interpretation” of the sentence.
This interpretation is what we need in order to execute the sentence in a logic engine. This section describes how to derive it.
Holes and Constraints
h arguments that refer to a predicate label (indicated by
LBL: in the MRS) that is not defined. In the above MRS
h6 are all holes since none of the predicates use those labels as their
HCONS section of the MRS puts CONStraints on where the Handles can be put and still end up with a legal interpretation of the phrase.
All I’ve ever seen here are
qeq constraints. A
qeq constraint always relates a hole to a (non-hole) handle and says that the handle must be a direct or eventual child in the tree and, if not direct, the only things between the hole and the handle can be quantifiers. Said a different way:
A qeq constraint of
h0 qeq h1(as in the above example) says that the direct path from
h1must only contain quantifier predicates, but can contain as many as you want as long as they don’t violate other constraints.
So, in this example,
h1 is the LBL of the
_in_p_loc__exx predicate. Given the qeq constraint of
h0 qeq h1 it would be perfectly valid to say that
h0 = h1 since the path from
h1 is direct.
You could also say that
h0 = h4 since
LBL: h4 _every_q__xhh... is a quantifier, followed by
h6 = h1 (
h6 is an argument of
_every_q__xhh) since the path from
h1 only includes one quantifier and
Once you fill all the holes and you follow all of the
qeq constraints, you’ll end up with a tree that might be valid. There is one more set of constraints to check.
X Variable Scoping
All of the arguments that aren’t handles in the MRS for
Every book is in a cave except two (
[ TOP: h0 INDEX: e2 RELS: < [ _a_q__xhh LBL: h10 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ] [ _cave_n_1__x LBL: h13 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] ] [ _every_q__xhh LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ] [ _book_n_of__xi LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ARG1: i8 ] [ _in_p_loc__exx LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ] > HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]
The rules for MRS say that any variable in the MRS is “global” (or existentially qualified in logic terms) to the whole structure except for
x variables. So, both
i8 don’t need any special handling, they are globally defined.
x variables, on the other hand, can only be declared by quantifiers, and are only in scope to the branches of the tree that are attached to their arguments:
So, while the predicates can be in any order in the tree with respect to their
u if it had them) arguments, the tree must also be checked to make sure all of the
x arguments have a parent which is a quantifier which introduces them (i.e. has the
x variable as its ARG0). This is an additional constraint that has to be checked to build a valid tree.
If the built tree passes all
qeq arguments, and the
x variables are all properly scoped, then it is a *scope-resolved” tree that we can now execute in a logic engine. That’s what we’re going for here.
Resolving the tree
Finding ways to efficiently create these trees is an area of active research because natural language can easily create MRS structures that have a ton of holes. N holes, in the worst case, can require N! checks to resolve if done exhaustively. So, an MRS structure with 12 holes (which is easy to generate) could require about 480,000,000 checks.
My first implementation simply generated all possible combinations, did the
x scoping checks, and threw away invalid trees. It was definitely not efficient enough to use for the prototype. I needed a better approach.
I landed on the following algorithm is able to prune the search space and works much faster. After doing some non-scientific testing, the Python implementation can usually handle 12 holes in around 1.5s (with some outliers being slower) on a MacBook. My prototype doesn’t attempt to understanding anything larger than 12 holes because it just takes too long. Something like “put the diamond on the table where the safe is and the book is” generates MRS structures with up to 14 holes and could take up to 30 seconds to generate all the valid interpretations (1500+ valid interpretations in some cases!) for each MRS. Because it scales factorially, things go south fast after 12 holes.
There are definitely more efficient approaches but the algorithm below has the advantage of being very simple. Here is one alternative. There are definitely more.
A Simple, Fast Enough, Algorithm
First some definitions:
- Hole: A scopal (i.e.
htype) argument in an MRS predicate that has not yet been assigned a predication
- Floater: A tree of predications that have had their scopal (i.e.
htype) arguments filled by any predications that are initially assigned by the MRS. [This is not at official MRS term, it is one I made up for this algorithm]
As a reminder, a tree is “scope-resolved” if:
- Each floater is assigned to one, and only one, hole. No holes
or floaters are left at the end
- None of the assignments of floaters to holes violates a
- Any variable introduced by a quantifier is not used outside of
the branches assigned to its
Here’s the intuition for how the algorithm works: We are going to walk a search tree. Every node of the tree represents a partial assignment of floaters to holes that meets the above 3 constraints. Every arc from a parent node to a child node represents a single new assignment of a (currently unassigned) floater represented by the arc to a (currently empty) hole in the parent node. If that assignment violates a constraint, the node is not valid and we stop searching that whole branch of the tree (this pruning is what makes it faster than the really naive “try every option” approach). Every node that has no holes left to assign is a solution.
Algorithm Flow: We start at the
TOP: hole and record any
qeq constraints that apply to it and any
X variables that are in scope for it. As we traverse an arc and assign a new floater to a hole, we propagate any constraints and in-scope variables from the (parent) hole to the holes in the (child) floater. Then we recurse.
allHolesDict: Dictionary containing all the holes in the MRS. Each hole has information about: - the
qeqconstraints it inherits - the
Xvariables that are in scope - the floater it is from
nodeAssignmentList: Assignments of floaters to holes for this node. Initially empty.
nodeRemainingHolesList: Holes left to fill in this node. Initially filled only with the
nodeRemainingFloatersList: Floaters unassigned so far. Initially contains all floaters. Each floater contains information about: - a list of holes they contain - a list of unresolved
xvariables they contain - a list of any
Loparts of a
qeqconstraint they contain (if they don’t also have the
Hipart in the floater)
currentHoleby removing the first hole from
currentFloaterby removing each floater from
currentFloaterdoes not violate the constraints in
currentHole = currentFloaterto
- Propagate the constraints and variables from the new parent to all holes in
- Add holes from
currentFloaterto the end of
- Check number of holes left:
- if == 0, return
nodeAssignmentListas a solution
- otherwise recurse and continue the search
- if == 0, return
Once this has run its course you will have all the valid scope-resolved trees for the MRS.