Using ERG for a Natural Language Interface Flow
This is the overview of how I’ve built the Perplexity Prototype which is a Natural Language “Proof of Concept” game. The goal of the prototype was to see if I could build a system that truly understood and acted upon every single word and construction in phrases the player gives it (i.e. deep semantic analysis) so that it really does seem like the computer understands what is being said. I wanted to avoid the common experience of having the computer “cherry-pick keywords” or try to get the “gist” of a phrase and do something that is not quite right. I wanted to see how far current technologies could get me.
You can decide for yourself whether or not the prototype was successful by running it here.
To build it, I took the output of the Delph-In English Resource Grammar and executed it in SWI-Prolog with some Python glue in-between. To fully understand what I’ve done, you’ll need to have at least a passing understanding of the MRS (Minimal Recursion Semantics) format, the Prolog Programming Language, and Hierarchical Task Networks (HTN). That last one is only necessary to get really deep into how some of the engine is implemented. An understanding of Python is only necessary if you want to understand the sample code itself. It isn’t necessary for understanding the conceptual flow.
This page serves as an overview of the whole process but each of the steps has a drill-down that gives more detail.
The Flow of Execution
The flow of execution starts with the player typing a phrase like "Every book is in a cave"
:
-
Autocorrect is applied to the phrase since the ERG is sensitive to proper spelling and sometimes capitalization and punctuation. I handled a few common cases before sending to the parser but it really didn’t need much to work well.
-
The ERG parser (called “Ace”) parses it into N valid MRS interpretations and returns them in “most likely order” which is key since some “valid” interpretations can be pretty far out.
- MRS is “underspecified” meaning that there are still many ways of interpreting each of the N parses. e.g.
Every book is in a cave
could mean “all books are in the same cave” or “every book is in a (possibly different) cave”. - With each MRS parse we don’t know which interpretation is meant, but we do know which logical predicates are involved and the constraints on how they can be validly put into a tree:
- MRS is “underspecified” meaning that there are still many ways of interpreting each of the N parses. e.g.
MRS for `Every book is in a cave`:
[ TOP: h0
INDEX: e2
RELS: <
[ _a_q__xhh LBL: h10 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ]
[ _cave_n_1__x LBL: h13 ARG0: x9 [ x PERS: 3 NUM: sg IND: + ] ]
[ _every_q__xhh LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h5 BODY: h6 ]
[ _book_n_of__xi LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ARG1: i8 ]
[ _in_p_loc__exx LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x9 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 h11 qeq h13 > ]
- Resolve each of the N MRS interpretations into M scope-resolved trees so they can actually be executed. We now have NxM possible meanings!
- This step removes the “underspecification” from the MRS and puts the predicates into a tree that can be executed
- The process can be quite time intensive depending on the phrase. Making it efficient is an active area of research
The two scope-resolved trees for the above MRS:
┌_book_n_of__xi:x3,i8
_every_q__xhh:x3,h5,h6┤
│ ┌_cave_n_1__x:x9
└_a_q__xhh:x9,h11,h12┤
└_in_p_loc__exx:e2,x3,x9
┌_cave_n_1__x:x9
_a_q__xhh:x9,h11,h12┤
│ ┌_book_n_of__xi:x3,i8
└_every_q__xhh:x3,h5,h6┤
└_in_p_loc__exx:e2,x3,x9
- Convert each of the NxM scope-resolved trees (i.e. possible meanings) into Prolog so they can be executed
- Since Prolog is an engine for executing logical statements and the ERG natively generates “predicate logic-like predicates”, there is a “naive” recursive procedure that can be applied that illustrates conceptually what is going on: walk the tree and recursively output the terms.
- However, I had to deviate in several very important ways from the “naive” procedure to make things work:
The "naive" and "final" Prolog transformations for first of the above trees:
naive:
_a_q__xhh(
x9,
_cave_n_1__x(x9),
_every_q__xhh(
x3,
_book_n_of__xi(x3, i8),
_in_p_loc__exx(e2, x3, x9)))
final:
erg(1, d_a_q__xhh(
arg(X9104, var([name(x9), type(reg), pers(3), num(sg)])),
arg(erg(2, d_noun__x(arg('cave', word('cave')),
arg(X9104, var([name(x9), type(reg), pers(3), num(sg)])))), term),
arg(erg(3, d_every_q__xhh(
arg(X9106, var([name(x3), type(reg), pers(3), num(sg)])),
arg(erg(4, d_noun__xi(
arg('book', word('book')),
arg(X9106, var([name(x3), type(reg), pers(3), num(sg)])),
arg(i, term))), term),
arg(erg(5, d_in_p_loc__exx(
arg(E9107, var([name(e2), index(yes), tense(pres)])),
arg(X9106, var([name(x3), type(reg), pers(3), num(sg)])),
arg(X9104, var([name(x9), type(reg), pers(3), num(sg)])),
arg(create, var([type(quote)])))), term),
arg(Quote9110, var([type(quote)])))), term),
arg(Quote9108, var([type(quote)])))).
- Fixup the final Prolog predicates to account for synonyms
- Mapping synonyms to a core predicate allowed the system to understand vastly more words cheaply
- This was not as simple as just doing autocorrect of words
- Since the MRS predicate signatures (both the name and arguments) provide lots of data about how the word was used, this process worked very well
- Fail gracefully if something is not implemented
- [No drill-down on this one since it is pretty obvious how to check if predicates are in the system]
- If the system doesn’t have an implemented predicate (or a synonym) for every MRS predicate generated, it fails and reports which terms are not known
- It also fails if the user isn’t speaking in present tense since future, past, etc weren’t explored in the prototype
- Add a final predicate which actually executes the key word (usually a verb) in the sentence
- The Prolog is executed differently depending on if the phrase is a command (“look around”), a question (“Is there a book in there?”) or a proposition (“There is a table in the cave”)
- Questions and propositions are simply Prolog queries
- Commands actually change the state of the world and I’ve found that I needed a simple Prolog implementation of a planner to do this.
- Execute the phrase in the Prolog engine.
- Designing the “ontology” for all this including a logic-based physics model was a very important part of it and definitely took some time.
- Figuring out how to say “why” something failed in Prolog was a very key part of making the system work
- Writing the predicates themselves was a lot of details to get right
- The prototype scenario is a very simple world that has 3 connected caves that contain a few simple items: a person (that you can tell to do things for you or ask questions to), a book, some pages, rocks, a crystal a diamond, two safes, a key and a table.
- Interpret and report the result of all the Prolog versions of solved trees. This is a non-trivial chunk of work
- Of the NxM meanings that were generated, some or all may fail and some or all may succeed. At the end, one answer is expected by the user.
- “wh-type questions” like “Where is …” may have multiple answers
- Some types of commands like “Look around” may generate answers like query would
- If everything failed to run, the “best failure” needs to be returned