One of my dreams has always been to build a rich, virtual, text-based world that you can interact with. I say “text-based” because many fantastical things are possible using language that are very hard or expensive to do graphically.
There is a small but vibrant community of developers building Interactive Fiction that take many approaches to doing this. Everything from “choose your own adventure” games where a user selects from multiple choices at every plot fork, to much richer games that allow structured commands to be used along the lines of the original Zork and Adventure games.
Those games are amazing, but I’ve not been able to find any examples that really allow free-form English to be used by the user and understood deeply by the system. The examples I’ve found require limited grammars or end up doing what many chat bots do: looking for keywords or using statistical methods to try to get the “gist” of a sentence. It kind of feels like the way I understand French:
French person: “blah blah vas blah maison blah blah blah!” Me: oh, you mean “go to the house!”
…when if fact it might have meant “do not go into the house, it is very dangerous!”
It ruins the magic for me when obvious English phrases are wildly misunderstood like that.
Given that a prototype that seemed to understand language deeply, Terry Winograd’s Shrdlu protoype, was built between 1968 and 1970, and that Natural Language Understanding has had a huge amount of progress since then, I figured that there must be examples done more recently with even cooler results!
They were really hard to find. Sure, I could find research papers on how to do natural language search on databases and on how to do natural language Q&A systems, among other things, but none had the combination of “deep semantic understanding” coupled with “how to actually build it” and “has a demo” to see how well it worked. Furthermore, it was really hard to find practical guides on how to build a system that can take an english phrase like “Go to the next room” and actually execute it in a program to do that thing.
The Prototype And Approach
So, I went about this the hard way: trying to pull whatever knowledge I could glean from the Internet and trying to figure it out.
My goal was to explore how to build a game that allows the user to use full english sentences and understand them deeply. Or, rather, to see how far I could get using the best tools and research I could find in the modern world. This set of blog posts documents my results and hopefully allows others looking to build this kind of system to replicate my results. Maybe most importantly, you can actually use the system to get a feel for how it really works.
Try the Perplexity prototype here: https://www.InductorSoftware.com/Perplexity
Perplexity is what is often called a “microworld” in that it is a controlled, closed system that has a limited things to interact with. As a microworld, it avoids many problems that crop up when using natural language to interact with the “real world”, the kinds of problems that Google or Siri have to contend with. Perplexity is a game, and is probably better described as a “proof-of-concept scenario” as it honestly isn’t really that fun yet. It is more a demonstration of what can be done.
To truly build a natural language game you need subsystems (each likely being designed with its own Perplexity-like prototype) in many of the following areas, depending on the game:
Addressed to some extent in Perplexity:
- Understand Natural Language
- Represent knowledge (facts about the world)
- Model real-world physics (position, motion, solid bodies, etc)
- Logical Understanding and “Theorem Proving”
- Build plans in complex environments to accomplish user commands
Not addressed at all in Perplexity:
- Generate Natural Language
- Generate Plots
- Realistic dialog
- Model humans and emotions
- Common sense reasoning
All of these are “unsolved problems” that are actively being researched. I did a broad sweep of the areas before starting Perplexity, and found a ton of interesting research on them. Clearly, building a working system was going to require putting some constraints on the problem to make it tractable.
So, the Perplexity prototype is focused primarily on understanding natural language in a microworld. To allow the user to interact, it requires some amount of logical understanding and planning, and of course there must be a way to represent knowledge about the world.
As you’ll see if you run Perplexity, there is hardly any attempt to have the system respond to you using proper English (although it expects that of you!). There is no plot to speak of and you’ll notice the lack of common sense in some scenarios (although being a microworld tends to hide this).
What Language Is Understood?
Furthermore, the language that is understood has some limitations:
- It only knows about the present tense
- There is no “pronoun model” so referring back to something as “it” won’t work
- There is no “memory” about what has happened so asking “did you X?” or “grab the thing that you put over there” won’t work
Also important is that only a limited number of words and grammar constructions are supported (although support for synonyms increases this drastically). I’ve not hit anything yet that couldn’t conceptually be implemented, but, obviously, the limitations of common sense reasoning and the depth of knowledge representation will limit how successful the effort will be.
All that said, what I found most surprising about the effort was how rich the world could be even with those limitations.
The Engine and Technology
The Delph-In English Resource Grammar (ERG) has been around for quite some time and takes a much different approach than many of the more recently created NLU technologies: The rules for understanding English are hand crafted and not machine learned. I ended up going with the ERG instead of using some kind of machine learning system because I wanted to focus on deep understanding of what the user typed. I didn’t want the “intent” or the “main points” of the sentence intuited by a black box. Furthermore, and perhaps most importantly, the output of the ERG is a set of predicate logic-like predicates that represent what was said. This seemed like a great starting point to build a logic-based system.
While there are lots of logic systems around (especially in the area of “theorem provers”), I ended up going with SWI Prolog. The Prolog language has been around forever, has lots of resources available for understanding it, and I had some familiarity with it. It also gets used quite often in research so learning it had some side benefits. SWI Prolog, in particular, has been a rock-solid open-source Prolog engine, with great support, lots of helpful libraries and a healthy user community.
Using Python as the glue code was just expedient. It could have been anything, but I found Python to be a well-supported and productive language for cranking out code.
How To Read The Write-up
First I recommend running Perplexity to understand what it does and form your own opinion about how successful it is.
There was a lot of information I had to learn and digest to build Perplexity. My goal with the write-up is to try to digest it so others can benefit. That said, it is still a lot of information. So, there are two ways I recommend reading through it:
- Top-Down: If you are the kind of person that likes a conceptual approach where you start at the top and dive into layers of detail as you go, start with the “Execution Flow” section. It has a summary, with links for detail, of all the steps required to turn English into a real computer interaction
- Bottom-Up: The “Writing Prolog MRS Predicates” section walks through the process of writing the actual Prolog code to implement a phrase like “Where are you?” in very concrete detail. There are links sprinkled throughout that supply the background to understand it.
Finally, if you want to just jump to a topic, here’s a list of all the sections, in a pseudo top-down order:
- Resources I Collected And Referenced To Build Perplexity
- Natural Language Processing: Natural Language as Logic
Going from English to a representation that is ready to be executed (MRS):
- Using ERG for a Natural Language Interface Flow
- Autocorrect for the ERG
- Understanding the Output of the ERG: the Minimal Recursion Semantics Format
- Implementing Synonyms for ERG Predicates
- Building Scope-Resolved Trees From MRS
- Conversion From Scope-Resolved Tree to Prolog
Key concepts for implementing ERG predicates in Prolog:
- Understanding ERG Events
- ‘Quoting’ in Predicates With Events
- Handling Logic That Fails
- Performance Issue:
thingand Free Variables
- Planning Using Hierarchical Task Networks
Actually writing the ERG predicates in Prolog:
Some case studies: