Parsing Poor English Grammar

Turns out that many users do not want to type full English sentences into Perplexity. Or, when they do, they sometimes revert to what I’ll call “computereze”: the default syntax used by Interactive Fiction (IF) engines like Inform7 but also a simplified keyword-like syntax that you might type into a search engine.

Computereze        -> English
'examine backpack' -> 'examine the backpack'
'put key in gate'  -> 'put the key in the gate'
'examine fence'    -> 'examine the fence'
'examine house'    -> 'examine the house'

I think there are several reasons: Obviously those people who have played a lot of IF are just used to this, so it is muscle memory. Even without that experience, it is often just faster. Finally, interacting with computers over the last N years has taught everyone that they don’t really understand English, and it often just confuses them. So we have been trained to do it.

The tricky part is that this is a different, but overlapping, syntax than English. “Eat food” is a valid sentence is both languages, as is “take root”. The former means the same when parsed in both, but the English meaning of “take root” is very different than the computereze meaning which would be “take the root” in English. Furthermore, the ERG parser will often happily parse phrases like “open door” since it is a valid fragment of a sentence like “go to the open door”, and when executed in Perplexity they end up running a query which checks for the existence of an open door, i.e. like the user asked “open door?” So, you have to pick which language to prioritize when parsing, and only try the other if it fails.

The approach I’ve taken in Perplexity is to prioritize English and then fall back to computereze if it doesn’t parse or if it parses but doesn’t succeed. So the logic is:

  1. User types phrase “open door”
  2. ERG parses the phrase and it works (as a fragment meaning “open door?”)
  3. Execute the logic generated by the ERG. If it turns out there is an open door, it will succeed
  4. If it fails or doesn’t parse, run the computereze parser to convert the phrase to English.
  5. If there is no computerize parse: stop and return the error from the first parse.
  6. Otherwise: Parse the new phrase and execute it.
  7. If it succeeds, return that result to the user
  8. If it fails, assume it was computereze since it parsed in computereze and return the error from the changed phrase to the user.

This does mean that typing in computereze is slower, but it is the best approach I’ve found so far.

You can tell when the alternative parser kicks in because Perplexity will say “I heard ‘'" before showing you what happened.

Below are the approaches I’ve tried for building the alternative parser.

Approach #1

The first approach I tried was to build a simple alternative phrase builder which was very brute force, and so I call it “the brute force parser”. It puts articles into the phrase mechanically and tries parsing each one until it runs out of time.

The approach was obviously very inefficient and didn’t always find the right answer before the timeout.

Approach #2

In this approach I tried training the GPT3 AI engine to do the translation from computereze to English and then parsing that. This approach worked amazingly well (but incurrs a slight cost). Here’s how to replicate what I did:

Easiest way to get started is to set up an account on openai.com and use the “Playground”. It is trivial (5 lines of code?) to get it working in your language of choice once it is doing nearly what you want, they have good docs for that.

So far, I’ve used what OpenAI calls “text completion” to do the parsing, which basically means literally writing down instructions, followed by examples, and then giving it what you want to be transformed and hoping it will follow the pattern.

I have to post filter what it gives me to detect when it goes off the rails. So far it’s just two rules:

And it gives very few bogus answers with what I’m using now. Note that you also should run its free “content” filter over its results in case it goes really crazy and sends something you will regret back. The filter will flag offensive stuff. Mostly. No guarantees.

To try out what I’ve done, go to the playground, and use the following settings:

Model: text-davinci-002 (the richest model, I've had mixed success so far on others)
Temperature: 0 (we want no creativity or risks, just a consistent answer)

Leave the rest of the settings at their defaults

Below is the exact text I have been using for my purposes and has worked very well testing against over 1200 phrases (both ones that shouldn’t be corrected and ones that should). You literally paste every single line of it into the playground window, and then fix the last line to be the text you want corrected. So turn the last line from:

"<the text you want to test>" should be

into (for example):

"who was governor of minnesota when ankahee was released?" should be

And hit submit. The playground will “complete” the phrase with the correction.

As always, this stuff is an art. When a phrase isn’t parsed how I want it, I add the phrase that didn’t work into the training set and try again with more tests. I kept doing that until I started getting consistent results that I wanted.

Here is my raw completion text (and this shows what it took to get it working like I wanted (so far)). All the phrases where the original and correction are the same are places where it had a poor suggestion and I had to add the phrase in to get it right:

Turn short phrases into full English sentences but don't remove any important words. For example:
"Open door" should be "open the door"
"eat apple" should be "eat the apple"
"put apple in barrel" should be "put the apple in the barrel"
"pacifier in bed" should be "put the pacifier in the bed"
"get safe" should be "get the safe"
"give buttercup" should be "give the buttercup"
"drop backpack" should be "drop the backpack"
"put boot table" should be "put the boot on the table"
"where the diaper bag" should be "where is the diaper bag"
"frog green?" should be "The frog is green?"
"a diamond is blue" should be "A diamond is blue"
"the pen is in the diamond cave" should be "The pen is in the diamond cave"
"there is a pen" should be "There is a pen"
"a bottom is on the slug" should be "A bottom is on the slug"
"describe the rocks" should be "Describe the rocks"
"there is blue paint" should be "There is blue paint"
"blue paint is on the table" should be "Blue paint is on the table"
"a roof is wet" should be "a roof is wet"
"go home" should be "go home"
"restart" should be "restart"
"help" should be "help"
"is a book in the entrance?" should be "Is a book in the entrance?"
"put the diamond in Plage" should be "put the diamond in Plage"
"get the rock on the floor" should be "get the rock on the floor"
"put the crystal on the table where the safe is" should be "put the crystal on the table where the safe is"
"where is the diamond at?" should be "Where is the diamond at?"
"are you still in a cave?" should be "Are you still in a cave?"
"get a hand" should be "get a hand"
"read page 1" should be "read page 1"
"read page 2" should be "read page 2"
"turn page 1" should be "Turn page 1"
"look around" should be "look around"
"paint is on the table" should be "paint is on the table"
"go to a cave" should be "go to a cave"
"is a rock in the cave?" should be "Is a rock in the cave?"
"is a girl in the doorway?" should be "Is a girl in the doorway?"
"what is the keyhole on?" should be "what is the keyhole on?"
"get Plage." should be "get Plage"
"go through the safe" should be "go through the safe"
"leave cave" should be "leave the cave"
"there is a front on a safe" should be "there is a front on a safe"
"drop a rock" should be "drop a rock"
"go into the 1st cave" should be "go into the 1st cave"
"where is a living room" should be "where is a living room"
"where is my grand children's house" should be "where is my grandchildren's house"
"<the text you want to test>" should be

That worked well, but was expensive. My instructions above use what OpenAI calls “few shot learning” to train the model, where you put examples right in the prompt every time. This costs more (because you get charged basically by the character) and isn’t the best training approach but it is good for initial testing. Worked great for my case.

I’ve now completed testing using what they call their “Fine Tuning” approach where you upload training data, they train the model on their servers and then you only send the text you really want to use as a prompt. Less text is cheaper, and the training approach gives better (or the same) results.

I also tried my sample phrases on a fine-tuned “Ada” model (as opposed to Davinci) which is a much smaller and much cheaper model and performs just as well with the data I’ve used so far. So, less text sent + cheaper base model + as good results = much cheaper to use overall (and supposedly faster but I haven’t measured that).

It did take a bit to get the training data right so here is a sample of what I used. It is the exact same data as the dataset above, but in the fine-tuning format:

{"prompt": "\"examine me\"\n\n###\n\n", "completion":" \"examine me\" END"}
{"prompt": "\"examine backpack\"\n\n###\n\n", "completion":" \"examine the backpack\" END"}
{"prompt": "\"examine gate\"\n\n###\n\n", "completion":" \"examine the gate\" END"}
{"prompt": "\"examine fence\"\n\n###\n\n", "completion":" \"examine the fence\" END"}
{"prompt": "\"examine house\"\n\n###\n\n", "completion":" \"examine the house\" END"}
{"prompt": "\"open gate\"\n\n###\n\n", "completion":" \"open the gate\" END"}
{"prompt": "\"close gate\"\n\n###\n\n", "completion":" \"close the gate\" END"}
{"prompt": "\"examine bell\"\n\n###\n\n", "completion":" \"examine the bell\" END"}
{"prompt": "\"examine door\"\n\n###\n\n", "completion":" \"examine the door\" END"}
{"prompt": "\"examine intercom\"\n\n###\n\n", "completion":" \"examine the intercom\" END"}

...