Prolog uses a “Closed World Assumption”. Knowledge that is not known (or cannot be proved) is “false”.
This makes it hard to answer questions well that make no sense like “is the sky happy?”. This would return “false” because we have no data about happiness and the sky. We’d like to answer something like “the sky has no emotions (silly!)”.
It also makes it hard to answer questions that make sense but need more than a “no” like “Is the cave locked?” when there is no locking mechanism. Instead of “No” we’d like to see “No, there isn’t a lock.”
Humans demand more complicated responses than
false. even in simple scenarios. For example, let’s walk through some scenarios where the user types “The rock is green”:
|Scenario||Prolog Response||Plausible Human Response|
|There is a blue rock and a red rock||false.||“There is more than one rock, which do you mean?”|
|There are many rocks of different colors, including green ones||false.||“There is more than one rock, which do you mean? “|
|The person they are talking to doesn’t know the word ‘rock’||false.||“I don’t know what a rock is”|
|The aren’t any rocks||false.||“There aren’t any rocks”|
The approach I’ve taken in Perplexity is able to reply with all of the “plausible” responses above in those scenarios. To do this I’ve:
- Created a mechanism that reports which Prolog predicate failed (this is not built into Prolog for reasons explained below)
- Supported replacing the failed predicate with a more meaningful failure when necessary
- Included metadata with variables to allow richer errors
- Followed some patterns in implementing the Prolog predicates to make the system report the best errors possible
Reporting why a logic statement failed is an active area of research and I do not claim to have solved it. However, the approach I’ve taken has worked suprisingly well and is worth examination. I will note up front that there are still cases where the responses using the approach below are less than optimal. I’ve got a section that goes through some good (bad) examples at the end.
The rest of this post goes through the details of how it works.
Why Was Something “False” In Prolog?
To understand how Perplexity handles failures, first you need some background on how Prolog does. When a Prolog statement can’t be proved, the system simply returns
false. and there is no error information that tells you why. That’s because it doesn’t know.
"the rock is green" Type: proposition [ TOP: h0 INDEX: e2 RELS: < [ _the_q__xhh LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ] [ _rock_n_1__x LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg ] ] [ _green_a_2__ex LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ] > HCONS: < h0 qeq h1 h5 qeq h7 > ] ┌_rock_n_1__x:x3 _the_q__xhh:x3,h5,h6┤ └_green_a_2__ex:e2,x3 Logic: _the_q__xhh(x3, _rock_n_1__x(x3), _green_a_2__ex(e2, x3)) [Simplified] Prolog: the(X, noun(X, rock), adjective(X, green))
I’ve shown a “simplified Prolog” in the above, which is similar to, but simpler than, what Perplexity would generate for illustrative purposes.
Now, let’s implement the first scenario (a red rock and a green rock exist) in simple Prolog. It uses a “triple store” for storing facts about the world, just like Perplexity, but, again, is simplified:
% things in the world rock1. rock2. gem1. % facts about them fact(rock1, isA, rock). fact(rock1, adjective, red). fact(rock2, isA, rock). fact(rock2, adjective, blue). fact(gem1, isA, gem). noun(X, Type) :- fact(X, isA, Type). adjective(X, Value) :- fact(X, adjective, Value). % Simple implementation of "the" the(_, RSTR, BODY) :- % Count the number of things that are true for RSTR aggregate_all(count, RSTR, Count), % Since this is "the", there should be exactly one Count == 1, % Since there is exactly 1, run RSTR again to get its value RSTR, % And now run the body BODY.
If we now run it:
?- the(X, noun(X, rock), adjective(X, green)) false.
false as expected. But why did it fail from a Prolog perspective? If you examine how Prolog proves things, it turns out many things failed (in order):
aggregate_all(count, noun(X, rock), Count),
noun(X, rock)failed on every fact that was not
fact(X, isa, rock)as it iterated through all the facts:
- FAIL: fact(rock1, adjective, red)
- FAIL: fact(rock2, adjective, blue).
- FAIL: fact(gem1, isA, gem)
Count == 1failed because
Countas actually 2
…and this is ignoring the predicates which might have failed inside of
aggregate_all which really wouldn’t make sense to anyone. Prolog doesn’t know which of those failures are the “ultimate reason for failure”. The real “reason” is that it couldn’t prove the statement was true.
All is not lost though! It turns out a couple of heuristics do a very reasonable job at reporting errors:
- Report the “deepest” failure
- Provide semantic information for places that fail
The next two sections describe how these work in Perplexity.
1. Report the “deepest” failure
Prolog proves logic statements by walking a tree of facts and diving into the logical chain of facts about them until it finds a set that are true (or fails). Intuitively, if you have a whole string of logic predicates in a row, probably the “deepest” failure is going to be a good “reason” why it failed (modulo a few things that I describe later). That’s because that is the farthest it got in proving the statement before it gave up.
If we look at the example above as a “stack” we can see where the failures happened:
?- the(X, noun(X, rock), adjective(X, green)) Prove: the(X, noun(X, rock), adjective(X, green)) Prove: aggregate_all(count, noun(X, rock), Count) FAIL: fact(rock1, adjective, red) FAIL: fact(rock2, adjective, blue). FAIL: fact(gem1, isA, gem) Prove: Count == 1 FAIL: Count == 2
Describing the reason “the rock is green” failed as:
“because ‘the’ means ‘the one thing’ but there are two so I couldn’t figure out which you mean’
seems much more “correct” for why this failed than:
“because rock1 is red” or “because gem1 is a gem”.
However, even though we may have found the failure point that makes the most sense, it’s hard to imagine how to generate the text:
'the' means 'the one thing' but there are two. I couldn't figure out which you mean
from the failure:
Failure at: Count == 1
2. Provide semantic information for places that fail
We can work around Prolog’s lack of a way to report the “most reasonable failure” and provide information that turns a raw predicate like “not(Count == 1)” into something meaningful with the same mechanism: annotate the implementation with extra information that says what it means if we fail at a particular place and remember that information at runtime if this is the deepest failure.
The predicate that does this is called
tryError/2. The first argument is a predicate to try, and the second is what it “means” if it fails. Let’s annotate the implementation of
the using this predicate:
% Simple implementation of "the" the(_, RSTR, BODY) :- % Count the number of things that are true for RSTR aggregate_all(count, RSTR, Count), % Since this is "the" there should be exactly one tryError(Count == 1, context(theMeansOneAndThereIsMoreOrLessThanOne)), % Since there is exactly 1, run RSTR again to get its value RSTR, % And now run the body BODY.
If the Prolog fails at
Count == 1, the caller can now do a separate call to get whatever is in the second argument of
tryError. It is up to the caller to make sense of it, but since you wrote it, presumably you are the caller so you’ll know what to do with it. In this case, converting
theMeansOneAndThereIsMoreOrLessThanOne to a message like:
I'm not sure which you mean by 'the'
would be a fine solution. I’ll show how to get better messages farther down in this post, but for now a little background on how
tryError/2 is implemented.
The algorithm for the implementation is pretty simple. First, we have global state that remembers:
- the stack level of the deepest error so far
- the error that should be returned if that really turns out to be the “deepest” failure
- Before executing a Prolog query, clear out the global state
tryError(Term, Error)is encountered, it:
- Remembers what was the current best error before we got here
- Sets Error as the current error if we are the deepest error
- Then solves Term
- If Term succeeds, put the previous best error back again (if something Term executed didn’t replace it with a better one)
- When the whole query fails, whatever is left in the global state is the error to return
The key predicate implementation is this:
% These three store the current % "best" error encountered so far :- dynamic(currentTermIndex/1). currentTermIndex(0). :- dynamic(currentDepth/1). currentDepth(-1). :- dynamic(currentError/1). currentError(none). tryError(Term, Error) :- % Remember whatever is the current context currentFailureContext(PreviousContext), % If we are now the "deepest" error % set the context to be this error failureContext(Error), % Remember whatever is now the "deepest" error % which might or might not be the one we just tried to set currentFailureContext(NewContext), % Actually run the term Term, % If we succeeded, we need to get rid of the new context % we *might have* set. But only do this if nothing that % Term called changed it. replaceFailureContextIfCurrent(NewContext, PreviousContext).
There are two important details.
First: What does “deepest” mean?
When initially using this approach I noticed that some scenarios caused deep stacks for “non-semantic” reasons (they iterated through a list using tail recursion for example) and thus would always “win” and cause a poorer reason to be returned. I realized that combining how “far” a failure was through the list of top-level predicates with how deep the failure was within that predicate was a better indicator of the “return the failure farthest in the proof” principle. This makes intuitive sense because the ERG only creates predicates that have a semantic meaning, and those are the ones that are executed at the top level. In a sense, this bounded a poor error to a single semantic unit instead of allowing it to take over.
So, “deepest” is actually a combination of two things: the index of the top-level predicate and the depth of the stack within that predicate. Here’s an example to illustrate what I mean:
?- predicate1(X), predicate2(X), predicate3(X).
The “depth” of a failure is
PredicateIndex indicates how far through the top level predicates we are:
PredicateIndex = 1,
PredicateInex = 2, etc. If:
- The first failure we hit is 5 stack frames deep in
predicate1the “depth” is
- If we hit another failure 6 stack frames deep in
predicate1the “depth” is
depth(1, 6)and this is the new failure.
- If we hit a failure 1 stack frame deep in
predicate2the “depth” is
depth(2, 1)which is deeper and so this is the new failure
Second: What is considered a “top-level predicate”?
Due to the fact that the MRS gets resolved to a tree, there is only one “top-level” predicate from the Prolog perspective. If you look at the example for “the rock is green”:
?- the(X, noun(X, rock), adjective(X, green))
you see that the “top-level” predicate is
the. However, experimentally, the best error reporting results happened when each of the MRS predicates was considered “top-level” and the error reported was the farthest through the MRS predicates.
Luckily, that was easy to do since the way conversion from MRS to Prolog happens in the prototype automatically wraps each predicate with a call to
erg/2. The whole point of this call is to mark it so that any errors that happen in the predicate (or in any predicates it “calls”) know the index of the predicate they are currently being called from.
erg/2 does this by putting information in the stack, kind of like a
try/catch block would:
% Used to call an ERG predicate and set the index of it as context % for any errors that occur erg(Index, Goal) :- callWithContext(Goal, Index). % callWithContext/2 makes sure it stays on the stack with Context as an argument % so that, later, we can call getContext/1 and retrieve the current value of % Context from the stack callWithContext(Goal, Context) :- call(Goal), % The retainValue/1 call ensures callWithContext/2 remains on the stack and passing the argument ensures Prolog will not garbage collect the argument. retainValue(Context). retainValue(_). % Retrieve the value of Context by looking up the stack and getting % the nearest call to callWithContext/2 getContext(Context) :- prolog_current_frame(Me), prolog_frame_attribute(Me, parent_goal, callWithContext(_, Context)).
With that in place, I can now show how
tryError pulls this information out and uses it. Here are all of the important predicates in this error handling infrastructure:
% These three store the current % "best" error encountered so far :- dynamic(currentTermIndex/1). currentTermIndex(0). :- dynamic(currentDepth/1). currentDepth(-1). :- dynamic(currentError/1). currentError(none). % Used to call an ERG predicate and set the index of it as context % for any errors that occur erg(Index, Goal) :- callWithContext(Goal, Index). % callWithContext/2 makes sure it stays on the stack with Context as an argument % so that, later, we can call getContext/1 and retrieve the current value of % Context from the stack callWithContext(Goal, Context) :- call(Goal), % The retainValue/1 call ensures callWithContext/2 remains on the stack and passing the argument ensures Prolog will not garbage collect the argument. retainValue(Context). retainValue(_). % Retrieve the value of Context by looking up the stack and getting % the nearest call to callWithContext/2 getContext(Context) :- prolog_current_frame(Me), prolog_frame_attribute(Me, parent_goal, callWithContext(_, Context)). % Called by predicates to record a potential failure that should be reported to the user tryError(Term, Error) :- % Remember whatever is the current context currentFailureContext(PreviousContext), % If we are now the "deepest" error % set the context to be this error failureContext(Error), % Remember whatever is now the "deepest" error % which might or might not be the one we just tried to set currentFailureContext(NewContext), % Actually run the term Term, % If we succeeded, we need to get rid of the new context % we *might have* set. But only do this if nothing that % Term called changed it. replaceFailureContextIfCurrent(NewContext, PreviousContext). % Finally, failureContext/1 is used to actually check if the error % is now the "deepest" by using getContext/1 to figure out the index % of the predicate we are currently resolving and how deep in the stack we are % Case 1: when we are farther in the term list than the current failure for the first time failureContext(Value) :- % Get the currently executing "top level" index getContext(TermIndex), % Get the last recorded "best" error index currentTermIndex(CurrentTermIndex), TermIndex > CurrentTermIndex, % Update all the state since we are deeper now retractall(currentTermIndex(_)), assert(currentTermIndex(TermIndex)), % Update the absolute depth of the stack prolog_current_frame(Frame), prolog_frame_attribute(Frame, level, Depth), retractall(currentDepth(_)), assert(currentDepth(Depth)), % And finally, record the actual error message retractall(currentError(_)), assert(currentError(Value)), !. % Case 2: when the current and last known are the same predicate index % only update, if the absolute depth of the stack is deeper failureContext(Value) :- getContext(TermIndex), currentTermIndex(CurrentTermIndex), TermIndex = CurrentTermIndex, % See if this actually went deeper in the current term prolog_current_frame(Frame), prolog_frame_attribute(Frame, level, Depth), currentDepth(CurrentDepth), Depth >= CurrentDepth, % Update all the state except Index since we are deeper now retractall(currentDepth(_)), assert(currentDepth(Depth)), retractall(currentError(_)), assert(currentError(Value)), !. % Case 3: needs to still succeed even if the context wasn't set or % the actual predicate won't run failureContext(_).
Errors Should Be Reported Against The Variable, Not The Instance
Prolog replaces variables with their value at runtime and it is not possible to know what the original variable was:
% Will always write the *value* of X and there % is no way to know, from inside of writeln, % that the value came from a variable called "X" foo(X) :- writeln(X).
There are cases where this causes problems. For example, “Is Lexi a rock?”:
"Is Lexi a rock?" Type: question [ TOP: h0 INDEX: e2 RELS: < [ _a_q__xhh LBL: h10 ARG0: x4 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ] [ _rock_n_1__x LBL: h13 ARG0: x4 [ x PERS: 3 NUM: sg IND: + ] ] [ proper_q__xhh LBL: h5 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h6 BODY: h7 ] [ named__cx LBL: h8 CARG: "Lexi" ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ] [ _be_v_id__exx LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x4 ] > HCONS: < h0 qeq h1 h6 qeq h8 h11 qeq h13 > ] ┌named__cx:Lexi,x3 proper_q__xhh:x3,h6,h7┤ │ ┌_rock_n_1__x:x4 └_a_q__xhh:x4,h11,h12┤ └_be_v_id__exx:e2,x3,x4 Logic: proper_q__xhh(x3, named__cx(Lexi, x3), _a_q__xhh(x4, _rock_n_1__x(x4), _be_v_id__exx(e2, x3, x4)))
You can see by the way the tree is built that Prolog is going to:
- Find the person named ‘Lexi’ and put that in
- Then backtrack through each rock in the system and put it in
- Finally, it will see if
_be_v_id__exx has a few ways to check if something “is” something else. Once is conceptual, i.e. “is that person a type which derives from that type”? Another just checks are the two things are identical. If the first one fails, you’ll get a good failure message. If the system is written such that the second version is the “deepest stack” (as mine is) you’ll end up failing on a line like this:
... tryError(x3 == x4, context(thingIsNotOtherThing(x3, x4))), ...
which means the error will be
thingIsNotOtherThing(lexi, rock2) which translates to “Lexi is not the second rock”. Clearly not what we wanted. We wanted “Lexi is not a rock”.
What we need here is a way to find out the “set” or “class” of thing that is being talked about, not the particular instance we failed on. The solution here is to get at the “provenance” of the data, i.e. “where did this instance come from?” and report that. Unfortunately, because Prolog loses the variable name, this is not easy.
My solution is to have all the arguments used in Perplexity be a special Prolog predicate called
arg/2 that holds the variable/value in the first position and metadata in the second position. Metadata is information such as the original variable name, tense, and plurality.
With that, the Prolog representation of
_be_v_id__exx (from above):
_be_v_id__exx LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x4 ]
looks like this:
arg(X57360, var([name(x4), type(reg), pers(3), num(sg)]))
This allows the
tryError predicate to pass all the metadata for the variable (including the instance it is currently checking) in the error like this:
context(thingIsNotOtherThing, arg(idLexi, var([name(x3), type(reg), pers(3), num(sg)])), arg(idRock2, var([name(x4), type(reg), pers(3), num(sg)])))'
Now, when the predicate fails, the caller knows both the instance
iRock2 and the variable it came from
x4. It can respond “Lexi is not a rock” by seeing which predicate introduced
_rock_n_1__x) and use its lemma (“rock”) in the answer. That is what Perplexity does.
Of course, this means that if you ask “Is Lexi the blue rock” you will get the same result: “Lexi is not a rock”. But I’ve found that this kind of failure of reporting errors on the “class” feels less egregious…
Clearly the technique of reporting the deepest error using the provenance of the data does not provably give “why” something failed. That’s an unsolved problem. Instead, it provides a heuristic that works surprisingly well…but: there are definitely ways to use it that give more success.
Now that we have the infrastructure in place (the
tryError/2 predicate and metadata for variables), I want to walk through some best practices I’ve found for using them.
Report failures from “semantically relevant” predicates.
Prolog execution is tree-like: it starts with a top-level predicate, executes that, which runs the predicates inside of it, which themselves have predicates in them, that get run, etc. In this system, the topmost predicates (usually) correspond to actual words the user said, and the farther down the stack you get, the more abstract the predicates are, ending in things like
length/2 which just returns the length of an array and has no coorspondance to anything the user would recognize.
Not surprisingly, I’ve found that the closer to the top an error is, the more “semantically relevant” it is to the user, and the more context it has to give in the error. Reporting errors deep in helper functions usually produces an error that won’t make much sense. Almost all of the
tryError predicates in Perplexity are immediately inside of ERG predicates. There are very few in helper functions.
Here’s a real example from the variant of
d_be_v_id__exx that checks if something is “conceptually” something else (e.g. “Is Lexi a person?”:
d_be_v_id__exx(EventIDArg, ThingArg, OtherThingArg, QuoteArg) :- tryError(specializes(Thing, OtherThing), context(d_be_v_id__exx, contextThingIsNotOtherThing, ThingArg, QuoteArg)).
Group together predicates that are only meaningful together
Often it is the case that it is a group of predicates that “mean” something and not a single one. In those cases, I’ve put the entire group as the first term of
Here’s a real example of the
d_on_p_loc__exx predicate variant that implements the type of “on” used when something is a part of something else, as in “Is the door on the car?” It checks to make sure the thing “on” it is not a conceptual face of the thing. We want “is the left side of the car on the car?” to fail. Kind of a technical detail but illustrative.
It uses bare parenthesis
() to group together
not(getFace(...)) into a single meaningful unit. If it fails, it means the thing is “notDirectionallyRelated”:
d_on_p_loc__exx(EventIDArg, OnIDArg, ObjectIDArg, CreateOrEvalArg) :- tryError( (rel(default, OnID, idPartOfRel, ObjectID), not(getFace(ObjectID, _, OnID))), context(d_on_p_loc__exx3, contextThingNotDirectionallyRelated, on, OnIDArg, ObjectIDArg)).
You could report an error for
rel such as
notAPartOf and a separate error for
not(getFace()) such as
isAConceptualFaceOf3DimensionalObject but that’s not going to make a lot of sense to the human. And if you don’t have an error for them both somehow, you’ll just get
false. which is probably worse.
Sometimes the predicate should be implemented less efficiently to report a good error
One of the variants of
d_the_q__xhh checks to make sure there is a single instance of the thing in question. It could have a single error that says “there is not exactly one of these” but instead breaks it into two so that it can say “there are none” or “there is more than one” which reads much better:
% "the" sometimes means a single item which is an *instance* of whatever got passed in d_the_q__xhh(VariableArg, RstrArg, BodyArg, QuoteOrEvalArg) :- % Make sure we are only counting *instances* X = (Rstr, isInstance(Variable)), count(Count, resolveTerm(X, Quote)), tryError(not(<(Count, 1)), location(d_the_q__xhh, countIsZero, X)), tryError(not(>(Count, 1)), location(d_the_q__xhh, moreThanOneItem, X)), % if that all worked, do it again for as many backtracks as there are resolveTerm(X, Quote), % And the run the body resolveTerm(Body, Quote).
Predicates that fail and we don’t care: “not”, “findall”?
Some predicates are implemented using
not() where they succeed if something fails. If the predicate inside of
not() is using
tryError to report why it fails, then that will get reported as a potential failure if the whole thing fails eventually. Clearly not what we wanted (so to speak).
For this case, I’ve got a helper predicate called
ignoreErrorIfSuccess(Term) that is used to ignore any recorded errors if it succeeds. So writing
ignoreErrorIfSuccess(not(...whatever...)) makes this case work like you’d expect.
Something similiar happens with the built-in predicate
findall. It will never fail, by definition, but the predicates it call can fail. Unless the call to
findall is wrapped with
ignoreErrorIfSuccess it could spuriously record errors that happen during its execution.
Returning a special error if no predicates succeed
I’m sure there are better examples of this, but here’s one I hit while implementing verbs that use prepositions.
To simplify, imagine that to implement the verb
go that handles predicates like
go *to* the room or
go *through* the door we simply implement a Prolog predicate for each type of preposition we want to handle, like this:
go(X) :- checkForThrough(X), ...logic to go *through*... go(X) :- checkForTo(X), ...logic to go *to*... go(X) :- checkForUnder(X), ...logic to go *under*...
Furthermore, imagine that each
checkFor... predicate uses
tryError to return a nice message if the preposition isn’t there like:
No 'through' preposition found.
If you then call
go with a preposition that matches none of them like
go *over* the door, you’ll get an error back that is basically randomly choosing one of the failures like
No 'to' preposition found, which is true, but misleading. Instead, you’d like to return an error like
I don't understand that preposition.
The pattern I’ve used here is remove the error checking from each
checkFor... predicate and elevate it to a final predicate that returns an error if none of them worked like this:
go(X) :- checkForThroughNoError(X), ...logic to go *through*... go(X) :- checkForToNoError(X), ...logic to go *to*... go(X) :- checkForUnderNoError(X), ...logic to go *under*... go(X) :- failWithErrorIfNoUnderstoodPrepositions(X),
But these are all just heuristics, it sometimes still doesn’t work well
This system works surprisingly well, well enough that I think I could launch a game using the basic approach. There are still places for improvement, though.
The easiest way to test the error handling is to make assertions about the world and see what the system says. It will always return the error generated as part of “No, that isn’t true: …[error here]”. Note that many of the problems you’ll see when doing this are not failures of the error handling approach described here, but are just poor implementations of the logic.
Some examples of things I’d like to fix:
User: The diamond is part of a safe. Lexi: No, that isn't true: a diamond is not a part [It should say "...part of a safe" but we don't dig into the provenence deep enough to see that the instance is also modified by `_safe_n`] User: There is an entrance cave. Lexi: No, that isn't true: There aren't any cave. [Another example of not checking the provenance of the data deeply enough. Knows the failure variable came from `_cave_n` but doesn't work back through `compound` and `_entrance_n`] User: [From the green cave] You are in this safe. Lexi: No, that isn't true: I don't know what is in a safe [In the green cave, "this safe" is locked and she doesn't know what is in it, so she says that. This fails because of how I've implemented hiding of things she doesn't know...And because the conceptual leap that she knows she's not in it without looking is not implemented either...] User: [From the diamond cave] You are in this safe. Lexi: No, that isn't true: I is not in a safe. [In the diamond cave, "this safe" is unlocked and open and she gives a reasonable if grammatically incorrect, answer.]