Handling Logic That Fails

Prolog uses a “Closed World Assumption”. Knowledge that is not known (or cannot be proved) is “false”.

This makes it hard to answer questions well that make no sense like “is the sky happy?”. This would return “false” because we have no data about happiness and the sky. We’d like to answer something like “the sky has no emotions (silly!)”.

It also makes it hard to answer questions that make sense but need more than a “no” like “Is the cave locked?” when there is no locking mechanism. Instead of “No” we’d like to see “No, there isn’t a lock.”

Humans demand more complicated responses than false. even in simple scenarios. For example, let’s walk through some scenarios where the user types “The rock is green”:

Scenario Prolog Response Plausible Human Response
There is a blue rock and a red rock false. “There is more than one rock, which do you mean?”
There are many rocks of different colors, including green ones false. “There is more than one rock, which do you mean? “
The person they are talking to doesn’t know the word ‘rock’ false. “I don’t know what a rock is”
The aren’t any rocks false. “There aren’t any rocks”

The approach I’ve taken in Perplexity is able to reply with all of the “plausible” responses above in those scenarios. To do this I’ve:

Reporting why a logic statement failed is an active area of research and I do not claim to have solved it. However, the approach I’ve taken has worked suprisingly well and is worth examination. I will note up front that there are still cases where the responses using the approach below are less than optimal. I’ve got a section that goes through some good (bad) examples at the end.

The rest of this post goes through the details of how it works.

Why Was Something “False” In Prolog?

To understand how Perplexity handles failures, first you need some background on how Prolog does. When a Prolog statement can’t be proved, the system simply returns false. and there is no error information that tells you why. That’s because it doesn’t know.

To show why, let’s continue with our simple example from above. Running “the rock is green” through MRS, scope-resolving a tree, and generating the Prolog gives:

"the rock is green"

Type: proposition

[ TOP: h0
INDEX: e2
RELS: < 
[ _the_q__xhh LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h5 BODY: h6 ]
[ _rock_n_1__x LBL: h7 ARG0: x3 [ x PERS: 3 NUM: sg ] ]
[ _green_a_2__ex LBL: h1 ARG0: e2 [ e SF: prop TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ]
>
HCONS: < h0 qeq h1 h5 qeq h7 > ]

                     ┌_rock_n_1__x:x3
 _the_q__xhh:x3,h5,h6┤
                     └_green_a_2__ex:e2,x3

Logic: _the_q__xhh(x3, _rock_n_1__x(x3), _green_a_2__ex(e2, x3))

[Simplified] Prolog: the(X, noun(X, rock), adjective(X, green)) 

I’ve shown a “simplified Prolog” in the above, which is similar to, but simpler than, what Perplexity would generate for illustrative purposes.

Now, let’s implement the first scenario (a red rock and a green rock exist) in simple Prolog. It uses a “triple store” for storing facts about the world, just like Perplexity, but, again, is simplified:

% things in the world
rock1.
rock2.
gem1.

% facts about them
fact(rock1, isA, rock).
fact(rock1, adjective, red).
fact(rock2, isA, rock).
fact(rock2, adjective, blue).
fact(gem1, isA, gem).

noun(X, Type) :- fact(X, isA, Type).
adjective(X, Value) :- fact(X, adjective, Value).

% Simple implementation of "the"
the(_, RSTR, BODY) :-  
  % Count the number of things that are true for RSTR  
  aggregate_all(count, RSTR, Count),  
  % Since this is "the", there should be exactly one  
  Count == 1,  
  % Since there is exactly 1, run RSTR again to get its value  
  RSTR,  
  % And now run the body  
  BODY.

If we now run it:

?- the(X, noun(X, rock), adjective(X, green))
false.

Prolog returns false as expected. But why did it fail from a Prolog perspective? If you examine how Prolog proves things, it turns out many things failed (in order):

…and this is ignoring the predicates which might have failed inside of aggregate_all which really wouldn’t make sense to anyone. Prolog doesn’t know which of those failures are the “ultimate reason for failure”. The real “reason” is that it couldn’t prove the statement was true.

All is not lost though! It turns out a couple of heuristics do a very reasonable job at reporting errors:

  1. Report the “deepest” failure
  2. Provide semantic information for places that fail

The next two sections describe how these work in Perplexity.

1. Report the “deepest” failure

Prolog proves logic statements by walking a tree of facts and diving into the logical chain of facts about them until it finds a set that are true (or fails). Intuitively, if you have a whole string of logic predicates in a row, probably the “deepest” failure is going to be a good “reason” why it failed (modulo a few things that I describe later). That’s because that is the farthest it got in proving the statement before it gave up.

If we look at the example above as a “stack” we can see where the failures happened:

?- the(X, noun(X, rock), adjective(X, green))
Prove: the(X, noun(X, rock), adjective(X, green))
  Prove: aggregate_all(count, noun(X, rock), Count)
    FAIL: fact(rock1, adjective, red)
    FAIL: fact(rock2, adjective, blue).
    FAIL: fact(gem1, isA, gem)
    Prove: Count == 1
      FAIL: Count == 2

Describing the reason “the rock is green” failed as:

“because ‘the’ means ‘the one thing’ but there are two so I couldn’t figure out which you mean’

seems much more “correct” for why this failed than:

“because rock1 is red” or “because gem1 is a gem”.

However, even though we may have found the failure point that makes the most sense, it’s hard to imagine how to generate the text:

'the' means 'the one thing' but there are two. I couldn't figure out which you mean

from the failure:

Failure at: Count == 1

2. Provide semantic information for places that fail

We can work around Prolog’s lack of a way to report the “most reasonable failure” and provide information that turns a raw predicate like “not(Count == 1)” into something meaningful with the same mechanism: annotate the implementation with extra information that says what it means if we fail at a particular place and remember that information at runtime if this is the deepest failure.

The predicate that does this is called tryError/2. The first argument is a predicate to try, and the second is what it “means” if it fails. Let’s annotate the implementation of the using this predicate:

% Simple implementation of "the"
the(_, RSTR, BODY) :-  
  % Count the number of things that are true for RSTR  
  aggregate_all(count, RSTR, Count),  
  % Since this is "the" there should be exactly one  
  tryError(Count == 1, context(theMeansOneAndThereIsMoreOrLessThanOne)),
  % Since there is exactly 1, run RSTR again to get its value  
  RSTR,  
  % And now run the body  
  BODY.

If the Prolog fails at Count == 1, the caller can now do a separate call to get whatever is in the second argument of tryError. It is up to the caller to make sense of it, but since you wrote it, presumably you are the caller so you’ll know what to do with it. In this case, converting theMeansOneAndThereIsMoreOrLessThanOne to a message like:

I'm not sure which you mean by 'the'

would be a fine solution. I’ll show how to get better messages farther down in this post, but for now a little background on how tryError/2 is implemented.

Implementation of tryError/2

The algorithm for the implementation is pretty simple. First, we have global state that remembers:

At runtime:

  1. Before executing a Prolog query, clear out the global state
  2. When tryError(Term, Error) is encountered, it:
    • Remembers what was the current best error before we got here
    • Sets Error as the current error if we are the deepest error
    • Then solves Term
    • If Term succeeds, put the previous best error back again (if something Term executed didn’t replace it with a better one)
  3. When the whole query fails, whatever is left in the global state is the error to return

The key predicate implementation is this:

% These three store the current 
% "best" error encountered so far
:- dynamic(currentTermIndex/1).  
currentTermIndex(0).  
:- dynamic(currentDepth/1).  
currentDepth(-1).  
:- dynamic(currentError/1).  
currentError(none).  

tryError(Term, Error) :-  
  % Remember whatever is the current context
  currentFailureContext(PreviousContext),  
  % If we are now the "deepest" error
  % set the context to be this error
  failureContext(Error),  
  % Remember whatever is now the "deepest" error
  % which might or might not be the one we just tried to set
  currentFailureContext(NewContext),  
  % Actually run the term
  Term,  
  % If we succeeded, we need to get rid of the new context 
  % we *might have* set. But only do this if nothing that 
  % Term called changed it.
  replaceFailureContextIfCurrent(NewContext, PreviousContext).

There are two important details.

First: What does “deepest” mean?

When initially using this approach I noticed that some scenarios caused deep stacks for “non-semantic” reasons (they iterated through a list using tail recursion for example) and thus would always “win” and cause a poorer reason to be returned. I realized that combining how “far” a failure was through the list of top-level predicates with how deep the failure was within that predicate was a better indicator of the “return the failure farthest in the proof” principle. This makes intuitive sense because the ERG only creates predicates that have a semantic meaning, and those are the ones that are executed at the top level. In a sense, this bounded a poor error to a single semantic unit instead of allowing it to take over.

So, “deepest” is actually a combination of two things: the index of the top-level predicate and the depth of the stack within that predicate. Here’s an example to illustrate what I mean:

?- predicate1(X), predicate2(X), predicate3(X).

The “depth” of a failure is depth(PredicateIndex, StackDepth). PredicateIndex indicates how far through the top level predicates we are: predicate1 has PredicateIndex = 1, predicate2 has PredicateInex = 2, etc. If:

Second: What is considered a “top-level predicate”?

Due to the fact that the MRS gets resolved to a tree, there is only one “top-level” predicate from the Prolog perspective. If you look at the example for “the rock is green”:

?- the(X, noun(X, rock), adjective(X, green))

you see that the “top-level” predicate is the. However, experimentally, the best error reporting results happened when each of the MRS predicates was considered “top-level” and the error reported was the farthest through the MRS predicates.

Luckily, that was easy to do since the way conversion from MRS to Prolog happens in the prototype automatically wraps each predicate with a call to erg/2. The whole point of this call is to mark it so that any errors that happen in the predicate (or in any predicates it “calls”) know the index of the predicate they are currently being called from. erg/2 does this by putting information in the stack, kind of like a try/catch block would:

% Used to call an ERG predicate and set the index of it as context  
% for any errors that occur  
erg(Index, Goal) :-  
  callWithContext(Goal, Index).
  
% callWithContext/2 makes sure it stays on the stack with Context as an argument
% so that, later, we can call getContext/1 and retrieve the current value of 
% Context from the stack 
callWithContext(Goal, Context) :-  
  call(Goal),  
  % The retainValue/1 call ensures callWithContext/2 remains on the stack and passing the argument ensures Prolog will not garbage collect the argument.  
  retainValue(Context).  
  
retainValue(_).  
  
% Retrieve the value of Context by looking up the stack and getting
% the nearest call to callWithContext/2
getContext(Context) :-  
  prolog_current_frame(Me),  
  prolog_frame_attribute(Me, parent_goal, callWithContext(_, Context)).

With that in place, I can now show how tryError pulls this information out and uses it. Here are all of the important predicates in this error handling infrastructure:

% These three store the current 
% "best" error encountered so far
:- dynamic(currentTermIndex/1).  
currentTermIndex(0).  
:- dynamic(currentDepth/1).  
currentDepth(-1).  
:- dynamic(currentError/1).  
currentError(none).  

% Used to call an ERG predicate and set the index of it as context  
% for any errors that occur  
erg(Index, Goal) :-  
  callWithContext(Goal, Index).
  
% callWithContext/2 makes sure it stays on the stack with Context as an argument
% so that, later, we can call getContext/1 and retrieve the current value of 
% Context from the stack 
callWithContext(Goal, Context) :-  
  call(Goal),  
  % The retainValue/1 call ensures callWithContext/2 remains on the stack and passing the argument ensures Prolog will not garbage collect the argument.  
  retainValue(Context).  
  
retainValue(_).  
  
% Retrieve the value of Context by looking up the stack and getting
% the nearest call to callWithContext/2
getContext(Context) :-  
  prolog_current_frame(Me),  
  prolog_frame_attribute(Me, parent_goal, callWithContext(_, Context)).

% Called by predicates to record a potential failure that should be reported to the user
tryError(Term, Error) :-  
  % Remember whatever is the current context
  currentFailureContext(PreviousContext),  
  % If we are now the "deepest" error
  % set the context to be this error
  failureContext(Error),  
  % Remember whatever is now the "deepest" error
  % which might or might not be the one we just tried to set
  currentFailureContext(NewContext),  
  % Actually run the term
  Term,  
  % If we succeeded, we need to get rid of the new context 
  % we *might have* set. But only do this if nothing that 
  % Term called changed it.
  replaceFailureContextIfCurrent(NewContext, PreviousContext).

% Finally, failureContext/1 is used to actually check if the error
% is now the "deepest" by using getContext/1 to figure out the index
% of the predicate we are currently resolving and how deep in the stack we are

% Case 1: when we are farther in the term list than the current failure for the first time  
failureContext(Value) :-  
  % Get the currently executing "top level" index
  getContext(TermIndex),  
  % Get the last recorded "best" error index
  currentTermIndex(CurrentTermIndex),  
  TermIndex > CurrentTermIndex,  
  % Update all the state since we are deeper now  
  retractall(currentTermIndex(_)),  
  assert(currentTermIndex(TermIndex)),  
  % Update the absolute depth of the stack 
  prolog_current_frame(Frame),  
  prolog_frame_attribute(Frame, level, Depth),  
  retractall(currentDepth(_)),  
  assert(currentDepth(Depth)),  
  % And finally, record the actual error message
  retractall(currentError(_)),  
  assert(currentError(Value)),  
  !.  
  
% Case 2: when the current and last known are the same predicate index
% 	only update, if the absolute depth of the stack is deeper  
failureContext(Value) :-  
  getContext(TermIndex),  
  currentTermIndex(CurrentTermIndex),  
  TermIndex = CurrentTermIndex,  
  % See if this actually went deeper in the current term  
  prolog_current_frame(Frame),  
  prolog_frame_attribute(Frame, level, Depth),  
  currentDepth(CurrentDepth),  
  Depth >= CurrentDepth,  
  % Update all the state except Index since we are deeper now  
  retractall(currentDepth(_)),  
  assert(currentDepth(Depth)),  
  retractall(currentError(_)),  
  assert(currentError(Value)),  
  !.  
  
% Case 3: needs to still succeed even if the context wasn't set or
% the actual predicate won't run
failureContext(_).

Errors Should Be Reported Against The Variable, Not The Instance

Prolog replaces variables with their value at runtime and it is not possible to know what the original variable was:

% Will always write the *value* of X and there 
% is no way to know, from inside of writeln, 
% that the value came from a variable called "X"
foo(X) :- writeln(X). 

There are cases where this causes problems. For example, “Is Lexi a rock?”:

"Is Lexi a rock?"

Type: question
[ TOP: h0
INDEX: e2
RELS: < 
[ _a_q__xhh LBL: h10 ARG0: x4 [ x PERS: 3 NUM: sg IND: + ] RSTR: h11 BODY: h12 ]
[ _rock_n_1__x LBL: h13 ARG0: x4 [ x PERS: 3 NUM: sg IND: + ] ]
[ proper_q__xhh LBL: h5 ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] RSTR: h6 BODY: h7 ]
[ named__cx LBL: h8 CARG: "Lexi" ARG0: x3 [ x PERS: 3 NUM: sg IND: + ] ]
[ _be_v_id__exx LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x4 ]
>
HCONS: < h0 qeq h1 h6 qeq h8 h11 qeq h13 > ]

                       ┌named__cx:Lexi,x3
 proper_q__xhh:x3,h6,h7┤
                       │                    ┌_rock_n_1__x:x4
                       └_a_q__xhh:x4,h11,h12┤
                                            └_be_v_id__exx:e2,x3,x4

Logic: proper_q__xhh(x3, named__cx(Lexi, x3), _a_q__xhh(x4, _rock_n_1__x(x4), _be_v_id__exx(e2, x3, x4)))

You can see by the way the tree is built that Prolog is going to:

  1. Find the person named ‘Lexi’ and put that in x3
  2. Then backtrack through each rock in the system and put it in x4
  3. Finally, it will see if x3 “be” x4 using _be_v_id__exx

_be_v_id__exx has a few ways to check if something “is” something else. Once is conceptual, i.e. “is that person a type which derives from that type”? Another just checks are the two things are identical. If the first one fails, you’ll get a good failure message. If the system is written such that the second version is the “deepest stack” (as mine is) you’ll end up failing on a line like this:

...
tryError(x3 == x4, context(thingIsNotOtherThing(x3, x4))),
...

which means the error will be thingIsNotOtherThing(lexi, rock2) which translates to “Lexi is not the second rock”. Clearly not what we wanted. We wanted “Lexi is not a rock”.

What we need here is a way to find out the “set” or “class” of thing that is being talked about, not the particular instance we failed on. The solution here is to get at the “provenance” of the data, i.e. “where did this instance come from?” and report that. Unfortunately, because Prolog loses the variable name, this is not easy.

My solution is to have all the arguments used in Perplexity be a special Prolog predicate called arg/2 that holds the variable/value in the first position and metadata in the second position. Metadata is information such as the original variable name, tense, and plurality.

With that, the Prolog representation of x4 in _be_v_id__exx (from above):

_be_v_id__exx LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x4 ]

looks like this:

arg(X57360, var([name(x4), type(reg), pers(3), num(sg)]))

This allows the tryError predicate to pass all the metadata for the variable (including the instance it is currently checking) in the error like this:

context(thingIsNotOtherThing, 
  arg(idLexi, var([name(x3), type(reg), pers(3), num(sg)])), 
  arg(idRock2, var([name(x4), type(reg), pers(3), num(sg)])))'

Now, when the predicate fails, the caller knows both the instance iRock2 and the variable it came from x4. It can respond “Lexi is not a rock” by seeing which predicate introduced x4 (_rock_n_1__x) and use its lemma (“rock”) in the answer. That is what Perplexity does.

Of course, this means that if you ask “Is Lexi the blue rock” you will get the same result: “Lexi is not a rock”. But I’ve found that this kind of failure of reporting errors on the “class” feels less egregious…

Patterns For tryError/2

Clearly the technique of reporting the deepest error using the provenance of the data does not provably give “why” something failed. That’s an unsolved problem. Instead, it provides a heuristic that works surprisingly well…but: there are definitely ways to use it that give more success.

Now that we have the infrastructure in place (the tryError/2 predicate and metadata for variables), I want to walk through some best practices I’ve found for using them.

Report failures from “semantically relevant” predicates.

Prolog execution is tree-like: it starts with a top-level predicate, executes that, which runs the predicates inside of it, which themselves have predicates in them, that get run, etc. In this system, the topmost predicates (usually) correspond to actual words the user said, and the farther down the stack you get, the more abstract the predicates are, ending in things like length/2 which just returns the length of an array and has no coorspondance to anything the user would recognize.

Not surprisingly, I’ve found that the closer to the top an error is, the more “semantically relevant” it is to the user, and the more context it has to give in the error. Reporting errors deep in helper functions usually produces an error that won’t make much sense. Almost all of the tryError predicates in Perplexity are immediately inside of ERG predicates. There are very few in helper functions.

Here’s a real example from the variant of d_be_v_id__exx that checks if something is “conceptually” something else (e.g. “Is Lexi a person?”:

d_be_v_id__exx(EventIDArg, ThingArg, OtherThingArg, QuoteArg) :-   
  tryError(specializes(Thing, OtherThing),  
    context(d_be_v_id__exx, contextThingIsNotOtherThing, ThingArg, QuoteArg)).

Group together predicates that are only meaningful together

Often it is the case that it is a group of predicates that “mean” something and not a single one. In those cases, I’ve put the entire group as the first term of tryError.

Here’s a real example of the d_on_p_loc__exx predicate variant that implements the type of “on” used when something is a part of something else, as in “Is the door on the car?” It checks to make sure the thing “on” it is not a conceptual face of the thing. We want “is the left side of the car on the car?” to fail. Kind of a technical detail but illustrative.

It uses bare parenthesis () to group together rel() and not(getFace(...)) into a single meaningful unit. If it fails, it means the thing is “notDirectionallyRelated”:

d_on_p_loc__exx(EventIDArg, OnIDArg, ObjectIDArg, CreateOrEvalArg) :-   
  tryError(  
    (rel(default, OnID, idPartOfRel, ObjectID),  
    not(getFace(ObjectID, _, OnID))),  
  context(d_on_p_loc__exx3, contextThingNotDirectionallyRelated, on, OnIDArg, ObjectIDArg)).

You could report an error for rel such as notAPartOf and a separate error for not(getFace()) such as isAConceptualFaceOf3DimensionalObject but that’s not going to make a lot of sense to the human. And if you don’t have an error for them both somehow, you’ll just get false. which is probably worse.

Sometimes the predicate should be implemented less efficiently to report a good error

One of the variants of d_the_q__xhh checks to make sure there is a single instance of the thing in question. It could have a single error that says “there is not exactly one of these” but instead breaks it into two so that it can say “there are none” or “there is more than one” which reads much better:

% "the" sometimes means a single item which is an *instance* of whatever got passed in  
d_the_q__xhh(VariableArg, RstrArg, BodyArg, QuoteOrEvalArg) :-   
  % Make sure we are only counting *instances*  
  X = (Rstr, isInstance(Variable)),  
  count(Count, resolveTerm(X, Quote)),  
  tryError(not(<(Count, 1)),  
    location(d_the_q__xhh, countIsZero, X)),  
  tryError(not(>(Count, 1)),  
    location(d_the_q__xhh, moreThanOneItem, X)),  
  % if that all worked, do it again for as many backtracks as there are  
  resolveTerm(X, Quote),  
  % And the run the body  
  resolveTerm(Body, Quote).

Predicates that fail and we don’t care: “not”, “findall”

Some predicates are implemented using not() where they succeed if something fails. If the predicate inside of not() is using tryError to report why it fails, then that will get reported as a potential failure if the whole thing fails eventually. Clearly not what we wanted (so to speak).

For this case, I’ve got a helper predicate called ignoreErrorIfSuccess(Term) that is used to ignore any recorded errors if it succeeds. So writing ignoreErrorIfSuccess(not(...whatever...)) makes this case work like you’d expect.

Something similiar happens with the built-in predicate findall. It will never fail, by definition, but the predicates it call can fail. Unless the call to findall is wrapped with ignoreErrorIfSuccess it could spuriously record errors that happen during its execution.

Ignoring the error from a predicate

Sometimes a predicate “knows” that it has a better error for the user than the one that will be set by a called predicate because it has more context. One example is d_compound__exx. It gets used in many situations, but one is in a phrase like “aluminum desk”. It gets passed both words and has to find something that is an “aluminum desk”. One of the alternatives it tries is seeing if “aluminum” is an adjective property of “desk”.

It turns out that the predicate for detecting adjectives, adjectiveValueOfType sets the current error so that adjective errors are consistent. But in this case, we know that the user isn’t looking for an adjective, they are looking for a compound term. So, if the user says “There is an aluminum desk”, instead of letting the default error “aluminum is not an adjective of desk” happen, I ignore the error and instead return “couldn’t find anything named aluminum desk”.

d_compound__exx(_, ActualItemIDArg, AdjectiveArg, _) :- Variant = d_compound__exx5, arg(ActualItemID, _) = ActualItemIDArg, arg(Adjective, _) = AdjectiveArg,
    % Find an adjective on ActualItem that is the same type and value
    % But ignore the errors because "contextNoCompoundName" is a better error than "x is not an adjective of Y"
    tryError(ignoreErrors(adjectiveValueOfType(ActualItemIDArg, _, AdjectiveArg)),
        context(Variant, contextNoCompoundName, AdjectiveArg, ActualItemIDArg)).

Here is the implementation of ignoreErrors:

% Used to ignore any errors generated by Term
ignoreErrors(Term) :-
 	% Remember whatever is the current context
	currentFailureContext(PreviousContext),
	% put it back in both success and failure of term
    (Term *->
        replaceFailureContext(PreviousContext);
        (replaceFailureContext(PreviousContext), fail)).

Returning a special error if no predicates succeed

I’m sure there are better examples of this, but here’s one I hit while implementing verbs that use prepositions.

To simplify, imagine that to implement the verb go that handles predicates like go *to* the room or go *through* the door we simply implement a Prolog predicate for each type of preposition we want to handle, like this:

go(X) :- 
  checkForThrough(X), ...logic to go *through*...
go(X) :- 
  checkForTo(X), ...logic to go *to*...
go(X) :- 
  checkForUnder(X), ...logic to go *under*...

Furthermore, imagine that each checkFor... predicate uses tryError to return a nice message if the preposition isn’t there like: No 'through' preposition found.

If you then call go with a preposition that matches none of them like go *over* the door, you’ll get an error back that is basically randomly choosing one of the failures like No 'to' preposition found, which is true, but misleading. Instead, you’d like to return an error like I don't understand that preposition.

The pattern I’ve used here is remove the error checking from each checkFor... predicate and elevate it to a final predicate that returns an error if none of them worked like this:

go(X) :- 
  checkForThroughNoError(X), ...logic to go *through*...
go(X) :- 
  checkForToNoError(X), ...logic to go *to*...
go(X) :- 
  checkForUnderNoError(X), ...logic to go *under*...
go(X) :- 
  failWithErrorIfNoUnderstoodPrepositions(X),

But these are all just heuristics, it sometimes still doesn’t work well

This system works surprisingly well, well enough that I think I could launch a game using the basic approach. There are still places for improvement, though.

The easiest way to test the error handling is to make assertions about the world and see what the system says. It will always return the error generated as part of “No, that isn’t true: …[error here]”. Note that many of the problems you’ll see when doing this are not failures of the error handling approach described here, but are just poor implementations of the logic.

Some examples of things I’d like to fix:

User: The diamond is part of a safe.
Lexi: No, that isn't true: a diamond is not a part
[It should say "...part of a safe" but we don't dig into the 
provenence deep enough to see that the instance is also modified by
`_safe_n`]

User: There is an entrance cave.
Lexi: No, that isn't true: There aren't any cave.
[Another example of not checking the provenance of the data deeply enough. 
Knows the failure variable came from `_cave_n` but doesn't work back 
through `compound` and `_entrance_n`]

User: [From the green cave] You are in this safe.
Lexi: No, that isn't true: I don't know what is in a safe
[In the green cave, "this safe" is locked and she doesn't know what is in it, so she says that. 
This fails because of how I've implemented hiding of things she doesn't 
know...And because the conceptual leap that she knows she's not in it without 
looking is not implemented either...]

User: [From the diamond cave] You are in this safe.
Lexi: No, that isn't true: I is not in a safe.
[In the diamond cave, "this safe" is unlocked and open and she gives a reasonable
if grammatically incorrect, answer.]