Performance Issue: `thing` and Free Variables
It turns out that lots of MRS representations end up using
thing(X) as a predicate, which means “filter X to the set of all things”. This is a large set. Because Prolog doesn’t operate on “sets” but instead backtracks through instances one by one this ended up creating a big performance bottleneck.
Here’s an example:
"what is on the table?" [ TOP: h0 INDEX: e2 RELS: < [ which_q__xhh LBL: h5 ARG0: x3 [ x PERS: 3 NUM: sg ] RSTR: h6 BODY: h7 ] [ thing__x LBL: h4 ARG0: x3 [ x PERS: 3 NUM: sg ] ] [ _the_q__xhh LBL: h9 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] RSTR: h10 BODY: h11 ] [ _table_n_1__x LBL: h12 ARG0: x8 [ x PERS: 3 NUM: sg IND: + ] ] [ _on_p_loc__exx LBL: h1 ARG0: e2 [ e SF: ques TENSE: pres MOOD: indicative PROG: - PERF: - ] ARG1: x3 ARG2: x8 ] > HCONS: < h0 qeq h1 h6 qeq h4 h10 qeq h12 > ] ┌thing__x:x3 which_q__xhh:x3,h6,h7┤ │ ┌_table_n_1__x:x8 └_the_q__xhh:x8,h10,h11┤ └_on_p_loc__exx:e2,x3,x8 Logic: which_q__xhh(x3, thing__x(x3), _the_q__xhh(x8, _table_n_1__x(x8), _on_p_loc__exx(e2, x3, x8)))
Let’s say there are 1000 “things” in the world (which is a small number if you consider that properties are things, names are things, etc. The reality is much bigger). Because
thing(x3) appears right at the beginning, Prolog will literally backtrack 1000 times through everything after the
Now notice that all
thing__x is doing is handing, one by one, a list of things in the world to
_on_p_loc__exx which checks to see if they are “on the table”. Because of the way Prolog works, if you instead run the query
_on_p_loc__exx with an unbound variable (i.e. one that hasn’t been set to anything), a well-structured Prolog program will list out the things on the table much more efficiently.
I found this
thing(X) pattern to be a bad enough performance problem in many queries that I had to come up with an alternative solution: have the implementation of
thing(x) simply leave the variable unbound (i.e. don’t have thing(x) do anything!) and have it work through the magic of Prolog!
But, if it were that simple I wouldn’t have this whole section for it…
The problem occurs with quoting. If, in this example, you asked
_on_p_loc__exx to quote itself (perhaps with the phrase “put everything on the table”), it has to come up with a representation for the variable that can be stored to disk easily. Because Prolog can’t round trip a variable and have it come back as the same variable, we need a way to map it:
% Showing what happens if you convert a variable to a string and back ?- term_string(Z, X), writeln(Z), term_string(Y, X), writeln(Y), Y=a, writeln(Z). % It isn't the same variable anymore! _18908 % This is what Prolog calls "Z" internally _18918 % This is what Prolog calls "Y" internally after we create it % using the string representation of Z! It isn't the same! _18908 % This is the value of Z at the very end, clearly "Y=a" didn't set it % So Z was not successfully round tripped through a string X = _18908, % This is the actual string representation of Z, which is expected Y = a.
Serializing Free Variables
So, the work-around is pretty simple…but a little hacky.
First: when we are asked to serialize a free variable in a triple using the Perplexity Ontology, we store a magic value
ERGVariable is the original name of the variable from the MRS such as
x3 in the example from above:
Then, when we deserialize it, we need a way to map from something like
idVariable(x3) back to the real Prolog variable so that Prolog can report the answer correctly. As I showed in the previous section, this isn’t as simple as calling
term_string on the variable. Instead, I create a mapping of ERG variable name to Prolog variable for all free variables. I store it on the stack as an alternative to having a global variable around that needs to be managed.
To keep it on the stack, I just use
freeVariables/2 (see below) to call any predicates I want to have access to this information. Within that call, any predicate can call
getFreeVariable/2 to retrieve the proper variable:
% expects Variables to be of the form: [item(ergName, ActualPrologVariable), ...] freeVariables(Goal, Variables) :- call(Goal), % The retainVariables/1 call ensures freeVariables/2 remains % on the stack and passing the argument ensures Prolog will not % garbage collect the argument. retainVariables(Variables). retainVariables(_). % Retrieve the Prolog variable given an erg variable name in "Name" getFreeVariable(Name, Variable) :- prolog_current_frame(Me), prolog_frame_attribute(Me, parent_goal, freeVariables(_, Variables)), memberchk(item(Name, Variable), Variables).
This means that all Prolog code can ignore this optimization, as long as:
- There is a call to
freeVariables/2that surrounds all the predicates
- The code that stores triples uses
You can see this in practice in the Executing the Prolog section since that is where the predicates get wrapped with