Building a Custom State Object

We are getting to the point where the examples need to get richer and hard-coding the state of the world or using the base State object is not going to be good enough. We need to step back and think about how to model the file system state in a more robust way. Even if you aren’t building an interface to a file system, many of the concerns and solutions described below could be useful to you. Designing how your state object works is a key design decision when building a Perplexity application.

The Perplexity State object

The default State object only has a small amount of code for manipulating application state, the rest of its implementation manipulates MRS variables. It is literally just one method: all_individuals() and we used it in the Action Verbs topic. Here it is again:

class State(object):
    def __init__(self, objects):

        ...

        self.objects = objects

    ...
    
    def all_individuals(self):
        for item in self.objects:
            yield item

    ...

The only reason it even has this basic implementation is because the system implementation of in_scope needs it to determine which objects are “proximate” to the user. So, outside of that, we are free to implement our system state in any way we like as long as it remains immutable. Any methods we implement that make changes must follow the pattern used by set_x() and add_to_e() and return a copy with the change instead of modifying the object directly. This is key for making the solver backtracking algorithm work.

So, we’ll need to add a notion of files and folders to State, and provide some ways to query the system about them. But all of this will only be used by the code we write to implement our custom predications. Perplexity will completely ignore it.

Identity

Because the system is built around immutable state, we will sometimes end up with two State objects and need to be able to find the same object contained in either one. We need a way to compare objects across state objects. The easiest way is to give all the objects in the system a globally unique id that can be easily compared. We’ll create a base class, UniqueObject that does this and derive everything from it:

class UniqueObject(object):
    def __init__(self):
        self.unique_id = uuid.uuid4()

Containment and Location

One of the main concepts in a file system is “containment” - folders contain files, files contain text, etc. We’ll want to model this in a general way so that words like “in” or “contains” can work across objects. We’ll do this by having two methods that objects can implement:

# Implement by yielding all objects that this object contains
def contained_items(self, variable_data):
    ...

# Implement by yielding all the places that this object "is" 
def all_locations(self, variable_data):
    ...

Files and Folders

Because users may talk about files or folders that don’t exist yet, or that may need to be created, we need the File and Folder object to be able to represent files and folders that don’t actually exist. So, these objects will have a small amount of information in them and call to a FileSystem object for the rest (we’ll implement that object in the next section).

It is very important that these objects implement __hash__ since that allows them to be in sets and dictionaries which are required by Perplexity. __repr__ is just a method that makes debugging nicer. Checking if things are equal is also required by Perplexity, which is why __eq__ is implemented. As described above, contained_items and all_locations are implemented in both to support predications that involve containment. The rest of the methods are helpers.

[Note that the File object also has a simplistic notion of a “linked file” (as in Unix) so that we can show the system answering questions about things that are in more than one place.]

class File(UniqueObject):
    def __init__(self, name, size=None, file_system=None, link=None):
        super().__init__()
        self.name = name
        self.size = size
        self.file_system = file_system
        self.link = link

        # If we assume link objects always have the same name, then if we only hash the name
        # link objects will have the same hash, but so will anything else with that name (which is OK)
        # Any other files in the system with the same name (including raw file specifiers) will
        # hash to the same value too.
        # This means that there could be collisions if there are lots of files with the same name
        # but it is unclear how else to do this
        self._hash = hash(self.file_name())

    def __repr__(self):
        return f"File(name={self.name}, size={self.size})"

    # The only required property is that objects which compare equal have the same hash value
    # But: objects with the same hash aren't required to be equal
    # It must remain the same for the lifetime of the object
    def __hash__(self):
        return self._hash

    def __eq__(self, obj):
        if isinstance(obj, File) and self._hash == obj._hash:
            if self.has_path() and obj.has_path():
                # If they both have a path, then the entire path must be ==
                # to make them ==
                # Unless there is a symbolic link, in which case the links must match
                self_name = self.name if self.link is None else self.link
                obj_name = obj.name if obj.link is None else obj.link
                return self_name == obj_name

            else:
                # If one or both of them doesn't have a path specified then it is a pure filename
                # which means it == the other object if the file name alone matches
                return self.file_name() == obj.file_name()

    def all_locations(self, variable_data):
        if self.exists():
            folder = self.file_system.item_from_path(str(pathlib.PurePath(self.name).parent), is_file=False)
            yield folder
            yield from folder.all_locations(variable_data)

        else:
            raise MessageException("notFound", [variable_data.name])

    def contained_items(self, variable_data):
        yield from self.file_system.contained_items(self, variable_data)

    def exists(self):
        return self.file_system.exists(self.name, is_file=True)

    # False if there is no path specified at all
    # including "./". Indicates the object is a raw
    # file specifier
    def has_path(self):
        return os.path.dirname(self.name) != ""

    def file_name(self):
        return pathlib.PurePath(self.name).parts[-1]

    def can_interpret_as(self, value):
        return pathlib.PurePath(self.name).match(value)

    def size_measurement(self):
        return Measurement(Megabyte(), self.size/1000000)


class Folder(UniqueObject):
    def __init__(self, name, size=0, file_system=None):
        super().__init__()
        self.name = name
        self.size = size
        self.file_system = file_system
        self._hash = hash(self.name)

    def __hash__(self):
        return self._hash

    def __repr__(self):
        return f"Folder(name={self.name}, size={self.size})"

    def __eq__(self, obj):
        return isinstance(obj, Folder) and str(self.name) == str(obj.name)

    def contained_items(self, variable_data):
        yield from self.file_system.contained_items(self, variable_data)

    def all_locations(self, variable_data):
        if self.exists():
            path = pathlib.PurePath(self.name)
            for parent_path in path.parents:
                yield self.file_system.item_from_path(parent_path, is_file=False)

        else:
            raise MessageException("notFound", [variable_data.name])

    def can_interpret_as(self, value):
        return pathlib.PurePath(self.name).match(value)

    def exists(self):
        return self.file_system.exists(self.name, is_file=False)

FileSystem

We’ll be using a fake FileSystem object for the examples so that we can inject the files and folders that we want to test, but the class is built to allow it to be implemented on top of a real file system as well. There are a lot of implementation details in this class that aren’t important for understanding how the system works – you can see the full implementation here. For our purposes, the important parts are the constructor and the implementation of all_individuals():

# Allows mocking up a file system for testing
class FileSystemMock(State):
    # current = the user's current directory as a string
    #
    # file_list must be in the form:
    # [(True, "/dir1/dir2/filename.txt", {"size": 1000} # Set to True for a file
    #  (False, "/dir3/dir4" # Set to False for a directory
    # ]
    # Adds the entire path of each directory as individual directories
    # in the file system
    def __init__(self, file_list, current):

        ...


    def all_individuals(self):

        ...

The constructor allows us to create a mock file system with whatever files and folders we want, as well as setting a “current” directory. It is used like this:

FileSystemMock([(True, "/documents/file1.txt", {"size": 1000}),
                (False, "/Desktop", {"size": 10000000}),
                (True, "/Desktop/file2.txt", {"size": 10000000}),
                (True, "/Desktop/file3.txt", {"size": 1000})],
                "/Desktop"))

It takes a list of tuples that describe files and folders. The first element of the tuple is True if the item is a file, False if folder. Sizes can be provided for each. The last argument is the folder that is the “current” directory.

Actor

We will encounter phrases that have an explicit person like “where am I?” as well as an implied person like “delete a file” (i.e. “[you] delete a file”). Either case generates predications that need “actors” to be modelled in the system:

# Represents something that can "do" things, like a computer
# or a human (or a dog, etc)
class Actor(UniqueObject):
    def __init__(self, name, person, file_system=None):
        super().__init__()
        self.name = name
        self.person = person
        self.file_system = file_system
        self._hash = hash((self.name, self.person))

    def __hash__(self):
        return self._hash

    def __eq__(self, other):
        if isinstance(other, Actor):
            return self._hash == other._hash

    def __repr__(self):
        return f"Actor(name={self.name}, person={self.person})"

    def all_locations(self, variable_data):
        if self.person == 1:
            # Return the locations for the user "me"
            yield self.current_directory()
            yield from self.current_directory().all_locations(variable_data)

    def current_directory(self):
        return self.file_system.current_directory()

An Actor has a person property that indicates what pronoun role it plays: 1 means “first person pronoun” like “I” or “me”, 2 means second person pronoun, which is always “the computer” in this system, etc. It also has a FileSystem member so it can find its “current directory”. Finally, it has the all_locations() method so we can find out where the Actor is.

FileSystemState

The last step is to create a new State object that uses the FileSystem object that we’ll actually use in the samples. We need to derive this from the Perplexity State object so that it can be used in the system, and we need to implement the State.all_individuals() method so that in_scope will work. Note that all individuals needs to return both actors and file system objects since they are all the objects in the system.

Note that there is also a save() method that uses Python “pickling” to save the state of the world, which is a simple way of saving a simple set of objects like these.

# The state representation used by the file system example
# note that the core system doesn't care at all what this object
# looks like. It is only the predications that interact with it
class FileSystemState(State):
    def __init__(self, file_system, current_user=None, actors=None):
        super().__init__([])
        self.file_system = file_system
        self.current_user = file_system_example.objects.Actor(name="User", person=1, file_system=file_system) if current_user is None else current_user
        self.actors = [self.current_user,
                       file_system_example.objects.Actor(name="Computer", person=2, file_system=file_system)] if actors is None else actors

    def save(self, file):
        pickle.dump(self.file_system, file, 5)
        pickle.dump(self.current_user, file, 5)
        pickle.dump(self.actors, file, 5)

    def all_individuals(self):
        yield from self.file_system.all_individuals()
        yield from self.actors

    def user(self):
        return self.current_user

The base State class will handle doing copies of the object when set_x or add_to_e are called. We don’t need to do anything special to handle the object being copied because the objects have all been carefully built to support the Python copy.deepcopy() method. They also all derive from UniqueObject so they can be compared across objects.

Using FileSystemState

To use the object, we modify the hello_world.py reset() function to return the new FileSystemState object instead of the default State object, like this:

... 

vocabulary = Vocabulary()


def reset():
    # return State([])

    return FileSystemState(FileSystemMock([(True, "/documents/file1.txt", {"size": 1000}),
                                           (False, "/Desktop", {"size": 10000000}),
                                           (True, "/Desktop/file2.txt", {"size": 10000000}),
                                           (True, "/Desktop/file3.txt", {"size": 1000})],
                                           "/Desktop"))

That change will ensure that our FileSystemState object is used by the solver in every call to one of our predications. That, and modifying the predications to start using the new state, is all that is needed.

Example

Only the _file_n_of, _folder_n_of, _large_a_1, delete_v_1_comm and pron predications need to be updated to use the new objects. The DeleteOperation class needs to be updated as well. These are all relatively minor changes, and the final functions are listed below:

@Predication(vocabulary, names=["_file_n_of"])
def file_n_of(context, state, x_binding, i_binding):
    def bound_variable(value):
        if isinstance(value, File):
            return True
        else:
            context.report_error(["valueIsNotX", value, x_binding.variable.name])
            return False

    def unbound_variable():
        for item in state.all_individuals():
            if bound_variable(item):
                yield item

    yield from combinatorial_predication_1(context,
                                           state,
                                           x_binding,
                                           bound_variable,
                                           unbound_variable)


# true for both sets and individuals as long as everything
# in the set is a file
@Predication(vocabulary, names=["_folder_n_of"])
def folder_n_of(context, state, x_binding, i_binding):
    def bound_variable(value):
        if isinstance(value, Folder):
            return True
        else:
            context.report_error(["valueIsNotX", value, x_binding.variable.name])
            return False

    def unbound_variable():
        for item in state.all_individuals():
            if bound_variable(item):
                yield item

    yield from combinatorial_predication_1(context,
                                           state,
                                           x_binding,
                                           bound_variable,
                                           unbound_variable)


@Predication(vocabulary,
             names=["_large_a_1"],
             handles=[("DegreeMultiplier", EventOption.optional)])
def large_a_1(context, state, e_introduced_binding, x_target_binding):
    # See if any modifiers have changed *how* large we should be
    degree_multiplier = degree_multiplier_from_event(context, state, e_introduced_binding)

    # "large" is being used "predicatively" as in "the dogs are large". This needs to force
    # the individuals to be separate (i.e. not part of a group)
    def criteria_bound(value):
        if hasattr(value, 'size') and value.size > degree_multiplier * 1000000:
            return True

        else:
            context.report_error(["adjectiveDoesntApply", "large", x_target_binding.variable.name])
            return False

    def unbound_values():
        # Find all large things
        for value in state.all_individuals():
            if hasattr(value, 'size') and value.size > degree_multiplier * 1000000:
                yield value

    yield from combinatorial_predication_1(context,
                                           state,
                                           x_target_binding,
                                           criteria_bound,
                                           unbound_values)


# Delete only works on individual values: i.e. there is no semantic for deleting
# things "together" which would probably imply a transaction or something
@Predication(vocabulary, names=["_delete_v_1"])
def delete_v_1_comm(context, state, e_introduced_binding, x_actor_binding, x_what_binding):
    # We only know how to delete things from the
    # computer's perspective
    if x_actor_binding.value[0].name == "Computer":
        def criteria(value):
            # Only allow deleting files and folders that exist
            if isinstance(value, (File, Folder)) and value.exists():
                return True

            else:
                context.report_error(["cantDo", "delete", x_what_binding.variable.name])

        def unbound_what():
            context.report_error(["cantDo", "delete", x_what_binding.variable.name])

        for new_state in individual_style_predication_1(context,
                                                        state,
                                                        x_what_binding,
                                                        criteria,
                                                        unbound_what,
                                                        ["cantDeleteSet", x_what_binding.variable.name]):
            yield new_state.record_operations([DeleteOperation(new_state.get_binding(x_what_binding.variable.name))])

    else:
        context.report_error(["dontKnowActor", x_actor_binding.variable.name])


# Delete any object in the system
class DeleteOperation(object):
    def __init__(self, binding_to_delete):
        self.binding_to_delete = binding_to_delete

    def apply_to(self, state):
        state.file_system.delete_item(self.binding_to_delete)


@Predication(vocabulary, names=["pron"])
def pron(context, state, x_who_binding):
    person = int(state.get_binding("tree").value[0]["Variables"][x_who_binding.variable.name]["PERS"])

    def bound_variable(value):
        return isinstance(value, Actor) and value.person == person

    def unbound_variable():
        for item in state.all_individuals():
            if bound_variable(item):
                yield item

    yield from combinatorial_predication_1(context, state, x_who_binding, bound_variable, unbound_variable)

With those changes, all the examples from before still work the same but now use the new objects:

python ./hello_world.py
? a file is large
Yes, that is true.

? which file is large?
(File(name=/documents/file2.txt, size=10000000),)

? what file is very large?
a file is not large

? a file is very large
a file is not large

? delete a large file
Done!

? a file is large
a file is not large

Comprehensive source for the completed tutorial is available here

Last update: 2024-10-25 by Eric Zinda [edit]