Tree data structures

Trees are a very common data structure. There are many different kinds of trees that are used for many different purposes. All trees, however, have two things in common: the data structures themselves are recursive and the methods used to process them are recursive. Below are two of the many possible approaches to creating trees in OmniMark using records and recursive functions.

Shelves as trees

Both a shelf and a tree are ordered collections of values. You can use a shelf to hold the values you want to order in a tree structure, and use record references to represent the tree structure.

To create such a shelf you declare a record type that will act both as an item on the shelf and as a node on a tree. Since trees are recursive data structures, the key characteristic of a "node" record is that one of its fields is itself of type node:

  declare record node elsewhere
  
  declare record node
   field stream name
   field node children variable

There are two important things to note about the above declaration. First, the "node" record type must be pre-declared using elsewhere, so that its name will be available to declare as a field type in the full declaration. Second, the field children, which is of type node, must be declared variable. This is for two reasons, first, we may want to add several child nodes to any given node, but more importantly, we must leave open the possibility that some nodes will have no children. Unless we do, every node will immediately have one child node as soon as it is created, and your program will stall as it tries to create an infinite chain of nodes.

Once we have created the node record type, we can create our shelf/tree by creating a shelf of type node:

  process
     local node tree variable
     
     ; make some nodes
     new tree{"furniture"}
     set tree{"furniture"}:name to "furniture"
     new tree{"table"}
     set tree{"table"}:name to "table"
     new tree{"chair"}
     set tree{"chair"}:name to "chair"
     new tree{"stool"}
     set tree{"stool"}:name to "stool"
     new tree{"arm chair"}
     set tree{"arm chair"}:name to "arm chair"
     new tree{"coffee table"}
     set tree{"coffee table"}:name to "coffee table"

Immediately we see that adding items to the tree shelf has become a two step process. First we have to create a new item on the tree shelf and then we have to set the value of the name field. We can simplify this by creating a function to create new nodes:

  define node function new-node
   value string name
   as
     local node x
     set x:name to name
     return x

This function returns a node, so we can call it to create a new node on the tree shelf:

  process
     local node tree variable
     
     ; make some nodes
     set new tree{"furniture"} to new-node "furniture"
     set new tree{"table"} to new-node "table"
     set new tree{"chair"} to new-node "chair"
     set new tree{"stool"} to new-node "stool"
     set new tree{"arm chair"} to new-node "arm chair"
     set new tree{"coffee table"} to new-node "coffee table"

Notice that we are duplicating the values in the shelf keys and in the node names. One of the implications of this is that the items in the tree must all have unique values. This approach would not work for trees in which duplicate values can occur, such as a directory tree. Duplicating the values in the shelf keys makes it easier to insert new items into the tree.

So far we have a shelf of nodes, but not a tree. Next we can assign the various items on the shelf to the nodes to which they belong. In this case, we want to represent a family tree of furniture types. The shape of the final tree can be represented as follows:

  furniture
    table
      coffee table
    chair
      arm chair
      stool

To accomplish this, we assign each shelf item as a node of its parent in the tree:

     set new tree{"furniture"}:children{"table"} to tree{"table"}
     set new tree{"furniture"}:children{"chair"} to tree{"chair"}
     set new tree{"table"}:children{"coffee table"} to tree{"coffee table"}
     set new tree{"chair"}:children{"stool"} to tree{"stool"}
     set new tree{"chair"}:children{"arm chair"} to tree{"arm chair"}

The first line of the code above assigns the shelf item tree{"table"} as a new items on the children shelf that is a field of tree{"furniture"}.

tree{"furniture"} and tree{"table"} are both items on the tree shelf, but tree{"table"} is now also a child of tree{"furniture"} in the tree representation of the shelf structure. There is no duplication of information involved here. All record variables are references to records, not records themselves. What the line of code above does is create two references to the node with the name "table". In shelf terms, it is tree{"furniture"}, and in tree terms it is tree{"furniture"}:children{"table"}.

Now that we have a tree representation of the data, how do we navigate the tree? Because the tree is also a shelf, adding and reading individual items is relatively easy. In the assignment statements above, for example, we don't have to navigate the entire tree to insert an item, we can go directly to the node we are interested in by its key. Similarly, we don't have to walk the entire tree to find all the children of "chair". We can examine the children of chair simply by repeating over tree{"chair"}:children:

  repeat over tree{"chair"}:children as c
     output c:name || "%n"
  again

However, when you want to walk the tree, and bring out the tree-like relationships it contains, you need a recursive function. The following function recursively walks the tree and prints out its hierarchy:

  define string source function walk-tree
   value node t
   indent value integer indent optional initial {0}
   as
      output " " ||* indent
      output t:name || "%n"
      repeat over t:children as n
         output walk-tree n indent (indent + 2)
      again

This function outputs the name of the current node and then repeats over the children of that node, calling itself for each child. To show the tree structure visually, it introduces an indent as an optional parameter. The indent is increased by 2 on each recursive call. The output of this function looks like this:

  furniture
    table
      coffee table
    chair
      stool
      arm chair

Here is the complete program:

  declare record node elsewhere
  
  declare record node
   field stream name
   field node children variable
   
  define string source function walk-tree
   value node t
   indent value integer indent optional initial {0}
   as
      output " " ||* indent
      output t:name || "%n"
      repeat over t:children as n
         output walk-tree n indent (indent + 2)
      again
      
  define node function new-node
   value string name
   as
     local node x
     set x:name to name
     return x
   
  process
     local node tree variable
     
     ; make some nodes
     set new tree{"furniture"} to new-node "furniture"
     set new tree{"table"} to new-node "table"
     set new tree{"chair"} to new-node "chair"
     set new tree{"stool"} to new-node "stool"
     set new tree{"arm chair"} to new-node "arm chair"
     set new tree{"coffee table"} to new-node "coffee table"
     
     ;build the tree
     set new tree{"furniture"}:children{"table"} to tree{"table"}
     set new tree{"furniture"}:children{"chair"} to tree{"chair"}
     set new tree{"table"}:children{"coffee table"} to tree{"coffee table"}
     set new tree{"chair"}:children{"stool"} to tree{"stool"}
     set new tree{"chair"}:children{"arm chair"} to tree{"arm chair"}
     
     output walk-tree tree{"furniture"}

A true tree

The shelf as tree technique is valuable in many situations. However, it does not allow duplicate keys on any of the nodes. So you can only use this technique where the nodes have unique keys, or where keys are not used. (Not using keys, however, removes many of the advantages of the technique).

The following program creates a tree by creating new nodes directly on the "children" field of existing nodes. The entire tree, therefore, is built on a single fixed-size node variable called tree.

  declare record node elsewhere
  
  declare record node
   field stream name
   field node children variable
   
  define string source function walk-tree
   value node t
   indent value integer indent optional initial {0}
   as
      output " " ||* indent
      output t:name || "%n"
      repeat over t:children as n
         output walk-tree n indent (indent + 2)
      again
      
  define node function new-node
   value string name
   as
     local node x
     set x:name to name
     return x
   
  process
     local node tree initial {new-node "furniture" with key "furniture"} 
     
     ; build tree
     set new tree{"furniture"}:children{"table"} to new-node "table"
     set new tree{"furniture"}:children{"table"}:children{"coffee table"} 
      to new-node "coffee table"
     set new tree{"furniture"}:children{"chair"} to new-node "chair"
     set new tree{"furniture"}:children{"chair"}:children{"stool"} 
      to new-node "stool"
     set new tree{"furniture"}:children{"chair"}:children{"arm chair"} 
      to new-node "arm chair"
     set new tree{"furniture"}:children{"chair"}:children{"bench"} 
      to new-node "bench"
     set new tree{"furniture"}:children{"table"}:children{"bench"} 
      to new-node "bench"
  
     output walk-tree tree{"furniture"}

There are two things to notice about this program. The first is that new nodes have to be created by specifying their exact position in the tree from the root node:

  set new tree{"furniture"}:children{"chair"}:children{"stool"} 
   to new-node "stool"  

The second is that it is now possible to have two nodes with the same key on different branches of the shelf. Thus there is a node called "bench" on both the "chair" branch and the "table" branch.

It is also worth noting that (since record variables are references) we can place the same record on the tree in two different places. Thus if the "bench" record is the same record on both the "chair" and the "table" branches (rather than two different records both called "bench") then we can place the same record in both places like this:

  set new tree{"furniture"}:children{"chair"}:children{"bench"} 
   to new-node "bench"
  set new tree{"furniture"}:children{"table"}:children{"bench"} 
   to tree{"furniture"}:children{"chair"}:children{"bench"}  

Inserting records by stating their full path in code, as in the example above, can quickly get cumbersome to manage. We can make life easier by writing a function to insert items at a specified place in the tree. Here is the program again with an insert-node function added:

  declare record node elsewhere
  
  declare record node
   field stream name
   field node children variable
   
  define string source function walk-tree
   value node t
   indent value integer indent optional initial {0}
   as
      output " " ||* indent
      output t:name || "%n"
      repeat over t:children as n
         output walk-tree n indent (indent + 2)
      again
      
  define node function new-node
   value string name
   as
     local node x
     set x:name to name
     return x
     
  define function insert-node
   value string name
   into modifiable node n
   at value string address optional
   as
     
     do when address is specified
       do scan address
          match any++ => node-key (";" | =|)
             insert-node name into n:children{node-key} 
                                at #current-input
          else
             set new n:children{name} to new-node name 
       done
     else
       set new n:children{name} to new-node name 
     done
   
  process
     local node tree initial {new-node "furniture" with key "furniture"} 
     
     ; make some nodes
     insert-node "table" into tree 
     insert-node "coffee-table" into tree at "table"
     insert-node "chair" into tree 
     insert-node "stool" into tree at "chair"
     insert-node "arm chair" into tree at "chair"
     insert-node "foot stool" into tree at "chair;stool"
  
     output walk-tree tree{"furniture"}

As you would expect in a function that works on a tree, the insert-node function is recursive. It takes an optional string, heralded by the word "at" that specifies the place in the tree where the node is to be inserted. Each recursive call removes and evaluates one element of the address string until it is empty. The new node is then inserted at that point in the tree. The use of shelf keys to index nodes is once again evident, as it allows you to immediately locate the desired node as you navigate up the tree.

The insert-node function above will fail if the path specified does not exist. You can handle this one of two ways. If you don't want new paths to be created spontaneously, you can insert a catch into the program to handle the error. If you do want new paths created automatically, you can adapt the function to insert new nodes at any point in the tree:

  define function insert-node
   value string name
   into modifiable node n
   at value string address optional
   as
     
     do when address is specified
       do scan address
          match any++ => node-key (";" | =|)
              set new? n:children{node-key} to new-node node-key
              insert-node name 
              into n:children{node-key} 
              at #current-input
          else
             set new n:children{name} to new-node name 
       done
     else
       set new n:children{name} to new-node name 
     done

This has the additional benefit that you can reduce the number of calls required to build the tree:

  process
     local node tree initial {new-node "furniture" with key "furniture"} 
     
     insert-node "coffee-table" into tree at "table"
     insert-node "arm-chair" into tree at "chair"
     insert-node "foot-stool" into tree at "chair;stool"
     insert-node "swag lamp" into tree at "lighting;lamp"
  
     output walk-tree tree{"furniture"}

Other types of trees

You can construct many types of trees using the techniques explained above. Trees can also be used to assist in certain types of scanning operations. Some types of trees, particularly binary trees, benefit from the ability to positively express the concept of an empty node. Two techniques for doing this are discussed in representing emptiness.