|
|||||
|
|||||
Scopes |
Scopes play a particularly important role in the design and execution of OmniMark programs. OmniMark has five kinds of scopes:
A lexical scope is a scope in the written structure of the program. For instance, a rule is a lexical scope -- it is written as a series of lines one after another. A function is also a lexical scope. Within a rule or function, a repeat
loop or a do...done
block is also a lexical scope.
Lexical scopes define the visibility of variables. You can declare a local variable in any lexical scope and it will be visible only to code within that scope. Where one lexical scope is nested inside another, variables declared in the outer scope are visible in the inner scope, unless a variable of the same name is declared in the inner scope. In this case, the variable in the outer scope is hidden within the inner scope, but it still exists in the outer scope.
process local stream foo local stream bar set foo to "A" set bar to "Z" output foo || bar do local stream foo set foo to "B" set bar to "Y" output foo || bar done output foo || bar
In this program the process rule is one lexical scope. The do...done block is another lexical scope nested inside the lexical scope of the rule. The program outputs "AZBYAY". The variable "bar", declared in the outer scope, is visible in the inner scope, so when its value is changed in the inner scope, the original variable is changed. The variable "foo", on the other hand, is a different variable inside the do...done block from the one declared in the rule. Changing the value of foo in the do...done block does not change the value of foo in the outer scope.
An execution scope is a lexical scope that is actually being executed at a particular point in a program. Just as lexical scopes can be nested in each other lexically, as shown above, execution scopes can be nested in each other in the course of program execution. The most straightforward case of execution nesting is a function call.
define integer function sum (value integer foo, value integer bar ) as return foo + bar process local stream foo local stream bar set foo to "A" set bar to "Z" output foo || bar do local stream foo set foo to "B" set bar to "Y" output foo || bar output "d" % sum (2, 4) done output foo || bar
Here the function "sum" is an entirely separate lexical scope. The variable names "foo" and "bar" used in the function have nothing to do with the variable names "foo" and "bar" in the process rule. But as the program is executed, the execution scope of the function is nested inside the execution scope of the process rule.
A more common case, in OmniMark, is the nested execution scoping that occurs when a find
rule fires as a result of a submit
in a rule:
process output "<rhyme>" submit "Mary had a little lamb" output "</rhyme>" find ("Mary" | "lamb") => person output "<person>" || person || "</person>"
This program outputs "<rhyme><person>Mary</person> had a little <person>lamb</person></rhyme>". In this program, the execution of the find rule is nested inside the execution of the process rule. The submit
initiates the scanning of the input data and invokes the find rules. It is this execution scoping that ensures that the "<rhyme>" and "</rhyme>" tags get wrapped around the material output as a result of the submit
.
The find rule and the process rule are independent lexical scopes but nested execution scopes. Note, however, that unlike the previous example in which the nested execution scope of the function was directly invoked by the function call, in this case it is the data that determines if and when a find rule will be executed in the execution scope established by the process rule. The fact that the data drives program execution in this way is what makes OmniMark such a powerful text processing tool.
While local variables are never visible outside their lexical scope, they are still instantiated for as long as their lexical scope is in execution scope, and they may well be active. Consider the following program:
process local stream foo open foo as file "foo.txt" using output as foo do output "<rhyme>" submit "Mary had a little lamb" output "</rhyme>" done find ("Mary" | "lamb") => person output "<person>" || person || "</person>"
In this case the local stream variable "foo" created in the process rule is the current output stream for the lexical scope bounded by using output as foo do
and done
. While it is not lexically in scope in the find rule, and you cannot put any code in the find rule to address or manipulate it, it is still very much active. It is the stream that output goes to when you say "output" in the rule.
What we saw above was in fact the establishment of an output scope. In most languages, "output" or its equivalent takes the form of an assignment statement, and the variable the assignment is made to must be in lexical scope. In OmniMark, the question of where output goes to is separated from the act of creating output, meaning that the stream that receives output does not have to be lexically in scope for you to output to it. Instead, a stream can be placed in an output scope. Once a stream is in the current output scope, all output will go to it, no matter what lexical scope the output statement occurs in.
You can use the keywords using output as
to create an output scope and to place a stream variable into that output scope. Like any other kind of scope, output scopes can be nested:
process local stream foo open foo as file "foo.txt" using output as foo do output "<rhyme>" submit "Mary had a little lamb" output "</rhyme>" done find ("Mary" | "lamb") => person local stream foo reopen foo as file "foo2.txt" using output as foo output "<person>" || person || "</person>"
Here a new output scope is established in the find rule, causing the material output in the find rule to be sent to a different destination. This output scope is nested inside the output scope created in the process rule. This scope becomes the current output scope again as soon as the find rule exits.
You can also place a stream into the current output scope, without creating a new output scope, using output-to
:
global stream foo global stream bar process open foo as buffer open bar as buffer using output as foo submit "Mary had a little lamb" close foo close bar output "Foo contains: " || foo output "%nBar contains: " || bar find " a " output-to bar
This program outputs the following:
Foo contains: Mary had Bar contains: little lamb
The output-to in the find rule resets the destination of the output scope established by the using output as
in the process rule. Thus the rest of the text goes to the new destination.
In general, you should use using output as
rather then output-to
, but output-to
is useful in certain situations, especially when the destination of data is determined by examining the data itself.
Consider a piece of XML that might be used to send files across a network. It encapsulates the name of the file and its contents inside "name" and encapsulates "data" elements inside a "file" element:
<file> <name>myfile.txt</name> <data>The content of the file.</data> </file>
We can process this with the following program:
global stream file-data process do xml-parse document scan file "files.xml" suppress done element file suppress close file-data element name open file-data as file "%c" output-to file-data element data output "%c"
Here the entire processing of the XML is done in a single output scope, but every time we find a filename in the input we change the destination of the current output scope. It would be difficult to do the same thing with using output as
, because the "data" element is not nested inside the "filename" element, so an output scope established in the "name" element would have expired before the "data" element was processed.
(By the way, the example shows poor XML language design. It would have been better to make the filename an attribute of the "file" element. But you can't always control the format of the data you have to process.)
If you use output-to
, note that placing a stream into an output scope does not exempt it from the rules of lexical scoping when it comes to the life span of variables. Local variables are created when their lexical scope enters execution scope and destroyed when their lexical scope leaves execution scope. It is an error to allow a local variable to go out of execution scope while it is still in an output scope. You will avoid this problem if you stick to using using output as
to create output scopes.
We have already seen several examples of an input scope. Every example above that uses a submit
or do xml-parse
is creating a new input scope. Input scopes are the flip side of output scopes. Just as output scopes determine where output goes, so input scopes determine where input comes from. Just as we never have to say where output goes to in an output statement, we never have to say where the input comes from when we write a find rule. Output goes to the current output scope. Input comes from the current input scope.
A new input scope is created by every submit
, every variant of scan
, and every matches
test. They establish the current input for the execution scopes contained within them. They also initiate scanning of that source. You can change the current input scope without initiating scanning by using using input as
.
Once an input scope is established, it is in effect for the execution scope of the submit
, scan
, or using input as
that established it. Within that scope, you can initiate a new scan of the current source using #current-input
. This allows you to perform efficient one-pass scanning of nested structures by initiating a new scan for each level of nesting, without the need to capture the whole structure and re-scan it.
The following code demonstrates this with the function "sum-of-csv", which calculates the sum of a series of comma-separated values found in the current input. This function could be called anywhere there is a current input scope, and it will consume a series of comma-separated numeric values from the current input scope and return the sum. It will exit as soon as it encounters data that does not fit the pattern it is looking for, leaving the current input scope intact, but with the comma-separated-value data consumed.
define integer function sum-of-csv as local integer sum initial {0} repeat scan #current-input match white-space* digit+ => number white-space* ","? set sum to sum + number again return sum process repeat scan "Results: (12,34,65, 92 , 75 )" match "Results:" white-space* "(" output "Total: " || "d" % sum-of-csv match ")" again
Note the difference between this code and the more common programming practice represented by the following program:
define integer function sum read-only integer numbers as local integer total initial {0} repeat over numbers set total to total + numbers again return total process local integer numbers variable repeat scan "Results: (12,34,65, 92 , 75 )" match "Results:" white-space* "(" [digit or space or ","]* => csv ")" repeat scan csv match digit+ => num set new numbers to num match any again output "Total: " || "d" % sum numbers again
The differences between these two pieces of code are twofold. First, in the second, more conventional, code the outer level of code is responsible for identifying the whole nested structure. This has a kind of symmetry about it, but it is misleading symmetry. The task of recognizing the beginning of a nested structure takes place outside the nested structure. (You find the door marked "IN" when you are outside; you find the door marked "OUT" when you are inside.) The task of recognizing the end of a nested structure should take place inside the nested structure. In our first example, the function that handles the comma-separated values is responsible for figuring out when the comma-separated values end. It does this very easily by exiting the repeat scan
as soon as it sees a character that does not fit the pattern it is looking for.
The second difference between the two programs is that the second program has to scan the csv data twice -- once when it is trying to find it in the data stream, and again when it is analyzed in the second repeat scan
. The first program processes the csv data and finds the end of the structure in one pass.
Referents also exist in scopes. The default referent scope is established at the start of a program and is resolved when the program ends. You can use the code using nested referents
to establish a nested referents scope. A nested referents scope is in effect for the duration of the execution scope with which it is established. The advantages of creating nested referent scopes are three:
Copyright © Stilo International plc, 1988-2008.