external-text-entity entity-name condition?


A rule used to provide OmniMark's SGML parser with the text of an external text entity (that is an external entity that is not cdata, sdata, ndata or subdoc) whenever such an entity is referenced in an SGML document. Its most important property is that everything written to the #markup-parser stream within the rule is considered part of the entity's text.

For example, this rule specifies that the text of the entity named "version" is the content of the file named "version.txt":

  external-text-entity version
     output file "version.txt"

If an external-text-entity outputs no text, the SGML parser treats the entity's replacement text as having zero characters. This is not an error.

It should be noted that, in a context-translation, an external-text-entity rule can be performed while the find-start rules are being performed. This will happen if the find-start rules output the text of an SGML declaration and there are external-text-entity rules for processing the entities represented by the public identifiers in the SGML declaration.

The output action is usually used inside an external-text-entity rule to provide the SGML parser with the entity's replacement text. In an external-text-entity rule, the default #current-output stream contains only the #markup-parser stream. That allows the replacement text of the entity to be fed to the parser using output actions.

Because anything written to the #markup-parser stream in an external-text-entity rule becomes part of the entity's text, the entity's text can be made up of one or more pieces from one or more sources.

If the replacement text of some of the external entities is small, all of the entities can be defined in a single file. This technique can be used to construct a "control file" for configurable documents.

The external-text-entity rule is invoked by the parser. It then starts up a new process to feed data into the parser. Data output in this process is fed to the parser (unless you explicitly change its destination). The new process consists of the code in the body of the rule, and any code invoked by the code in the body. It runs as a parallel process with the parser until the data it produces has been completely parsed.

Where an external text entity's text does not need processing, it is appropriate that an external-text-entity rule will use an output or put action to provide the file's text to the SGML parser.

Even if some processing is required, it can be done with a do scan or repeat scan in the external-text-entity rule, where each match emits the processed text using output or put.

On the other hand, if substantial processing is required, it will often be the case that it is more appropriate to submit the text of the file to be processed by find rules. In this case, any output of the find rules that process the submitted text is considered part of the text of the entity.

If the find rules are different from those used to process the main input, it will be necessary to use a using group prefix on the submit action to specify which find rules are used to process the submitted text. For example:

  external-text-entity #implied when entity is (public & in-library)
     using group entity-processing
        submit file "%pq"

Note that when a category is handled with a condition, all instances failing the condition must be handled using other rules. For example, when using a when #implied rule, a second rule must be used to handle unless #implied.

An output-to action is allowed in an external-text-entity rule. output-to in an external-text-entity rule remains in effect until the end of the rule, unless it is overridden by a further output-to.

Usually, the only active output stream in an external-text-entity rule is the #markup-parser stream, so text written using the output action becomes part of the replacement text of the external text entity. The output-to action allows the OmniMark programmer to redirect the output to another destination.

The following code shows how external-text-entity rules can be used to match named entities. The first rule will match all entities named in the program, except the #dtd entity and those used in the SGML Declaration (because they don't have names). This allows the #dtd entity to be processed differently than named entities. The second rule matches all entities in the dtd and the document instance by including both the #dtd and #implied entities:

  external-text-entity #implied when
  external-text-entity (#implied | #dtd) when

Entity replacement text can also be constructed from multiple sources.

The following external-text-entity rule processes any external text entity that has a system identifier. It treats the system identifier as a sequence of file names, separated by semicolons, and concatenates the text from all of the files together as the entity's replacement text.

  external-text-entity #implied when entity is system
     repeat scan "%eq"
     match [ \ ";"]+ => file-name
        output file file-name
     match ";"
        ; Ignore any semicolon

An example of an entity with multiple file names is a case where there is a general entity that represents the chapters that comprise the "advanced" part of a textbook:

  <!ENTITY advanced SYSTEM "chapter7.sgm;chapter8.sgm;chapter9.sgm">