Line breaking

You can have OmniMark add line breaks to the data in a stream.

The usual way to add a line break to a stream is to output %n in the appropriate places. However, in some cases, you don't have direct access to the output stream, such as the data content of an XML or SGML element, or you don't want to write code yourself to determine where line breaks should fall.

In these cases, you can set up automatic line breaking by including a line-breaking rule in your program. Before you can use a line-breaking rule, however, you need to tell OmniMark where to place line breaks. You can use the break-width declaration to establish where lines are broken and an insertion-break rule to determine how they are broken:

  break-width 72
  insertion-break "%n"
  
  process
     do xml-parse document scan #main-input
        output "%c"
     done
  
  element #implied
     output "%c"

The program above will output the data content of an XML file in 72-character lines. It will break each line after exactly 72 characters, even in the middle of a word. It will do so by inserting the %n character specified in the insertion-break rule.

If you want to have your lines broken only at the end of words, you can replace insertion-break with replacement-break. replacement-break replaces a specified character (usually a space) with the line break character:

  break-width 72
  replacement-break " " "%n"
  
  process
     do xml-parse document scan #main-input
        output "%c"
     done
  
  element #implied
     output "%c"

The above code replaces a space with a %n whenever the next space in the data is past the 72nd character position on the line.

In some cases, a space (or other specified character) may not occur for more than the number of characters specified by break-width. In this case, OmniMark will break at the first space (or other specified character) it does find, creating a line that is greater than the specified break-width limit. You can force OmniMark to report an error if a line exceeds a certain absolute maximum by specifying a second parameter for break-width:

  break-width 72 to 255
  replacement-break " " "%n"
  
  process
     do xml-parse document scan #main-input
        output "%c"
     done
  
  element #implied
     output "%c"

In the program above, lines will be broken to fit into the 72-character limit, unless a suitable breakpoint is not found, in which case they will break at the first suitable breakpoint after 72. If, however, no suitable breakpoint is found at or before position 255, an error occurs.

Note that the rule replacement-break " " "%n" replaces the space with %n; it does not simply insert %n after the space. If you want the space character to remain at the end of the line, you must re-insert it into the break string:

  replacement-break " " " %n"

The line-breaking string can be anything you like. For instance, you could use a replacement-break rule like this to insert email-style quoting:

  replacement-break " " "%n> "

If you do this, remember that natural line breaks (those already present in the data) are not affected by the replacement-break rule, so you would need to add a find rule to insert the quote character after a natural line break:

  find "%n"
     output "%n> "

Similarly, you can replace a character, other than a space, by specifying it in the replacement-break rule. The following code would break lines at an ampersand (removing the ampersand in the process):

  replacement-break "&" "%n"

Only data output in the course of an XML or SGML parse (that is, output by the parser itself or by output action in element rules of your program) is breakable by default. No other kind of output can be broken, unless you specify permitted breakpoints by inserting the format item %/ into the stream before the breakable character.

The following program will do line breaking and insert email-style quoting into data read from a text file. It enables line breaking by inserting %/ before every space in the file:

  break-width 60 to 72
  replacement-break " " "%n> "
  
  process
     submit file "linebreak.txt"
  
  find " "
     output "%/ "
  
  find "%n"
     output "%n> "

Limiting the Scope of Line Breaking

When break-width is used as a global declaration, OmniMark applies line breaking to output written to #main-output. To set line breaking for a particular stream, use break-width as a modifier to the open keyword. You can limit the application of line breaking by using break-width as a modifier to either the open or set actions:

  replacement-break " " "%n> "
  
  process
     local stream foo
     local stream bar
     local stream baz
  
     open foo as file "foo.txt" with break-width 72 to 80
     set bar with (break-width 36 to 50) to make-breakable-file ("bar.txt")
     open baz as file "baz.txt"

In the above program, any data written to foo will be subject to line breaking at 72 characters with a maximum of 80. The contents of the buffer attached to bar will be subject to line breaking at 36 characters with a maximum of 50. (The function make-breakable-file () will need to insert %/ into the stream at the appropriate places to enable line breaking.) Any data written to the baz will not be subject to line breaking.

Note that parentheses are used to offset the break-width in the set action. They are required to distinguish the to belonging to break-width from the to belonging to set.

Preventing Line Breaking

You can prevent line breaking in a circumstance where it would normally be active using either the "%[" format item or the h format modifier. Any text between the %[ and %] format items will not be broken. Text that appears between %[ and %] is not counted towards the preferred break width. It is only counted towards the maximum break width. You can use this facility when there is "hidden" text in the output that does not affect the length of displayed lines, and break-width is being used in formatting the output. It is deprecated in all other circumstances.

A %[ and the matching %] can be written by separate output or put actions. You can insert nested instances of %[ and %] into a stream.

You can use the h format modifier on a %c or %v to switch off line breaking altogether. If you output %hc in an element rule, no line breaking occurs for the data content of the element and its children. No error occurs if an output line exceeds the maximum specified line length. Line breaking is in effect for any data output directly in that element rule.

Line Breaking and Referents

You can apply line breaking to a stream that has referents allowed, but line breaking will be done before any referents are resolved. When the referents are resolved, they will add to the length of the lines in which they occur, and no further line breaking will be done.

insertion-break versus replacement-break

When both insertion-break and replacement-break declarations apply, preference is given to the applicable replacement-break. Breaking between words is generally preferable to breaking within words, even if the latter can be done within the rules of word division.

Limitations

You can have any number of replacement-break rules in a program, however you must use conditions to ensure that only one insertion-break and one replacement-break rule are active at a time.

The condition on a line-breaking rule cannot contain function calls.

The character to replace in a replacement-break rule must be a single literal character.

The replacement string in either type of line-breaking rule must contain only static text, and must contain at least one end-of-line sequence.

Related Topics