XXIIVV

String rewriting languages are usually a series of rules that transform strings in a given alphabet.

A string rewriting system is based on an alphabet which is just a set of symbols. Strings over a given alphabet are arbitrary sequences of symbols, each taken from the alphabet. The empty sequence or null string is included. A string rewriting system consists of an alphabet and a relation ->, a set of such pairs or rules.

swap x y -> y x

These languages have multiple, distinct phases with different syntactic and semantic rules. There are often two phases; the first gives a set of rules, and the second provides objects on which those rules are to be applied.

swap foo bar ; bar foo

Thue is a matrioshka esoteric computer based on string rewriting rules.

A Thue program consists of two parts: a list of substitution rules, which is terminated with a line having both sides of the operator empty, followed by a string representing the initial program state.

#::=Unused rules are comments
a::=~Hello Thue!
::=
[a] []

Execution consists of picking, from the list of rules, an arbitrary rule whose original string exists as a substring somewhere in the program state, and replacing that substring by the rule's replacement string. This process repeats until there are no rules that can be applied, at which point, the program ends.

#::=Increment binary number
1_::=1++
0_::=1
01++::=10
11++::=1++0
_0::=_
_1++::=10
::=
_10010011_ _10010100

Thue represents one of the simplest possible constraint-based programming language. It is to the constraint-based paradigm what languages like OISC are to the imperative paradigm.

Input

Added to this simple system are two strings which are used to permit Thue to communicate with the outside world. The first of these is the input symbol (":::"). The input symbol is actually the lhs of an implicit rule of which the user (or system's "input stream") is a component. The input symbol, therefore, is replaced by a line of text received from the "input stream."

Output

As a counterpart of input, the output symbol ("~") is supplied. Like the input symbol, the output symbol triggers an implicit rule which, in this case, encompasses the "output stream." The specific effect is that all text to the right of the output symbol in the rhs of a production is sent to the output stream.

Note that either (or both) of these implicit rules may be overridden by providing explicit rules that perform some other task.

#::=Sierpinski's triangle, backticks are linebreaks
X::=~_
Y::=~*
Z::=~`
_.::=._X
_*::=*_Y
._|::=.Z-|
*_|::=Z
..-::=.-.
**-::=*-.
*.-::=*-*
.*-::=.-*
@.-::=@_.
@*-::=@_*
::=
@_*...............................|
It is pitch black. You are likely to be eaten by a Thue.

Modal is a programming language based on string rewriting.

Modal programs are represented as a series of substitution rules, applied to a given tree which gets continually modified until no rules match any given part of the tree. The principale elements of modal are:


The documentation below displays the examples as a series of rules, followed by the rewriting steps in the following format:

<> A rule

.. The input program
04 The result of applying rule #4
-1 The result of applying a lambda

Modal's evaluation model is based on scanning from left-to-right across a string that represents a serialized tree. We only match from the start of the string, and if we can't find a rule that matches, we move one token or subtree forward. All rules match against the start of the string, and if one matches, the matched pattern is erased, and the right-hand side of the rule is written to the end of the string.

Rules

To define a new rule, start with <>, followed by a left and a right statement, which is either a word, or a tree. The program evaluation starts at the first character of the string and walks through to the end trying to match a transformation rule from that location:

<> hello (good bye)  This is a rule

.. hello world       This is program data
00 good bye world    This is the result

Rules can be also be undefined using the >< operation that will try matching a previously declared rule's left statement:

<> cat owl
<> owl bat
<> owl rat
>< owl

.. cat 
00 owl 
02 rat

Modal is homoiconic, meaning that any string is a potential program and new rules can be composed directly during the evaluation. For instance, here is a rule that defines a new rules definition infix syntax:

<> (?x -> ?y) (<> ?x ?y)
fruit_a -> apple
fruit_b -> banana
(apple banana) -> fruit-salad

.. fruit_a fruit_b
01 apple fruit_b
02 apple banana
03 fruit-salad

Registers

Registers are single-character identifiers bound to an address in a pattern used in rewriting:

<> (copy ?a) (?a ?a)

.. copy cat
00 cat cat

When a register is used in a pattern, and when we try to match a given tree with a pattern, each register is bound to a corresponding an address to the left of a rule, and referenced to the right:

<> (swap ?x ?y) (?y ?x)

.. (swap fox rat)
00 (rat fox)

When a register appears more than once in a rule, each instance is bound to the first address, but differently named registers can still match on the same pattern:

<> ((?x ?x ?x)) match3
<> ((?x ?y)) match2

.. (fox fox fox) (bat bat) (bat cat)
00 match3 (bat bat) (bat cat)
01 match3 match2 (bat cat)
01 match3 match2 match2

Trees

Trees can be found in rules and program data, they include words, registers and nested trees. Rules can match specific trees and rewrite their content in a new sequence.

<> (rotate ?x (?y) ?z) (?y (?z) ?x)

.. rotate foo (bar) baz
00 bar (baz) foo

An efficient way to represent an array is to store information in nested lists, it allows for rules to target specific segments of the list, similarly to Lisp's car and cdr primitives. To print each element of such a structure, we can use the following recursive rules:

<> (putrec (?: ?x)) (putrec ?: ?x)
<> ((putrec (?:))) (?:)

.. (putrec (a (b (c (d (e))))))
00 (putrec (b (c (d (e)))))
00 (putrec (c (d (e))))
00 (putrec (d (e)))
00 (putrec (e))
01 

> abcde

Logic

Let us build a logic system, starting by comparing two registers:

<> (eq ?x ?x) (#t)
<> (eq ?x ?y) (#f)

.. (eq fox bat)
01 (#f)

We can implement the truth tables by defining each case:

<> (and #t #t) #t <> (or #t #t) #t 
<> (and #t #f) #f <> (or #t #f) #t
<> (and #f #t) #f <> (or #f #t) #t 
<> (and #f #f) #f <> (or #f #f) #f
<> (not #t) #f    <> (not #f) #t

.. (or #f #t)
08 (#t)

Building on the comparison rule above, we can write conditionals with a ternary statement:

<> (ife #t ?t ?f) (?t)
<> (ife #f ?t ?f) (?f)
<> (print ?:) (?:)

.. ife #f (print True!) (print False!)
13 (print False!)
14 ()

Types

Understanding how to use typeguard to reach a specific evaluation order is important to become proficient with Modal. Creating a type system is merely a matter of creating stricter rules expecting a specific grammar. Notice in the example below, how join-strings expects to match two String typed words. Without typed inputs, the rule is not matched.

<> (join-strings (String ?x) (String ?y)) (?x?y)

.. join-strings (String foo) (String bar)
00 foobar

Lambdas

A lambda is created by using the ?(body) special register. Rules created that way exist only for the length of one rewrite and must match what is found immediately after:

.. ?((?x ?y) (?y ?x)) foo bar
-1 bar foo

Outgoing Events

Sending a message is done with the ?: special register, it sends a word or a tree to be handled by a device:

<> (print ?:) (?:)

.. print (hello world\n)

hello world

Incoming Events

Similarly, listening to incoming messages is done with the ?~ special register:

<> (?: print) (?:)
<> (READ ?~) ((You said: ?~ \n) print)

.. (READ stdin)

You said: 
modal(adj.): of, or relating to structure as opposed to substance.

Special Registers Reference

IO
Read?~Read from devices
Send?:Send to devices
Substrings
Explode token?(?* ?*) abca (b (c ()))
Explode tuple?(?* ?*) (abc def ghi)abc (def (ghi ()))
Unpack?(?. ?.) (abc def)abc def
Join?(?^ ?^) (abc def ghi)abcdefghi

The ?* special register explodes a token or tuple into a nested list, and the ?^ register to join it back into a single word. Notice how the following program makes use the List type to ensure a specific evaluation order:

<> (reverse List () ?^) (?^)
<> (reverse (?*)) (reverse List (?*) ())
<> (reverse List (?x ?y) ?z) (reverse List ?y (?x ?z))

.. (reverse (modal))
01 (reverse List (m (o (d (a (l ()))))) ())
02 (reverse List (o (d (a (l ())))) (m ()))
02 (reverse List (d (a (l ()))) (o (m ())))
02 (reverse List (a (l ())) (d (o (m ()))))
02 (reverse List (l ()) (a (d (o (m ())))))
02 (reverse List () (l (a (d (o (m ()))))))
00 (ladom)

sierpiński.modal

To review everything documented above, here is a small program that prints the Sierpiński triangle fractal:

?(?-) (Rules)

<> (* (. > (. ?x))) (* (. (. > ?x))) 
<> (. (. > (* ?x))) (* (. (* > ?x))) 

?(?-) (Physics)

<> (Tri > (?x ?y)) (Tri (?x > ?y))
<> (Tri (?x > (?y ?z))) (Tri (?x (?y > ?z)))
<> (?x (?y > (?z ?n))) (. (?y (?z > ?n)))
<> ((?x > ())) (< ())
<> (Tri < (* ?^)) (?(?: ?:) (*?^ \n))
<> ((?x < ?y)) (< (?x ?y))

?(?-) (Print)

<> (Tri.join ?x ?:) (Tri > ?x ?:)
<> (Tri.dup ?x ?^) (Tri.join ?x ?^)
<> (Tri < ?x) (Tri.dup (. ?x) (?x \n))

?(?* (Tri < (?*))) ...............*...............

Implementation

The language runtime can be implemented in about 300 lines.

cc modal.c -o modal view raw