An introduction to combinatory calculus, by Brent Kerby

We'll now introduce the notion of a "function", a notion which is
quite vague. A **function** can be thought of as an imaginary table
(possibly infinite) with two columns, the things on the left being called
**parameters** and the things on the right being called **results**.
A function maps parameters to results, like a dictionary maps words
to their definitions.

One example of a function is arithmetic negation, which given a number as a parameter, yields another number (the parameter's opposite) as a result. Negation would, for instance, given "3", yield "-3".

Here are some other examples of functions:

- A function that given any number "x", yields "x+1"
- A function that given any number "x", yields "x*x"
- A function that given any list, yields the number of items in it
- A function that given any person, yields the person's father

It is often convenient to assign abbreviated names to certain functions. For instance, the function mentioned in the second example we'll call simply "sqr" (an abbreviation for "squaring", appropriate because that function yields the square of its parameter). Then, the square of a particular number "x" we'll write as "sqr(x)" (the "application" of the function "sqr" to parameter "x"). For instance, the number "9" could be written as "sqr(3)".

Now, it should not be thought that we use "sqr(3)" to denote the "action" of finding the square of "3". We use "sqr(3)" simply to denote the square of "3" itself (in other words, "9").

Also, we distinguish between "sqr" and "sqr(x)". The former is the function of squaring; the latter is the square of a particular number "x".

At this point, we offer a few simple problems to ensure that the reader understands our conventions:

- If "succ" is the function that adds one to its parameter (i.e., the "successor" function), then what is "succ(sqr(4))"?
- Then, what is "sqr(succ(4))"

All of the functions mentioned so far took a single parameter to yield a result. However, there are many functions that require several parameters to yield a result. For instance, the functions of addition, subtraction, multiplication, and division ("+", "-", "*", "/") each require two parameters.

We'll call "add" the function of addition. Then, we'll use "add(3,5)" to mean the sum of "3" and "5", or in other words, "8".

In general, if "f" is a function that takes two parameters then we write the application of "f" to two parameters "x" and "y" as "f(x,y)".

Whereas functions that take only one parameter are called **unary**,
functions that take two parameters are called **binary**. Functions
that take three parameters are called **tertiary**. In general,
the number of parameters that a function takes is called the function's
**order**.

- If "mul" is the multiplication function, then what is "add(mul(2,3),mul(3,4))?
- Is "add" a unary function?

Up to this point, we have mostly only dealt with functions that
take numbers as parameters and yield numbers as results. However,
there are many other interesting things that functions could take or yield
(for instance: lists of numbers). In particular, a function could take
or yield another function. A function that takes or yields a function
is called a **higher-order** function.

One example of a higher-order function is the "composition" function. This function takes two parameters (each parameter being a unary function) and yields a unary function that is a "composite" of the two functions.

By a "composite" of two unary functions "f" and "g", we mean a function that, given a parameter "x", yields "f(g(x))". In other words, the composition of "f" and "g" applies "g" to its parameter and then "f".

As an example use of the composition function (which we'll call "comp"), take "comp(sqr,succ)", the composition of squaring with succession. This is a function that takes a parameter and yields the square of its successor; given a parameter "x", it would yield "sqr(succ(x))" or in other words "(x+1)*(x+1)". Note that "comp(sqr,succ)" is very different from "comp(succ,sqr)".

In general, for composition, we can say that "(comp(f,g))(x)" is the same as "f(g(x))".

- Give an example parameter for which the functions "comp(succ,sqr)" and "comp(sqr,succ)" give different results.
- Is "sqr" a higher-order function?
- Is "sqr(sqr)" the same as "comp(sqr,sqr)"?

There is a way in which unary higher-order functions can be used to emulate multiple-parametered functions. This technique is very interesting and is called "currying". We'll illustrate it with an example:

Consider the function that adds 1 to its parameter (the previously mentioned "succ" function). Now consider the function that adds 2 to its parameter, and the function that adds 3 to its parameter. Clearly, there is many similar functions like these, each adding a certain amount to their parameter. We could capture them all if we had a higher-order function (which we'll call "plus") that given a number, yields a function that adds that number to its parameter. For example, "plus(1)" would be the successor function, while "plus(2)" would be the function that adds 2 to its parameter.

This "plus" function is very similar to the original multiple-parametered function "add". Whereas the "add" function took both parameters at once, the "plus" function takes just one, yielding a function that takes the second parameter, yielding the result. We call "plus" the "curried" version of "add". In general, we can say about "plus", that "(plus(x))(y)" is the same as "x+y", or in other words, "add(x,y)".

This technique of currying can be used with all multiple-parametered functions, including those that take 3 or more parameters. From now on, we will generally not use multiple-parametered functions, in favor of curried functions.

- How can the multiple-parametered function of subtraction be thought of as a single-parametered one, using currying?

Now that we no longer have to worry about multiple-parametered functions, we can adopt a new notation for function application that will be less cluttery (requiring fewer parentheses).

We illustrate our new notation with a few examples:

Old Notation | New Notation |
---|---|

succ(1) | succ 1 |

(plus(x))(y) | plus x y |

(add(succ(1)))(succ(2)) | add (succ 1) (succ 2) |

Basically, we write the application of "f" to "x" as just "f x", unless "x" is a large expression (itself containing applications), in which case the "x" will be surrounded by parentheses.

Note that even if the left side of the application (the "f") is a large expression, it is not necessary to surround it by parentheses; that is, "plus x y" is understood to mean "(plus x) y". Also, if there was a function "f" that took three parameters through currying, then "f x y z " would be understood to mean "((f x) y) z".

This notation is very convenient. In fact, the convenience of this kind of notation is probably the most important reason to use curried functions instead of multiple-parametered ones (just kidding).

- Write "add(3,sqr(succ(4)))" in the new notation, using a curried version of "add".

- "succ(sqr(4))" is the successor of 16, or in other words, 17.
- "sqr(succ(4))" is the square of 5, or in other words, 25.

- "add(mul(2,3),mul(3,4))" is the sum of 6 and 12, or in other words, 18.
- No. "add" is a binary function, meaning that it takes two parameters, Examples of unary functions include "succ" and "sqr".

- The number 1 will do as an example. The first function will result in 2 while the second will result in 4. Actually, any number will do as an example except 0 (for which both functions yield 1).
- No, "sqr" is not a higher-order function. As we've defined it, "sqr" only takes and yields numbers, not other functions. An example of a higher-order function is the composition function.
- No, "sqr(sqr)" (the squaring function squared) is quite a meaningless expression, since "sqr" applies only to numbers, not functions. However, "comp(sqr,sqr)" is quite meaningful; it is the function that squares a number twice (i.e., raises it to the fourth power).

- One can think of a higher-order function "minus", that given a number, yields a function that subtracts that number. That is, "minus(1)" would be a function that subtracts one, while "minus(2)" would be a function that subtracts two. So, "(minus(2))(5)" is the same as "3". (Note that this "minus" actually takes the parameters in the opposite order of the traditional "-", taking the thing to be subtracted first, rather than second. There is, of course, a higher-order function "minusfrom" that takes them in the traditional order)

- "add(3,sqr(succ(4)))" would be written as "plus 3 (sqr (succ 4))".

The notion of a "function" has, for its simplicity, been around a surprisingly short time. It originated, I think, sometime in the 1800s with mathematicians like G. W. Leibniz and Leonhard Euler. The idea of applying functions to things other than numbers (in particular, to other functions) began, it seems, with Gottlob Frege in the late 1800s and early 1900s.

Frege, it seems, was the first to think of emulating multiple-parametered functions using higher-order functions, but the technique ("currying") has come to be named after Haskell Curry, an influential mathematician who popularized the technique.

In the last section we defined some basic terminology relating to functions and explained the technique of currying, a way of emulating multiple-parameter functions using higher-order unary ones. In this section, we'll describe some very special higher-order functions called "combinators".

The first combinator we'll describe is the one which has come to be known simply as "B". In the last section we mentioned a composition function "comp", which was presented as a binary function that takes two parameters (each unary functions) and yields their composite. The "B" combinator is related to the composition function and, in fact, is basically just a curried version of it.

In the last section, we mentioned that the composition function had the property (for all "f", "g", and "x")

(comp(f,g))(x) = f(g(x))

The "B" combinator has a similar property:

Bfgx = f(gx)

Using extra parentheses and spaces, the statement could be written as:

((B f) g) x = f (g x)

This property (sometimes called "B"'s "rewrite rule") can be thought of as the definition of what we mean by "B".

As described in the last section using "comp", the "B" combinator has many uses. Given a function "plus 2" that adds two, and a function "times 3" that multiplies by three, we can form a function "B (plus 2) (times 3)" that multiplies its parameter by three and then adds two.

In general, "B" is used to sort of "chain" functions together... Remember that the order of the parameters does matter with "B"; i.e., "Bfg" is not always the same as "Bgf".

Although "B" is often applied to two things together, yielding their composite, it is also meaningful when applied to just one parameter. For instance, "B sqr" is a function that is like squaring, except that instead of taking just one parameter (a number), it takes two parameters (a function and a number), the function being applied to the number before being passed to the original squaring function. One might think of "B sqr" as a version of "sqr" in which the parameter has been split into two. This function might be called the "composition of squaring".

We've only used "comp" on functions that are unary; however "Bfg"
may be meaningful even if "f" takes more than one parameter (through
currying). For instance, "B plus sqr" (the composition of addition
with squaring) is meaningful: Even though "plus" can be thought of
as a binary function (taking two numbers through currying and adding
them together), it can also be thought of as a (higher-order) unary
one that takes a number and yields a function that adds that number.
So, if "plus" takes a number and yields a function that adds that number,
then "B plus sqr" takes a number and yields a function that adds
*the square of* that number. That is:

plus x y = x + y (B plus sqr) x y = x^{2}+ y

So, we see that the composition of addition and squaring is an adding function, except that its first parameter is squared.

In general, when composing a multiple-parametered function "f" with function "g", only the first parameter of "f" is affected (in speaking of a "multiple-parametered" function we mean a function that takes multiple parameters through currying; this is how we will usually use "multiple-parametered" from now on).

- What is "B succ (B (times 3) sqr)"?
- What is "B (B succ (times 3)) sqr"?

Now we introduce another very interesting higher-order function, the "C" combinator. Basically, this function swaps the order of a function's parameters. That is, if "C" is applied to a binary function "f", then it yields a modification of "f" that is the like "f", except the order of its two parameters are reversed.

For instance, if "minus" is a subtraction function such that "minus 3 5" is the same as "2", then "C minus" will be like the minus function, except that it takes the two parameters in reverse order; that is, "(C minus) 3 5" is the same as "minus 5 3", and "-2".

In general, we can say for "C" that

(Cf)xy = fyx

or without the extra parentheses:

Cfxy = fyx

Another way to think of "C" is that it can be used to postpone
a function's first parameter. Using "C", it is possible to
"sneak in", so to speak, a function's second parameter
without giving the first. Suppose, for instance, that we have "minus"
and want to form a unary function that subtracts a number
*from* "3"; now, the number to subtract *from* comes second
in "minus". However, we can sneak in the second parameter using "C";
"C minus 3" is a function that subtracts a number from "3".

(later, we will show how variations of the "C" combinator can be used to sneak in a function's third, fourth, or other parameter).

- What is the result of applying "C" to "plus"? (i.e., what is the converse of addition?)
- What is the result of applying "C" to "B"?

We now turn our attention to another interesting combinator, the W combinator. Basically, this combinator duplicates a function's parameter. That is, if "W" is applied to a binary function "f", the result is a unary function that is like "f" except that its one parameter plays the role of both parameters.

For instance, "W plus" is the doubling function. Whereas "plus" is a binary function adding the new parameters together, "W plus" (doubling) is a unary function that adds a parameter with itself. That is, "(W plus) 5" is the same as "plus 5 5".

In general, we can say about "W" that

(Wf)x = fxx

- What is the result of applying "W" to "times"?

Whereas the W combinator is used to construct functions that use their parameter more than once (so to speak), the K combinator is used to construct functions that ignore their parameter. In particular, applying "K" to an "x" yields a function that always results in "x".

For instance, "K 3" is a function that always results in three, regardless of its parameter. That is, "(K 3) 0" is "3". Also, "(K 3) W", "(K 3) B", and "(K 3) C" are also "3". The second parameter to K is always ignored.

In general, we can say about "K" that

Kxy = x

- What is "K 2 2"?

We've now mentioned several combinators (B, C, W, and K) and shown how, given some primitives like "plus" and "times", they can be used to form other interesting functions (like the doubling function, the squaring function, the function that doubles and then squares, etc.)

One interesting thing about combinators is that, even without other primitives like "plus" and "times", very interesting things can be formed simply by applying combinators to one another.

For instance, consider "WK". "K" can be thought of as a binary function that always yields the first parameter; so "WK" is then a unary function whose parameter plays the role of both. The result is that "WK" always yields its parameter unchanged and is another name for the "I combinator", for which we can say:

Ix = x

To see this more clearly, note that "WKx" is the same as "Kxx" (by W's rewrite rule) and thus "x" (by K's rewrite rule). So, "(WK)x" is the same as "x", and "WK" is a function that results exactly in its parameter, with no change.

- Is "WK" the same as "KW"?

We've just shown how the I combinator can be derived from W and K. Now, we'll give a much more elaborate example. We'll show how a combinator called "S" can be derived from B, C, and W. To begin, by "S", we mean the "S" that has this rewrite rule:

Sfgx = fx(gx)

This "S" is actually quite an interesting combinator. It could be thought of as a generalized version of "W". Remember that "W" duplicates a parameter of a binary function "f". Similarly, "S" duplicates a parameter of a binary function "f" but allows the duplicated parameter to be modified by a function "g" before being passed as the second parameter to "f".

For example, "W plus" is a doubling function; i.e., it is a function
that adds a parameter with itself. Then, "S plus sqr" is a function that
adds a parameter, not with itself, but with its square. For instance,
"(S plus sqr) 4" is "plus 4 (sqr 4)" or in other words "4 + 4^{2}"
or "20".

Now, we'll show how "S" may be constructed from B, C, and W. In fact, "S" is quite simply

B(BW)(BBC)

(Read: "the composition of composed duplication with composition itself and the swapping function". Just kidding.)

At first, this may appear to be jibberish, but a closer look will show that it is indeed the "S" combinator.

In particular, when this combinator is applied to a parameter "f", one gets

B(BW)(BBC)f BW(BBCf) by "B"'s rewrite rule BW(B(Cf)) by "B"'s rewrite rule (again)

If we apply this to another parameter "g", we get

BW(B(Cf))g W(B(Cf)g) by "B"'s rewrite rule

Finally, if we apply this to a third parameter "x", we get

W(B(Cf)g)x B(Cf)gxx by "W"'s rewrite rule Cf(gx)x by "B"'s rewrite rule fx(gx) by "C"'s rewrite rule

So, we can see that "B(BW)(BBC)" is a combinator that, given three parameters "f", "g", and "x", yields "fx(gx)", and is the same as the "S" combinator.

- What is "S times (plus 2)"?
- What is "S times I"?
- What is "S plus I"?

In this section, we've described some specific higher-order functions called "combinators" but have not yet told what exactly we mean by saying that they are "combinators". We'll now fill in this gap.

Before giving a precise definition of "combinator", we first introduce another term, a "combination". We define this term using an example; here are some combinations of "x", "y", "z":

- x(yz)
- x(yzz)
- yzz
- xyyzz
- xyz
- zyx
- x
- y
- z

Basically, to form a combination of a set of things, just pick some out (possibly picking some more than once), scramble them up a bit, and insert some parentheses. (This definition of "combination" is somewhat vague, but it will do for now).

Now, with that in mind, we finally define what we mean by "combinator".
A **combinator** is a function that, given some parameters, yields
a combination of those parameters.

We can see that B, C, W, K, I, and S are all functions that take some parameters and always yield a combination of them.

We've described several interesting combinators, including B, C, W, K, I, and S. Some of these combinators can be constructed from one another (as "I" can from "W" and "K"), and many interesting other combinators can be constructed from all these.

Actually, the set of combinators is infinite, and there are lots of interesting ones. One thing that may be a bit surprising is that all combinators can be constructed from just B, C, W, and K. It may also be surprising that all combinators can be constructed from just S and K. This topic will be discussed more in the next section, on Completeness.

- Well, "B (times 3) sqr" is a function that squares a parameter and
then multiplies it by three; so, "B succ (B (times 3) sqr)" is
a function that does that and then adds one. In other words,
if we call this function "f", we have:
fx = x

^{2}*3 + 1 - "B succ (times 3)" is the function that multiplies a parameter by
three and then adds one. So, "B (B succ (times 3)) sqr" is the
function that squares its parameter and then does that. So,
if we call this function "f", we have:
fx = x

^{2}*3 + 1Note that this function is the same as the function in the last example. In general, "Bf(Bgh)" is the same as "B(Bfg)h", in that both, given a parameter "x", result in "f(g(hx))". This property of "B" is called its

**associativity**.

- The result is the addition function (there is no change). Since the order of parameters does not matter in addition, "C plus" is the same as "plus".
- The result is like "B" except that it takes the two parameters in reverse order.

- The result is the squaring function, the function that multiplies a parameter by itself.

- "K 2 2" is the same as "2". Note that the first instance of "2" in "K 2 2" is the one that matters; the second one could be changed to something else (as in "K 2 5") and the result would still be "2".

- No. "WK" always yields its parameter. In contrast, "KW" always just yields "W", ignoring its parameter.

- "S times (plus 2)" is a function that multiplies a number by the number plus two. That is, given a number "x", the function yields "x * (x+2)". For a parameter of 3, the function yields "3 * (3+2)", or in other words, 15.
- "S times I" is a function that multiplies a number by its own self. In other words, it is a squaring function and is also "W times". We can see that "S times I" when applied to a parameter "x" yields "times x (I x)" (by S's rewrite rule), and this is the same as "times x x" (since "I x" is the same as "x").
- "S plus I" is a function that adds a number with its own self. It is a doubling function is and is also the same as "W plus". This is very similar to the last problem. Actually, "SfI" is, in all cases of "f", the same as "Wf".

The first studies of combinators began in 1924 with a mathematician named Moses Schonfinkel; he introduced the I, B, C, S, and K combinators (he called them "I", "Z", "T", "S", and "C", respectively).

Combinators were independently studied about the same time by Haskell Curry (he found out about Schonfinkel's work in late 1927). The names he used for them are the ones we use (they've sort of come to be the standard). For curiosity, here are the (arbitrary) reasons why he chose the names:

I | suggested by "identity" |

B | suggested by "substitution"; he was reserving "S" for other uses (possibly for "sum" or "successor"), although he ended up adopting Schonfinkel's use of "S" when he heard of it. |

C | suggested by "converse" |

W | suggested by the letter's appearance; "W" is often rendered as two side-by-side "V"'s, suggesting duplication. |

K | suggested by "constant", which is pronounced with a "k" sound. |

In the last section we showed how using combinators (like B, C, W, and K) and certain primitive functions like "plus" and "times", it is possible to form many other interesting functions. In this section, we'll show how a very different approach, using a construct called a "lambda" instead of combinators, can also be used to form functions.

We illustrate the approach with an example. Consider the function that takes the square of the successor of a number. Using the "B" combinator, this function might be represented as

B sqr succ

On the other hand, using a "lambda", the function could be represented as

x\ sqr (succ x)

(Read: "the function that given 'x', yields the square of the successor of 'x'")

A second example: Consider the function of doubling (adding something with itself). Using the "W" combinator, this function might be represented in terms of "plus" as

W plus

On the other hand, using a lambda, the function could be represented as

x\ plus x x

(Read: "the function that given 'x', yields the sum of 'x' and 'x'")

In general, we say that "x\F" (where "F" is an expression possibly
mentioning "x") is the function that, given a parameter "x", yields "F".
We call "\" the **lambda** operator.

To ensure that the reader understands this new notation, we offer a few simple problems:

- Does "x\ plus x x" mean the same thing as "y\ plus y y" ?
- What is "(x\ plus x x) 5"?
- What is "x\ sqr x"?
- What is "x\ plus (sqr x) (sqr x)" ?
- How it could it be written using combinators instead of a lambda?

When using the combinator approach to forming functions, we used currying to model multiple-parametered functions. When using the lambda approach, we can (and will) still use currying.

So far, we have only used lambdas to form unary functions (like the doubling function and the double-and-square function). But lambdas can also be used to form multiple-parametered functions (in the curried sense).

For instance, consider the "minusfrom" function (a subtraction such that "minusfrom 5 3" is "2"). Using the C combinator, this can be expressed in terms of "minus" as

C minus

It can also be expressed using lambdas as

x\ (y\ minus y x)

(Read: "the function that given 'x', yields the function that given 'y', yields the subtraction of 'y' from 'x'")

Note that we are using a lambda within a lambda (i.e., nested lambdas); this is common when using lambdas to form multiple-parametered functions. The parentheses and extra space in the last expression are not necessary and could have been omitted, as in

x\y\ minus y x

(In our notation, a lambda should always be interpreted as extending as far to the right as possible; for instance, "x\plus 2 x" should be interpreted as "x\ (plus 2 x)", and not "(x\plus) 2 x").

- What is "x\y\ plus x y"?
- What is "x\y\ minusfrom (sqr x) (sqr y)"?
- What is "x\y\ times (plus x y) (minusfrom x y)"?

So far we've only used lambdas to form functions that actually use their parameter; however, a lambda can also be used to form functions that ignore their parameters. For instance, the function that always results in "3", regardless of its parameter, we can write using a lambda as

x\3

This function could also be formed easily using the K combinator:

K 3A function that ignores its parameter is said to be

- Is "K 3" a vacuous function?
- What is "(x\3) 5"?

We have seen that many functions that can be formed using lambdas can also be formed using combinators. However, it is also possible to form the combinators themselves using lambdas. For instance, the I combinator can be formed using a lambda as

x\x

It is the function that, given parameter 'x', yields 'x'. Similarly, it is possible to form the W combinator using lambdas as

f\x\ fxx

It is the function that, given parameter 'f', yields the function that given parameter 'x', yields 'f' applied to 'x' and 'x'. In a similar way, it is possible to form other combinators using lambdas:

K = x\y\ x B = f\g\x\ f(gx) C = f\x\y\ fyx

- How can the S combinator be written using lambdas?

When discussing combinators, we assigned each combinator a rewrite rule; in particular, we said that (for all "f", "g", "x", "y", and "z")

Ix = x Bfgx = f(gx) Cfxy = fyx Wfx = fxx Sxyz = fx(gx) Kxy = x

These rewrite rules sort of acted as definitions of the combinators. We'll now give a similar rewrite rule that will more precisely define the lambda construct. Before precisely stating the rule, we'll give some instances of it. One instance is that

(x\ sqr (succ x)) 3 = sqr (succ 3)

Since "x\ sqr (succ x)" is the function that, given "x", yields the square of the successor of "x", it makes sense that applying the function to a specific parameter "3" would yield the square of the successor of "3". Some other instances of the rule:

(x\ plus x x) 3 = plus 3 3 (x\ plus x x) (sqr 3) = plus (sqr 3) (sqr 3) (x\ x) 3 = 3 (x\ y\ minus y x) 3 = (y\ minus y 3) (x\ y\ minus y x) (sqr 3) = (y\ minus y (sqr 3)) (x\ y\ x) 3 = (y\ 3) (x\ 1) 3 = 1

Basically, the general rule is that "(x\F) G" is the same as just
"F", except that every instance of "x" must be replaced with "G".
This rule is called the rule of **beta-reduction**.

- What is "(f\g\x\ f(gx)) sqr succ"?
- What is "x\ xx"?
- What does it give when applied to the K combinator?

Each time one forms a function using a lambda, it is necessary to make up an arbitrary name (like "x") to stand for the function's parameter. If one is using nested lambdas (say, to form a multiple-parametered function), then when the time comes to pick the name for the second parameter, a name (say "x") will already be used up by the first one; so, it is necessary to pick a different name (like "y"). Obviously, if one picks the same name for both parameters, then problems will occur.

A person is probably not likely to initially pick the same name for two different parameters; however, it is quite easy to make a mistake while doing beta-reductions. It is possible for an expression initially free of naming conflicts to cause a conflict when beta-reduced. For instance, consider the expression

(z\ zz) (x\y\x)

This is the application of the K combinator to itself (mentioned in the last problem). The expression is free of name conflicts, but if we naively beta-reduce it, we get

(x\y\x) (x\y\x)

and then

y\x\y\x

There are now two parameters with the name "y". In this case, the problem is not really too severe, since "y" is not actually referenced; however, there are worse cases (one is mentioned in the next problem), where several referenced parameters end up having the same name.

Note in the second step of the last example, that there is not yet a naming conflict with "(x\y\x) (x\y\x)". Although there are two parameters each with names "x" and "y", the functions with those parameter names are not nested within each other; the expression is easily identifiable as the application of the K combinator to itself. The problem does not occur until the final step.

The problem with the final step could have been solved if the variables in one of the "x\y\x"s had been renamed. For instance, the expression could have been rewritten as

(x\y\x) (f\g\f)

before proceeding. Then the final step would yield

y\f\g\f

which has no naming conflicts.

- What is a correct beta-reduction of "(x\y\x) (x\y\y)"?

The most common kind of rewrite used with lambdas is beta-reduction, but there is another kind, alluded to in the examples, known as "eta-reduction". Eta-reduction is the kind of rewrite used when we say that

x\ sqr x

is the same as just "sqr". In general, **eta-reduction** is
the rewriting of a expression of the form "x\ F x" (where "F" contains
no references to "x") to just "F".

Eta-reduction is a subset of a more general principle, called
the **principle of extensionality**. The principle of extensionality
says that if "Fx = Gx" for an arbitrary "x", then "F = G". In other
words, if two functions always yield the same thing given the same
parameter, then they are same.

- How can it be shown using eta-reduction that "x\y\ plus x y" is the same as just "plus"?

Before proceeding to discuss lambda expressions and their relation to combinators, it may be useful to step back and reflect on our technique of referring to things.

By **expression**, we mean a symbol (typically a sequence of
English characters/puncuators) used to denote something.
In a broad sense, we might call the English sentence "The cat
jumped over the hat" an expression, but by "expression" usually
we mean something like "sqr", "3", "plus 5", "B sqr succ"; i.e.,
one of those symbols not common in ordinary English but assigned
a precise meaning by us.

In fact, throughout this work we've been developing a sort
of formal language. Because ordinary English does not have
a concise way of precisely referring to things like the
B, C, W, and K combinators and the function of squaring
(etc.), we have made up our own abbreviated words for referring to them.
Our abbreviated words like "B", "C", "W", "K", "sqr", and "plus"
that are assigned specific meanings we will call **primitive expressions**.

There are some expressions that are not primitive expressions.
The expression "B sqr succ" we will call a **compound expression**
(and not a primitive expression) because it is formed several parts,
parts which individually have meaning.

In English and other natural languages a myriad of strange constructs are used for forming compound expressions. Often a sentence in English appears as a long string of primitive words, possibly with puncuators. Complicated traditions (involving verbs, subjects, direct objects, prepositions, etc.) determine how the meaning of overall sentence is to be drawn from the individual parts.

In contrast, in our little formal language, compound expressions
are generally formed from primitives using only the simple construct of
**function application**. An expression in our formal language
that is written using only primitives and function application is
said to be **purely applicative**. A purely applicative expression
is represented nicely using a "binary tree"; that is, a diagram like this:

* / \ / \ / \ / \ * * / \ / \ / \ sqr 3 plus * / \ sqr 5

Each "*" represents a function application, the thing to the left being the function and the thing on the right being the parameter. This particular diagram represents "plus (sqr 5) (sqr 3)", which, with extra parentheses, is written "(plus (sqr 5)) (sqr 3)".

Because binary trees are a bit awkward to draw, we've adopted the convention of writing expressions as a string of primitives with parentheses that convey the tree structure. Relevant to parentheses, it is always a good idea to keep in mind that by "f x y z" we mean "((f x) y) z", or, using with a binary tree diagram:

* / \ * z / \ * y / \ f x

Now, it should not be thought that all our expressions are
purely applicative. In particular, in this last section, we've
used the lambda construct, a construct that is neither a primitive
nor a function application. An expression that is purely applicative
except for uses of lambda will be called a **lambda expression**.

The lambda construct involves odd things called "variables". Symbolized using letters like "x", "y", "z", these symbols do not have any concrete meaning permanently associated with them, and thus are not primitives. Thus, the lambda construct is quite odd. There is then great interest (and practical use, we suspect, in computing systems) in seeing how lambdas can be eliminated in place of combinators.

Earlier we showed how certain expressions written with lambdas could be rewritten using combinators instead. For instance, the expression

x\ sqr (succ x)

can be rewritten, using the B combinator, as

B sqr succ

Some expressions are apparently more difficult to rewrite, for example,

x\y\ minusfrom (sqr x) (sqr y)

In a moment we'll show how this expression, and in general all lambda expressions, can be written as purely applicative expressions, using combinators. But first, we'll give some more concrete examples, demonstrating some of our techniques.

x\ minus 2 (sqr (succ x))We want to eliminate the lambda in this expression. We can see that this is the composition of three functions, "minus 2", "sqr", and "succ". Thus, the expression could be written without lambdas as

B (minus 2) (B sqr succ)

x\ minus x 3This is the function that subtracts a number from 3. It is the "minus" function with the second parameter "snuck in" to be "3". It can be written without a lambda as

C minus 3

x\ minus (sqr x) 3

This function subtracts the square of a number from 3. It is just like the last example except that the parameter is squared first. So, if we compose the last example with squaring, we get

B (C minus 3) sqr

But there is an alternative solution to this one. Consider the function "B minus sqr"; this is the subtraction function with the first parameter squared. If we snuck in "3" as the second parameter to this function, we would get the function we want, the function that subtracts a number's square from 3:

C (B minus sqr) 3

Now that we've given these examples, we'll give a couple simple techniques that can be used to help eliminate lambdas. The first technique is that if one has an expression of the form

x\ F G

where "G" mentions "x" but "F" does not, then the lambda may be "pushed" deeper into the right, using the B combinator:

B F (x\G)

For instance, if one had the expression

x\ sqr (jiba x 3)then, since the left side ("sqr") does not mention "x", it is possible to "push" the lambda to the right, as in

B sqr (x\ jiba x 3)

This makes sense; the original function does something to "x" (finds "jiba x 3") and then takes the "sqr" of the whole thing. Since "sqr" is being taken of the whole thing, it makes sense that the function is the composition of "sqr" with the other part. Also it can be verified that each function, given an arbitrary parameter "z", yields "sqr (jiba z 3)"; for the original function, this is obvious; for the new function, the result is

B sqr (x\ jiba x 3) z sqr ((x\ jiba x 3) z) (B's rewrite rule) sqr (jiba z 3) (beta-reduction)

So, we see that the technique of pushing a lambda to the right (when the variable is only mentioned on the right) is sound. Now, we'll demonstrate a similar technique; if the only occurence of the variable is on the left, then the lambda can be pushed to the left, using the C combinator. For instance, in the expression

x\ jiba x 3 4

which can be redundantly parenthesized as

x\ (jiba x 3) 4

the left side is "jiba x 3" and the right side is "4". Since only the left side mentions "x", we can push the lambda to the left, using the C combinator:

C (x\ jiba x 3) 4

In the resulting lambda, the left side is "jiba x" and right side is "3". So, the lambda can be pushed left further:

C (C (x\ jiba x) 3) 4

Now, "x\ jiba x" can be eta-reduced to "jiba", so we get

C (C jiba 3) 4

This makes sense; the original function "x\ jiba x 3 4" is like "jiba" except that the second and third parameters are snuck in to be "3" and "4" respectively. Our function is formed by sneaking in "3" as the second parameter ("C jiba 3"), and then sneaking in "4" as the new second parameter. It can be verified that we get the right answers when we apply this function to an arbitrary parameter "z":

C (C jiba 3) 4 z C jiba 3 z 4 (C's rewrite rule) jiba z 3 4 (C's rewrite rule)

These two techniques are often quite helpful ways of eliminating lambdas. Note that in order for these techniques to work, the lambda's variable must be referenced only on one side of the expression or the other (the left or the right); the variable must not be referenced on both sides.

- How may "x\ times 3 (minus x 3)" be written without lambdas?
- How may "x\y\ minusfrom (sqr x) (sqr y)" be written without lambdas? (this is question given earlier that was never answered)
- How may "f\g\x\y\ f(gxy)" be written without lambdas?

We've given a couple techniques that can help one rewrite lambda expressions as purely applicative ones. We'll now describe a couple more, and then tie them all together into a specific procedure for eliminating lambdas.

We've already shown how a lambda expression of the form "x\ F G" can be simplified if "x" is mentioned by either "F" or "G" (but not both). There are a couple cases we haven't considered:

- What if "x" is not mentioned by either one?
- What if "x" is mentioned by both?

The first case, if "x" is not mentioned by either one, is very easy. The entire lambda can be eliminated in one swell foop using the K combinator.

The second case is a bit trickier. The thing to do in the second case is to push the lambda inward to both "F" and "G", using the S combinator. Remember that the S combinator has the rewrite rule

Sfgx = fx(gx)

Now, what we're saying is that "x\ F G" could be rewritten as "S (x\F) (x\G)". To see that this is sound (i.e., that "x\ F G" is in all cases the same as "S (x\F) (x\G)"), check that both "x\ F G" and "S (x\F) (x\G)" give them same thing when applied to an arbitrary parameter "z":

The original expression "x\ F G" when applied to an "z" will give "F G", but with every occurence of "x" changed to "z". Also, "S (x\F) (x\G)" when applied to "z" will give "(x\F) z ((x\G) z)", or in other words "F G", with every occurence of "x" within "F" changed to "z" and also every occurence of "x" within "G" changed to "z". So, the technique is sound. We'll give an example of it now, in rewriting this expression:

x\ plus (sqr x) (succ x)

The left side is "plus (sqr x)" and the right is "succ x"; each side mentions "x". So, we use the S combinator to push the lambda inward both ways:

S (x\ plus (sqr x)) (x\ succ x) S (x\ plus (sqr x)) succ (eta-reduction) S (B plus sqr) succ (eliminate remaining lambda)

We can see that this function is the same as the original function (that adds the square of a number with the successor of the number):

S (B plus sqr) succ x B plus sqr x (succ x) (S's rewrite rule) plus (sqr x) (succ x) (B's rewrite rule)

Now that we can see that this technique works, we can tie together a complete procedure for eliminating a lambda:

Suppose that we have a lambda expression of the form "x\Z" (where "Z" is a purely applicative expression; this must be an innermost lambda). Then, the structure of "Z" determines how the expression should be rewritten:

- If "Z" is of the form "F x", where "F" does not mention "x", then eta-reduce the expression to just "F".
- If "Z" is of the form "x", then the expression is the I combinator, and should be rewritten as just "I".
- If "Z" does not mention "x", then the expression should be rewritten as "K Z".
- If "Z" is of the form "F G" and "x" is mentioned only in "F", then the expression should be rewritten as "C (x\F) G".
- If "Z" is of the form "F G" and "x" is mentioned only in "G", then the expression should be rewritten as "B F (x\G)".
- If "Z" is of the form "F G" and "x" is mentioned in both "F" and "G", then the expression should be rewritten as "S (x\F) (x\G)".

Note that "Z" will always be in at least one of those forms. The first three cases are the base cases. It would be nice to get to one of those cases, because then the lambda can be gotten rid of entirely. However, in the other three cases, the rewrite of the expression still involves a lambda; but, the body of each lambda is smaller than the body of the original lambda, which brings the expression closer to a base case.

Now, as for eliminating lambdas in expressions that have more than one lambda ... Simply pick a lambda that has no lambdas within it (i.e., pick an innermost lambda; for, there will always be at least one innermost lambda if there are any lambdas at all). Eliminate it. Then there will be one less lambda. Repeat the process until there are no more.

- How may "f\x\ fxx" be rewritten without lambdas, using the procedure we're outlined?

We've just shown how using the I, K, B, C, and S combinators it
is possible to eliminate lambdas in an otherwise purely applicative
expression. Since all combinators can be written as lambda expressions,
this means that I, K, B, C, and S can be used to form any combinator.
Because of this, we say that {I, K, B, C, S} (i.e., the set containing
these five combinators) is a **complete combinatory base**.

However, this is not necessarily the most special base, or even the smallest. For, if the B and C combinators are omitted from the base, it will still be complete (i.e., {I, K, S} is also complete). In the cases where a lambda is pushed to the left or to the right using "B" or "C", it also works to push it both ways, using "S"; this will cause it to be pushed to a side where it is not needed, but this not a problem, and will just require an extra use of the K combinator. And, in fact, since {I, K, S} is a complete combinator base, it is possible to construct the B and C combinators from I, K, and S. Applying our procedure to the B and C combinators will show that

B = S(KS)K C = S(BBS)(KK) = S(S(K(S(KS)K))S)(KK)

It is also interesting to note that the I combinator can be constructed from just "S" and "K". In fact, "SKx" for any "x" is an identity combinator; for "(SKx)z" rewrites to "Kz(xz)" and thus "z", regardless of what "x" is. So, "SKK" is the identity combinator, then (as is "SKS").

This means that {S, K} is a complete base (and in fact, is a smallest base; no base with just one combinator can be complete, as we'll show in a later section). Still, it is not necessarily the most elegant, or the most useful in all cases.

There might be some interest in a base that does not involve "S". "S" is sort of an odd combinator that might seem kind of complex and thus not well fit to be taken as primitive. It is possible to use "W" in some cases as a primitive instead of "S". In particular, {I, B, C, W, K} is a complete base that does not involve "S". To show that this is complete, it is enough to show that "S" can be constructed from these. It can; remember that

S = B(BW)(BBC)

Also, we might like to know a good procedure for eliminating lambdas using this base (simply using our original procedure with the derived version of "S" would not be a very elegant solution). With "W", it is possible, at the very beginning stage of eliminating lambdas, to change all lambdas with more than one occurence of their variables into ones that only have one occurence (once this is done, there will be no need to use "S" to push a lambda inward, since a variable could not possibly occur on both sides if there is only one occurence). In a lambda "x\Z", if "Z" mentions "x" more than one time, then the lambda can be rewritten as "W (x\y\Z)", with one of the occurences of "x" changed to a "y"; the process may need to be repeated, if "x" still has more than one occurence (i.e., if there were three or more at the beginning).

Now we'll give a specific example of eliminating lambdas with the base {I, B, C, W, K}:

x\y\ times (plus x y) (minusfrom x y)

To begin with, there are multiple occurences of "y"; so, we'll use "W" to split the lambda:

x\W (y\y2\ times (plus x y) (minusfrom x y2))

Now, there are still multiple occurences of "x", so we'll split that lambda also:

W (x\x2\ W (y\y2\ times (plus x y) (minusfrom x2 y2)))

Now it is just a matter of eliminating the lambdas with B and C.

W x\x2\ W y\y2\ times (plus x y) (minusfrom x2 y2) W x\x2\ W y\ B (times (plus x y)) (minusfrom x2) W x\x2\ W (C (B B (B times (plus x))) (minusfrom x2)) W x\B W (B (C (B B (B times (plus x)))) minusfrom) W (B (B W) (C (B B (B C (B (B B) (B (B times) plus))) minusfrom))

Needless to say, our procedure gave quite a messy result in this case.

- How can the combinator "x\f\ fx" be written using only combinators in {I, B, C, W, K, S}?
- How can the combinator "f\g\h\x\ f(gx)(hx)" be written using only combinators in {I, B, C, W, K, S}?

- Yes, they both represent the doubling function ("W plus"). The name of the variable does not matter (but we will usually use "x", "y", "z", "f", "g", or "h").
- It is the application of the doubling function to "5", or in other words, "10".
- It is the function that, given parameter "x", yields the square of "x". This function "x\ sqr x" is the same as just "sqr".
- It is the function that, given parameter "x", yields the square of "x" plus the square of "x". In other words, it yields the double of the square of "x". Given a parameter of, say, "3", it would yield the double of "9", or in other words, "18".
- Since the function takes the double of a square, it could be written as "B dub sqr" (where "dub" is a doubling function), or in other words "B (W plus) sqr".

- It is the function that, given parameters "x" and "y", yields the sum of "x" and "y". This function is the same as just "plus".
- It is the function that, given two parameters "x" and "y",
yields the square of "x" minus the square of "y", or in other words,
"x
^{2}- y^{2}". Given parameters of "5" and "2", it would yield "25 - 4", or in other words, "21".(The reader may be wondering if this function can be formed using combinators instead of lambdas; it can, but because it is a little complicated, we'll delay showing how for a bit.)

- It is the function that, given two parameters "x" and "y",
yields "(x+y) * (x-y)". Given parameters of "5" and "2", it
would yield "(5 + 2) * (5 - 2)", or in other words, "7 * 3", or "21".
(Incidentally, it is a fairly well-known fact of arithmetic that this function and the function mentioned in problem 2 always yield the same thing, given the same parameters).

- Yes. It is the example of a vacuous function that we just used.
- "3". Since "x\3" is a function that always results in "3", the application of it to any particular thing (in this case, "5") is "3".

- It could be written as "f\g\x\ fx(gx)", or perhaps "x\y\z\ xz(yz)" (it does not matter which variable names are used).

- Since "f\g\x\ f(gx)" is the composition function "B", we can see
then that "(f\g\x\ f(gx)) sqr succ" is the composition of
the squaring function with the succession function (this function
often seems to turn up in our examples). Another way to think of
it is to realize that
(f\g\x\ f(gx)) sqr succ

beta-reduces to

(g\x\ sqr (gx)) succ

and then to

x\ sqr (succ x)

Then we can see that this is the composition of squaring with succession.

- It is the function that applies a function to itself. This function we will later call the "M" combinator. Many functions do not make sense when applied to themselves (for instance, "sqr" would not make sense applied to itself, since it only applies to numbers, not functions). However, some functions are interesting when applied to themselves (for instance, the K combinator).
- It the K combinator applied to itself, or in other words, a function that always yields the K combinator, regardless of its parameter. That is, "KK" (or in other words, "MK") is "x\K", or in other words, "x\y\z\y". This function takes three parameters and always yields the second.

- Beta-reducing "(x\y\x) (x\y\y)" would yield "y\x\y\y", which has a naming conflict. So, we first rename variables in the expression, getting "(x\y\x) (f\g\g)". Then, when we beta-reduce, we get "y\f\g\g", a correct answer. "x\y\z\z" is also a correct answer, then.

- Well, the expression "x\y\ plus x y" contains the expression "y\ (plus x) y", which can be eta-reduced to just "plus x", leaving the full expression to be "x\ plus x", which can by another eta-reduction be reduced to "plus".

- In the expression "x\ times 3 (minus x 3)", only the right side
mentions "x", so we can push the lambda to the right, giving
B (times 3) (x\ minus x 3)

Then, in the new lambda, only the left side is mentioned, so we can push the lambda left with "C", giving

B (times 3) (C (x\ minus x) 3)

Then, since "x\ minus x" can be eta-reduced to "minus", we get

B (times 3) (C minus 3)

- One way to eliminate the lambdas in "x\y\ minusfrom (sqr x) (sqr y)"
is to focus first on the inner lambda. In the inner lambda,
only the right side ("sqr y") mentions "y", so we can push the
lambda right, using "B"; this gives
x\ B (minusfrom (sqr x)) (y\ sqr y) x\ B (minusfrom (sqr x)) sqr (by eta-reduction)

We are now left only with one lambda. The only occurence of the "x" is on the left, so we can push the lambda left with "C":

C (x\ B (minusfrom (sqr x))) sqr

Now it is just a matter of pushing the lambda right a couple times:

C (B B (x\ minusfrom (sqr x))) sqr C (B B (B minusfrom (x\ sqr x))) sqr C (B B (B minusfrom sqr)) sqr

- To eliminate the lambdas is "f\g\x\y\ f(gxy)" we'll again focus
on the innermost lambda, the one using the variable "y". The left
side is "f" and the right side is "gxy". Only the right side mentions
"y". So, we use "B" to push the lambda to the right, giving
f\g\x\ Bf(y\gxy) f\g\x\ Bf(gx) (eta-reduction)

Now, we focus on the next lambda, the one with the "x". Once again, only the right side mentions "x", so we push right with "B":

f\g\ B(Bf)(x\gx) f\g\ B(Bf)g (eta-reduction)

Now, we focus on the next lambda, the one with the "g". It can be gotten rid of immediately using an eta-reduction, leaving

f\ B(Bf)

Now, our answer is simply

BB(f\Bf) BBB (eta-reduction)

("Composition composed with composition")

- This is the W combinator. We begin by trying to remove the innermost lambda
(with variable name "x"), in "f\x\ fxx". The left side is "fx"
and the right side is "x". Both sides mention "x", so we rewrite
using "S":
f\ S (x\fx) (x\x) f\ S f (x\x) (eta-reduction) f\ S f I ("x\x" is I combinator)

Now, "f" is mentioned on the left ("S f") only, so we rewrite with "C":

C (f\ S f) I CSI (eta-reduction)

This gives us an interesting way of writing the W combinator in terms of "C", "S", and "I".

- It may be written as "CI". This is an interesting combinator which we will call "T", or the "reverse-applicator". Given a parameter, it yields a function that passes that parameter to a function.
- It may be written as "B(BS)B":
f\g\h\x\ f(gx)(hx) f\g\h\ S(Bfg)h f\g\ S(Bfg) f\ BS(Bf) B(BS)B

This is an interesting combinator which we will call "N". Whereas "S" was a generalized version of "W", "N" is a more general form still. From "S"'s rewrite rule,

Sfgx = fx(gx)

we see that "S" is like "W" except that it allows the second "x" to be modified by a function before being passed to the binary "f". "N", with the rewrite rule,

Nfghx = f(gx)(hx)

allows both "x"s to be modified (each by a different function) before each is passed to the binary "f".

The idea of using a construct like a lambda to refer to functions began, it seems, with Alonzo Church in the 1930s or so as a device for his "lambda calculus" (Gottlob Frege I think may have used a similar notation a bit earlier). Before lambdas were used, functions were apparently referred to in this sort of way:

The function, f(x) = 2*x + 3

This sort of notation requires that the function be given an arbitrary name "f" (in addition to an arbitrary name "x" for the parameter), whereas with a lambda construct, a name like "f" is not needed.

There are actually many different notations in use for the lambda construct. In the most standard notation, the last function would be written

x. 2*x + 3

using "", the greek letter "lambda". (We've avoided this notation because it is somewhat cluttery and because it is difficult to get the lambda to match the font size in HTML). One might also see

\x. 2*x + 3

where a backslash ("\") is used in place of the lambda symbol. Also, in the Haskell programming language,

\x-> 2*x + 3

is used.

We've already introduced several interesting combinators, including I, B, C, W, K, and S. In this section, we'll introduce a series of combinators called "iterators". These combinators are very similar to the natural numbers "0", "1", "2", "3", and so forth.

The first iterator we'll introduce is the one which corresponds to the number two. This combinator, "2", has the rewrite rule:

2fx = f(fx)

This combinator takes a function "f" and yields a function that applies "f" two times. For instance, "2 sqr" would be a function that squares a number twice (i.e., raises it to the fourth power); applying "2 sqr" to a number "3" would give "sqr (sqr 3)", or in other words "81". "2 sqr" is the same as "B sqr sqr".

Now, we introduce the iterator "3", with the rewrite rule:

3fx = f(f(fx))

This combinator take a function "f" and yields a function that applies "f" three times. In other words, given "f", it yields "Bf(Bff)", which is the same as "B(Bff)f". Continuing this pattern, we define "4" to have the rewrite rule:

4fx = f(f(f(fx)))

Combinators that are like "2", "3", and "4" are called **iterators**.
In an expression "4f", the "f" is called the **iterand**, and
the the result is called an **iteration** of "f".

- What is "2Kx"?

There are, of course, iterators for the two very special numbers one and zero. "1" is an iterator that takes a function "f" and yields a function that applies "f" a single time (in other words, it makes no change on "f"); it would have the rewrite rule:

1fx = fx

In extension, the "1" iterator is the same as the "I" combinator, since it makes no change to its parameter.

Next, the "0" iterator is a combinator that takes a function "f" and yields a function that applies "f" no times at all:

0fx = x

- What is "0K"?

In the last section we showed how the I, B, C, W, and K combinators could be used to form all other combinators. Since an iterator is a kind of combinator, iterators can also be formed from I, B, C, W, and K.

We already mentioned that the "1" iterator is the same as the I combinator. The "0" iterator when applied to any function yields the I combinator, so we can say that

0 = KI

Also, the "2" iterator takes a function "f" and yields "Bff" (composition of "f" with itself). Because of this, we say that

2 = WB

Now, as for the "3" iterator, it is

f\x\ f(f(fx)) f\ Bf(Bff) S(f\Bf)(f\Bff) SB(WB)

And, the "4" iterator is

f\x\ f(f(f(fx))) f\Bf(Bf(Bff)) SB(f\Bf(Bff)) SB(SB(f\Bff)) SB(SB(WB))

At this point, it looks like "SB" may be a successor function, since "SB2" is "3" and "SB3" is "4". In fact, "SB" is a successor function; it may be useful to write it using lambdas:

SB n\f\ Bf(nf) n\f\x\ f(nfx)

It takes three parameters; the first one, "n", is the iterator
to take the successor of. The next one is "f", the function to
iterate. The third is "x", the thing that will have "f" applied
to it some number of times. Then, after taking these three parameters,
"SB" yields, not "nfx" (which would be "f" applied to "x" the original
number of times), but "f(nfx)", which is "f" applied to "x" the
original number of times *plus* one more time.

- What is "SBI"?
- What is "SB(KI)"?
- What is "K(3Kx)"?
- What is "2K(3Kx)"?

We found that "SB" is a successor function, a function that adds one. We'll now show that "NB" is a general addition function, where "N" is the combinator such that

Nfghx = f(gx)(hx)

If "N" is this combinator, then "NB", the addition function, is

m\n\f\ B(mf)(nf) m\n\f\x\ mf(nfx)

Unlike the successor function which took three parameters, this function takes four parameters; "m" is the iterator that is being added (i.e., the amount to add); "n" is the original iterator; "f" is the function being iterated; and "x" is the thing that "f" will be applied to some number of times. Unlike the successor function which gave "f(nfx)" ("f" applied to "x" the original number of times plus one more), the addition function yields "mf(nfx)" ("f" applied to "x" the original number of times plus "m" more times).

- What is "(NB 2 3) K x"?
- What is "2K(2K(2Kx))"?

As noted in the last problem, "3(2K)" is the same as "6K". And, in general, "3(2f)" is the same as "6f". Thus, "B 3 2" is the same as "6". It seems that the B combinator can be used to multiply iterators, and it can.

This makes sense because in general, if "m" and "n" are iterators, then "m(nf)" is a function that applies "nf" "m" times, but "nf" is the applies "f" "n" times, so "m(nf)" applies "f" a total of "m * n" times. So, in general, "B m n", given a function "f", yields a function that applies "f" "m * n" times.

It is interesting that the combinator for multiplication is even simpler than the combinator for addition.

- What is "3(2f)"?
- What is "2(3(4f))"?
- What is "2(2(2f))"?

As demonstrated in the last problem, "3 2" is the cube of "2", or
in other words, "8". In general, if "m" and "n" are iterators,
then "m n" is "n^{m}".

This makes sense because in general, if "n" is an iterator,
then "n f" is "f" composed with itself "n" times. And, if "f"
is an iterator, then composing it with itself "n" is the same as
multiplying it with itself "n" times (since composition is used for
multiplication of iterators), and multiplying something by
itself "n" times is what is meant by raising it to the "n"^{th}
exponent.

So, essentially, the "2" iterator acts as a squaring function, when applied to other iterators. And, "3" acts as a cubing function. And, "1" acts a function that raises a number to the power of "1" (i.e., makes no change). And, "0" acts as a function that raises a number to the power of "0" (i.e., always yields "1").

A note about our notation: although we often omit spaces in expressions (as in "2Kx"), we will always leave a space between the names of two iterators. That is, we will not write "32" if we mean the cube of two; we will instead write "3 2". When we do write "32", we are referring to the iterator for thirty-two.

- What is "2 3"?

There are several properties that numbers are traditionally taken as having; here are some (where "^" is exponentiation):

0 + x = x 1 * x = x 0 * x = 0 x ^ 1 = x x ^ 0 = 1 (x + y) + z = x + (y + z) (x * y) * z = x * (y * z) (x + y) * z = (x * z) + (y * z) z ^ (x + y) = (z ^ x) * (z ^ y) z ^ (x * y) = (z ^ y) ^ x

These properties hold for all iterators "x", "y", and "z", as nearly trivial consequences of the definitions we've given of "+", "*", "^", "0", and "1". We won't bother to show how every single one can be proven, but we will prove a few...

For instance, the property that

(x * y) * z = x * (y * z)can be proven by showing that "B" is associative, that "B(Bfg)h" is in all cases the same as "Bf(Bgh)". This is fairly obvious, because both "B(Bfg)h" and "Bf(Bgh)" give "f(g(hx))" when applied to a parameter "x".

One particularly interesting property is

z ^ (x * y) = (z ^ y) ^ xWritten using "B" for "*", it says:

(Bxy)z = x(yz)

So, this property is the most fundamental property of "*" of all (B's rewrite rule), and is a sort of definition of multiplication in terms of exponentiation.

Similarly, the property

z ^ (x + y) = (z ^ x) * (z ^ y)

is a fundamental definition of addition in terms of multiplication and exponentiation.

Now, it should not be thought that all properties of numbers are trivial to prove with iterators. One that is typically not-so-easy is

x * y = y * x

In a theory of iterators, this might be expressed as

Bxy = Byx

Yet, this does not hold for all "x" and "y" (remember that composition is not commutative). However, it does hold if "x" and "y" are both iterators, but this is not a simple thing to prove (in fact, we won't attempt to prove it here at all).

Up until now, we have only used iterators corresponding to the natural numbers: 0, 1, 2, 3, (etc.). We thought of "2" as the iterator applies a function twice; "1" is the iterator that applies a function once; "0" is the iterator that applies a function no times at all. If we continued the pattern, we would take "-1" to be the iterator that "takes away" a function from a parameter. In other words, "-1", given a function, would yield the function's "inverse", a function that has the opposite effect as the original.

Unfortunately, although iterators for "0", "1", "2", (etc.) could be defined in terms of combinators, there is no way that this can be done for "-1". For, consider our definitions of "3", "2", "1", and "0":

3 = f\x\ f(f(fx)) 2 = f\x\ f(fx) 1 = f\x\ fx 0 = f\x\ x

There does not seem to be a way to continue this pattern. However, if we added an odd extension to our lambda construct, we could, in this way:

-1 = f\(fx)\ x -2 = f\(f(fx))\ x -3 = f\(f(f(fx)))\ x

But this is quite an abuse of the lambda construct, as we've currently defined it.

An addition of a "-1" as a primitive would add great strength to a language which otherwise could only express combinators, because, using multiplication, it is possible to form all negative integers (for instance, "B 3 (-1)" is "-3"). Also, fractions could be formed by raising a integer to the power of "-1" (for instance, "(-1) 3" is one third). Then, by raising numbers to fractional powers, roots of numbers could be expressed (for instance, "(-1) 2 3" is the square root of three, because "(-1) 2" is the inverse of the squaring function, the square root function). Roots of negative numbers could be used to express imaginary numbers.

However, because consideration of "-1" and similar functions would cause many complications that we are not yet prepared to handle, we will postpone discussing them further until a later chapter.

Often in this work we'll refer to compositions of various functions. With our current syntax for composition, this can be quite cluttery. For instance, the composition of functions "f", "g", "h", and "x" might be written:

Bf(Bg(Bhx))

Since this is too cluttery, we introduce a new notation. In our new notation, the above function would be written:

f * g * h * x

In general, "Bfg" will be abbreviated to "f * g". The use of "*" is, of course, suggested by "B"'s use for multiplication.

Also, we will sometimes abbreviate "NBfg" to "f + g". This is sometimes meaningful even when "f" and "g" are not iterators. Whereas "*" is used for composing unary functions, "+" is used to sort of "compose" binary functions. For instance, "add + mul" is a function that multiplies by number and then adds by the same number. That is, "(add + mul) 3" is a function that multiplies by 3 and then adds 3, and is the same as "add 3 * mul 3". Applying "(add + mul) 3" to, say, the number "2" would give "add 3 (mul 3 2)", or in other words, "9".

- It is "K(Kx)", a function that ignores its parameter and yields "Kx", which is a function that in turn ignores its parameter and yields "x". In other words, "K(Kx)" takes two parameters through currying, ignores them, and yields "x". "K(K(Kx))" would be a function that ignored three parameters, yielding "x".

- "0K" is a function that applies "K" no times at all. "0Kx" is then the same as, not "K(Kx)" or "Kx", but just "x". So, "0K" is the I combinator.

- Since "SB" is the successor function, we might expect "(SB)I" to
be the "2" iterator (the successor of "1"). It is. Remember that
in general "SfI" is the same as "Wf"; so, "SBI" is the same as "WB",
the "2" iterator. Also, we can see that "SBI" when applied
to parameters "f" and "x" gives
SBIfx Bf(If)x (S's rewrite rule) Bffx (I's rewrite rule) f(fx) (B's rewrite rule)

as we'd expect the "2" iterator to. - Since "KI" is the "0" iterator, we'd expect "SB(KI)" to be
the "1" iterator, the I combinator. It is, as we can see
when we apply "SB(KI)" to parameters "f" and "x":
SB(KI)fx Bf(KIf)x (S's rewrite rule) BfIx (K's rewrite rule) f(Ix) (B's rewrite rule) fx (I's rewrite rule)

- First, "3Kx" is "K(K(Kx))" the function that yields "x" after ignoring 3 parameters. So, "K(3Kx)" is "K(K(K(Kx)))", ignoring one more parameter than "3Kx"; i.e., it ignores 4 parameters and is the same as "4Kx".
- It is "5Kx", or in other words, "K(K(K(K(Kx))))".

- "(NB 2 3) K x" is
B(2K)(3K)x (N's rewrite rule) 2K(3Kx) (B's rewrite rule)

This is the application of "K" upon "x" 3 times and then 2 more times. In other words, it is "5Kx", or "K(K(K(K(Kx))))". We could have seen this easily if we'd noticed that "NB 2 3" (the sum of 2 and 3) is "5".

- It is "K(K(K(K(K(Kx)))))", or "6Kx". Note that "2K(2K(2Kx))" could also have been written "3(2K)x". This means that "3(2K)" (the application of "2K" three times) is the same as "6K".

- It is a function that applies "f" 6 times; in other words, "3(2f)"
is the same as "6f". This can be verified:
3(2f)x 2f(2f(2fx)) 2f(2f(f(fx))) 2f(f(f(f(fx)))) f(f(f(f(f(fx)))))

So, "3(2f)", given an "x", gives "f(f(f(f(f(fx)))))", just as "6f" does.

- It is a function that applies "f" 24 times, in other words, "24 f".
- It is a function that applies "f" 8 times, in other words, "8f". Note that this function could also have been written "3 2 f" (which, by "3"'s rewrite rule, is the same as "2(2(2f))").

- It is the square of "3", or in other words, "9".

The idea of using combinators in this way to represent natural numbers originated with Alonzo Church in the early 1930s. Actually, Church did not use the iterator for zero (he began with one); apparently, he did not like the zero iterator, because it is a function ("KI") that ignores its parameter (in general Church did not like functions that ignore their parameters).

We've already mentioned several combinators, including I, B, C, W, K, S, N, T, and several iterators, with these rewrite rules:

Ix = x Bfgx = f(gx) Cfxy = fyx Wfx = fxx Kxy = x Sfgx = fx(gx) Nfghx = f(gx)(hx) Txf = fx 0fx = x 1fx = fx 2fx = f(fx) 3fx = f(f(fx))

In this section, we'll describe several other interesting combinators, primarily variations of the above combinators.

We've already used the B combinator extensively. Now we'll focus on various iterations of "B". For instance, "2B" (read: "B squared"), or in other words, "BBB" or "B * B", has the rewrite rule:

(2B)fgxy = f(gxy)

This is clear, because "(2B)fgxy" is the same as

2Bfgxy B(Bf)gxy (2's rewrite rule) Bf(gx)y (B's rewrite rule) f(gxy) (B's rewrite rule)

Whereas "B" is used to compose a function "f" upon a unary function "g", "2B" can be used to compose "f" upon a binary function. For instance, "2B sqr plus" is a binary function that adds two numbers and then squares the result.

Note that "B sqr plus" (as opposed to "2B sqr plus") is not particularly meaningful; whereas "2B sqr plus" waits for two parameters "x" and "y" (so to speak), giving them to "plus" before taking the square, "B sqr plus" only waits for one parameter to give to "plus", and takes the square (the square is then taken of an adding function then, rather than a number). This actually could be meaningful after all if "sqr" is interpreted as the "2" iterator; for sake of clarity, however, we're using "sqr" and "plus" to mean the functions of squaring and addition over numbers in the ordinary sense, not iterators.

Now, "3B", as one might expect, has the rewrite rule:

(3B)fgxyz = f(gxyz)

because "(3B)fgxyz" is the same as

3Bfgxyz B(B(Bf))gxyz (3's rewrite rule) B(Bf)(gx)yz (B's rewrite rule) Bf(gxy)z (B's rewrite rule) f(gxyz) (B's rewrite rule)

"3B" could be useful when composing "f" with a tertiary function "g" that needs to take 3 parameters before the result is passed to "f". And, of course, "4B", "5B", (etc.) could be used for compositions upon functions of 4, 5, and more parameters.

- What is "2B (plus 1) times"?
- What is "2B plus times"?
- What is "0B plus 1"?

Unlike "B", when "C" is iterated, new interesting combinators are not formed. For instance, "2C" is the same as "I", because "C(Cf)" is in all cases the same as "f". This is because taking the converse of a function one time flips the order of its two parameters, but taking the converse again restores them to the original order.

Then, "3C" is the same as "C", and "4C" is the same as "I". In general, if "n" is an iterator, then "nC" is either "C" or "I"; it is "C" if "n" is odd, but "I" if "n" is even.

- What is "5C"?
- What is "1C"?
- What is "0C"?

The W combinator does yield interesting variations when iterated. For instance, "2W" has the rewrite rule:

(2W)fx = fxxx

This is because "(2W)fx" is the same as

2Wfx W(Wf)x (2's rewrite rule) Wfxx (W's rewrite rule) fxxx (W's rewrite rule)

Whereas "W" was used to unify a binary function to a unary one, "2W" can be used to unify a tertiary function to a unary one. Then, as would be expected, "3W" can be used to unify 4-ary functions, and has the rewrite rule:

(3W)fx = fxxxx

This is clear because "(3W)fx" is the same as:

(3W)fx W(W(Wf))x (3's rewrite rule) W(Wf)xx (W's rewrite rule) Wfxxx (W's rewrite rule) fxxxx (W's rewrite rule)

One can think of "3W" as making 3 duplicates of its second parameter, "x" (making a total of 4 instances).

- What is "2W (2B plus times)"?

We've already mentioned some iterations of "K" in the last section. We saw that while "K" can be used to form a function that ignores a parameter, "2K" can be used to form a function that ignores two parameters. In particular, "2K" has the rewrite rule:

(2K)xyz = x

We know this because "(2K)xyz" is the same as

2Kxyz K(Kx)yz (2's rewrite rule) Kxz (K's rewrite rule) x (K's rewrite rule)

Also, "3K" can be used to form a function that ignores three parameters; it has the rewrite rule:

(3K)xyzf = x

This is clear because "(3K)xyzf" is the same as

3Kxyzf K(K(Kx))yzf (3's rewrite rule) K(Kx)zf (K's rewrite rule) Kxf (K's rewrite rule) x (K's rewrite rule)

- What is "2KI"?
- What is "2K(2K)"?

We'll now look at what is formed when "B", and iterations of "B", are applied to "C". We've seen that "C" can be used to swap the first two parameters of a function that is at least binary. it has the rewrite rule:

Cfxy = fyx

Now, "BC" is similar to "C"; it takes a function and swaps its second and third parameters (leaving the first intact). It has the rewrite rule:

(BC)fxyz = fxzy

Note that the "y" and "z" are interchanged on the right. To verify that "BC" behaves this way, see that "(BC)fxyz" is the same as:

BCfxyz C(fx)yz (B's rewrite rule) fxzy (C's rewrite rule)

Now, "B(BC)", or in other words, "2BC", is also similar to "C". It takes a function and swaps its third and fourth parameters (leaving the first and second intact). It has the rewrite rule:

(2BC)fxyzg = fxygz

This can be verified by noting that "(2BC)fxyzg" is the same as:

2BCfxyzg B(BC)fxyzg BC(fx)yzg C(fxy)zg fxygz

As one might expect, "3BC" is like "2BC" except it postpones
the interchange for one more parameter (i.e., it swaps a function's
fourth and fifth parameters). In general, if "n" is an iterator,
then "nBC" takes a function and swaps its "n+1"^{th} and
"n+2"^{th} parameters.

- What is "BC (2B plus times)"?

We'll now look at the results of applying iterations of "B" to "W". We know that "W" can be used on an at-least binary function to duplicate its first parameter (i.e., make it take the place of the first and second parameters); it had the rewrite rule:

Wfx = fxxSimilarly, "BW" can be used on an at-least tertiary function to duplicate its second parameter, leaving the first parameter intact. It has the rewrite rule:

(BW)fxy = fxyy

This can be verified by noting that "(BW)fxy" is the same as:

BWfxy W(fx)y (B's rewrite rule) fxyy (W's rewrite rule)

As one might expect, "2BW", or in other words, "B(BW)", has the rewrite rule:

(2BW)fxyz = fxyzz

This is clear because "(2BW)fxyz" is the same as:

2BWfxyz B(BW)fxyz (2's rewrite rule) BW(fx)yz (B's rewrite rule) W(fxy)z (B's rewrite rule) fxyzz (W's rewrite rule)

It is clear that each time "B" is applied the operation of duplication is postponed another parameter.

- What is "BW (2B plus times)"?

The situation with "K" is similar to that of "C" and "W". We know that "K" can be used on a function (possibly a 0-ary function; i.e., not a function at all) to yield a function like the original, but with a "dummy" parameter inserted first in the parameter list. "K" has the rewrite rule:

Kfx = f

Now, "BK" is like "K", except that the second parameter of the resulting function is ignored, instead of the first. "BK" has the rewrite rule:

(BK)fxy = fx

This is clear because "(BK)fxy" is the same as:

BKfxy K(fx)y (B's rewrite rule) fx (K's rewrite rule)

As one would expect, applying "2BK" to a function results in the function but with a dummy parameter inserted after two real ones. Then, "3BK" takes a function and inserts a dummy parameter after three real ones.

- What is "BK (2B plus times)"?

Iterating "B" upon itself has some interesting results.
Recall that "B" essentially takes a function and yields a
function like the original, but with the first parameter "split"
into two (function and parameter). We see that "BB" is like
"B", except that it splits a function's *second* parameter
instead of its first:

(BB)fxgy = fx(gy)

Similarly, "2BB" would split a function's third parameter (skipping over 2 of them):

(2BB)fxygz = fxy(gz)

This is clear because "(2BB)fxygz" is the same as:

2BBfxygz B(BB)fxygz (2's rewrite rule) BB(fx)ygz (B's rewrite rule) B(fxy)gz (B's rewrite rule) fxy(gz) (B's rewrite rule)

- What is "B (2B plus times)"?
- What is "BB (2B plus times)"?
- What is "2BB (2B plus times)"?

We've now seen the results of iterating and composing the basic combinators B, C, W, and K. In summary, iterating each combinator three times gives:

3B = B * B * B = f\g\x\y\z\ f(gxyz) 3C = C * C * C = f\x\y\ fyx 3W = W * W * W = f\x\ fxxxx 3K = K * K * K = f\x\y\z\ f

Remember that by "*" we mean composition, and that iterating a function yields that function's composition with itself so many times.

Then, to summarize our discussion of compositions of B, C, W, and K, note that composing each two times gives:

2BB = B(BB) = f\x\y\g\z\ fxy(gz) 2BC = B(BC) = f\x\y\z\g\ fxygz 2BW = B(BW) = f\x\y\z\ fxyzz 2BK = B(BK) = f\x\y\z\ fxy

The B, C, W, and K combinators all take a function "f" and yield a function like "f", but with the parameter list rearranged (some parameters possibly duplicated, ignored, or applied to each other). Because these combinators behave this way, they are called "regular combinators". Applying "B" to any regular combinator yields another regular combinator like the original but with the effect on the parameter list postponed one parameter.

To more precisely define "regular combinator", we say that a
combinator is a **regular combinator** if and only if
it mentions its first parameter only one time, and at the very front of
the right side of the rewrite rule. The combinators I, B, C, W, K, and S
are all examples of regular combinators. The combinators T and M,
however, are irregular:

Txf = fx Mf = ff

We'll now examine interesting variations of the C combinator which we'll call "pushing" combinators; a pushing combinator is a regular combinator that when applied to a function yields a function like the original but with the function's first parameter pushed back some number of parameters (i.e., the first parameter is postponed).

The C combinator is the most elementary pushing combinator; it postpones a parameter back one place. The next pushing combinator is:

BC * C

Essentially, applied to a function, it takes the function's converse, pushing the first parameter back one place; then, "BC" is taken of the result, swapping the function's new second parameter with the third, pushing it back one further.

The next pushing combinator is:

2BC * BC * C

This combinator takes a function, swaps its first parameter with its second, its new second parameter with its third, and then its new third parameter with its fourth; in the final result, the original first parameter is swapped into the fourth place and the other three are shifted forward.

The pushing combinators are quite useful; thus, we'll use
a special syntax for naming them ... "BC * C" will be called
simply "C_{2}", and "2BC * BC * C" will be called "C_{3}"
and so forth.

- What is "C
_{2}(2B plus times) 3 4"?

There is another kind of variation of "C", "pulling" combinators. These behave in the opposite way as pushing combinators; whereas a pushing combinator postpones a parameter some number of places, a pulling combinator takes a parameter some number of places back and pulls it to the front. That is, a pulling combinator, when applied to a function of several parameters, yields a function like the original but with one of the later parameters taken first.

The C combinator is the most elementary pulling combinator; it pulls a function's second parameter to the front. The next pulling combinator is:

C * BC

When applied to a function, it swaps the function's second and third parameters, then swaps the new second parameter with the first. The result is that the third parameter is pulled to the first place while the original first and second parameters are pushed back.

The next pulling combinator is:

C * BC * 2BC

This combinator pulls a function's fourth parameter to the front.

It is interesting that pulling combinators are formed in quite a similar way to pushing combinators. Also, all pushing combinators and pulling combinators can be formed from just "B" and "C". For instance, the pulling combinator just mentioned would be written without the "*" abbreviation as:

BC(B(BC)(B(BC)))

Because pulling combinators are quite useful, we also use a special syntax for naming them;

C * BC

will be called "C_{2}^{-1}", while

C * BC * 2BC

will be called "C_{3}^{-1}". This is a natural
way of naming pulling combinators, since they are inverses of
corresponding pushing combinators (the inverse of a function
"f" is commonly (and naturally) written "f^{-1}").

- What is "(C * BC) (2B plus times) 3"?

- It's a binary function, taking two parameters, multiplying them together and then adding one.
- At first, this may appear to be meaningless, but it is
meaningful. Given two parameters "x" and "y", it yields
"plus (times x y)", a function that adds the product of "x" and "y".
So, this is a tertiary function, multiplying the first two
parameters together and then adding the third. It could
be written using lambdas as:
x\y\z\ (x*y) + z

- It is "plus 1". Remember that "0" applied to anything is "I"; so "0B" is "I" and "0B plus 1" is the same as "plus 1". This makes sense; whereas "2B" applied to "f" and "g" waits for two parameters to give to "g" before applying "f", and "B" waits for one parameter to give to "g", "0B" waits for no parameters at all and, applied to "f" and "g", is just "fg".

- It is "C", because it an odd iteration of "C". Another way to think of it is that "5C" is the same as "4C * C", and thus "I * C" (since "4C" is "I") and thus "C" (since composing anything with the identity is itself).
- It is just "C", the function that takes a converse just one time. Remember that "1" is the same as the I combinator.
- It is "I", the function that takes no converse at all. Another way to think of it is that since "0" is even, this is a special case of our rule that an even iteration of "C" is "I".

- "2B plus times" was the function mentioned earlier,
x\y\z\ (x*y) + z

So, applying "2W" to this unifies all three parameters, giving

x\ (x*x) + x

This function could also have been written,

N plus sqr I

- It is a function that ignores two parameters, yielding "I".
In other words, it takes three parameters, and yields the last.
It has the rewrite rule:
(2KI)xyz = z

- It is a function that ignores two parameters, yielding "2K",
which takes a parameter "z", ignores two more parameters,
yielding "z". In other words, it takes 5 parameters, and yields
the third. It has the rewrite rule:
(2K(2K))xyzfg = z

In general, if "m" and "n" are iterators, then "mK(nK)" is a function that ignores "m" parameters, takes the important parameter, ignores "n" more parameters, and yields the important one. It takes a total of "m+n+1" parameters, "m+n" being the number of parameters that are ignored.

Note that our last example, "2KI", could be written "2K(0K)"; as one would think from the rule stated above, it ignores two parameters, takes the important one, doesn't ignore any more, and yields the important one.

- We first recall that "2B plus times" is:
x\y\z\ (x*y) + z

Then, applying "BC" to this would interchange the second and third parameters, giving:

x\y\z\ (x*z) + y

- We know that "2B plus times" is:
x\y\z\ (x*y) + z

So, applying "BW" to this would duplicate the second parameter "y", making it take the place of the third parameter "z" also, giving:

x\y\ (x*y) + y

- It is like "2B plus times", but with a dummy parameter inserted
after the first real one. It could be written:
x\f\y\z\ (x*y) + z

- It is like "2B plus times", but the first parameter is split:
f\x\y\z\ (fx * y) + z

- It is like "2B plus times", but the second parameter is split:
x\f\y\z\ (x * fy) + z

- It is like "2B plus times", but the third parameter is split:
x\y\f\z\ (x * y) + fz

- "C
_{2}(2B plus times)" is like "2B plus times" but with the first parameter pushed back two places:y\z\x\ (x*y) + z

Then, applying "3" and "4" to this gives:

x\ (x*3) + 4

- It is like "2B plus times", but with the third parameter snuck
in to be "3":
x\y\ (x*y) + 3

Often in mathematics, one talks about "ordered pairs"
of numbers (or other things). The **tuple**, an ordered collection
of things, is the topic of this section. An ordered pair is
a kind of tuple, a tuple in which there are two elements.

A tuple of "x", "y", and "z", we'll write as "[x, y, z]". In general, a tuple can be written as a comma-seperated list of its members, enclosed by square brackets.

Tuples are quite useful. They are often used to represent points in coordinate geometry, and they (or varieties of them) are essential data structures in most programming languages.

It is possible to think of a tuple as a kind of function, a function that passes some parameters. For instance, the tuple "[2,3]" might be thought of as a function that takes a function and passes the parameters "2" and "3" to it; i.e., the function:

f\ f 2 3

If "[2,3]" is thought of in this way, then "[2,3] plus" is the same as "plus 2 3", or "5". The tuple "[2,3]" could be written using combinators as:

f\ f 2 3 C (f\ f 2) 3 C (T 2) 3

As another example, the tuple "[2,3,4]" could be thought of as a function that passes "2", "3", and "4" to a function, or in other words,

f\ f 2 3 4 C (f\ f 2 3) 4 C (C (f\f 2) 3) 4 C (C (T 2) 3) 4

A tuple with only one element in it, such as "[2]", is called
a **unit tuple**. Note that "[2]" is distinct from "2".
"[2]" is a function that passes "2", and is the same as "T 2".

A moment ago it was mentioned that "[2,3]" could be written using combinators as

C (T 2) 3

In general, "[x,y]" can be written with combinators as

C(Tx)y

Because of this, we can see that there is a binary function that yields the pair of two parameters:

x\y\ C(Tx)y C * T

This function is the same as "C_{2}I", or in other words,

(BC * C)I

In general, "C_{n}I" is a function that takes "n" parameters
forming a "n"-tuple with those parameters in it.

(this section needs to be finished)