INTERCAL functions in DD/SH

General Health Warning

Reading this material may cause interleaving of your brain hemispheres

You have been warned

Table of contents

Why?

Well, and why not? People often ask how to translate programs between INTERCAL and DD/SH, and the major stumbling block is the absence of INTERCAL-style arithmetic in DD/SH. This problem has now been resolved.

This article provides an example implementation of unary logical operators (or, and, and exclusive or) as well as the two main binary operators (interleave and select). A simple expression parser is also included, allowing the program to be used as a simple INTERCAL calculator (just like intercalc, only different).

The Example Calculator

The easiest way to see these functions in effect is to copy the program to a local file, make it executable, and run it. It will print a question mark, to which you can reply with a simple expression, such as #V1 or #1¢#2. The program will make the required calculation, and print the result (in wimp mode): the two examples would therefore produce 32769 and 3, respectively. You can now enter another expression, or kill the program.

The expressions accepted are a subset of standard INTERCAL expressions: registers and interleaving are not available, but interleave, select, unary AND, unary OR and unary Exclusive OR are all available; the interleave can be specified with the CLC-INTERCAL's "change" (¢) or the C-INTERCAL's "big money" ($), as well as another currency symbol which does not have a name in the INTERCAL character set (£). Similarly, the exclusive OR can be specified using the CLC-INTERCAL's "bookworm" (¥) or C-INTERCAL's "what" (?). In other word, the following expressions are all equivalent, and all produce the same result (2147483650): #?1$#0, #?1£#0, #?1¢)#0, #¥1$#0, #¥1£#0, #¥1¢)#0.

The Functions

The program defines three unary functions: f_and, f_or and f_xor, which take one argument and perform the relevant operation, assigning the result to shell variable result. The arguments are reverse bit strings with the correct number of bits: for example #1 is represented as 1000000000000000 or 10000000000000000000000000000000 depending on whether it is to be understood as a 16 or 32 bit number.

The program also defines two binary functions: f_select and f_interleave, which take two arguments and assign the result of the calculation to shell variable result. The numbers are represented in the same way as the unary functions, so for example, after calling f_interleave 1000000000000000 1000000000000000 one finds 11000000000000000000000000000000 in $result.

Other useful functions are provided to convert a number to a bit string and vice versa. Function parse_number takes a number, without the initial #, and assigns a bit string to the variable result. If the number starts with a unary operation it also executes the operation, so calling parse_number V1 assigns 1000000000000001 (32769) to result. Function num2wimp converts a bit string to a number (in wimp mode), again assigning the result to shell variable result. Hence after calling num2wimp 1000000000000001 one could do something like echo $result to produce 32769 in the output.

The implementation of functions to convert bit strings to Roman numerals is left as an exercise to the reader

How does it work?

Glad you asked! It's all very simple, really. The representation of numbers as reverse bit strings is to allow arithmetic to proceed left-to-right, which is easier because one can just use DD to extract the first character, and another call to DD allows to skip the first character and have a string ready for another iteration.

The following program sections are described:

Initialisation

The program initialisation is copied from the standard DD/SH program template: it clears the PATH to prevent cheating and provides a variable to access the DD program. It also takes care of cleaning up of temporary files at the end of the program.

#!/bin/sh

# /bin/rm is not really required, but it is nice to clean up temporary files

PATH=
dd=/bin/dd
rm=/bin/rm

# temporary files we might need
tmp=/tmp/silly.$$
trap "$rm -f $tmp $tmp.*; exit" 0 HUP INT QUIT

# from now on, no more rm - the above trap is enough
unset rm

# we do interesting things with IFS, but better save it...
saveIFS="$IFS"
    

Standard functions

Some shells don't have a builtin "echo" command, and some inferior operating systems may lack a /dev/zero. Yes, it sounds unlikely in the 21st century, but such is life.

We replace these possibly missing functionality by defining shell functions; we also define "true" and "false" functions to use when writing loops and conditionals.

Function "zero" behaves essentially like a dd if=/dev/zero except that it produces an endless file obtained by looping the program's source code as many times as required; the "broken pipe" signal tells us when to stop.

Echo () {
case "$1" in
  -n) shift
      $dd of=$tmp 2>/dev/null <<EOF 
$@
EOF
      IFS="+"
      set `$dd if=$tmp bs=1 of=/dev/null skip=1 2>&1`
      IFS="$saveIFS"
      $dd if=$tmp bs=1 count=$1 2>/dev/null
      ;;
  *)  $dd 2>/dev/null <<EOF 
$@
EOF
      ;;
esac
}

true () {
    return 0;
}

false () {
    return 1;
}

zero () {
  ( trap 'go=false' PIPE
    go=true
    while $go
    do
      $dd "if=$0"
      case "$?" in
	0) ;;
	*) go=false ;;
      esac
    done
  ) 2>/dev/null
}
    

Stashing and retrieving shell variables

While parsing the expression, this program needs to call functions recursively. This means that local variables need to be stashed and retrieved otherwise things can get rather confused. Some shells allow to specify variables as local to functions, other shells don't. To make the program portable, we don't rely on this shell functionality: instead, we define two functions to stash and retrieve shell variables. These functions will also be useful in general programs.

Stashing (invoked by stash variablename) works by copying the variable to be stashed to a new variable, which has name s_variablename. If the variable is stashed a second time, it is copied to ss_variablename, and so on as many times as necessary. Variable stash_variablename contains the name of the variable used to stash variablename. For example, after calling stash x twice, the two values will be found in s_x and ss_x, respectively, and variable stash_x will contain the name of the last stash variable, ss_x

stash () {
    eval sname=\$stash_$1
    case "$sname" in
	?*) sname="s$sname" ;;
	*)  sname="s_$1" ;;
    esac
    eval stash_$1="$sname"
    eval $sname=\$$1
}
    

Retrieving works like stashing, only in reverse: it is invoked with retrieve variablename, and it uses the variable stash_variablename to find the name of the shell variable containing the actual value to be retrieved; after that, the variable stash_variablename is modified so that it points at the next stash variable: this is achieved by simply removing one letter "s" from the name.

retrieve () {
    eval fname=\$stash_$1
    eval value=\$$fname
    eval $1=\$$fname
    eval sv=\$stash_$1
    value="`Echo -n $sv | $dd bs=1 skip=1 2>/dev/null`"
    eval stash_$1=$value
}
    

Parsing the expressions

The expression is parsed by assigning the text to be parsed to variable text and calling function parse. This will parse the expression, perform the calculation, and assign the result to variable result. It will also remove from variable text the text which was parsed successfully, so if a parse error occurs the text left in this variable indicates the location of the error. The function returns a true status if some text could be parsed (check if $text is empty to see if there was some text left unparsed); it returns a false status if it encounters an error.

Function parse expects to find a single term (such as a number, or a subexpression enclosed in sparks or rabbit ears), optionally followed by a binary operator and another term (and so on: any number of binary operators can be used this way). After an initial call to parse_term, which stores the term's value in result and removes the text it parses, the function enters a loop in which it checks for the presence of a binary operator. If one is found, a second term is parsed, the operation executed, and the loop repeated. If no operator is found, the function simply returns: the caller then determines if the text left unparsed is a sign or parse error or what.

parse () {
    if ! parse_term
    then
	return 1
    fi
    while true
    do
	case "$text" in
	    '¢'*) f=f_interleave ;;
	    '$'*) f=f_interleave ;;
	    '£'*) f=f_interleave ;;
	    '~'*) f=f_select ;;
	    *)    return 0 ;;
	esac
	text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
	left="$result"
	stash left
	stash f
	if ! parse_term
	then
	    retrieve f
	    retrieve left
	    return 1
	fi
	retrieve f
	retrieve left
	right="$result"
	if ! $f result "$left" "$right"
	then
	    return 1
	fi
    done
}
    

Function parse_term recognises a simple term, which can be a number (identified by its mesh) or a subexpression enclosed in rabbit ears or sparks. Anything else implies an invalid expression. After removing an initial mesh, this function calls parse_number to do the rest; after removing an initial spark or rabbit ears, this function calls parse recursively to parse and remove the subexpression, then it checks that a second spark or rabbit ears follows.

parse_term () {
    code="`Echo -n "$text" | $dd bs=1 count=1 2>/dev/null`"
    text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
    case "$code" in
	'#') if parse_number
	     then
		 return 0
	     else
		 return 1
	     fi
	     ;;
	'"') if ! parse
	     then
		 return 1
	     fi
	     case "$text" in
		  '"'*) ;;
		  *)    return 1 ;;
	     esac
	     text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
	     return 0
	     ;;
	"'") if ! parse
	     then
		 return 1
	     fi
	     case "$text" in
		  "'"*) ;;
		  *)    return 1 ;;
	     esac
	     text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
	     return 0
	     ;;
	*)   return 1
	     ;;
    esac
}
    

Function parse_number attempts to convert a number into a reverse bit string. If the number starts with an unary operator, it calls itself recursively and then applies the operator to the result. If the number does not start with an unary operator, it calls parse_n1 repeatedly to process each digit, until that funtion reports an error (the text does not start with a digit). At this point, if any digit has been processed the function returns a true status, otherwise it returns a false status (missing number).

parse_number () {
    $debug "parse_number($text)"
    case "$text" in
	'V'*) f=f_or;  c=true ;;
	'¥'*) f=f_xor; c=true ;;
	'?'*) f=f_xor; c=true ;;
	'&'*) f=f_and; c=true ;;
	*)    f=true;  c=false ;;
    esac
    stash f
    problem=true
    if $c
    then
	text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
	if parse_number
	then
	    problem=false
	fi
    else
	result=0000000000000000
	$debug "f=$f c=$c -- text=($text)"
	while parse_n1
	do
	    problem=false
	    $debug "result=$result text=($text)"
	done
    fi
    retrieve f
    if $problem
    then
	return 1
    fi
    arg="$result"
    $f result "$arg"
    $debug "result($result) -- text=($text)"
    return 0
}
    

Function parse_n1 is practically self-explanatory. It multiplies the number on result by 10 and adds the new digit. To do this, it first convert the digit to binary (the first case statement), then it loops over the 16 bits of result performing an addition of four values: the result shifted by 1 and 3 bits (in other words, result multiplied by 2 and 8: when added, these contribute result × 10), the digit, and a carry bit (in fact, there are two carry bits, because the four numbers can add to 4, but I won't discuss these minor details). At the end of the loop, the carry is checked to verify that there has not been an overflow.

parse_n1 () {
    digit="`Echo -n "$text" | $dd bs=1 count=1 2>/dev/null`"
    case "$digit" in
	0) num=0000000000000000 ;;
	1) num=1000000000000000 ;;
	2) num=0100000000000000 ;;
	3) num=1100000000000000 ;;
	4) num=0010000000000000 ;;
	5) num=1010000000000000 ;;
	6) num=0110000000000000 ;;
	7) num=1110000000000000 ;;
	8) num=0001000000000000 ;;
	9) num=1001000000000000 ;;
	*) return 1 ;;
    esac
    text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
    # check if result is already too big
    case "$result" in
	*1) return 1 ;;
	*01) return 1 ;;
	*001) return 1 ;;
    esac
    # shift left result by 1 and 3
    res1="0`Echo -n "$result" | $dd bs=1 count=15 2>/dev/null`"
    res3="000`Echo -n "$result" | $dd bs=1 count=13 2>/dev/null`"
    # now add the numbers
    c1=0
    c2=0
    temp=''
    for digit in 0 1 2 3 4 5 6 7 8 9 A B C D E F
    do
	dig1="`Echo -n "$res1" | $dd bs=1 count=1 2>/dev/null`"
	res1="`Echo -n "$res1" | $dd bs=1 skip=1 2>/dev/null`"
	dig3="`Echo -n "$res3" | $dd bs=1 count=1 2>/dev/null`"
	res3="`Echo -n "$res3" | $dd bs=1 skip=1 2>/dev/null`"
	dign="`Echo -n "$num" | $dd bs=1 count=1 2>/dev/null`"
	num="`Echo -n "$num" | $dd bs=1 skip=1 2>/dev/null`"
	case "$dig1$dign$dig3$c1" in
	    0000) add=0; carry=0 ;;
	    0001) add=1; carry=0 ;;
	    0010) add=1; carry=0 ;;
	    0011) add=0; carry=1 ;;
	    0100) add=1; carry=0 ;;
	    0101) add=0; carry=1 ;;
	    0110) add=0; carry=1 ;;
	    0111) add=1; carry=1 ;;
	    1000) add=1; carry=0 ;;
	    1001) add=0; carry=1 ;;
	    1010) add=0; carry=1 ;;
	    1011) add=1; carry=1 ;;
	    1100) add=0; carry=1 ;;
	    1101) add=1; carry=1 ;;
	    1110) add=1; carry=1 ;;
	    1111) add=1; carry=2 ;;
	esac
	temp="$temp$add"
	case "$c2$carry" in
	    00) c1=0; c2=0 ;;
	    01) c1=1; c2=0 ;;
	    02) c1=0; c2=1 ;;
	    10) c1=1; c2=0 ;;
	    11) c1=0; c2=1 ;;
	    12) c1=1; c2=1 ;;
	esac
    done
    case "$c1" in
	1) return 1 ;;
    esac
    case "$c2" in
	1) return 1 ;;
    esac
    result="$temp"
    return 0
}
    

Converting back to numbers

Function num2wimp, called with a single argument, which should contain a reverse bit string, converts its argument to a decimal number (in wimp mode) and assigns the result to variable result. Since it is easier to convert a bit string than a reverse bit string, the function starts by reversing its argument. Then it removes a bit at a time and calculates the result. The calculation performed for each bit is essentially the addition of twice the accumulated result with the new bit, and is essentially a simplification of the loop in parse_n1, with the difference that the big case statement has been replaced by a few calls to DD just for variety. The result is produced in reverse decimal (123 is represented as 321), because it is easy to calculate on, and reversed at the end.

num2wimp () {
    reverse "$1"
    num="$result"
    result=0
    while true
    do
	case "$num" in
	    ?*) ;;
	    *)  break ;;
	esac
	carry="`Echo -n "$num" | $dd bs=1 count=1 2>/dev/null`"
	num="`Echo -n "$num" | $dd bs=1 skip=1 2>/dev/null`"
	temp="$result"
	result=''
	while true
	do
	    case "$temp" in
		?*) ;;
		*)  break ;;
	    esac
	    dig="`Echo -n "$temp" | $dd bs=1 count=1 2>/dev/null`"
	    temp="`Echo -n "$temp" | $dd bs=1 skip=1 2>/dev/null`"
	    > $tmp
	    zero | $dd bs=2 count=$dig 2>/dev/null >> $tmp
	    zero | $dd bs=1 count=$carry 2>/dev/null >> $tmp
	    IFS="+"
	    set `$dd if=$tmp bs=1 of=/dev/null 2>&1`
	    IFS="$saveIFS"
	    case "$1" in
		??*) carry="`Echo -n "$1" | $dd bs=1 count=1 2>/dev/null`"
		     ndig="`Echo -n "$1" | $dd bs=1 skip=1 2>/dev/null`"
		     ;;
		?)   carry=0
		     ndig="$1"
		     ;;
		*)   carry=0
		     ndig=0
		     ;;
	    esac
	    result="$result$ndig"
	done
	case "$carry" in
	    0) ;;
	    *) result="$result$carry" ;;
	esac
    done
    reverse "$result"
}
    

Function reverse helps num2wimp twice: to transform the reverse bit string argument to a bit string, and to transform the reverse decimal result to a decimal number. It works by removing one digit at a time from its argument, and tacking it to the front of the result.

reverse () {
    text="$1"
    result=''
    while true
    do
	case "$text" in
	    ?*) ;;
	    *)  return ;;
	esac
	first="`Echo -n "$text" | $dd bs=1 count=1 2>/dev/null`"
	text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
	result="$first$result"
    done
}
    

Interleave

Interleave is the easiest INTERCAL operation to implement, as it is simple text manipulation: after checking that its operands can be represented in 16 bits, function f_interleave just takes the bits in the right order and joins them all together. Note that we assume that the numbers are represented as reverse bit strings, so the bits of the second number are added before the bits of the first number. The function is called with three arguments: the name of a variable where the result will be stored, and the two numbers.

f_interleave () {
    res="$1"
    num1="$2"
    num2="$3"
    case "$num1" in
	????????????????)
	    ;;
	????????????????0000000000000000)
	    num1="`Echo "$num1" | $dd bs=1 count=16 2>/dev/null`"
	    ;;
	*)
	    Echo "Too many spots" 2>&1
	    return 1
	    ;;
    esac
    case "$num2" in
	????????????????)
	    ;;
	????????????????0000000000000000)
	    num2="`Echo "$num2" | $dd bs=1 count=16 2>/dev/null`"
	    ;;
	*)
	    Echo "Too many spots" 2>&1
	    return 1
	    ;;
    esac
    val=''
    for bits in 0 1 2 3 4 5 6 7 8 9 A B C D E F
    do
	dig1="`Echo -n "$num1" | $dd bs=1 count=1 2>/dev/null`"
	num1="`Echo -n "$num1" | $dd bs=1 skip=1 2>/dev/null`"
	dig2="`Echo -n "$num2" | $dd bs=1 count=1 2>/dev/null`"
	num2="`Echo -n "$num2" | $dd bs=1 skip=1 2>/dev/null`"
	val="$val$dig2$dig1"
    done
    eval $res=$val
    return 0
}
    

Select

Function f_select performs the corresponding INTERCAL operation by extracting two bit strings from its first argument, depending on whether the second argument has a 0 or a 1 in the corresponding bit position. The function works by first making sure both numbers have the same number of bits, and the executing a loop (one iteration per bit): this extracts a bit from the second number: if this bit is 1, it adds the corresponding bit of the first number to shell variable val, otherwise it adds a zero to shell variable end. Therefore, after the loop terminates, the result is simply the concatenation of these two variables. The function is called with three arguments: the name of a variable where the result will be stored, and the two numbers.

f_select () {
    res="$1"
    num1="$2"
    num2="$3"
    case "$num2" in
	????????????????)
	    case "$num1" in
		????????????????)
		    ;;
		????????????????????????????????)
		    num1="`Echo "$num1" | $dd bs=1 count=16 skip=16 2>/dev/null`"
		    ;;
	    esac
	    ;;
	????????????????????????????????)
	    case "$num1" in
		????????????????)
		    num1="${num1}0000000000000000"
		    ;;
		????????????????????????????????)
		    ;;
	    esac
	    ;;
    esac
    val=''
    end=''
    while true
    do
	case "$num1" in
	    ?*) ;;
	    *) break ;;
	esac
	dig1="`Echo -n "$num1" | $dd bs=1 count=1 2>/dev/null`"
	num1="`Echo -n "$num1" | $dd bs=1 skip=1 2>/dev/null`"
	dig2="`Echo -n "$num2" | $dd bs=1 count=1 2>/dev/null`"
	num2="`Echo -n "$num2" | $dd bs=1 skip=1 2>/dev/null`"
	case "$dig2" in
	    0) end="${end}0" ;;
	    1) val="${val}$dig1" ;;
	esac
    done
    eval $res=$val$end
    return 0
}
    

Unary operations

All unary operations are implemented the same way: the difference is the value produced by each combination of bits in the argument. Functions f_or, f_xoe and f_and take two arguments: the name of a variable where the result will be stored, and the argument in the form of a reverse bit string. They simply call function unary with their two arguments followed by four bit values, representing the four possible combination of bits.

f_or () {
    unary "$1" "$2" 0 1 1 1
}

f_xor () {
    unary "$1" "$2" 0 1 1 0
}

f_and () {
    unary "$1" "$2" 0 0 0 1
}
    

Function unary takes six arguments: the name of a variable where the result will be stored, the number to operate on, and the four bit values used to calculate the result. It first adds the least significant bit to the end of the argument, then enters a loop: in each iteration, the first two bits are used to generate a bit of the result, then one of these two bits is removed from the number. The loop terminates when just one bit is left (this would be the bit added to the number just before starting the loop).

unary () {
    res="$1"
    num="$2"
    d1="$3"
    d2="$4"
    d3="$5"
    d4="$6"
    case "$num" in
	0*) num="${num}0" ;;
	1*) num="${num}1" ;;
	*)  Echo "Something bad happened, bit value is wrong" >&2 ; return 1 ;;
    esac
    val=''
    while true
    do
	case "$num" in
	    ?) break ;;
	esac
	dig="`Echo -n "$num" | $dd bs=1 count=2 2>/dev/null`"
	num="`Echo -n "$num" | $dd bs=1 skip=1 2>/dev/null`"
	case "$dig" in
	    00) val="${val}$d1" ;;
	    01) val="${val}$d2" ;;
	    10) val="${val}$d3" ;;
	    11) val="${val}$d4" ;;
	esac
    done
    eval $res="$val"
}
    

Main program

With the above functions, it is a simple matter to write an INTERCAL calculator: just obtain a line from standard input, call parse to do the parsing and the calculation, and then call num2wimp to convert the result to a decimal number in wimp mode.

Echo -n "? "
while read line
do
    case "$line" in
	?*) ;;
	*)  Echo -n "?? "
	    continue
	    ;;
    esac
    text="$line"
    parse
    case "$result" in
	*?) case "$text" in
		?*) Echo "Parse error just before $text"
		    ;;
		*)  num2wimp "$result"
		    Echo "$line -> $result"
		    ;;
	    esac
	    ;;
	*)  Echo "Something went wrong" ;;
    esac
    Echo -n "? "
done