Reading this material may cause interleaving of your brain hemispheres
You have been warned
Well, and why not? People often ask how to translate programs between INTERCAL and DD/SH, and the major stumbling block is the absence of INTERCAL-style arithmetic in DD/SH. This problem has now been resolved.
This article provides an example implementation of unary logical operators (or, and, and exclusive or) as well as the two main binary operators (interleave and select). A simple expression parser is also included, allowing the program to be used as a simple INTERCAL calculator (just like intercalc, only different).
The easiest way to see these functions in effect is to copy
the program to a local file, make it executable,
and run it. It will print a question mark, to which you can reply with
a simple expression, such as #V1 or #1¢#2.
The program will make the required calculation, and print the result
(in wimp mode): the two examples would therefore produce 32769 and
3, respectively. You can now enter another expression, or kill the program.
The expressions accepted are a subset of standard INTERCAL expressions:
registers and interleaving are not available, but interleave, select,
unary AND, unary OR and unary Exclusive OR are all available; the
interleave can be specified with the CLC-INTERCAL's "change" (¢)
or the C-INTERCAL's "big money" ($), as well as another currency
symbol which does not have a name in the INTERCAL character set (£).
Similarly, the exclusive OR can be specified using the CLC-INTERCAL's
"bookworm" (¥) or C-INTERCAL's "what" (?). In other word, the
following expressions are all equivalent, and all produce the same result
(2147483650):
#?1$#0, #?1£#0, #?1¢)#0,
#¥1$#0, #¥1£#0,
#¥1¢)#0.
The program defines three unary functions: f_and,
f_or and f_xor, which take one argument
and perform the relevant operation, assigning the result to shell
variable result. The arguments are reverse bit strings
with the correct number of bits: for example #1 is
represented as 1000000000000000 or 10000000000000000000000000000000
depending on whether it is to be understood as a 16 or 32 bit number.
The program also defines two binary functions: f_select
and f_interleave, which take two arguments and assign
the result of the calculation to shell variable result.
The numbers are represented in the same way as the unary functions,
so for example, after calling f_interleave 1000000000000000
1000000000000000 one finds 11000000000000000000000000000000
in $result.
Other useful functions are provided to convert a number to a bit
string and vice versa. Function parse_number takes
a number, without the initial #, and assigns a
bit string to the variable result. If the number
starts with a unary operation it also executes the operation,
so calling parse_number V1 assigns 1000000000000001
(32769) to result. Function num2wimp
converts a bit string to a number (in wimp mode), again assigning
the result to shell variable result. Hence
after calling num2wimp 1000000000000001 one could
do something like echo $result to produce 32769 in
the output.
The implementation of functions to convert bit strings to Roman numerals is left as an exercise to the reader
Glad you asked! It's all very simple, really. The representation of numbers as reverse bit strings is to allow arithmetic to proceed left-to-right, which is easier because one can just use DD to extract the first character, and another call to DD allows to skip the first character and have a string ready for another iteration.
The following program sections are described:
The program initialisation is copied from the standard DD/SH program template: it clears the PATH to prevent cheating and provides a variable to access the DD program. It also takes care of cleaning up of temporary files at the end of the program.
#!/bin/sh
# /bin/rm is not really required, but it is nice to clean up temporary files
PATH=
dd=/bin/dd
rm=/bin/rm
# temporary files we might need
tmp=/tmp/silly.$$
trap "$rm -f $tmp $tmp.*; exit" 0 HUP INT QUIT
# from now on, no more rm - the above trap is enough
unset rm
# we do interesting things with IFS, but better save it...
saveIFS="$IFS"
Some shells don't have a builtin "echo" command, and some inferior
operating systems may lack a /dev/zero. Yes, it sounds
unlikely in the 21st century, but such is life.
We replace these possibly missing functionality by defining shell functions; we also define "true" and "false" functions to use when writing loops and conditionals.
Function "zero" behaves essentially like a dd if=/dev/zero
except that it produces an endless file obtained by looping the
program's source code as many times as required; the "broken pipe"
signal tells us when to stop.
Echo () {
case "$1" in
-n) shift
$dd of=$tmp 2>/dev/null <<EOF
$@
EOF
IFS="+"
set `$dd if=$tmp bs=1 of=/dev/null skip=1 2>&1`
IFS="$saveIFS"
$dd if=$tmp bs=1 count=$1 2>/dev/null
;;
*) $dd 2>/dev/null <<EOF
$@
EOF
;;
esac
}
true () {
return 0;
}
false () {
return 1;
}
zero () {
( trap 'go=false' PIPE
go=true
while $go
do
$dd "if=$0"
case "$?" in
0) ;;
*) go=false ;;
esac
done
) 2>/dev/null
}
While parsing the expression, this program needs to call functions recursively. This means that local variables need to be stashed and retrieved otherwise things can get rather confused. Some shells allow to specify variables as local to functions, other shells don't. To make the program portable, we don't rely on this shell functionality: instead, we define two functions to stash and retrieve shell variables. These functions will also be useful in general programs.
Stashing (invoked by stash variablename)
works by copying the variable to be stashed to a new variable,
which has name s_variablename. If the
variable is stashed a second time, it is copied to
ss_variablename, and so on as many times
as necessary. Variable stash_variablename
contains the name of the variable used to stash variablename.
For example, after calling stash x twice, the two
values will be found in s_x and ss_x,
respectively, and variable stash_x will contain the
name of the last stash variable, ss_x
stash () {
eval sname=\$stash_$1
case "$sname" in
?*) sname="s$sname" ;;
*) sname="s_$1" ;;
esac
eval stash_$1="$sname"
eval $sname=\$$1
}
Retrieving works like stashing, only in reverse: it is invoked
with retrieve variablename, and it uses the
variable stash_variablename to find the name
of the shell variable containing the actual value to be retrieved;
after that, the variable stash_variablename
is modified so that it points at the next stash variable: this is
achieved by simply removing one letter "s" from the name.
retrieve () {
eval fname=\$stash_$1
eval value=\$$fname
eval $1=\$$fname
eval sv=\$stash_$1
value="`Echo -n $sv | $dd bs=1 skip=1 2>/dev/null`"
eval stash_$1=$value
}
The expression is parsed by assigning the text to be parsed to variable
text and calling function parse. This will
parse the expression, perform the calculation, and assign the result to
variable result. It will also remove from variable text
the text which was parsed successfully, so if a parse error occurs the
text left in this variable indicates the location of the error.
The function returns a true status if some text could be parsed (check if
$text is empty to see if there was some text left unparsed);
it returns a false status if it encounters an error.
Function parse expects to find a single term (such as a
number, or a subexpression enclosed in sparks or rabbit ears), optionally
followed by a binary operator and another term (and so on: any number of
binary operators can be used this way). After an initial call to
parse_term, which stores the term's value in result
and removes the text it parses, the function enters a loop in which it
checks for the presence of a binary operator. If one is found, a second
term is parsed, the operation executed, and the loop repeated. If no
operator is found, the function simply returns: the caller then determines
if the text left unparsed is a sign or parse error or what.
parse () {
if ! parse_term
then
return 1
fi
while true
do
case "$text" in
'¢'*) f=f_interleave ;;
'$'*) f=f_interleave ;;
'£'*) f=f_interleave ;;
'~'*) f=f_select ;;
*) return 0 ;;
esac
text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
left="$result"
stash left
stash f
if ! parse_term
then
retrieve f
retrieve left
return 1
fi
retrieve f
retrieve left
right="$result"
if ! $f result "$left" "$right"
then
return 1
fi
done
}
Function parse_term recognises a simple term, which can
be a number (identified by its mesh) or a subexpression enclosed in
rabbit ears or sparks. Anything else implies an invalid expression.
After removing an initial mesh, this function calls parse_number
to do the rest; after removing an initial spark or rabbit ears, this
function calls parse recursively to parse and remove the
subexpression, then it checks that a second spark or rabbit ears follows.
parse_term () {
code="`Echo -n "$text" | $dd bs=1 count=1 2>/dev/null`"
text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
case "$code" in
'#') if parse_number
then
return 0
else
return 1
fi
;;
'"') if ! parse
then
return 1
fi
case "$text" in
'"'*) ;;
*) return 1 ;;
esac
text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
return 0
;;
"'") if ! parse
then
return 1
fi
case "$text" in
"'"*) ;;
*) return 1 ;;
esac
text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
return 0
;;
*) return 1
;;
esac
}
Function parse_number attempts to convert a number into a
reverse bit string. If the number starts with an unary operator, it
calls itself recursively and then applies the operator to the result.
If the number does not start with an unary operator, it calls
parse_n1 repeatedly to process each digit, until that
funtion reports an error (the text does not start with a digit).
At this point, if any digit has been processed the function returns
a true status, otherwise it returns a false status (missing number).
parse_number () {
$debug "parse_number($text)"
case "$text" in
'V'*) f=f_or; c=true ;;
'¥'*) f=f_xor; c=true ;;
'?'*) f=f_xor; c=true ;;
'&'*) f=f_and; c=true ;;
*) f=true; c=false ;;
esac
stash f
problem=true
if $c
then
text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
if parse_number
then
problem=false
fi
else
result=0000000000000000
$debug "f=$f c=$c -- text=($text)"
while parse_n1
do
problem=false
$debug "result=$result text=($text)"
done
fi
retrieve f
if $problem
then
return 1
fi
arg="$result"
$f result "$arg"
$debug "result($result) -- text=($text)"
return 0
}
Function parse_n1 is practically self-explanatory. It multiplies
the number on result by 10 and adds the new digit. To do this,
it first convert the digit to binary (the first case statement),
then it loops over the 16 bits of result performing an addition
of four values: the result shifted by 1 and 3 bits (in other words, result
multiplied by 2 and 8: when added, these contribute result × 10),
the digit, and a carry bit (in fact, there are two carry bits, because the
four numbers can add to 4, but I won't discuss these minor details).
At the end of the loop, the carry is checked to verify that there has not
been an overflow.
parse_n1 () {
digit="`Echo -n "$text" | $dd bs=1 count=1 2>/dev/null`"
case "$digit" in
0) num=0000000000000000 ;;
1) num=1000000000000000 ;;
2) num=0100000000000000 ;;
3) num=1100000000000000 ;;
4) num=0010000000000000 ;;
5) num=1010000000000000 ;;
6) num=0110000000000000 ;;
7) num=1110000000000000 ;;
8) num=0001000000000000 ;;
9) num=1001000000000000 ;;
*) return 1 ;;
esac
text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
# check if result is already too big
case "$result" in
*1) return 1 ;;
*01) return 1 ;;
*001) return 1 ;;
esac
# shift left result by 1 and 3
res1="0`Echo -n "$result" | $dd bs=1 count=15 2>/dev/null`"
res3="000`Echo -n "$result" | $dd bs=1 count=13 2>/dev/null`"
# now add the numbers
c1=0
c2=0
temp=''
for digit in 0 1 2 3 4 5 6 7 8 9 A B C D E F
do
dig1="`Echo -n "$res1" | $dd bs=1 count=1 2>/dev/null`"
res1="`Echo -n "$res1" | $dd bs=1 skip=1 2>/dev/null`"
dig3="`Echo -n "$res3" | $dd bs=1 count=1 2>/dev/null`"
res3="`Echo -n "$res3" | $dd bs=1 skip=1 2>/dev/null`"
dign="`Echo -n "$num" | $dd bs=1 count=1 2>/dev/null`"
num="`Echo -n "$num" | $dd bs=1 skip=1 2>/dev/null`"
case "$dig1$dign$dig3$c1" in
0000) add=0; carry=0 ;;
0001) add=1; carry=0 ;;
0010) add=1; carry=0 ;;
0011) add=0; carry=1 ;;
0100) add=1; carry=0 ;;
0101) add=0; carry=1 ;;
0110) add=0; carry=1 ;;
0111) add=1; carry=1 ;;
1000) add=1; carry=0 ;;
1001) add=0; carry=1 ;;
1010) add=0; carry=1 ;;
1011) add=1; carry=1 ;;
1100) add=0; carry=1 ;;
1101) add=1; carry=1 ;;
1110) add=1; carry=1 ;;
1111) add=1; carry=2 ;;
esac
temp="$temp$add"
case "$c2$carry" in
00) c1=0; c2=0 ;;
01) c1=1; c2=0 ;;
02) c1=0; c2=1 ;;
10) c1=1; c2=0 ;;
11) c1=0; c2=1 ;;
12) c1=1; c2=1 ;;
esac
done
case "$c1" in
1) return 1 ;;
esac
case "$c2" in
1) return 1 ;;
esac
result="$temp"
return 0
}
Function num2wimp, called with a single argument, which
should contain a reverse bit string, converts its argument to a decimal
number (in wimp mode) and assigns the result to variable result.
Since it is easier to convert a bit string than a reverse bit string,
the function starts by reversing its argument. Then it removes a bit at
a time and calculates the result. The calculation performed for each
bit is essentially the addition of twice the accumulated result with
the new bit, and is essentially a simplification of the loop in
parse_n1, with the difference that the big case
statement has been replaced by a few calls to DD just for variety.
The result is produced in reverse decimal (123 is represented as 321),
because it is easy to calculate on, and reversed at the end.
num2wimp () {
reverse "$1"
num="$result"
result=0
while true
do
case "$num" in
?*) ;;
*) break ;;
esac
carry="`Echo -n "$num" | $dd bs=1 count=1 2>/dev/null`"
num="`Echo -n "$num" | $dd bs=1 skip=1 2>/dev/null`"
temp="$result"
result=''
while true
do
case "$temp" in
?*) ;;
*) break ;;
esac
dig="`Echo -n "$temp" | $dd bs=1 count=1 2>/dev/null`"
temp="`Echo -n "$temp" | $dd bs=1 skip=1 2>/dev/null`"
> $tmp
zero | $dd bs=2 count=$dig 2>/dev/null >> $tmp
zero | $dd bs=1 count=$carry 2>/dev/null >> $tmp
IFS="+"
set `$dd if=$tmp bs=1 of=/dev/null 2>&1`
IFS="$saveIFS"
case "$1" in
??*) carry="`Echo -n "$1" | $dd bs=1 count=1 2>/dev/null`"
ndig="`Echo -n "$1" | $dd bs=1 skip=1 2>/dev/null`"
;;
?) carry=0
ndig="$1"
;;
*) carry=0
ndig=0
;;
esac
result="$result$ndig"
done
case "$carry" in
0) ;;
*) result="$result$carry" ;;
esac
done
reverse "$result"
}
Function reverse helps num2wimp twice: to transform
the reverse bit string argument to a bit string, and to transform the reverse
decimal result to a decimal number. It works by removing one digit at a time
from its argument, and tacking it to the front of the result.
reverse () {
text="$1"
result=''
while true
do
case "$text" in
?*) ;;
*) return ;;
esac
first="`Echo -n "$text" | $dd bs=1 count=1 2>/dev/null`"
text="`Echo -n "$text" | $dd bs=1 skip=1 2>/dev/null`"
result="$first$result"
done
}
Interleave is the easiest INTERCAL operation to implement, as it is
simple text manipulation: after checking that its operands can be
represented in 16 bits, function f_interleave just
takes the bits in the right order and joins them all together.
Note that we assume that the numbers are represented as reverse
bit strings, so the bits of the second number are added before
the bits of the first number. The function is called with three
arguments: the name of a variable where the result will be stored,
and the two numbers.
f_interleave () {
res="$1"
num1="$2"
num2="$3"
case "$num1" in
????????????????)
;;
????????????????0000000000000000)
num1="`Echo "$num1" | $dd bs=1 count=16 2>/dev/null`"
;;
*)
Echo "Too many spots" 2>&1
return 1
;;
esac
case "$num2" in
????????????????)
;;
????????????????0000000000000000)
num2="`Echo "$num2" | $dd bs=1 count=16 2>/dev/null`"
;;
*)
Echo "Too many spots" 2>&1
return 1
;;
esac
val=''
for bits in 0 1 2 3 4 5 6 7 8 9 A B C D E F
do
dig1="`Echo -n "$num1" | $dd bs=1 count=1 2>/dev/null`"
num1="`Echo -n "$num1" | $dd bs=1 skip=1 2>/dev/null`"
dig2="`Echo -n "$num2" | $dd bs=1 count=1 2>/dev/null`"
num2="`Echo -n "$num2" | $dd bs=1 skip=1 2>/dev/null`"
val="$val$dig2$dig1"
done
eval $res=$val
return 0
}
Function f_select performs the corresponding INTERCAL operation
by extracting two bit strings from its first argument, depending on whether
the second argument has a 0 or a 1 in the corresponding bit position. The
function works by first making sure both numbers have the same number of
bits, and the executing a loop (one iteration per bit): this extracts a
bit from the second number: if this bit is 1, it adds the corresponding
bit of the first number to shell variable val, otherwise
it adds a zero to shell variable end. Therefore, after the
loop terminates, the result is simply the concatenation of these two
variables. The function is called with three arguments: the name of a
variable where the result will be stored, and the two numbers.
f_select () {
res="$1"
num1="$2"
num2="$3"
case "$num2" in
????????????????)
case "$num1" in
????????????????)
;;
????????????????????????????????)
num1="`Echo "$num1" | $dd bs=1 count=16 skip=16 2>/dev/null`"
;;
esac
;;
????????????????????????????????)
case "$num1" in
????????????????)
num1="${num1}0000000000000000"
;;
????????????????????????????????)
;;
esac
;;
esac
val=''
end=''
while true
do
case "$num1" in
?*) ;;
*) break ;;
esac
dig1="`Echo -n "$num1" | $dd bs=1 count=1 2>/dev/null`"
num1="`Echo -n "$num1" | $dd bs=1 skip=1 2>/dev/null`"
dig2="`Echo -n "$num2" | $dd bs=1 count=1 2>/dev/null`"
num2="`Echo -n "$num2" | $dd bs=1 skip=1 2>/dev/null`"
case "$dig2" in
0) end="${end}0" ;;
1) val="${val}$dig1" ;;
esac
done
eval $res=$val$end
return 0
}
All unary operations are implemented the same way: the difference is the
value produced by each combination of bits in the argument. Functions
f_or, f_xoe and f_and take two
arguments: the name of a variable where the result will be stored, and
the argument in the form of a reverse bit string. They simply call
function unary with their two arguments followed by four
bit values, representing the four possible combination of bits.
f_or () {
unary "$1" "$2" 0 1 1 1
}
f_xor () {
unary "$1" "$2" 0 1 1 0
}
f_and () {
unary "$1" "$2" 0 0 0 1
}
Function unary takes six arguments: the name of a variable where
the result will be stored, the number to operate on, and the four bit values
used to calculate the result. It first adds the least significant bit to the
end of the argument, then enters a loop: in each iteration, the first two
bits are used to generate a bit of the result, then one of these two bits
is removed from the number. The loop terminates when just one bit is left
(this would be the bit added to the number just before starting the loop).
unary () {
res="$1"
num="$2"
d1="$3"
d2="$4"
d3="$5"
d4="$6"
case "$num" in
0*) num="${num}0" ;;
1*) num="${num}1" ;;
*) Echo "Something bad happened, bit value is wrong" >&2 ; return 1 ;;
esac
val=''
while true
do
case "$num" in
?) break ;;
esac
dig="`Echo -n "$num" | $dd bs=1 count=2 2>/dev/null`"
num="`Echo -n "$num" | $dd bs=1 skip=1 2>/dev/null`"
case "$dig" in
00) val="${val}$d1" ;;
01) val="${val}$d2" ;;
10) val="${val}$d3" ;;
11) val="${val}$d4" ;;
esac
done
eval $res="$val"
}
With the above functions, it is a simple matter to write an INTERCAL calculator:
just obtain a line from standard input, call parse to do the parsing
and the calculation, and then call num2wimp to convert the result
to a decimal number in wimp mode.
Echo -n "? "
while read line
do
case "$line" in
?*) ;;
*) Echo -n "?? "
continue
;;
esac
text="$line"
parse
case "$result" in
*?) case "$text" in
?*) Echo "Parse error just before $text"
;;
*) num2wimp "$result"
Echo "$line -> $result"
;;
esac
;;
*) Echo "Something went wrong" ;;
esac
Echo -n "? "
done