Efficient BASIC coding for the ZX Spectrum (III)

[Click here to read this in English ]

Éste es el tercero de una serie de artículos que explican los fundamentos de la (in)eficiencia de los programas en BASIC puro para el ZX Spectrum:
I. Sobre los números de línea
II. Sobre las variables
III. Sobre las expresiones
IV. Funcionalidades diversas y medida del tiempo
V. Operaciones en la pantalla basadas en caracteres

Empezaremos esta entrada señalando un hecho bien conocido en la ciencia de la computación:

Las expresiones forman todo un sub-lenguaje dentro de cualquier lenguaje imperativo

Esto significa que evaluar una expresión, por ejemplo 2*cos(x) - 23 o "hola, " + "mundos"(1 TO 5), necesita un trabajo importante de análisis sintáctico además de la propia evaluación, trabajo que, en el caso del ZX Spectrum, realiza el intérprete de la ROM con ayuda del calculador. Por tanto, en máquinas con escasa capacidad de cómputo es necesario tener una especial atención con las expresiones.

De forma general, esto implica que:

Deberíamos utilizar expresiones cortas (en el sentido de pocas operaciones), y simples (en el sentido de operaciones no demasiado complejas ni que involucren demasiados pasos de interpretación / cálculo).

La utilidad de análisis de ZX-Basicus (-a) puede ayudar en este aspecto, pues produce una lista ordenada por complejidad decreciente de todas las expresiones del programa: número de elementos involucrados en la expresión, tanto operaciones como datos, sin considerar sus longitudes textuales ni en memoria. Esto sirve para identificar rápidamente las expresiones más complejas (que deberían ser pocas). Asimismo, el intérprete incluido en ZX-Basicus (-r) puede generar información del perfil de frecuencias de ejecución (--profile) de las expresiones del programa, ordenadas de más frecuentes a menos, lo que completa la información necesaria para saber si hay expresiones complicadas que son evaluadas frecuentemente y que, por tanto, deberían tener formas más simples.

Se ha de tener especial cuidado de que las expresiones sean cortas y simples especialmente dentro de los bucles que más frecuentemente se repitan y que tengan necesidades de rapidez. Esto excluye casi del todo las llamadas a funciones de usuario: FN no es conveniente, ya que no sólo implica el análisis sintáctico de la expresión y su evaluación, sino la búsqueda previa de la línea que contiene al correspondiente DEF FN y la copia de los argumentos en ese sitio, algo que ya señalábamos como fuente de ineficiencia en la primera entrega.

Asimismo, si una expresión se repite en varios lugares cercanos, es conveniente usar una variable para almacenar su valor y así calcularla sólo una vez: siempre es más rápido evaluar una sola variable que la expresión completa. No hay que olvidar inicializar la variable al principio de la ejecución del programa para acelerar aún más sus accesos, como explicábamos en la segunda entrega.

Deteniéndonos en los componentes de cualquier expresión, encontramos que éstas constan de cuatro: literales (números y cadenas de caracteres en el caso del ZX), operadores (+,-,*,OR, AND, indexaciones de tablas o cadenas,…), llamadas a funciones (COS, SIN, INT, FN(),…) y agrupadores (paréntesis). Como ya se ha dicho, algunas llamadas a funciones son bastante ineficientes; los paréntesis pueden eliminarse siempre que se comprendan las precedencias de evaluación de los operadores; los operadores no pueden optimizarse demasiado en sí mismos (véanse los siguientes apartados); pero de los literales podemos decir más cosas:

Los literales numéricos que forman parte de las expresiones pueden ser optimizados, especialmente en su tamaño en memoria, pero eso tiene consecuencias en el tiempo de ejecución.

NOTA: En un programa BASIC ya “tokenizado” en memoria, cada literal numérico está almacenado como el texto que muestra su valor seguido de 6 bytes con el mismo valor codificado en binario; curiosamente, no se almacenan nunca valores negativos en la codificación binaria -aunque el intérprete es capaz de manejarlos sin problemas-: el operador “-” que precede a dichos literales no se considera parte del número, sino una operación previa, distinta del literal.

Existe truco muy utilizado para reducir el tamaño en memoria de los literales numéricos: consiste en utilizar la función VAL y una cadena de texto para reemplazar un literal numérico. Por ejemplo, escribir VAL "32768" en lugar del literal 32768. El literal ocupa en memoria tantos bytes como dígitos tiene el número más 6 por culpa de un valor oculto que crea el intérprete cuando hace el análisis léxico; mientras que si usamos VAL ocupamos los mismos bytes debidos a los dígitos y sólo 3 más (1 byte de VAL más las dos comillas).

La herramienta de optimización de ZX-Basicus (-t) incluye una opción para hacer automáticamente este truco en un programa fuente (--valtrick). Sin embargo, hay que tener tres cosas en cuenta a la hora de usarlo, y nunca hacerlo sin reflexionar sobre cada caso:

Esta técnica ahorra memoria siempre… que no haya otra forma de obtener el número sin usar su literal. Por ejemplo, PI es mucho más eficiente en memoria que 3.1416 o que VAL "3.1416", ya que ocupa 1 byte (PI se tokeniza), por lo que no sólo es lo más corto sino lo más preciso; es más, se puede usar NOT PI en cualquier expresión como equivalente a cero, o SGN PI como equivalente a 1, y sólo ocuparán 2 bytes (los dos tokens) en lugar de los 7 que ocuparía el literal. También podemos ahorrar memoria usando notación científica: se puede escribir 4E3 en vez de 4000, ahorrando 1 byte (VAL "4E3" en lugar de VAL "4000" ahorra también 1 byte, por lo que no merece la pena por su tiempo extra de cómputo).
Hay que considerar que esta técnica incrementa el tiempo de ejecución, ya que sustituye un literal por una expresión más compleja. No es lo mismo tener que evaluar una expresión, y mucho menos un VAL o un VAL$, para los que el intérprete debe crear un contexto nuevo de evaluación de expresión dentro del original, que leer el valor literal y usarlo directamente. Éste es el principal inconveniente. Hay ocasiones en que, sin embargo, no importa esa pérdida de eficiencia: por ejemplo, PAUSE NOT PI está plenamente justificado puesto que ahorra 4 bytes respecto a PAUSE 0 y el retardo adicional de la evaluación de la expresión NOT PI no va a influir en absoluto en el tiempo que tarde el usuario en pulsar una tecla.
Muchos programas en BASIC puro del ZX suelen tener memoria suficiente para almacenarse, debido a su relativamente escasa complejidad (otra cosa son los programas en ensamblador, que necesitan cantidades grandes de datos para gráficos y demás), por lo que hay que pensar si realmente es necesario ahorrar tan poco espacio. Por ejemplo, si no se modifican los canales del sistema y si no se ha hecho CLEAR y se ha dejado el tope usable de RAM en 65367 para almacenar tras él los 21 UDGs, la memoria para el BASIC estará como sigue: el programa comenzará en la dirección 23755 y el último byte de las zonas más altas de trabajo estará en 41926 al comenzar a ejecutarlo. Esto nos da 41926 – 23755 = 18171 bytes, que, si sólo ocupáramos con el programa (no es cierto), nos permitiría almacenar del orden de 2019 líneas vacías, sólo con el número de línea y la información de su tamaño y marca de fin: esto ocupa 9 bytes por línea si los números de línea tienen cuatro dígitos. Normalmente, los programas BASIC que se escriben tienen mucho menos de 2000 líneas, y habitualmente no son muy largas (aunque hay que tener en cuenta lo que decíamos en la primera entrega de esta serie sobre eso).

Para tener una idea más precisa de cuánto supone usar el truco del VAL en un programa, se puede ejecutar el siguiente, que mide el tiempo de uso repetido del mismo:

10  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0 :  FOR f = 1 TO 10000 :  LET c =  VAL “65535” :  NEXT f :  LET T =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT “VAL: “ ; T ; ”  “ ; T * 0.02 ; ”  “ ;  ( T * 0.02 / 10000 ) 

Si se ejecuta este programa (unos 2 minutos), se vuelve luego a ejecutar tras cambiarle solamente el VAL "65535" por 65535, y se hace la diferencia entre los tiempos medios por iteración del bucle en ambos casos, se puede ver que el uso de VAL es 15 milisegundos más lento, de media, que el literal numérico simple, lo cual es muchísimo tiempo cuando se ejecuta con frecuencia la expresión (la precisión de este cálculo de tiempos es de +/-115.5 microsegundos con el 95% de probabilidad).

El resto de esta entrada lo dedicaremos a cada uno de los principales tipos de expresiones que existen en Sinclair BASIC (y en cualquier lenguaje de programación imperativo).

Las expresiones de manipulación de textos son particularmente costosas para el intérprete de BASIC.

Aun así, también pueden ser optimizadas. Es poco eficiente sumar (concatenar textos), pues implica cambiar su tamaño en memoria y, por tanto, desplazar una cantidad potencialmente grande de datos en el proceso, como ya se explicó en la creación de variables en la segunda entrega. Asimismo, comparar textos es un proceso costoso, del orden de O(n), siendo n la menor longitud de los dos textos. Si nos aseguramos de que uno de los dos términos de la comparación es muy corto (1 ó 0 caracteres), incrementamos la eficiencia.

A veces es más conveniente para ahorrar memoria almacenar los datos numéricos como textos (en lugar de los 5 bytes que necesita un valor numérico). Por ejemplo, si los números que almacena son enteros menores de 10000, una tabla con los mismos resulta más corta en memoria si se introducen como cadenas de caracteres, a costa de un mayor coste de ejecución al recuperarlos más tarde como números por medio de la función VAL.

En cuanto a las expresiones lógicas, hay que tener bien presente que no funcionan exactamente igual que en los lenguajes modernos, y también que existen varias expresiones distintas, con distinta eficiencia, para algunos cálculos comunes.

Las expresiones lógicas son aquéllas que se espera que sólo den dos valores, equivalentes de alguna manera a “verdadero” y “falso”. En Sinclair BASIC dichas expresiones se forman con los operadores lógicos NOT, AND y OR, aunque los dos últimos son algo más potentes que los habituales, como se verá en el apartado de expresiones numéricas. El punto importante para la eficiencia en este tipo de expresiones es que el intérprete no hace evaluación perezosa o, más concretamente, de “cortocircuito”: los dos operandos son evaluados antes de evaluar los operadores AND y OR, cuando, en realidad, podría ahorrarse la evaluación del segundo si el primero es 0 ó 1, respectivamente. Por tanto, para ahorrar tiempo de ejecución deberíamos usar sólo expresiones muy simples como segundo operando de estos operadores.

Además, hay varios cálculos lógicos muy comunes que pueden implementarse con varias expresiones distintas, de modo que es conveniente seleccionar la más eficiente. En la siguiente tabla recogemos algunas que el programador @igNaCoBo ha utilizado en su juego ArkanoidB2B:

Cálculo	Expresión	T (ms)	Ganancia (microsegs = us)
¿Es `g` distinto de `0`?	`g <> 0`	38,38
	`abs g`	37,78	600 us + rápido que `g<>0`
	`sgn g`	37,76	620 us + rápido que `g<>0`
	`g`	37,04	1,34 milisegs + rápido que `g<>0`
¿Es `g` igual a `0`?	`g = 0`	37,60	780 us + rápido que `g<>0`
	`not g`	37,12	480 us + rápido que `g=0`
¿Es `g` mayor que `0`?	`g > 0`	37,60	780 us + rápido que `g<>0`; igual que `g=0`
¿Es `g` menor que `0`?	`g < 0`	37,80	580 us + rápido que `g<>0`; 200 us + lento que `g=0`
¿Es `g` igual a `1`?	`g = 1`	37,68	80 us + lento que `g=0`
¿Es `g` igual a `-1`?	`g = -1`	38,40	800 us + lento que `g=0`

Los tiempos de la tabla se han obtenido usando el programa patrón que se muestra abajo. La columna “T” de la tabla recoge los tiempos esperados en una iteración del bucle de las líneas 10 a 50; las entradas en cursiva de la última columna son sólo resultados interesantes, quizás no muy útiles para optimizar.

1  CLS  :  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0
10  FOR f =  – 499 TO 500
20  LET h = f + 499 :  LET g = h –  INT  ( h / 3 )  * 3 – 1 :  PRINT  AT 0 , 0 ; g ; ” “ ; 
30  IF g <> 0 THEN  PRINT “T “ :  GOTO 50
40  PRINT “F “
50  NEXT f
60  LET t =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT “total: “ ; t ; ” s (“ ;  ( t / 60 )  ; ” m)” ‘ “iter time: “ ;  ( t * 0.02 / 1000 )  ; ” secs”

Este programa ha sido diseñado para que se evalúe la expresión para casos en que g < 0, g = 0 y g > 0 (el cálculo de la variable g en la línea 20 produce -1, 0 ó 1 cíclicamente). El programa repite 1000 iteraciones de un bucle en el que se evalúa dicha expresión y se hacen otras cosas; si dividimos por 1000 obtenemos una estimación del tiempo esperado para una sola iteración (columna “T”). Ese tiempo puede entonces compararse con el del mismo programa cuando se cambia la expresión a evaluar por otra, de lo que deducimos los datos de la última columna.

Finalmente, las expresiones numéricas ofrecen posibilidades de optimización importantes si son enteras y trabajan con números que no excedan los 16 bits.

El calculador de la ROM es capaz de distinguir dos tipos de datos: enteros y reales (representados en memoria en coma flotante). Los primeros, en particular aquéllos dentro del rango -65535 hasta +65535, los manipula mucho más rápidamente que los segundos, por lo que, por ejemplo, no es conveniente usar bucles FOR con pasos (STEP) reales en lugares que necesiten rapidez (por cierto, las expresiones para el paso y el límite de un bucle FOR son evaluadas sólo una vez, y guardadas junto a la variable del bucle para no tener que repetir ese cálculo). Tampoco deben usarse funciones reales (trigonometría, por ejemplo) en zonas críticas. Una alternativa, si no tenemos más remedio que usar números reales, sería representar con números enteros esos valores usando coma fija, es decir, suponer que una serie de dígitos del número son la parte decimal, aunque no lo sean, y trabajar siempre con él como entero. Por ejemplo: 255 puede significar para nosotros 2.55, aunque lo usemos como 255 en el programa. Las sumas y restas no cambian el valor final por el hecho de usar coma fija; la coma puede cambiar de sitio sólo en la multiplicación y división. Otra alternativa es precalcular los valores de las expresiones reales al principio del programa (o cargarlos en una tabla desde cinta) y sólo consultarlos luego; por ejemplo, preparar una tabla de senos o cosenos.

Cuando nos haga falta un número entero a partir de un valor real siempre podemos usar la función INT, pero hay que tener en cuenta que ésta hace un redondeo hacia abajo, es decir, elimina la parte decimal del número por completo. Si queremos un redondeo hacia arriba (obtener el número entero más pequeño que sea mayor o igual al valor real que estamos redondeando) podemos aprovecharnos de que el intérprete de BASIC hace exactamente esa operación cuando a una sentencia que necesita un número entero se le pasa uno real. Así, si hacemos LET a = INT 10.6 obtendremos el valor 10, pero si hacemos POKE 40000,10.6 el valor que obtendremos con PEEK 40000 será 11. En cualquier caso, ten en cuenta que el tiempo que tarda en ejecutar INT(10.6 + 0.5), que hace el mismo redondeo hacia arriba, puede ser menor: en el caso de usar POKE/PEEK como antes, INT tarda aproximadamente 3 milisegundos menos. También es más eficiente en cuanto al consumo de memoria.

Un caso particular de expresiones con números enteros que se puede acelerar en BASIC es la descomposición de valores de 16 bits en sus dos bytes (alto y bajo). Esto es necesario en diversas ocasiones, pero el BASIC del ZX Spectrum no tiene ninguna función que lo realice, por lo que habría que escribir un código bastante ineficiente en tiempo de cómputo: LET h = int(v/256) : LET l = v - h * 256. Hay una forma de hacer esta descomposición más rápidamente aprovechándose de rutinas de la ROM ideadas para otros asuntos: RANDOMIZE v : LET l = PEEK 23670 : LET h = PEEK 23671 , que da el mismo resultado puesto que RANDOMIZE se encarga de la descomposición por nosotros, dejándola en la variable del sistema SEED, localizada en las direcciones de memoria 23670 y 23671. Eso sí, estamos cambiando la semilla de los números aleatorios que generemos para el programa a partir de ese momento, y ¡no podemos hacerlo si v es 0, porque entonces el comando RANDOMIZE no guarda el 0 en esa variable del sistema!

Otro caso especial de aceleración de expresiones numéricas es cuando queremos incrementar una variable de manera cíclica, es decir, que vaya de 1 hasta cierto valor n y luego vuelva a 1. Esto se puede hacer sin IF así: LET v = v + 1 OR v = n , donde además hemos aprovechado las precedencias en la evaluación de operadores para evitarnos tener que escribir los paréntesis. Este truco se basa en el comportamiento del operador OR en Sinclair BASIC, que no es un OR lógico ni de bits exactamente. Existe la correspondiente variante con AND, que permite ir cíclicamente entre 0 y n-1: LET v = v + 1 AND v < n - 1, quizás más natural para los programadores de lenguajes donde los índices de arrays empiezan en 0.

Se puede medir la ganancia en tiempo de estos incrementos cíclicos de variable entera mediante el siguiente programa:

1  CLS  :  LET v = 1 :  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0
10  FOR f = 1 TO 1000
20  LET v = v + 1 :  IF v = 11 THEN  LET v = 1
30  PRINT  AT 0 , 0 ; v ; ”  “
50  NEXT f
60  LET t =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT “total: “ ; t ; ” s (“ ;  ( t / 60 )  ; ” m)” ‘ “iter time: “ ;  ( t * 0.02 / 1000 )  ; ” secs”

Ejecutándolo primero tal y cual está, y luego cambiando la línea 20 por:

20  LET v = v + 1 OR v = 10

nos da una diferencia de tiempos de 540 microsegundos a favor de la segunda opción. Esta aceleración puede ser importante (¡ejecutar tan sólo dos veces esta nueva línea 20 nos mejora los tiempos en más de 1 milisegundo!). Por otra parte, si hacemos lo análogo con los incrementos cíclicos de 0 a n-1 obtenemos una mejora de 420 microsegundos respecto a usar IF. De este experimento también se puede deducir que contar de 0 a n-1 de manera acelerada es 540 microsegundos más rápido que contar de manera acelerada de 1 a n.

Para terminar esta sección hay que recordar que toda expresión que contenga partes que no varían debería ser re-escrita con esas partes ya evaluadas, un trabajo que el intérprete de la ROM, al contrario que los compiladores modernos, no hará por nosotros. Esto gana tanto en tiempo como en espacio. Por ejemplo, no escribas A * PI / 180 para pasar el valor de A de grados a radianes; es más conveniente escribir A * 0.017453293, aunque quede más feo y no deba hacerse en plataformas y lenguajes actuales.

.oOo.

[Click here to read this in Spanish ]

This is the third in a series of posts that explain the foundations of the (in)efficiency of pure BASIC programs written for the ZX Spectrum:
I. On line numbers
II. On variables
III. On expressions
IV. Some statements and time measurement
V. Screen operations based on characters

We begin this post by recalling a well-known fact in computer science:

Expressions are an entire sub-set of any imperative programming language

That means that evaluating an expression, let say 2*cos(x) - 23 or "hello, " + "worlds"(1 TO 5) involve an important work of syntactical analysis in addition to the very evaluation, a work that, in the case of the ZX Spectrum, must be done by the interpreter coded in the ROM at execution time. In machines like that, with scarce computational resources, it is therefore necessary to pay special attention to expressions.

In general, for improving efficiency,

We should use expressions that are short (few operations in them) and simple (not too complex operations or involving too many steps of interpretation / calculation).

The ZX-Basicus utility can help with that through its analysis tool (-a): it produces a list of all the expressions of a program ordered by decreasing complexity (number of elements involved in them, both operations and data, without considering their textual lengths). This serves to identify the most complex ones, that should be few. Also, the ZX-Basicus interpreter (-r) can generate a profile (--profile) with the frequency of use of each expression, which completes the information needed to decide whether there are too complex expressions that are evaluated frequently and, therefore, need to be shortened and simplified.

Expressions should be short and simple specially those inside loops that are frequently executed and that need to be fast. This excludes almost completely to use user function calls: FN involves not only the syntactical analysis of the expression and its evaluation, but a search of the line that contains the corresponding DEF FN and the copy of the arguments into the placeholders of that function, as it was also explained in the first post.

If the same expression is repeated in nearby places, it is convenient to store its result once and reuse it the rest of times: it is always faster to evaluate a single variable than a whole expression. Do not forget to initialize the variable once at the beginning of the execution (along the rest of variables that the program uses) in order to make this even more efficient, as it was explained in the second post of this series.

Let deep in more detail in the components of any expression. They are four: literals (numbers and strings in the case of the ZX), operators (+,-,*,OR, AND, indexing of tables or strings,…), function calls (COS, SIN, INT, FN(),…) and groupings (parentheses). As I said before, function calls are pretty inefficient; parentheses can be avoided as long as we know well the operator evaluation precedences; operators cannot be optimized much (see the rest of this post); but we can say a little more about literals:

Numeric literals can be optimized, especially in its storage, but that has consequences in the execution time.

NOTE: In a “tokenized” BASIC program, each numeric literal is stored as the text that shows its value followed by 6 bytes containing the same value coded in binary; surprisingly enough, negative values are never stored in that binary coding -though the interpreter is quite capable of dealing with them-: the corresponding positive values are coded in binary because the negation operator is not considered part of the numeric literal.

In this context, it is worth to mention a very popular trick to reduce the size in memory of some expressions. It consists in using the VAL function and a string instead of a numeric literal. For example, VAL "32768" would replace literal 32768. The numeric literal is stored in memory with as many bytes as digits has the number plus other 6 for a hidden value that the interpreter stores there when it does its lexical analysis; on the other hand, the VAL version stores the same bytes for the digits plus only 3 more (1 byte for VAL, 2 bytes for the quotes).

The optimization tool of ZX-Basicus (-t) includes an option to perform automatically this trick in a source program (--valtrick). However, there are three important points that should be considered carefully before using the trick:

This technique saves memory… as long as there is no way of writing the numeric value without using its digits. For instance, VAL "3.1415" is not convenient since we can write PI, which is much more efficient in memory savings since it only occupies 1 byte (once PI is tokenized); furthermore, you can use NOT PI in any expression as equivalent to 0 and SGN PI as equivalent to 1: they only occupy 2 bytes (2 tokens) instead of the 7 bytes of a literal. This can be pushed even farther if we use scientific notation: you can write 4E3 instead of 4000, saving 1 byte (writing VAL "4E3" instead of VAL "4000" also saves 1 byte, but it is not worthy since it takes longer to run).
The technique increases execution time, possibly in an appreciable manner, because it turns a literal into a more complex expression. Evaluating the VAL expression requires to create a new expression environment inside the original expression, which makes things worse than in an expression not using VAL. Reading directly the numeric value is much faster. There are situations where, nevertheless, that loss of efficiency is not relevant: for example, PAUSE NOT PI is completely justified since it saves 4 bytes compared to PAUSE 0 and the additional delay produced by the evaluation of the expression NOT PI has no influence in the time the user takes to press a key.
Most programs written purely in BASIC in the ZX, that, unlike assembly programs, do not need large amounts of data for graphics and stuff, usually have enough memory in RAM for working. Therefore, you have to think carefully if the space saved by the VAL trick is really worthy. For example, if the program does not modify the channels of the system and if CLEAR has not been executed to modify the top limit of usable memory and therefore that limit is still 65367 (default one, for storing 21 UDGs after that), the BASIC memory layout will be as follows: the program will begin at the address 23755 and the last byte of the top working areas will be at 41926 at the start of the execution. That gives us 41926 – 23755 = 18171 bytes that, if filled only with the BASIC program (which is not true), leads to a capacity for 2019 empty program lines, only with their line numbers and their information of size and ending markers (this amounts to 9 bytes if the line numbers have 4 digits). Often, BASIC programs have much less than 2000 lines and they are not too long (although there are some considerations on this commented in the first post of this series).

To get a more precise idea of how longer takes an expression that uses VAL instead of numeric literals, you can test the following program:

10  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0 :  FOR f = 1 TO 10000 :  LET c =  VAL “65535” :  NEXT f :  LET T =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT “VAL: “ ; T ; ”  “ ; T * 0.02 ; ”  “ ;  ( T * 0.02 / 10000 ) 

If the program is run (around 2 minutes), then run again after changing VAL "65535" by the numeric literal 65535 only, and then you calculate the difference between both average times per loop iteration, you will see that using VAL is 15 milliseconds slower than the numeric literal, which is very slow, particularly when the expression is repeatedly evaluated (the precision in the time estimation of the program above is around +/-115.5 microseconds with 95% of probability).

The rest of this post is devoted to each of the main types of expressions that exist in Sinclair BASIC (and in any imperative programming language).

Firstly, string manipulation expressions are particularly costly for the BASIC interpreter.

String (text) expressions can also be optimized. It is little efficient to add them (concatenation), since that involves to change their size in memory (therefore displacing potentially significant amounts of memory, as it was discussed in the second post), and intermediate strings that are only used for that operation must be created / deleted dynamically. Also, comparing strings is O(n), being n the length of the shortest string to compare. We should be sure to do that only when one of the two compared terms is really short, ideally one character or none.

On the other hand, sometimes it is convenient to store numerical values as text, since that can save memory as explained in the VAL trick. An array of numbers will be shorter in memory if stored as an array of characters as long as the numbers are integers less than 10000. Notice, nevertheless, that this involves to use VAL to recover the numerical values, which increases execution time, as explained before.

Regarding logical expressions, it must be taken into account that they do not work exactly the same as in modern languages, and also that there are different expressions for some common calculations, with different efficiency.

“Logical” expressions produce only two values, expected to be semantically equivalent to “true” and “false”. In Sinclair BASIC these expressions are formed with the NOT, AND and OR operators, and the last two are slightly more powerful than the common ones, as explained in the next section. The important point for efficiency is that the interpreter does not do lazy evaluation in logical expressions, or, more concretely, short-circuit evaluation: both operand are evaluated before evaluating the operators AND and OR, when, actually, the evaluation of the second could be saved if the first is evaluated to 0 or 1, respectively. That increases innecessarily the execution time, thus we should be careful to use just very simple expressions in the second terms of those operands.

Besides, there are some common logical calculations that can be implemented with diverse expressions, thus it is convenient to choose the one more efficient for our purposes. In the next table we list some of these expressions, used by the programmer @igNaCoBo in his game ArkanoidB2B:

Calculation	Expression	T (ms)	Time gain (microsecs = us)
Is `g` different from `0`?	`g <> 0`	38,38
	`abs g`	37,78	600 us faster than `g<>0`
	`sgn g`	37,76	620 us faster than `g<>0`
	`g`	37,04	1,34 milisecs faster than `g<>0`
Is `g` equal to `0`?	`g = 0`	37,60	780 us faster than `g<>0`
	`not g`	37,12	480 us faster than `g=0`
Is `g` greater than `0`?	`g > 0`	37,60	780 us faster than `g<>0`; same speed as `g=0`
Is `g` smaller than `0`?	`g < 0`	37,80	580 us faster than `g<>0`; 200 us slower than `g=0`
Is `g` equal to `1`?	`g = 1`	37,68	80 us slower than `g=0`
Is `g` equal to `-1`?	`g = -1`	38,40	800 us slower than `g=0`

The times in the table have been obtained using the program below. In the table, the “T” column shows the time per iteration (lines 10 to 50); the italic gains are just interesting results, maybe not very useful for optimization.

1  CLS  :  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0
10  FOR f =  – 499 TO 500
20  LET h = f + 499 :  LET g = h –  INT  ( h / 3 )  * 3 – 1 :  PRINT  AT 0 , 0 ; g ; ” “ ; 
30  IF g <> 0 THEN  PRINT “T “ :  GOTO 50
40  PRINT “F “
50  NEXT f
60  LET t =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT “total: “ ; t ; ” s (“ ;  ( t / 60 )  ; ” m)” ‘ “iter time: “ ;  ( t * 0.02 / 1000 )  ; ” secs”

This program has been carefully designed for evaluating the expression of interest for cases where g < 0, g = 0 and g > 0. The code does 1000 iterations of a loop where that expression is evaluated (and other things happen too); if we divide the total time by 1000 we get an estimation of the expected time per iteration, that can then be compared with the one of the same program when we change the expression by another one (last column in the table)

Finally, numerical expressions offer relevant possibilities for optimization especially if they are integer ones that fit in 16 bits.

The ROM interpreter uses a sophisticated software also included in ROM called “the calculator”. That calculator distinguish two kinds of numeric values: integers and reals (represented as floating point reals). The former are much more efficient to work with than the latter, particularly those in the range -65535 to +65535; therefore, it is not convenient to use real numbers in FOR loops (for instance, as STEP arguments) if those loops need to be fast (by the way, the step and limit of a FOR loop are evaluated just once and stored along with the loop variable, thus the cost of repeatedly evaluating them is saved). You should not call trigonometric functions within critical code either. An alternative to the latter would be to represent the real numbers with fixed point, i.e., to assume that a number of digits of the integer actually represent the digits after the decimal point in the real. In that manner, the program always work with integers, and only convert them into reals when they need to be shown in the screen or the like. For example, 255 can have the meaning of 2.55 in the program. Additions and substractions do not change the place of the decimal point when using fixed point arithmetics; only multiplications and divisions do. A different alternative is to pre-calculate the values of real expressions needed in the program and store those values in an array; that array can be loaded from tape at the beginning. This is often done for cosine and sine tables.

When we need to get an integer number from a real number we can use the INT function, but you should know that it rounds the number towards 0, that is, we lose all the decimal part of the number. If you need to round towards infinity instead (get the smaller integer that is equal or greater than the real value) you can leverage the behaviour of the interpreter, because it does exactly that kind of rounding every time a statement that needs an integer parameter is provided with a real argument. For instance, if we do LET a = INT 10.6 we will get 10, but if we do POKE 40000,10.6 the value that we get with PEEK 40000 will be 11. Anyway, take into account that the time spent in doing INT(10.6 + 0.5) for obtaining the same rounding toward infinity can be shorter: in the case of using POKE/PEEK as before, INT will spend around 3 milliseconds less. The same occurs with the memory footprint of both alternatives.

A particular case of integer number expression that can be accelerated in BASIC is the decomposition of 16 bits integers into their high and low 8-bits parts. This comes handy in a number of situations, but since the ZX Spectrum BASIC has no function that does that, a quite inefficient code should be written: LET h = int(v/256) : LET l = v - h * 256. Luckily, there is a way of doing this decomposition faster by taking advantage of ROM routines devised for other purposes: RANDOMIZE v : LET l = PEEK 23670 : LET h = PEEK 23671. This produces the same result because RANDOMIZE does exactly the desired decomposition and stores it in the system variable SEED, located at memory addresses 23670 y 23671. Take care, though, that we are changing the seed for the random numbers that will be generated for the program from now on, and also that you cannot use this trick when v is 0, because in that case RANDOMIZE does not store that value into SEED.

Another special case of expression with numeric values occurs when we wish to increment a variable cyclically, that is, to assign increasing values from 1 to n and then to 1 again. This can be done without IF in this way: LET v = v + 1 OR v = n , where, in addition, we have used the operator evaluation precedences to avoid the writing of parentheses. This trick is based on the behaviour of the OR operator in Sinclair BASIC, that is not a logical OR or a bitwise OR. There exists a variant with AND that allows us to scan cyclically from 0 to n-1: LET v = v + 1 AND v < n - 1, maybe more natural for programmers of languages where array indexes begin at 0.

We can measure the time gain of these cyclic increments of integer variables with this program:

1  CLS  :  LET v = 1 :  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0
10  FOR f = 1 TO 1000
20  LET v = v + 1 :  IF v = 11 THEN  LET v = 1
30  PRINT  AT 0 , 0 ; v ; ”  “
50  NEXT f
60  LET t =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT “total: “ ; t ; ” s (“ ;  ( t / 60 )  ; ” m)” ‘ “iter time: “ ;  ( t * 0.02 / 1000 )  ; ” secs”

If we run it as it is and then change line 20 by:

20  LET v = v + 1 OR v = 10

and run it again, we get a difference of 540 microseconds in favor of the second version. This acceleration can be important (running just twice the second version of line 20 saves more than 1 millisecond!). On the other hand, if we do the same with the cyclical increments from 0 to n-1 we save 420 microseconds with respect to using IF. We can also deduce from this experiment that counting (cyclically) from 0 to n-1 in this fast way is 540 microseconds faster than counting (fast, cyclical) from 1 to n.

Finally, do not forget that every expression containing some part that do not vary should be re-written with that part already evaluated, saving in this way effort that the interpreter should do otherwise at running time (unlike modern compilers, the interpreter of the ZX does not do such pre-optimization automatically). This also saves space in RAM. For instance, do not write A * PI / 180 to convert the degrees in A to radians; it is much more efficient to write A * 0.017453293, even when it is uglier and not suitable for modern programming.

4 thoughts on “Efficient BASIC coding for the ZX Spectrum (III)”