Efficient BASIC coding for the ZX Spectrum (V)

[Click here to read this in English ]

Éste es el quinto y último de una serie de artículos que explican los fundamentos de la (in)eficiencia de los programas en BASIC puro para el ZX Spectrum:
I. Sobre los números de línea
II. Sobre las variables
III. Sobre las expresiones
IV. Funcionalidades diversas y medida del tiempo
V. Operaciones en la pantalla basadas en caracteres

En esta última entrega hablaremos sobre la pantalla del ZX Spectrum y de cómo acelerar operaciones de caracteres en ella cuando se programa en Sinclair BASIC. No hablaremos aquí de cómo dibujar píxeles (PLOT, DRAW, …); para ello, se puede consultar la entrada anterior. Asimismo, el truco del DEFADD para mover bloques de memoria (útil también para trabajar en la pantalla) se describió en aquella entrada.

Para navegar más fácilmente por la presente entrada, éstos son los apartados que contiene:

La pantalla y la escritura de caracteres. Sobre la memoria de la pantalla y por qué la eficiencia de la escritura de caracteres fue tan importante para diseñar su estructura.
Escribir caracteres. Sprites en BASIC, caracteres de control, compresión de pantallas (con la participación de @igNaCoBo).
Leer caracteres. Tiempos de ejecución de SCREEN$ y ATTR.
Escribir texto ampliado. Cómo escribir texto a gran tamaño con el truco de LPRINT (en colaboración con @IvanBASIC).
Escribir atributos de color. Cómo manipular los atributos de color independientemente de sus caracteres con el truco de LPRINT (en colaboración con @IvanBASIC).

La pantalla y la escritura de caracteres

Aunque en BASIC del Spectrum no es habitual escribir / leer directamente de la memoria de pantalla, es importante conocer cómo se organiza dicha memoria para entender algunas técnicas que pueden acelerar los programas (como el truco de LPRINT que explicamos más adelante, o los límites que tiene el truco del DEFADD que explicamos en la entrada anterior al usarlo en la pantalla). En entradas más antiguas de este mismo blog (como ésta y ésta) ya explicamos algunas de las características de los gráficos en el ZX; aquí nos centramos en por qué tienen la organización en memoria que tienen y qué operaciones son más eficientes con ella.

La primera decisión importante que se tomó durante el diseño de los gráficos del ZX fue asumir que era imprescindible escribir texto en la pantalla, y que, para simplificar el software que se encargara de ello (puesto que no hay hardware dedicado en el ZX a esas funciones y por tanto las tiene que hacer la CPU), los caracteres de texto serían de tamaño fijo: 8 x 8 píxeles, concretamente.

La segunda decisión fue que, dado que el ZX Spectrum, al contrario que su predecesor, el ZX 81, debía desplegar un gran colorido para hacer honor a su nombre, había que asignar más de 2 tonos (blanco y negro) a los píxeles. Lamentablemente, hacer eso para cada píxel por separado disparaba el coste de fabricación prohibitivamente debido a la cantidad de memoria necesaria. Lo más sensato era ajustarse a lo que mínimamente necesitara cada carácter de texto dibujado en pantalla, es decir, tomar bloques de 8 x 8 píxeles como granularidad espacial para el color, y dedicar a cada uno de esos bloques la mínima cantidad de memoria necesaria para almacenar los colores de dichos píxeles asumiendo que en muchísimas ocasiones corresponderían a los de un carácter de texto.

Eso llevó a que, por cada bloque de 8 x 8 píxeles de pantalla, se almacenara sólo 1 byte como “atributo” de color de esa “celda”, tal y como se explica aquí y se ilustra en la figura de abajo. El carácter visible en la televisión se obtiene usando dicho atributo sobre un “mapa de bits” compuesto por 8 bytes (a la izquierda en la figura) cuyos bits indican, a 0, que se use para ellos el color de papel, y a 1 que se use el de tinta.

Por tanto, la memoria de pantalla del ZX Spectrum acabó separada en dos partes almacenadas en zonas de la RAM distintas pero relacionadas entre sí, que eran leídas periódicamente por el hardware (concretamente, por la ULA) para refrescar la imagen mostrada en la televisión:

El mapa de bits, donde están las “formas” o, si se prefiere ver así, los dibujos en “blanco y negro” de todos los caracteres que caben en pantalla (24 líneas por 32 columnas de caracteres). Empieza en la dirección 16384 (justo tras finalizar la ROM). Ocupa 6144 bytes (24 líneas de caracteres x 8 píxeles de alto por carácter = 192 filas de píxeles, con 32 columnas de caracteres de 1 byte cada una).
El mapa de atributos, donde están los atributos de color de dichos caracteres. Empieza justo tras el mapa de bits, en la dirección 22528. Ocupa 768 bytes en memoria (24 líneas por 32 columnas de atributos de color, a 1 byte por atributo).

La tercera decisión es la que más nos importa en esta serie de artículos, porque es la que lleva a la (in)eficiencia de acceder a la memoria de pantalla para trabajar con cosas más grandes o más pequeñas que un carácter. Como la CPU, o sea, el software, tenía que ocuparse de todas las labores de escritura en la misma, especialmente de la escritura de caracteres, había que encontrar la forma de hacer eso lo más eficientemente posible. ¿Qué es lo más costoso? Escribir los 8 bytes del mapa de bits de un carácter (¡su atributo de color es sólo 1 byte!). ¿Y qué cálculos son los más frecuentes cuando se tienen que escribir en memoria los 8 bytes del mapa de bits de cada carácter? Básicamente dos, que etiquetamos (Y) y (X) por motivos que quedarán claros más adelante:

(Y) calcular la dirección de memoria donde hay que almacenar el siguiente byte del mapa de bits en la celda de un carácter (o sea, moverse 1 fila de píxeles hacia abajo en el mapa de bits).
(X) calcular la dirección de memoria donde hay que almacenar el mapa de bits del siguiente carácter (o sea, moverse a la derecha 1 columna en el mapa de bits).

Lo primero hace falta para almacenar los 8 bytes del mapa de bits de un carácter concreto en la memoria de pantalla de forma que lo pueda mostrar la ULA en la televisión, y lo segundo para pasar a almacenar el siguiente.

Ambas operaciones suponen incrementar direcciones de memoria, que son números enteros positivos. Las instrucciones máquina más rápidas de la CPU Z80 para incrementar números son las INC, que aumentan un número entero en una unidad; son especialmente rápidas cuando lo que tienen que incrementar es un número que quepa en 8 bits. O sea, que si se consiguiera la acción (Y) incrementando en 1 un número de 8 bits y la acción (X) incrementando en 1 otro, la impresión de texto en pantalla sería lo más rápida posible.

Pues bien, una dirección de memoria en el Z80 es un número entero positivo de 16 bits, y para el Z80 acceder independientemente a los dos bytes que componen tales números es prácticamente trivial; de hecho, se pueden considerar a casi todos los efectos que estos números están divididos en el byte más significativo o “alto” y el menos significativo o “bajo”. Así que nos encontramos con dos operaciones independientes de incremento de 8 bits que el Z80 puede hacer rápidamente con direcciones de memoria: incrementar el byte alto e incrementar el byte bajo. Si la primera operación consiguiera bajar a la fila de píxeles siguiente dentro del mapa de bits de un carácter y la segunda a la columna de la derecha, donde debe ir el siguiente carácter, estaría todo solucionado.

Eso es exactamente lo que hicieron en la memoria de mapa de bits del ZX.

Con este diseño, tenemos dos cursores que podemos mover independientemente para situarnos dentro del mapa de bits de la pantalla: uno en horizontal (columnas de caracteres; se mueve incrementando en 1 el byte bajo de la dirección de memoria; podemos movernos en 32 valores diferentes, que necesitan 5 bits para representarse) y uno en vertical (filas de píxeles; se mueve incrementando en 1 el byte alto de la dirección de memoria; podemos movernos en 192 valores diferentes, que necesitan 8 bits para representarse). Algo así:

Ahora bien, con este diseño estamos usando sólo 5 bits del byte más bajo de la dirección de memoria de pantalla para movernos según la acción (X), por lo que hay 3 bits en ese byte bajo sin utilizar, lo que provocará “huecos” en memoria de pantalla (direcciones que nunca usaremos), lo cual, obviamente, hay que evitar. Además, si usáramos este diseño el primer byte del mapa de bits de la pantalla estaría en la dirección 0 de memoria, que ni siquiera es RAM…

La solución a este problema fue trasladar parte de los bits del cursor vertical, alojado inicialmente en el byte alto de la dirección de memoria, hasta el byte bajo (tenemos sitio para alojar 3 bits). No se pueden trasladar los bits menos significativos del byte alto, porque son los que permiten hacer la acción (Y) (¡no los vamos a quitar de ahí!), así que había que trasladar los más significativos. Eso tiene como efecto lateral liberar de todo uso los bits 15, 14 y 13 del byte alto de la dirección de memoria; si se fijan al valor 010 (binario), la primera dirección de memoria del mapa de bits de pantalla, donde los dos cursores son 0, será 16384 exactamente (todas las direcciones del mapa de bits de memoria tienen los 3 bits más significativos con ese mismo valor). Este diseño quedaría así:

Al mover los 3 bits más significativos del cursor vertical (es decir, sus bits 5, 6 y 7) al byte más bajo de la dirección de memoria, nos quedaríamos en el byte alto de la dirección de memoria con 5 bits útiles (además de los constantes 010). Podríamos incrementar 32 veces el número alojado en esos 5 bits antes de saturar su valor, lo que significaría que, haciendo la acción (Y), podemos cubrir 4 filas de caracteres de texto antes de saturar y por tanto tener que tocar el byte bajo de la dirección de memoria (los 3 bits que nos llevamos allí) para seguir.

El problema es que, una vez saturados los 32 posibles valores que tiene el cursor horizontal (X), pasaríamos a cambiar los bits más altos del vertical (Y7, Y6, Y5), lo que haría que, tras incrementar en uno la última columna horizontal (la 31), saltáramos 4 líneas de caracteres más abajo en vertical, lo que es bastante poco útil y extraño.

Lo suyo sería que, al incrementar la última columna horizontal de la pantalla, el cursor vertical pasara a la siguiente línea de texto. Para lograr esto, los diseñadores del sistema decidieron mover los bits 3, 4 y 5 del cursor vertical hasta el byte bajo de la dirección de memoria, en lugar de los 5, 6 y 7. De esa manera, se satura el cursor vertical tras hacer la acción (Y) sólo 8 veces (que es suficiente para acceder a todos los bytes del mapa de bits de 1 carácter), pero conseguimos el paso más natural a la siguiente línea de caracteres cuando lleguemos a la última columna:

Este diseño tiene un efecto algo inesperado: si consideramos qué partes de la pantalla corresponden a bloques de caracteres contiguos en memoria, estamos dividiendo el mapa de bits en 3 secciones contiguas de 8 filas de caracteres cada una, correspondientes a los 3 valores que pueden albergar los bits 6 y 7 del cursor vertical (no pueden llegar a valer 11 en binario pues sólo hay 192 filas de píxeles en pantalla, no 255). Esto es lo que produce que, cuando se cargue una pantalla desde cinta, vaya apareciendo ésta con las filas de píxeles distribuidas de forma tan extraña.

También tiene algunos inconvenientes importantes. Para empezar, requiere cómputos complicados para operaciones que no sean (X) ni (Y), como explican por ejemplo aquí. Asimismo, limita el poner bloques de caracteres de dimensiones medianas en pantallas y el hacer scroll con el truco del DEFADD explicado en la entrada anterior. Sin embargo, considerándolo todo en conjunto, se ve que las ventajas (escribir texto en pantalla lo más rápidamente posible) superan razonablemente a estos inconvenientes.

Se puede visualizar todo lo explicado aquí con este programa BASIC tan sencillo, que escribe en el mapa de bits valores uno detrás de otro, con lo que se observa claramente cómo van repartiéndose por la imagen; además, también muestra cómo el mapa de atributos es lineal y mucho más simple de entender:

10  BORDER 1 :  CLS 
20  REM *** FILL ATTRMAP ***
30  FOR b = 22528 TO 23295 :  LET indbl =  INT  (  ( b – 22528 )  / 256 )  :  POKE b ,  ( indbl + 2 )  * 8 +  ( indbl + 5 )  :  NEXT b
40  REM *** FILL BITMAP ***
50  FOR a = 16384 TO 22527
60  POKE a , 255
70  NEXT a
80  BEEP 1 , 1 :  PAUSE 0

Escribir caracteres []

El lenguaje BASIC del ZX Spectrum tiene instrucciones de escritura de caracteres en pantalla que están implementadas en ROM y ocultan al programador los detalles explicados en el apartado anterior. Sin embargo, es importante acelerar estas operaciones, porque son esenciales en casi cualquier programa.

Si tenemos que imprimir, por ejemplo, un sprite que tenga varios caracteres de alto y de ancho (típicamente definidos por el usuario, o UDGs), no parece conveniente hacer dos bucles FOR anidados para ello, con todo el cómputo extra, saltos y espacio en memoria de programa que eso supone. Sería mucho mejor usar dentro del sprite caracteres de control que muevan el cursor de impresión automáticamente conforme el sprite se imprime. De hecho, la ROM del ZX imprime este tipo de comandos de control empotrados en una cadena de texto a una velocidad muy superior a la que haría el programa mediante sentencias explícitas equivalentes a los mismos.

Ay, lamentablemente el intérprete del ZX no implementa bien todos los caracteres de control de movimiento direccional (las flechas). Los controles de izquierda y derecha tienen algunos bugs, y los de arriba y abajo no funcionan. Sólo si queremos pintar siempre en el mismo lugar absoluto de pantalla podremos hacerlo rápidamente usando el carácter de control AT (22). Para el resto de casos, solventar el problema de la lentitud de los bucles que dibujan el sprite sólo puede hacerse con técnicas como el desenrollado de bucles, explicada en la primera entrega de esta serie.

Aquí dejo un ejemplo de programa que, al ejecutar, ilustra algunos problemas que tiene el intérprete de la ROM con los caracteres de control direccionales:

10  BORDER 1 :  CLS 
20  PRINT  AT 10 , 10 ;  PAPER 5 ; “0” ;  PAPER 7 ; 
25  PAUSE 0
30  PRINT  CHR$ 9 ; “R” ; 
35  PAUSE 0
40  PRINT  CHR$ 8 ;  CHR$ 8 ; “L” ; 
45  PAUSE 0
50  PRINT  CHR$ 9 ;  CHR$ 11 ; “U” ; 
55  PAUSE 0
60  PRINT  CHR$ 10 ;  CHR$ 10 ; “D” ; 
65  PAUSE 0
70  PRINT  CHR$ 13 ; “ENTER”
75  PAUSE 0

Lo que sí resulta práctico incluir en el sprite son códigos de control de color, para que se imprima con los atributos que deseemos sin tener que usar INK, PAPER, etc., dentro de la sentencia PRINT, lo que, además de ocupar bastante espacio, ejecutaría más lentamente. Esto tiene el inconveniente de estropear la legibilidad del código fuente cuando éste se visualiza mediante el editor de código original del ZX, sin embargo.

Para estimar lo que se tardaría en dibujar un sprite de ciertas dimensiones, se puede ejecutar el siguiente programa:

5  LET s$ = “12345678901234567890” :  LET n = 100
10  FOR x = 32 TO 16 STEP  – 1
11  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0
15  FOR f = 1 TO n
20  FOR y = 0 TO 32 – x :  PRINT  AT y , x – 1 ; s$ ( 1 TO 32 – x + 1 )  :  NEXT y
21  NEXT f
25  LET T =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT  AT 32 – x , 0 ;  ( 32 – x + 1 )  ; ” “ ;  ( T * 0.02 / n ) 
30  NEXT x

Este programa repite la impresión en pantalla de bloques cuadrados de caracteres de diversas dimensiones, tomando nota del tiempo que lleva para cada tamaño posible. Los tiempos medios resultantes se muestran en la siguiente gráfica:

El comportamiento se puede dividir en tres componentes aditivos: el principal, claramente cuadrático (470 microsegundos por el número total de caracteres a imprimir), viene de la impresión en sí misma; el secundario, lineal (17 milisegundos por la altura en caracteres del sprite) es producido por el bucle de la línea 20, encargado de imprimir cada fila de caracteres; el último, constante (el offset vertical de unos 12 milisegundos), se debe al resto de trabajo en bucle, evaluación de expresiones, etc., que es siempre igual para cualquier tamaño de sprite.

Como vemos en la gráfica, en general sólo se podrá imprimir 1 carácter, si es que queremos hacerlo a una frecuencia esperada de 25 fps o superior (si queremos borrar el sprite, bajará la frecuencia a la mitad); si vamos a sprites de 2×2 (sin códigos de control de color, lo cual aumentaría el tamaño), sólo esperaríamos alcanzar los 20 fps; algo menos de 8 fps con sprites de 6×6 (equivalentes a sprites de 2×2 con colores añadidos de tinta en cada carácter); y así sucesivamente. En la práctica, sprites de más de 2×2 con colores incluidos consumen más tiempo del disponible en la mayoría de las situaciones.

Nótese que los datos de la gráfica no sólo sirven para estimar cuánto tiempo llevará imprimir un sprite cuadrado de media, sino que se pueden usar para cualquier conjunto de caracteres que queramos imprimir, tenga las dimensiones que tenga. Por ejemplo, imprimir una sola línea horizontal de 32 caracteres (sin controles de color) tendría como tiempo principal esperado 470 x 32 = 15.04 milisegundos si no consideramos los tiempos de las operaciones constantes (expresiones y demás), pues equivaldría a no hacer bucles de impresión de varias líneas, ahorrándonos la componente lineal. Otro ejemplo sería imprimir toda la pantalla (digamos, 24 x 32 caracteres, contando un número razonable de códigos de control entre ellos) con una sola orden PRINT, lo que esperaríamos que llevara principalmente 470 x 24 x 32 = 360.96 milisegundos o, si se hace línea a línea con un bucle FOR, 0.470 x 24 x 32 + 17 x 24 = 768.96 milisegundos.

Hay una pequeña mejora en la impresión de caracteres que el programador de juegos BASIC @igNaCoBo ha utilizado en su juego ArkanoidB2B: consiste en terminar la orden PRINT con un punto y coma (;) con el fin de evitar que el intérprete realice un cambio de línea, como hace por defecto a menos que encuentre ese símbolo al final. Hacer el cambio de línea le lleva más tiempo que interpretar el punto y coma en su lugar. Concretamente, se puede modificar la línea 20 del programa BASIC anterior para que la sentencia PRINT termine en punto y coma y medir los tiempos (más cortos) que lleva imprimir caracteres de esa manera. Las diferencias de tiempo con el programa original se observan en las siguientes gráficas:

Vemos que, de media, una sentencia PRINT tarda 364 microsegundos más en hacer el cambio de línea que en procesar el punto y coma (histograma inferior). Esto puede no parecer mucho, pero si se repite la sentecia varias veces (como sucede al imprimir sprites) la ganancia se acumula linealmente, como se ve en la figura superior (la figura de en medio muestra la misma información que el histograma pero para cada tamaño de sprite probado).

Además de esta pequeña mejora en el tiempo de cómputo, se nos ofrece una oportunidad particular de incremento de eficiencia cuando lo que se imprime es una secuencia de espacios contiguos: si éstos son más de 3, y el último se sabe en qué columna debe situarse, es conveniente utilizar el carácter de control TAB, que automáticamente imprime espacios hasta alcanzar la columna indicada. La herramienta de análisis de ZX-Basicus (-a) puede ayudar con esto porque localiza la lista de literales de texto que tienen espacios contiguos.

También puede servir para esto el carácter de control COMMA, que inserta espacios automáticamente hasta alcanzar la siguiente mitad horizontal de la pantalla, respecto a la posición actual del cursor.

Por supuesto, no se debe borrar toda la pantalla si no es con CLS, o rellenarla si no es con un sólo literal de texto de 32*24 caracteres de longitud (es decir, que ocupe todo el área visible, o, al menos, todo lo que se quiera imprimir). Este puede estar optimizado usando TAB o COMMA si contiene espacios contiguos.

Al hilo de lo expuesto en este último párrafo, hay un truco para rellenar toda la pantalla con cualquier carácter de manera rápida: consiste en redefinir de manera temporal el carácter espacio para el sistema; cuando se escriban TABs o COMMAs suficientes, el intérprete imprimirá en pantalla no espacios, sino el carácter que hayamos definido. Por ejemplo, para rellenar toda la pantalla con ladrillos muy rápidamente basta ejecutar este programa (las 44 comas que hay en la línea 30 son las que rellenan las 22 líneas de la pantalla con espacios; los POKE en 23606 y 23607 le dicen al sistema que use un juego de caracteres nuevo cuyo primer carácter, es decir, el de espacio, está en el UDG “A” que definimos en la línea 20):

10  BORDER 0 :  PAPER 7 :  INK 0 :  CLS 
20  RESTORE 20 :  FOR f = 0 TO 7 :  READ b :  POKE  USR “a” + f , b :  NEXT f :  DATA 255 , 8 , 8 , 8 , 255 , 128 , 128 , 128
30  LET d =  USR “a” – 256 :  RANDOMIZE d :  POKE 23606 ,  PEEK 23670 :  POKE 23607 ,  PEEK 23671 :  PRINT  INK 6 ;  PAPER 2 ;  AT 0 , 0 ;  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ;  :  PAUSE 0 :  CLS  :  POKE 23606 , 0 :  POKE 23607 , 60

Finalmente, el sintetizador de código que incluye la herramienta ZX-Basicus es capaz de comprimir pantallas (o partes de ellas) usando los trucos explicados anteriormente (opción --comprscr). Esta utilidad genera automáticamente los datos binarios de las pantallas (definiciones de conjuntos de caracteres necesarios, datos para la definición de variable que contendrá la pantalla por medio del truco del DEFADD descrito en la entrada anterior) y el código BASIC que puede mostrarlas.

Leer caracteres []

En cuanto a examinar el contenido de la pantalla con ATTR (mapa de atributos) y SCREEN$ (mapa de bits), la verdad es que sus implementaciones en la ROM son todo lo eficientes que pueden ser, por lo que, siempre que sus argumentos no sean expresiones de gran complejidad, no son demasiado acelerables.

Sí que se debería usar ATTR preferentemente, que es más rápida porque no tiene que reconocer el carácter que se examina a partir de su bitmap en pantalla. Esto puede comprobarse usando el siguiente programa, que tarda unos 2 minutos en ejecutarse y produce una medida de tiempo:

10  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0 :  FOR f = 1 TO 10000 :  LET c$ =  SCREEN$  ( 11 , 11 )  :  NEXT f :  LET T =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT “SCREEN$: “ ; T ; ”  “ ; T * 0.02 ; ”  “ ;  ( T * 0.02 / 10000 ) 

Si lo ejecutamos una vez con SCREEN$ y otra igual pero con ATTR, las diferencias entre ambos tiempos serán exactamente las que tengan ambas formas de comprobar la pantalla. Los resultados son que, de media, SCREEN$ es unos 2.6 milisegundos más lenta que ATTR (la precisión de las medidas de tiempo del programa es alta: las medias obtenidas pueden variar, con alrededor del 95% de probabilidad, en +/-115.5 microsegundos de lo mostrado).

Escribir texto ampliado []

Este apartado ha sido elaborado en colaboración con @IvanBASIC, experto programador de juegos en BASIC puro para el ZX Spectrum, como Pedro Pómez, Brain 8 o Rompetechos (aquí pueden descargarse), en los que ha exprimido al máximo éstos y otros trucos.

Como se ha explicado en el primer apartado de esta entrada, el ZX está diseñado para escribir caracteres en pantalla cuyos mapas de bits sean de 8 x 8 píxeles. Sin embargo, a veces es interesante poder escribir con caracteres más pequeños o más grandes. En el primer caso se requiere código en ensamblador, pero el segundo se puede resolver hasta cierto punto en BASIC utilizando la sentencia LPRINT, diseñada originalmente para enviar datos en serie a la impresora ZX.

Entre otras muchas cosas admirables, el ZX dispone de un sistema de comunicación serie universal capaz de manejar muy diversos dispositivos, basado en el concepto de “corrientes” (streams) y “canales” (channels). En resumen (aquí se puede estudiar más a fondo), los canales son análogos a puertos de comunicación con dispositivos hardware junto con sus rutinas de entrada/salida asociadas, y las corrientes son conexiones software que se abren y cierran con esos canales. Se dispone de 16 corrientes (de la #0 a la #15) asignables a 4 canales predefinidos (‘S’ -> parte superior de la pantalla, ‘K’ -> parte inferior de la pantalla y teclado, ‘P’ -> impresora ZX, ‘R’ -> uso interno de la ROM); esta lista de canales puede ampliarse conectando dispositivos hardware externos.

Cuando un ZX Spectrum 16K / 48K arranca, las siguientes corrientes están abiertas: #0 y #1 contra el canal ‘K’, #2 contra el ‘S’, y #3 contral el ‘P’ (el ‘R’ no es accesible desde BASIC). Así, PRINT #0 o PRINT #1 enviarán sus argumentos hacia la parte baja de la pantalla; PRINT #2 hacia la alta (un PRINT normal, por defecto, usa la corriente #2); y PRINT #3 hacia la impresora, lo que puede también conseguirse con la sentencia sinónima LPRINT.

Cada vez que LPRINT tiene una línea de texto que enviar a la impresora (32 caracteres como máximo, al igual que la pantalla), la rutina de salida de datos asociada al canal ‘P’ procesa el mapa de bits de dicha línea de texto (la impresora no puede imprimir colores, así que los atributos se ignoran) y deja los datos procesados en un buffer de memoria en la RAM, de donde supuestamente la impresora los cogerá. Como una línea de texto puede tener un mapa de bits de, como máximo, 8 filas de píxeles por 32 columnas de caracteres, el buffer de impresión tiene un tamaño de 8 x 32 = 256 bytes. Si se imprimen menos caracteres, se usa una porción menor del mismo.

La primera cuestión es cómo se guarda en ese buffer el mapa de bits del texto a imprimir. La respuesta es: primero se almacena en el buffer la primera fila de píxeles (la más alta) del mapa de bits del texto; 32 bytes más adelante del comienzo de eso, se almacena la segunda fila de píxeles; y así sucesivamente hasta terminar las 8 filas de píxeles. Si el texto es más corto de 32 caracteres, las columnas no usadas en el buffer no se escriben (se saltan).

La segunda cuestión es cómo se puede redirigir el proceso de esa rutina de la ROM para que guarde los datos en otro lugar (por ejemplo, en la memoria de la pantalla). Esto es muy fácil de hacer en BASIC, ya que hay una variable del sistema, PR-CC, situada en las direcciones de memoria RAM 23680 y 23681, que almacena dónde comienza el buffer de impresión (byte bajo y alto del comienzo del mismo, respectivamente). Inicialmente, esa variable contiene los valores 0 y 91, que forman la dirección de memoria 23296, pues 0 + 91 * 256 = 23296, pero se pueden cambiar con POKE sin problema. CUIDADO: después de cada sentencia LPRINT, el contenido de PR-CC es devuelto por el sistema automáticamente a la dirección original, 23296 (lo que vuelve a situar el buffer de impresión justo después del mapa de atributos de la pantalla y justo antes de la zona de variables del BASIC, como se puede leer en el manual el ZX Spectrum 16K / 48K).

El truco de LPRINT para escribir caracteres ampliados verticalmente se basa en redirigir adecuadamente el buffer de impresión hacia el mapa de bits de pantalla antes de ejecutar la sentencia. Debido a la organización del mapa de bits de pantalla y a cómo se escriben los datos en el buffer de impresión, si se redirige el buffer hacia la primera fila de píxeles de alguna celda de la pantalla, se conseguirá escribir el texto en pantalla dejando espacios de 7 filas de píxeles intercalados, tal y como muestra este programa tan sencillo:

Nótese que en la línea 10 del programa se sitúa el buffer de impresión en la dirección de memoria 0 + 72 * 256 = 18432, que corresponde con el comienzo de la segunda sección de las tres que componen el mapa de bits de la pantalla, como se explicó en el primer apartado de esta entrada.

El programa anterior ilustra el hecho de que LPRINT envía la primera fila de píxeles del mapa de bits del texto a la dirección a la que apunta el buffer, luego suma 32 bytes a la dirección donde comenzó a escribir esa fila y escribe la segunda fila de píxeles del texto, y así sucesivamente; estos saltos en el buffer de impresión suponen saltos de 8 filas de píxeles en vertical en la pantalla, es decir, va saltando cada vez a la siguiente línea de caracteres.

Por tanto, si repetimos LPRINT con el mismo texto pero situando el buffer de impresión en cada una de las 8 direcciones del mapa de bits de la primera celda, o, equivalentemente, sumamos 1 cada vez al byte más alto de la dirección del mapa de bits de pantalla (la acción (Y) que explicamos en el primer apartado de esta entrada), conseguiremos el efecto de “caracteres estirados”.

El siguiente programa hace exactamente eso, cambiando antes de cada repetición de LPRINT la dirección del buffer mediante incrementos de 1 en su byte alto, almacenado en la variable del sistema PR-CC (en 23681). Nótese cómo los caracteres ampliados tienen 8 veces el tamaño vertical original y 1 vez el tamaño horizontal:

Por supuesto, podemos imprimir en otra columna redirigiendo el buffer hacia la primera fila de píxeles de una celda diferente por medio de la acción (X), es decir, incrementando el byte bajo de su dirección (el que almacenamos en el byte bajo de PR-CC, en 23680):

Si durante la impresión cubrimos con el buffer columnas iguales o superiores a la 32, lo que resulta del truco de LPRINT es algo parecido a una rotación dentro del bloque de los caracteres a imprimir, pues aquellos bytes que se salgan “fuera” pasarán a la izquierda de la pantalla y algo más bajos, como se puede observar ejecutando este programa que establece el byte bajo de la dirección del buffer de impresión a algunos valores superiores a 31 (variable b):

5  FOR b = 10 TO 255
10  FOR a = 72 TO 79
20  POKE 23680 , b :  POKE 23681 , a
30  LPRINT ” A B C D “
35  NEXT a :  PAUSE 10 :  NEXT b

Para terminar este apartado, nótese que la velocidad de ejecución de este truco es similar a la de PRINT pero multiplicada por 8. De hecho, si imprimimos 8 veces el mismo texto con cualquiera de las dos sentencias no observamos diferencia, como muestra este programa, que escribe un texto de longitud 9 y mide 0.18 segundos en ambos casos, es decir, tarda unos 20 milisegundos por carácter (incluyendo el tiempo de ejecución del FOR):

1  LET t0 =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674
10  FOR a = 72 TO 79
20  POKE 23680 , 0 :  POKE 23681 , a
30  PRINT  AT 10 , 0 ; ” A B C D “
35  NEXT a
40  LET t1 =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674
50  PRINT  AT 0 , 0 ;  ( t1 – t0 )  * 0.020 ; ” secs for PRINT”
101  LET t0 =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674
110  FOR a = 72 TO 79
120  POKE 23680 , 0 :  POKE 23681 , a
130  LPRINT ” A B C D “
135  NEXT a
140  LET t1 =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674
150  PRINT  AT 1 , 0 ;  ( t1 – t0 )  * 0.020 ; ” secs for LPRINT”

Esta velocidad de impresión puede mejorarse ligeramente usando el mismo truco del punto y coma que comentamos en el apartado de impresión de caracteres, lo que me ha señalado también @igNaCoBo. Si ponemos un punto y coma al final de la sentencia LPRINT, ésta no hace el trabajo de cambiar de línea, que en su caso consiste en vaciar el buffer de impresión, por lo que mejora sus tiempos. El buffer será reescrito con el siguiente LPRINT.

Escribir atributos de color []

Este apartado ha sido elaborado en colaboración con @IvanBASIC, experto programador de juegos en BASIC puro para el ZX Spectrum, como Pedro Pómez, Brain 8 o Rompetechos (aquí pueden descargarse), en los que ha exprimido al máximo éstos y otros trucos.

El hecho de que la memoria de pantalla del ZX Spectrum esté dividida en dos partes almacenadas independientemente, el mapa de bits y el mapa de atributos, como se ha explicado en el primer apartado de esta entrada, da la posibilidad de realizar operaciones gráficas sólo con una de las dos. Así, muchos programas manipulan sólo el mapa de bits, ahorrando el valioso tiempo que se iría en mantener también actualizados los atributos (algo muy común en los primeros juegos desarrollados para el Spectrum, pero también en los realizados hoy en día en BASIC puro).

Aunque es más raro, también se puede manipular sólo el mapa de atributos; a cambio de perder una resolución considerable (de 192 x 256 píxeles pasamos a 24 x 32 celdas), se consigue un aumento de velocidad proporcionalmente importante. Con mucha creatividad, el efecto conseguido puede ser interesante. Por poner un ejemplo muy básico: usando el bit de parpadeo de los atributos se pueden animar un par de rótulos en pantalla, como hacía el maravilloso “Manic Miner” de 1983:

Lamentablemente, para manipular conjuntos de atributos en tiempo real no existe ninguna instrucción BASIC (salvo POKE), lo que reduce la eficiencia porque obliga a utilizar bucles FOR, que son muy lentos, o complica bastante el código si se usan técnicas como el desenrollado de bucles, explicadas en otras entradas de esta serie.

A pesar de eso, se puede engañar al sistema utilizando trucos como el del DEFADD descrito en la entrada anterior. En ésta añadiremos a nuestro arsenal un truco basado en LPRINT similar al explicado en el último apartado con el fin de escribir conjuntos de atributos en pantalla de forma muy rápida. A continuación explicamos cómo se hace y qué limitaciones tiene.

Si redirigimos el buffer de impresión hacia una dirección en el mapa de atributos, por ejemplo, a la celda de atributo localizada en 22784 (es decir, si hacemos POKE 23680,0 y POKE 23681,89, ya que 0 + 89 * 256 = 22784), los bytes que LPRINT envíe al buffer se almacenarán como atributos de color:

Sin embargo, se observa que, aunque el relleno de 8 x 18 celdas de atributos se consigue hacer rapidísimamente (una sola orden LPRINT), el resultado no tiene mucho sentido. Esto se debe a que LPRINT escribe en el buffer de impresión los bytes correspondientes al mapa de bits del texto de 18 caracteres "Esto no colorea...". Esos bytes, al interpretarse como atributos de color, no producen nada útil: no eran números pensados para representar atributos de color, sino mapas de bits de caracteres.

En la imagen de arriba, cada tira vertical de 8 atributos de color corresponde al mapa de bits de uno de los caracteres del texto; por ejemplo, la primera tira vertical son los 8 bytes del mapa de bits de la letra "E" visualizados como atributos de color. Si en lugar de "E" hubiéramos hecho LPRINT "A" usando este truco, se escribirían como atributos los correspondientes a los bytes que definen el mapa de bits de la letra "A", que son 0, 60, 66, 66, 126, 66, 66 y 0:

Este programa BASIC escribe previamente algo en el mapa de bits (línea 6 del programa); como se observa, la manipulación de atributos se hace independientemente de dicho mapa de bits, por lo que sólo cambiarán los colores de lo que hay allí, no las formas.

Está claro, por tanto, que para que este truco produzca algo interesante debemos hacer que los bytes del mapa de bits del texto que imprimimos sean los atributos de color que queremos almacenar en pantalla. O sea, que el texto a imprimir debe estar compuesto de caracteres diseñados por nosotros de forma que sus mapas de bits se correspondan con los atributos que queremos. En BASIC, esto puede hacerse mediante gráficos definidos por el usuario (UDGs).

Por ejemplo, si queremos hacer un “arco iris Spectrum” en pantalla manipulando sólo los atributos de color, usando colores de papel que vayan desde el 0 hasta el 7 en vertical y repitiéndolos a lo largo de toda la pantalla en horizontal, los atributos a escribir por cada columna en el mapa de atributos serían 0, 8, 16, 24, 32, 40, 48, 56 y 64. Podemos definir por tanto el UDG “A” con esos números y hacer LPRINT de ese nuevo carácter repetido 32 veces, como se muestra en el siguiente programa (el carácter extraño de la línea 20 es precisamente la forma o mapa de bits del UDG "A" que hemos definido en la línea 6; aparece así por haber capturado la imagen después de haber ejecutado el programa):

Se puede conseguir un scroll horizontal rotando la cadena de texto que se imprime, como hace el siguiente programa, donde hemos insertado algunos atributos diferentes del arco iris para que se note dicho scroll:

Hay que tener en cuenta en este truco que mediante el POKE que cambia el byte alto de la variable del sistema PR-CC (23681) sólo podemos apuntar el buffer de impresión a tres secciones diferentes del mapa de atributos (haciendo POKE 88, 89 y 90, respectivamente). Afortunadamente, y al contrario que en el mapa de bits, podemos poner el byte bajo (23680) a cualquier valor desde 0 hasta 255; el módulo 32 de dicho valor (desde 0 hasta 31) cambiará la posición horizontal de pantalla en que imprimimos el atributo, y el resto, junto con el byte alto, establecerá la fila y por tanto la celda concreta donde se iniciará la tira vertical de 8 atributos (¡hay que tener cuidado de no salirse del mapa de atributos!).

Además, modificando el byte bajo de PR-CC también se puede hacer scroll horizontal de atributos, como se ilustra con este programa:

La principal ventaja del truco del LPRINT para atributos es su velocidad. A partir de los ejemplos que hemos explicado en este apartado se puede experimentar con otras cadenas de texto y distintos valores de la dirección del mapa de atributos a apuntar con el buffer de impresión. En general, requiere mucha práctica encontrar efectos o programar escenarios (mapas) para movernos en ellos, y serán necesarios lápiz y papel para hacer esos diseños, luego transformarlos en valores numéricos de atributo y finalmente crearlos como UDGs.

.oOo.

[Click here to read this in Spanish ]

This is the fifth and last one in a series of posts that explain the foundations of the (in)efficiency of pure BASIC programs written for the ZX Spectrum:
I. On line numbers
II. On variables
III. On expressions
IV. Some statements and time measurements
V. Screen operations based on characters

In this last post of the series we will talk about the ZX Spectrum screen, and how to speed up some character printing operations on it when programming in Sinclair BASIC. We do not talk here about drawing with pixels (PLOT, DRAW, …); for that, please refer to the last post. Also, the DEFADD trick to move memory blocks (useful for screen operations as well) was described in that post too.

To navigate through this post more easily, these are the sections it contains:

The screen and the printing of characters. On the layout of the display memory and why the efficiency in printing characters was so important for its design.
Printing characters. Sprites in BASIC, control characters, screen compression (with some suggestion by the BASIC game programmer @igNaCoBo).
Reading characters. Execution times of SCREEN$ and ATTR.
Printing scaled text. How to print zoomed in characters with the LPRINT trick (in collaboration with the BASIC game programmer @IvanBASIC).
Printing colour attributes only. How to manipulate the colour attributes of the screen independently from their characters with the LPRINT trick (in collaboration with the BASIC game programmer @IvanBASIC).

The screen and the printing of characters

Direct writings and readings in the ZX Spectrum screen memory are not very common in BASIC, but it is important to know how that memory is organized in order to understand some techniques that can speed up programs (like the LPRINT trick explained further on, or the limitations of the DEFADD trick, explained in the previous post, when it is used on the screen). In older posts of this blog (such as this and that) I already explained some of the characteristics of ZX graphics; here I focus on why they have that particular memory organization and what operations are more efficient that way.

The first decision made during the design of the original ZX graphic system was related to the importance of writing text on the screen and the lack of dedicated hardware to do so (the burden was on the CPU): to begin with, fixed size text characters should be used for alleviating that burden, in particular of 8 x 8 pixels.

The second decision had to do with the necessity of displaying a bunch of colours to honour the ZX name, unlike its black and white predecessor, the ZX 81. Unfortunately, doing so for every pixel separately would have driven the price up to a completely prohibitive cost. A sensible approach back then was to provide the minimum required for writing text, that is, to take blocks of 8 x 8 pixels as the spatial granularity for colour, using the smallest amount of memory for storing the colours of the pixels in those blocks since most of the time they would be used to display text characters.

Consequently, just 1 byte was stored for each 8 x 8 pixel block, called the “attribute” of that screen “cell”, as I explained here and it is illustrated in the figure below. The visible character was drawn on the TV display by using that “attribute” along with a bitmap consisting of 8 bytes (shown on the left of the figure) with bits set to 0 for indicating “paper color” and to 1 for “ink color”.

Due to this, the ZX Spectrum screen memory was split into two parts, separately stored but related to each other, that were periodically read by the hardware (concretely, by the ULA) in order to refresh the TV display:

The bitmap, that stores the “shapes” or, if you prefer, the “black and white drawings” of all characters that can be written on the screen (24 lines times 32 columns of characters). It starts at address 16384 (right after the end of the ROM). It occupies 6144 bytes (24 lines of characters times 8 pixels of height per character = 192 pixel rows, with 32 character columns of 1 byte each).
The attribute map, that stores the colour attributes for those characters. It starts right after the bitmap, at address 22528. It has 768 bytes (24 lines times 32 columns of colour attributes, 1 byte per attribute).

The third decision is the most important one for us in this series of posts, because it is the one responsible for the (in)efficiency of the direct accesses to the screen memory when working with anything either smaller or larger than one text character. Since the CPU, that is, the software, was in charge of all character writing, that work had to be done as fast as possible. And where was the greatest time cost? In writing the 8 bytes of the bitmap of a character in the screen memory (its attribute is just 1 byte!). And what calculations were the most frequent ones when writing those bitmap bytes? Basically two, that we denote as (Y) and (X) for reasons that will be clarified later on:

(Y) To calculate the memory address to store the next byte of the bitmap of a given character (that is, to go 1 pixel row down on the screen bitmap).
(X) To calculate the memory address to store the bitmap of the next character of the text (that is, to go 1 column to the right on the screen bitmap).

The former is needed to store the 8 bytes of a character bitmap in the screen memory in order to be displayed by the ULA on the TV, and the latter to start doing the same to the next character of the text.

Both operations involve to increment memory addresses, and memory addresses are positive integer numbers. The Z80 CPU fastest machine instructions to increment such numbers are INC, which add 1 to an integer; they are specially fast when the number fits into 8 bits. Therefore, if it was possible to do action (Y) by adding 1 to an 8 bits number and action (X) by adding 1 to another one, writing text on the screen would be as fast as it could be.

Well, a Z80 memory address is a positive integer number of 16 bits, and the Z80 finds extremely easy to work with most 16 bits numbers as though they are composed of two independent numbers of 8 bits, called the most significant or “higher” byte of the address and the least significant or “lower” one. Consequently, it can do two independent 1-unit increments on a 16 bits memory address very fast. If incrementing the higher byte of the address gets the pixel row of the character bitmap down and incrementing the lower byte of the address gets the column of the character bitmap one column to the right, where the next character in the text must be stored, everything would be sorted out.

That is exactly what the screen memory designers of the ZX did.

With that design, we have 2 cursors that we can move independently on the screen bitmap: one is horizontal (character columns; it moves by incrementing the lower byte of the memory address; it can take 32 different values -columns-, that need 5 bits to be represented) and the other is vertical (pixel rows; it moves by incrementing the higher byte of the memory address; we can move it along 192 different values, that need 8 bits to be represented). Something like this:

However, with this design we are only using 5 bits from the lower byte of the memory address to do action (X), therefore there are unused 3 bits in that byte, which would produce “gaps” in the screen memory (addresses that we would never consider), and that must be avoided. Moreover, if we use this design, the memory address of the first byte of the screen bitmap would be located at 0, which is not RAM…

The solution was to move part of the vertical cursor bits, stored in the higher byte of the memory address, to the lower byte (we can only move 3 bits). We cannot move the least significant bits of the higher byte, because they are the ones involved in action (Y), thus we have to use the most significant bits. That has an added benefit: it will release bits 15, 14 and 13 of the memory address, and they can be set to the binary value 010, which makes the first screen bitmap address, where both cursors are 0, to be 16384 exactly (all the bitmap addresses will have the same value in those bits). The design is now like this:

Moving the 3 most significant bits of the vertical cursor (that is, its bits 5, 6 and 7) to the lower byte of the memory address we would leave 5 bits of the higher byte for that cursor (in addition to the constant bits 010). With those 5 bits you could increment 32 times the vertical cursor value before it overflows, which means that, doing action (Y), you could cover 4 rows of text before having to modify at all the 3 bits that we moved to the lower byte.

The problem is that, once the 32 possible values of the horizontal cursor (X) are spent, we change the higher bits of the vertical cursor (Y7, Y6, Y5), which means that, after incrementing the last horizontal column (31), we jump 4 rows of characters below the current one, which is of little use and weird.

It would be much more natural that, after incrementing the last horizontal column in the screen, the vertical cursor would go to the next character row below. For doing that, they decided to move bits 3, 4 and 5 from the vertical cursor to the lower byte of the memory address, instead of bits 5, 6 and 7. In that way, the vertical cursor in the higher byte of the memory address overflows after doing action (Y) only 8 times (which is enough to access all bytes of the bitmap of one character), but a natural change in the character line is achieved when reaching the last column:

This design has an unexpected effect: when considering which parts of the screen correspond to contiguous characters in memory, we are splitting the bitmap into 3 contiguous sections of 8 character rows each corresponding to the 3 values that the bits 6 and 7 of the vertical cursor may store (they cannot store the binary value 11 because there are only 192 lines of pixels in the screen, not 255). This is the reason why when a screen is loaded from tape, it appears on the TV display in a so weird sequence of pixel lines.

It also has a couple of important drawbacks. To begin with, it requires more complicated calculations for operations different from (X) and (Y), as it is explained, for instance, here. Also, it imposes limitations to the fast memory copies of the DEFADD trick, explained in the previous post of this series, when used to put graphic blocks on the screen or to do scroll. But all in all, the advantages for writing characters in a fast way overweigh these inconvenients.

Everything explained in this section can be shown with this simple BASIC program, that writes on the bitmap sequentially; you can visualize how these values are distributed in memory. The program also illustrates the much simpler linear organization of the attribute map:

10  BORDER 1 :  CLS 
20  REM *** FILL ATTRMAP ***
30  FOR b = 22528 TO 23295 :  LET indbl =  INT  (  ( b – 22528 )  / 256 )  :  POKE b ,  ( indbl + 2 )  * 8 +  ( indbl + 5 )  :  NEXT b
40  REM *** FILL BITMAP ***
50  FOR a = 16384 TO 22527
60  POKE a , 255
70  NEXT a
80  BEEP 1 , 1 :  PAUSE 0

Printing characters []

The ZX Spectrum BASIC language has statements to print characters on the screen which hide the details explained in the paragraphs above. Nevertheless, printing characters on screen is essential in any BASIC program for the ZX, and thus understanding their (in)efficiency and providing some useful hints to accelerate them is needed.

In the case of printing sprites, i.e., a contiguous set of characters with certain width and height (usually composed of UDGs), it is not convenient to use FOR to scan and print every character due to the extra computation time and program space in memory; instead, you may think of embedding within the characters of the sprite some that control the printing position. The ZX ROM prints this kind of embedded control commands very fast, much much more than the equivalent explicit statements.

Alas, unfortunately the interpreter does not implement correctly the control characters in charge of directional movements (arrows). The left/right controls have some bugs, and the up/down ones do not work. Only if we print always at the same absolute place on the screen we can do it fast through the control character AT (22). For the rest of cases, coping with the lack of directional control characters requires to rely on loop unrolling techniques, that we explained in previous posts.

This short program illustrates some of the problems of the interpreter to use the directional control characters:

10  BORDER 1 :  CLS 
20  PRINT  AT 10 , 10 ;  PAPER 5 ; “0” ;  PAPER 7 ; 
25  PAUSE 0
30  PRINT  CHR$ 9 ; “R” ; 
35  PAUSE 0
40  PRINT  CHR$ 8 ;  CHR$ 8 ; “L” ; 
45  PAUSE 0
50  PRINT  CHR$ 9 ;  CHR$ 11 ; “U” ; 
55  PAUSE 0
60  PRINT  CHR$ 10 ;  CHR$ 10 ; “D” ; 
65  PAUSE 0
70  PRINT  CHR$ 13 ; “ENTER”
75  PAUSE 0

There is however a practical use of control characters when printing sprites: color control (and bright, flash, etc.). With them we can avoid the use of the INK, PAPER, etc. statements, which are much slower to run and require to evaluate additional expressions (their arguments). Also, we save program memory space. The drawback is that the listing of the program code gets messed up when seeing it in the ZX editor, but that has no effect during execution.

To get an estimate of the time spent in printing a sprite of certain size, you can run this program:

5  LET s$ = “12345678901234567890” :  LET n = 100
10  FOR x = 32 TO 16 STEP  – 1
11  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0
15  FOR f = 1 TO n
20  FOR y = 0 TO 32 – x :  PRINT  AT y , x – 1 ; s$ ( 1 TO 32 – x + 1 )  :  NEXT y
21  NEXT f
25  LET T =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT  AT 32 – x , 0 ;  ( 32 – x + 1 )  ; ” “ ;  ( T * 0.02 / n ) 
30  NEXT x

The program repeats the printing of square blocks of characters on the screen with diverse sizes, measuring the time taken for each of those sizes. The resulting average times are shown in this graph:

This behaviour can be split into three aditive components: the main one, clearly quadratic (470 microseconds times the total number of characters to print), comes from the very printing; the secondary, linear (17 milliseconds due to the height in characters) is produced by the loop in line 20, in charge of printing each of the rows of characters; the last one, constant (the vertical offset of about 12 milliseconds) is due to the rest of work in the loop (expression evaluations, etc.), that is always the same, unregarding the size of the sprite.

As you can see in the graph, in general, to get 25 fps or more you can only print one character (not considering the time to erase the sprite, which halves that frequency); moving to 2×2 sprites (without colour control, which would increase their size), you can only get 20 fps; below 8 fps with 6×6 sprites (or their equivalent: 2×2 sprites with ink colour controls inserted for each character); and so on. In practice, sprites larger than 2×2 with embedded colour control characters take more time than the one available in most situations.

Notice that the data shown in the graph do not only serve for estimating how long will take to print a square sprite on average, but can be used for any set of characters we need to print, with any dimensions. For instance, printing one horizontal line of 32 characters (no control codes) would take mainly 470 x 32 = 15.04 milliseconds if we do not consider the time taken by constant-time operations (expressions and the like), because it would be similar to not having the printing row loop and therefore discarding the linear component in the formula. Another example would be to print the entire screen (let say 24 x 32 characters, maybe some of them control characters) with one PRINT statement, which we could expect to take mainly 470 x 24 x 32 = 360.96 milliseconds or, if we do one PRINT for each line using a FOR loop, 0.470 x 24 x 32 + 17 x 24 = 768.96 milliseconds.

You can get a little improvement while printing characters, suggested to me by the BASIC games programmer @igNaCoBo (he used it in its game ArkanoidB2B): you can end the PRINT command with a semicolon (;) in order to avoid the change to a new line that the interpreter does automatically if the semicolon is not present. To do a new line takes longer than interpreting the semicolon. The above BASIC program can be modified to measure that difference: just insert the semicolon at the end of the PRINT statement in line 20. The time differences with the original program are shown in these figures:

It is shown that, on average, a PRINT statement takes 364 extra microseconds in doing the new line with respect to interpreting the semicolon (bottom histogram). This may be little, but if the statement repeats (e.g., to draw a sprite), the gain accumulates linearly, as shown in the top figure (the middle figure shows the same info as the bottom one but spread along all the sizes of the sprite that have been tested).

Besides this little improvement, printing can also be made more efficient when we have to print a sequence of spaces. If they are more than 3, and must end at a known column, it is convenient to use the TAB control character instead of the spaces, which automatically prints them until reaching the column (the TAB control char plus the argument are 2 bytes only, and no expression evaluation is involved). The ZX-Basicus analysis tool (-a) can help in this sense since it locates the text literals that contain sequences of contiguous spaces.

Also, sequences of contiguous spaces can be substituted by the COMMA control character, but that is less flexible: it prints spaces until reaching the next half of the screen with respect to the current cursor position.

Of course, you should not print spaces individually to clear the screen, but use CLS, or, alternatively, print a number of COMMA control characters that fill it. You can use this “trick” also to fill parts of the screen with graphics (either built-in graphic blocks or UDGs) faster than printing them one by one.

Regarding what we explain in the last paragraph, there is a trick to fill the entire screen with any character in a very fast way: it consists in redefining the space character temporarily; when TABs or COMMAs are used, the interpreter will not print blanks, but the character defined by us. For instance, to fill the screen with bricks, just run the following program (the 44 commas in line 30 fill the 22 rows of the screen with spaces; the POKE in 23606 and 23607 tell the system that it has to use a new charset whose first character, i.e., space, is at the UDG “A” that we define in line 20):

10  BORDER 0 :  PAPER 7 :  INK 0 :  CLS 
20  RESTORE 20 :  FOR f = 0 TO 7 :  READ b :  POKE  USR “a” + f , b :  NEXT f :  DATA 255 , 8 , 8 , 8 , 255 , 128 , 128 , 128
30  LET d =  USR “a” – 256 :  RANDOMIZE d :  POKE 23606 ,  PEEK 23670 :  POKE 23607 ,  PEEK 23671 :  PRINT  INK 6 ;  PAPER 2 ;  AT 0 , 0 ;  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ,  ;  :  PAUSE 0 :  CLS  :  POKE 23606 , 0 :  POKE 23607 , 60

Finally, the synthesizer tool included in ZX-Basicus can compress screens using the tricks explained before (option --comprscr), generating automatically their binary data (new charset definitions, definition of the fake variable that will contain the screen using the DEFADD trick explained in the previous post) and the BASIC programs that use them.

Reading characters []

In games it is very useful to read some screen position and tell which character is printed there, or which attribute is used there; in that way, the program can decide whether there has been a collision between sprites, for example.

Both ATTR (attribute map) and SCREEN$ (bitmap) functions are aimed to solve that problem, and they are quite efficiently implemented in ROM. They cannot be greatly accelerated beyond using only integer literals as their arguments.

ATTR is faster, though, since it does not interpret the bitmap to deduce the character, thus it should be the preferred collision-detection method. This can be tested with the following program, that takes about 2 minutes in finishing and producing a precise and accurate time measurement:

10  POKE 23672 , 0 :  POKE 23673 , 0 :  POKE 23674 , 0 :  FOR f = 1 TO 10000 :  LET c$ =  SCREEN$  ( 11 , 11 )  :  NEXT f :  LET T =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674 :  PRINT “SCREEN$: “ ; T ; ”  “ ; T * 0.02 ; ”  “ ;  ( T * 0.02 / 10000 ) 

If we run it first with SCREEN$ and then with ATTR (not changing anything else in the code), the differences between both time measurements will be due to the ones in those functions only. The results show that, on average, SCREEN$ is around 2.6 miliseconds slower than ATTR (the precision in the time measurements of this program is high: the averages may be, with approximately 95% of probability, around +/-115.5 microseconds from the values shown).

Printing scaled text []

This section has been created in collaboration with @IvanBASIC, an expert programmer of pure BASIC games for the ZX Spectrum, such as Pedro Pómez, Brain 8 or Rompetechos (you can download them here), in which he has made the most of these and other tricks.

As we explained in the first section of this post, the ZX Spectrum is designed to write characters of 8 x 8 pixels on the screen as fast as the Z80 CPU can do it. However, sometimes it is interesting to write smaller or larger characters. The former requires assembly programming, but the latter can be addressed, to some extent, by using the LPRINT BASIC statement, originally designed to send data to the ZX printer in a serial fashion.

Among many remarkable things, the ZX has an universal serial communication system able to deal uniformly with very diverse devices. It is based on the concepts of streams and channels. In short (here you can find more details), channels are communication ports corresponding to specific hardware devices, along with their input/ouput data processing routines, while streams are software connections that can be opened and closed to those channels on the fly. There are 16 available streams (from #0 to #15), that can be associated to 4 pre-defined channels (‘S’ -> upper screen, ‘K’ -> lower screen and keyboard, ‘P’ -> printer, ‘R’ -> ROM internal use); the list of channels can be expanded by connecting other external hardware devices.

When a ZX Spectrum 16K / 48K starts up, the following streams are opened: #0 and #1 for channel ‘K’, #2 for channel ‘S’, and #3 for channel ‘P’ (channel ‘R’ is not accessible from BASIC). Thus, PRINT #0 or PRINT #1 will send their arguments to the lower part of the screen; PRINT #2 to the upper part (a conventional PRINT, by default, uses stream #2); and PRINT #3 to the printer, which can also be done using its synonym LPRINT.

Each time LPRINT has a text line to send to the printer (32 characters at most, the same that the screen), the output data routine associated to channel ‘P’ processes the bitmap of that text line (the printer cannot deal with colours, thus attributes are ignored) and leaves the processed data into a memory buffer in RAM, from which the printer takes them. Since a text line may have a bitmap of, at most, 8 pixel lines times 32 character columns, the printer buffer will occupy a maximum of 8 x 32 = 256 bytes. If LPRINT prints shorter lines, it fills a smaller portion of the buffer.

The first interesting point here is how the bitmap of the text line is stored in the printer buffer: the first pixel line of the bitmap of the text (the one at the top) is stored in the buffer, which amounts to, at most, 32 bytes; 32 bytes after the first of those bytes is where the second line of pixels is stored; and so on until storing the 8 lines of pixels of the text bitmap. If the text is shorter than 32 characters, the columns that are not used in the buffer are not filled with anything.

The second interesting point is how can we redirect the process of that output routine in order to store the data in a different place (for instance, in the screen memory). This is really easy in BASIC, since there is a system variable, PR-CC, placed at RAM addresses 23680 and 23681, that stores the start address of the printer buffer (higher and lower bytes, respectively). Initially, that variable contains the bytes 0 and 91, that form the memory address 23296 because 0 + 91 * 256 = 23296, but they can be changed at any time using POKE. Take into account that, after each LPRINT statement, the content of PR-CC will be set automatically to its original value, 23296, which places the printer buffer right after the attribute map of the screen and before the system variables area of BASIC, as you can read in the ZX Spectrum 16K / 48K manual.

The LPRINT trick to write vertically scaled characters is based precisely on redirectly the printer buffer adequately to the screen bitmap memory before executing the statement. In particular, due to the organization of that memory and how the printer buffer is filled with data, if the redirection is done to the first line of pixels of some screen cell the text will be printed on screen, but with 7 pixel lines between each pair of lines of the text bitmap, as shown by this simple program:

Notice that in the line 10 of the program the printer buffer is placed in the memory address 0 + 72 * 256 = 18432, that corresponds to the first cell (and pixel line) of the second section of the screen bitmap, out of the three sections that constitute the screen explained in the first part of this post.

This program illustrates how LPRINT sends the first pixel line of the text bitmap to the start of the printer buffer, then adds 32 bytes to the first address used for that, writes there the second pixel line, and so on; the jumps of 32 bytes in the printer buffer are jumps of 8 pixel lines (vertically) on the screen bitmap; in other words, we are jumping to the next character row of the screen with each one of them.

Consequently, if we repeat LPRINT with the same text but placing the printer buffer in each one of the 8 pixel addresses of the first cell used above, or, equivalently, if we increment the higher byte of the screen bitmap address at each repetition (the (Y) action explained in the first part of this post), we will enlarge vertically the text.

The following program does exactly that: before each repetition of LPRINT, it moves the printer buffer (increments its higher byte, stored in the higher byte of the system variable PR-CC, at 23681). Notice how the enlarged characters have 8 times their original vertical size and 1 time their original horizontal size:

Of course, we can print at other column just by placing the printer buffer in the first pixel line of a different cell, doing action (X), that is, by incrementing the lower byte of the screen bitmap memory address where the printer buffer is placed (the byte stored in the lower byte of PR-CC, at 23680):

Furthermore, if, during the printing, the buffer covers columns beyond 31, LPRINT produces a sort of rotation of the text to print, because those bytes that fall “outside” (on the right) will go back to the left of the screen, and a little bit below, as you can observe by running this program (the lower byte of the printer buffer is established through variable b):

5  FOR b = 10 TO 255
10  FOR a = 72 TO 79
20  POKE 23680 , b :  POKE 23681 , a
30  LPRINT ” A B C D “
35  NEXT a :  PAUSE 10 :  NEXT b

To end this part, notice that the execution speed of this trick is similar to the one of PRINT times 8. Actually, if we print 8 times the same text using any of these two statements, there is no clear difference in their times, as the following program shows: it writes a text of 9 characters, and measures 0.18 seconds in both cases, that is, it takes about 20 milliseconds per character (including the times involved in the FOR loop management):

1  LET t0 =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674
10  FOR a = 72 TO 79
20  POKE 23680 , 0 :  POKE 23681 , a
30  PRINT  AT 10 , 0 ; ” A B C D “
35  NEXT a
40  LET t1 =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674
50  PRINT  AT 0 , 0 ;  ( t1 – t0 )  * 0.020 ; ” secs for PRINT”
101  LET t0 =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674
110  FOR a = 72 TO 79
120  POKE 23680 , 0 :  POKE 23681 , a
130  LPRINT ” A B C D “
135  NEXT a
140  LET t1 =  PEEK 23672 + 256 *  PEEK 23673 + 65536 *  PEEK 23674
150  PRINT  AT 1 , 0 ;  ( t1 – t0 )  * 0.020 ; ” secs for LPRINT”

This speed can be slightly improved using the same trick of the semicolon commented above in the section about printing characters, which has been pointed out to me by @igNaCoBo. If you write a semicolon at the end of LPRINT, the statement does not perform the new line change, which actually consists in clearing the printing buffer, and that saves some time. The buffer will be overwritten in the next LPRINT.

Printing colour attributes only []

This section has been created in collaboration with @IvanBASIC, an expert programmer of pure BASIC games for the ZX Spectrum, such as Pedro Pómez, Brain 8 or Rompetechos (you can download them here), in which he has made the most of these and other tricks.

The fact that the ZX Spectrum screen is split into two parts stored separately, the bitmap and the attribute map, as explained in the first section of this post, give us the opportunity of performing graphical operations only in one of them. Actually, many programs work with the bitmap only, saving valuable time that would otherwise be spent in keeping the attributes updated (this was very common in the first games developed for the Spectrum, but also in pure BASIC programs written today).

Although more unusual, we can also work with the attribute map only; even losing considerable resolution (from 192 x 256 pixels to 24 x 32 cells), the proportional speed up is worth it, and with some creativity we can get very interesting effects. Just as a very basic example: using the flash bit of the attributes we can animate a banner on the entire screen, like the wonderful “Manic Miner” did in 1983:

Regretfully, there is no BASIC statement to manipulate groups of attributes in real time (except POKE), which reduces efficiency since forces us to use FOR loops, that are very slow, or makes the code much complicated by using techniques like loop unrolling, explained in other posts of this series.

In spite of that, we can cheat the system by using tricks like the DEFADD one described in the previous post. Here we add to our toolkit another based on LPRINT, similar to the one of the previous part of this post, to write blocks of attributes on the screen very fast. In the following we explain how it is done and what limitations it has.

The trick consists in redirecting the printer buffer to the screen attribute map. If we do that, for example, pointing to the middle part of the screen, that has its first attribute cell at address 22784 (that is, if we do POKE 23680,0 and POKE 23681,89, since 0 + 89 * 256 = 22784), the bytes sent by LPRINT to the printer buffer will be used as colour attributes as this program shows:

You can observe how, in spite of the speed of filling 8 x 18 attribute cells with just one LPRINT statement, the result has not much sense. This is because LPRINT writes in the printer buffer the bytes corresponding to the bitmap of the 18 characters text "Esto no colorea..." ("This does not colour..." in Spanish). Those bytes, when interpreted as colour attributes, does not produce anything useful: they were not numbers intended to represent colours, but character bitmaps.

In the above picture, each vertical stripe of 8 colour attributes comes from the bitmap of one of the text characters; for instance, the first stripe are the 8 bytes of the bitmap of the character "E" visualized as colour attributes. If we would have done LPRINT "A" using this trick, the attributes would be the ones of the bitmap of the character "A", that is, 0, 60, 66, 66, 126, 66, 66 and 0:

This BASIC program writes to the screen bitmap before doing the attribute trick (see line 6 in the listing); as you can observe, the attribute manipulation is done independently from the screen bitmap: only colours change on that area, not the written characters.

Consequently, for this trick to do something interesting, we must make the bitmap of the written text equal to the colour attributes we wish to store in the screen. In other words, the text to print must consists of characters designed by us in such a way that their bitmaps corresponds to the desired attributes In BASIC, this can be done with user degined graphics (UDGs).

For example, to draw a “Spectrum rainbow” on the screen by only manipulating attributes, using paper colors ranging from 0 to 7 vertically and repeating that pattern along the entire screen horizontally, the attributes to use at each stripe would be 0, 8, 16, 24, 32, 40, 48, 56 and 64. We can thus define the bitmap of the UDG “A” with those numbers and do LPRINT of that character 32 times, as the following program does (the strange character in line 20 is precisely the bitmap of the UDG "A" defined in line 6; it already appears in its user defined shape because we have captured the screen after running the program):

You can also do horizontal scroll by rotating the text string printed by LPRINT, as the next program does (we have inserted in the string some random characters for the animated scroll to be visualized):

When using this trick, notice that the POKE that changes the higher byte of the system variable PR-CC (23681) can only redirect the printer buffer to three different sections of the screen attribute map (POKE 88, 89 or 90, respectively). Fortunately, and unlike in the bitmap, we can set the lower byte (23680) to any value within 0 and 255; that value modulus 32 (which will range from 0 to 31) will change the horizontal position where we print the attribute on screen, and the rest, including the higher byte, will set the row and therefore the particular cell where the vertical stripe of 8 attributes will be printed (take care not to go beyond the screen attribute map!).

Moreover: by changing the lower byte of PR-CC you can do horizontal attribute scroll, as illustrated below:

The main advantage of the attribute LPRINT trick is its speed. Starting with the examples we have included in this part of the post you can experiment with other text strings and different redirections of the printer buffer. In general, it requires practice to find novel effects or program scenarios (game maps) for moving along them, and pen and paper will be necessary for making the designs, transforming them into numerical attribute values, and create the UDGs.