wermi

I made a demo, so you probably can, too

Update (2023-12-09): fixed typos/style

Note: this article was mostly written in Feb 2022, it has been shelved for over a year and mostly unchanged, aside from minor cleanups and added content. Take with a grain of salt.

I hope my future employer will not read this!

Due to popular demand, I decided to write an article about how the LZW demogroup and its debut production came to be, or at least how it went from my perspective. I may have not been particularly diligent about version control, but I have recorded the state of the demo pretty much throughout its entire development in other ways, since I communicated with Loni heavily and sent her various video snippets and pictures. I’ll mostly be showing those, instead of boring you too much with the tech talk.

If you haven’t done it already, you might want to watch the final product first:

Teh Beginnings

The inception of LZW happened over a year before the demo became a thing. Zlew convinced me to go to a demoparty (Xenium 2019), which in turn stimulated my desire to contribute something myself. He also pushed the idea of making something for ZX Spectrum (possibly for speccy.pl party? but it didn’t happen in the end because of you-know-what…). In the meantime I also asked my friend Loni, who is an excellent pixel artist (& much more!) if she’s interested in the demoscene and if she would like to work with me, then it Sorta Happened.

Zlew asked around and gathered resources for learning Speccy coding, so I messed with that a bit. I had a bit of an easier time, because I’ve already attempted learning Gameboy assembly coding before (it has a Z80-like syntax, but not that much in common otherwise). I will let my labor of love speak for itself:

Yeah, I basically made that one to make fun of my brother’s misery.

Moving on…

Somewhere in April of 2021, Loni announced “happy party three” (which, as the name implies, is the fourth event from the “happy party” series). Unlike previous ones, which were about bite-size DJ sets, this one was a visual art (or in other words, a short film) festival. Loni also proposed to make a demo for the happy party, which I initially rejected, because there was not enough time (about two weeks) and I didn’t feel confident enough in my programming skills. Eventually the deadline was extended to whopping 2 months, so we went back to that idea.

The Darned Sinscroller

For the first part, I came up with the brilliant idea of making a scroller with a wavy LZW logo, inspired by Loni’s sketches from 2020.

Wrapping my head around Spectrum’s screen layout was a bit difficult at first, so it took a while to write a working sprite routine and go from this…

to this…

to something like this. While I was struggling to get stuff to work well enough, Loni fattened up the logo and worked on the font.

Being my first finished part, the code is not pretty and there is a lot of stuff done in a weird way. For example, I couldn’t figure out a fast way to clear the screen properly, so I just made the graphics taller and included some empty space, so that the next sprite draw would overwrite any leftover junk from the previous frame. This should also explain why there’s not that much vertical movement overall.

Anmeten

Loni had an idea to make an animation and went with rather conservative constraints on her side (160x160px, 12.5fps, no color) to make the job easier on my part. At this point I started looking into compression options available on the Z80 and found a couple intriguing articles… especially this one written in Russian (hi introspec!), which basically mentioned aPLib as best compromise in speed/compression ratio. So I went with it.

A plot comparing various compressors available on the Z80

The animation uses a very simple delta encoding; the frame data consists of a 400 byte block indicating which 8x8 tiles are unchanged from the previous frame, and another variable size block consisting of the changed tiles. As the final step, the frame data is compressed using the oapack compressor, which happens to have complexity of O(n3), but still finishes reasonably fast, and in my experience can sometimes save an extra byte over the apultra compressor. The Z80 decompression routine is fast! Frame data depacks in ~50 ms at worst, which gives leeway for my poorly written copy routine (which can take over a frame) and is enough for the target framerate of 12.5fps (80ms).

At this point I have also learned that by massaging the data in a specific way, you can make it compress better. For example, the “commands” indicating if a tile gets skipped or changed were initially interleaved with the pixel data. Separating them into two streams improved the compression by almost 7%.

I have also informed Loni how the encoding works, and she would try to minimize changes happening from frame to frame, as well as making them happen on 8x8 boundaries whenever possible. This has likely contributed a lot to keeping the animation size small as well.

The final animation (including the ending variation) takes up about 19KB when compressed, which is about 250 bytes per frame (compared to 3200 bytes of raw uncompressed data). Probably not a particularly impressive result for a loop that’s a couple seconds long, but still small enough to fit in a prod :) It also takes up less space than the gif version below (~33KB).

Aside from the animation itself, Loni also drew a pretty border for it. One half of it is flipped, and indeed gets flipped at runtime to avoid redundancy. Though I’m not sure how many bytes it really saves compared to just naively packing the whole thing.

Don’t ask why the end part of the routine suddenly starts operating on the screen mapped to 0xC000, instead of 0x4000 like before. Because I have no clue. It’s also easy to tell that the pixel data is not particularly pretty on the border, as I didn’t bother to include any optimizations in my homebrew gfx converter script.

Cube Rotator

Vectors!!! Fuck Yeah!!!

People that knew me for a while may be familiar with the fact that I used to make games in a tool called Multimedia Fusion 2 (nowadays called Clickteam Fusion). It’s an extremely flawed program, with an abundance of weird quirks and design issues. But I was very used to it, and you can prototype quite fast when you’re familiar with it. In fact, at the beginning of 2020 I learned how matrix multiplication works and made a rotating cube in this very software. It doesn’t even run fullspeed on a 3rd gen i5, but it made me incredibly happy regardless.

One year later, I did (almost) the same thing on a 3.5MHz microprocessor. Fun stuff. Naturally, I wouldn’t be able to do it without cutting a lot of corners.

My first crime is stealing the fastest unsigned 8bit integer multiplication routine from cpcwiki. My second crime is bastardizing it, so that it takes and spits out 8bit signed fixed point values (which also made it at least 50% slower in process). 1 bit for the sign, 2 bits for the integer part and 5 bits for the fraction.

Lol.

It’s a miracle it works as well as it does with such low precision.

Afterwards, I wrote two routines: one that constructs a rotation matrix using a minimal amount of multiplications, and one that multiplies a vector by the resulting matrix. I wrote some quick Python scripts calculating the same thing to compare the results against. After confirming that the values were seemingly correct, I wanted to visually inspect the fruits of my labor, so I drew the transformed points using the simplest method I could think of, which is using the plot routine from the ROM.

I mean, it sorta looks like a cube?

Don’t worry, your buddy Bresenham has got you covered:

But it’s still a flickery mess! Isn’t it amazing how I did not bother to figure out how to toggle screens on a Spectrum until about halfway through the development of the demo?

I also added another friend:

I quickly noticed that the wireframe on the tetrahedron can look really awkward (think spinning dancer illusion), so I decided to look into backface culling. It took a lot of contemplating on my part before implementing it. This time, a YouTube comment saves the day:

A very simple method… the only problem is that my 8bit fixed point abomination lacks the precision to make it work well. Whatever, let’s hope nobody notices erroneously disappearing faces on the cube.

Adding another friend…

This one ended up being quite slow and the disappearing faces were too prominent, so I ultimately rejected it.

Speaking of the multiplication routine, I’ve had two problems with it. The first one is that the precalc routine expects the memory to be zeroed, otherwise it won’t work correctly. The other problem is that whoever wrote it was using an old assembler that did not respect operator precedence in expressions. Having a decent debugger helped a lot in diagnosing those problems.

Plasma

Let’s rip off an effect from another demo! The dithering pattern and method of filling the screen (alternating the “field” redraws to increase perceived smoothness) is directly inspired by the plasma in Nightmares demo by Noice. Though mine happens to be slower.

I started out by calculating an array of 8bit values and… dumping the memory to see if the result is anyhow close to what I expect.

One dimension looks fine. Let’s expand the loop by another dimension and see if it still works…

Not fun.

The plasma was probably one of the most troublesome parts in the demo, and frankly I’m not that satisfied with the end result. The precalc is slow and it looks quite unorthodox.

Bezier

Ran out of demos to copy? Don’t worry, you can just rip off one of Windows' oldschool screensavers!

I found an interesting article on calculating bezier using something called “forward differencing”. I only vaguely understand how it works, but the nice thing about algorithms and formulas is that you can treat them as a black box and there’s actually no need to understand anything! I found some other implementation of the method and wrote my own based on it.

I opted for the lowest precision I could get away with and constrained the routine to a fixed amount of 17 points. I ended up with a pretty fast implementation, but the other end of the curve was extremely shaky due to the precision loss.

Solution? Copy the routine, make it calculate the points backwards and stop halfway through.

IMO there’s one way my bezier improves upon the Windows screensaver: the points move over Lissajous curves instead of bouncing around in linear fashion. The Lissajous are implemented by having a couple counters, incrementing them on each iteration and checking if they reached a specified value. If so, reset the counter and increment sine table position.

But the movement is way too slow if you do that only once a frame.

Solution? Copy the routine call until desired speed is reached.

There’s some other exquisite stuff in here:

Not much to say here. There’s actually a bug in routine that makes it not copy the first character column, but I never noticed it because the first column is completely empty in the image.

Credits part

Before doing any coding work, I first conceptualized the part by making a mockup in Photoshop. The final product doesn’t look that much different.

I wrote a simple text printing routine with variable width font support. No fancy features like word wrap or alignment, but there’s control characters for arbitrary horizontal offset and arbitrary delays in character printing, which I typed manually into the text files.

The plasma is mostly recycled, but with a little twist: the “cells” are still 8x4px, but they have proper dither and it’s rainbow colored now.

I couldn’t settle for a background color, as they both were quite appealing. Loni drew the sprites with a black background in mind, so that put a constraint on the final panel. I ended up using both and alternating between them.

Final words

In conclusion, I would say making an oldskool demo in 2021 while having little coding experience is Not That Bad, actually! There are plenty of publicly available resources and excellent tools, which make the job a lot easier.

Personally, the development experience was nothing short of incredible. You know the feeling when you think something is out of your reach, but then you try and… actually manage to do it? That’s what this demo is for me, essentially. Even if the end result is not that impressive in the grand scheme of things. I have learned a lot of things, demoscene rejuvenated my interest in programming and I managed to consistently work on one project for a prolonged period of time (as much as 13 hours a day nearing the deadline!), despite having a track record of never finishing stuff (I wonder why…). Last but not least, it has been great to have friends by my side that actually believed in me, even though I barely had anything to show for it. I couldn’t have asked for a better team. Thank you <3

Other Fun Stuff: fixing a fuse-libretro bug

During the creation of the demo, I had other chances to deal with code of excellent quality written by other people! For instance Fuse, or more specifically its libretro port.

At some point I thought that it would be a good idea to figure out an emulator to capture the demo with. Initially during the development I was using the native Windows version of Fuse, but it didn’t have an option to capture the video to a file. Window capture was out of the question because Fuse’s window rendering is, simply put, awful. I don’t think I’ve seen anything stutter/drop frames this badly.

Fast forward, I came up with the genius idea of using RetroArch and the libretro port of Fuse. I can record a video directly and it looks pretty much perfect! Everything’s great in theory, right? But then I noticed something: no matter what AY stereo separation option was picked in the frontend, the audio always played in ACB. Which is not very convenient, considering Zlew’s tune is meant for ABC playback. I did a quick search and learned the issue has already been reported in 2017 (!) and unfixed since then. Of course, I could have easily swapped the channels in the module itself as a workaround and forgot about it, but my perfectionism and spite could not settle for such an option. With barely any C knowledge at the time, I decided to attempt fixing the bug myself.

I spent a quite significant amount of time on getting the debugger to work at all, then quickly learned the “bug” was simply caused by a hardcoded stereo value in the port specific code. Likely a leftover hack from the initial porting phase. Fun.

While reading through Fuse’s codebase, I also stumbled on some other interesting things. “Experts” will try to convince you they understand what this means:

BTW, unchanged since 2008:

Some time after fixing the bug, which marked my first open source contribution, I learned about how sketchy RetroArch and certain people involved in it are. Unrelated to that, I ended up using Spectaculator for the capture and a good chunk of the development process, because somehow an unmaintained shareware program made by one guy has better human factors than many of the “free and open source” alternatives. Truly surprising, isn’t it.