CD+G

Aug 8-17 2020

Have you ever heard of the CD+G format? Before CD-ROMS and just after CD Audio, the engineers at phillips and sony came up with this format that stashed 16 color animations in some extra space between the audio tracks that would play along with the music. it wasn’t super popular, but it found a very popular niche in karaoke releases.

before it left the mainstream though, they got this masterpiece of 8 bit CD multimedia art out: Holst’s: The Planets

playlist?list=PL8RnW3nRCF9li1DR-NTxslLBOEB_H-zBZ

Reading about the functioning of the CD+G format, and I’m fascinated by its potential as an art medium. Especially since it could be played on so many different old games consoles. (not to mention any karaoke machine). If there were more of an audience and software support around it, it would make a cool potential followup to the Merveilles hyperjam from @rek & @neauoire

(sadly I don’t think it’s ready for that) Or some other demo jam. It’s like a lowfi musical gif

Tech Flashback: The CD+Graphics Format (CD+G) | Gough’s Tech Zone

looking at the Holst: the planets video, it seems the CD+G command stream is capable of altering palettes and doing some kind of simple scrolling, so it is capable of doing palette cycling style animations. it really plays into this weird idea i have of a kind of multimedia stream format that is something between gif and teletext. an animation or vector drawing format that’s efficient enough to be played back in sync from one track of a stereo audio casette or over fm radio.

but also low power enough to implement on an 8-bit machine: commodore 64, nes, gameboy color, and so on.

one iteration of that is imagining if you were to put something inbetween the NES cpu and PPU that could capture the digital signals sent to the ppu and play them back. what would that look like? what if in an alternate 1980’s, whole cutscenes with audio were stored on a casette tape the game could cue up and play back at will?

I found this javascript/canvas player for CDG, this could come in handy. I just wish I could find the original CDG files for Holst’s The Planets. They’re like brilliant animated collages of every image associated with each planet.

https://github.com/bhj/cdgraphics

so, software engineering challenge:

you have a normal C90 audio cassette tape, it holds 4x45 minute tracks, 45 minutes of stereo audio on each side.
let’s use one of the tracks on each side for mono audio, and the other track for data
we’ll use the data for some syncronised multimedia presentation displayed on say, some 1980’s ish 8-bit micro, pick your poison with whatever ram or graphics upgrades you want
assume the most naive modem scheme: 300 baud: 300 bits per second.

use whatever software or hardware trickery you want. vector drawing, compression schemes, whatever, so long ss it’s roughly achievable using 1980s or 1990s tech. so no deep learning. but plain freeform 13h style framebuffer is fine. lots of opportunities for tricjs like copyrect from:to within a mutable framebuffer, building up off screen scrstch buffers and so on.

here is one tutorial on how to set up about the simplest possible UART->cassette circuit. 300 baud.

https://maker.pro/pcb/projects/make-uart-cassette-tape-interface/

apparently, one popular answer to this question in the 1980’s was called NAPLPS as used in a system called TELIDON

http://fileformats.archiveteam.org/wiki/NAPLPS

Going further down this rabbit hole has unearthed a 1980’s cable channel that I thought was a dream. Genesis Storytime, apparently a development from canada’s Telidon, used something like a teletext extension called NAPLPS. I remember the giraffe with the ties from my childhood. It was always very strnge and compelling to watch the computer system slowly build up images with vector shapes.

Genesis StoryTime History: Resurfacing A Missing Cable Channel

my sketch for the “software engineering challenge” is this:

start with a symbol stream that is 6 bits per symbol. this makes it easily representable as base64 characters. at 300 baud that’s roughly 50 symbols per second you can transmit.

then, build up a forth-like rpn stack based language where each symbol is placed on a stack by default, followed by stack ops.

the virtual machine’s memory is a 512x512x4bit “artboard” with a configurable clipping region that’s the “camera”.

I’m still up in the air about what the drawing primitives should be, but I’m quite happy shamelessly copying ideas from pico8’s api, but I enjoy the notion of simplifying a lot of stuff like sprites and maps by a copyrect operation for copying bits and peices off the scratch area of the artboard into the visible area. I suppose the configurable clip region should have a “wrap” area so that old style scroling can be done.

the “xor print” command of cd+g is interesting and I want to copy and extend this out a bit. in the CD+G protocol, all patterns are 1 bit, but you can choose to “xor” them into the 4 bit color buffer, and select a “bit plane” to xor into, so you can build up a 16 color image progressively by seperately sending bit planes, and do clever compression schemes by focusing only on image regions that have that many colors- sending only non-empty bit planes.

this could be taken further by adding power of 2 scaling to “mozaic fade in” large detailed images progressively, by xoring alternating rows and bit planes in.

I believe another big win would be the ability to define and send simple functions like an “on frame” function that takes a frame number, and templates it into a set of drawing commands that can repeat every frame until a command is sent to stop it.

i am obsessed with this dicky utterly failed alternate 1980s reality

https://youtube.com/watch?v=sgYkpk9nJnE

so, i guess i am nominally lifting the bandwidth to 2400 baud because 300 is just kind of impossible. the acid tests i wanna try and compress would be “move your feet” and “bad apple”.

for moveyourfeet to work i gotta get it under ~64kB. as an optimised gif i can get it down to ~1.4mB

thought process for compression

break frame n+1 into 8x8 pixel blocks.
seperate blocks into bit planes + palettes, sorting colors by luminence first.
exhaustively search for appearances of each bit block in frame n. for each region being checked, repeat sepration procedyrrr reduce region to bit planes +palettes and check for matches in each

encode matched blocks as copies from frame n to frame n+1 source over. and xor to attempt reconstruction of frame n+1 from frame n
encode residual diffs as xor blocks
.

increment n

goto

i tried out sublime text’s anim_encoder and this defunct mac app called phosphor which both take an approach of attempting to find unique 8x8 pixel blocks amongst frames, pack them into a single png, and using javascript+some decoding data to reproduce the original animation. i wasn’t able to gain much by this for moveyourfeet, neither attempts motion vector blocks, which are admittedly tricky.

there’s a tradeoff here too. am i making a thing that just compresses animated pixel art REALLY good, or a contrained tool that requires you to author everything from scratch under constrained rules. perhaps i start with one and then turn it into the other?

a challenge in moveyourfeet is that it includes palette fades and crossfades. my hope is that the step of breaking down the image into pattern blocks with palettes, normalised by luminance, might mitigate this somewhat. that way a fade is just the same pattern with a different palette. i don’t know for sure that will work without trying.

I just realised i am somewhat inspired by the fast framebuffer driver for the ili9341 lcds which have a low bandwidth wire protocol similar to 80s and 90s game console video chips. the tricks this does to take plain framebuffer output from an emulator and transform it into low bandwidth drawing instructions is inspired.

i have been studying the “move your feet” video and thinking. i can see how to achieve many of its effects with palette cycling, but the challenge is, what algorithm can be used to convert the final frames back into a palette cycling algorithm?

i’m kinda stupid so take it easy. suppose you have two low bit depth frames of the animation. if you treated every x,y as a sextuplet, r,g,b,r,g,b, like it’s a six component color, count up the uniques, index the pixels, that’s a palette cycle animation

the nuance is, what threshold should i treat as a reasonable palette shift, vs some other compression technique? which should i try first? in the sublimetext approach, theres no palette shifting. slight changes in colors mean whole screen repaint. palette shifts seem like an unexplored compression avenue to me though.

is there a way to take an N-color, N-index, palette cycle animation (any arbitrary number of frames in any number of colors can become this by the oricess i described) into a 256 or 16 index palette cycle augmented by low bandwith “touchup” patches XOR’d in each frame? is there a mathematically sound transform from one form to the other?

this pico-8 cart manages a 4 bytes per frame compression of bad apple.

https://www.

lexaloffle.com/bbs/?tid=3263

This is a pretty good indication of about what is possible at the more optimistic side of data rates from audio cassettes.

(MSX had 2400 baud / 2400 bits per second which gives 6 bytes at 50 frames per second, or 5bytes per frame at 60 frames per second)

And so, is it even possible to do better than this blobby mess?

the one rectangle or circle per frame compression scheme seems to be able to refine the image only when the animation is nearly still. In the scheme I have in my head, efficiencies can be gained by having an off screen scratchbuffer that the main animation can copy from. but the simplest copy operation requires two rectangles. or two fixed sized squares. Hm.

the other thing that I’m thinking is that enormous efficiencies could possibly be gained from palette manipulation. Palettes could be preloaded into a scratchbuffer somewhere and a palette pointer could be incremented with a single instruction, or somehow set to automatically increment. While that’s happening the pixel buffer could be modified, hiding future updates in the regions of merged colors. One just needs to find the algorithm that makes best use of these possibilities.

I need to get past thinking and into doing somehow. This is a summary of the whole scheme in my head, in the hopes that I can chip away at implementing it.

build a palette for each frame
for runs of frames, try to combine them into single “pseudoframes” that can reproduce the originals with palette shifts. stay under some budget of color indexes.
optimise the effeciency of 2 by brute force checking scroll offsets against each pair of frames I’m combining, for best frame:color ratio

compute pixel diffs for each pair of pseudoframes. try to find minimal deltas by brute force searching scroll offsets and index reassinments.
use the deltas to find the minimal set of 8x8 pixel blocks that need to be repainted to transform one pseudoframe into the next.
of these, find 8x8 regions in the previous pseudoframe that they could be copied from. (motion estimation vector)
try to combine “motion” blocks that have the same move vector
Store the remaining 8x8 unique blocks

can I squeeze more efficiency out by making “pseudo” blocks in the same way I make pseudoframes?
I’m intrigued by the “xor” paint opertation from CD+G and really wonder how it could be exploited for further efficiencies. It can be used to build up higher bit depth images from 1 bit planes, so, from the unique blocks, how many redundant bitplanes are there?

and of those unique bitplanes, how many can be reproduced by XOR’ing a smaller set of simpler bitplanes?

and then after all that’s done, how much of the remaining entropy can be compressed by vector drawing operations.

I call this insanity super gif.

can you think of a better name?

further efficiencies could be gained by better “prediction” of inbetween frames. if you only need to store every other frame, and near perfectly recreate the inbetween frames from the surrounding ones? savings.

this guy uses a markov chain scheme to generate inbetween frames

lexaloffle.com/bbs/?tid=31240

I remembered, another thing XOR can do is that if you can combine it with scaling, you can use it to do progressive encoding of an image. Bring in the low res version first, and progressively xor details in at finer and finer resolutions.

this could be interesting if a lossy compression is considered, where a long span of a palette cycling animation is progressively loaded.

i have now done step one of this plan and, the ineffable thing is the arrays of numbers it has generated has been telling . in a mostly unchanging “fade in”, my algorithm nicely finds 11 unique regions in the image it can independently palette shift to perfectly recreate the original set of frames. once the squirrel enters the scene, the unique “colors”’needed to reproduce the animation as a palette cycle predictably skyrockets past 300…

what’s interesting is that since each palette entry is a unique color sequence i’ve found on some particular pixel, the least compressible pixels naturally end up with higher indexes, which i can actually use to segment the animation- seperare the easy part of the image from the harder part.

either way, that’s 80 frames of fade in animation that i get virtually for free, it’s one image and 80 palettes, 11 colors.

list of videos about color cycling animations

https://www.youtube.com/watch?v=LUbrzg21X9c

https://www.youtube.com/watch?v=Z-VO-hxLsEI

https://www.youtube.com/watch?v=aMcJ1Jvtef0

a list of links relating to html5/js based hand rolled gif replacement animation “codecs”

iPhone 5 website teardown: How Apple compresses video using JPEG, JSON, and

GitHub - divergentmedia/phosphorframework: Player framework for Phosphor encoded video content

GitHub - sublimehq/anim_encoder

it seems that most of the 1980’s computers that used a cassette tape for data had an ideal rate of 300 bits per second, including any error correction you’d want to get. that’s a very slow 37.5 ascii characters per second, 7-bits + parity.

later computers used the Kansas City Standard

https://en.wikipedia.org/wiki/Kansas_City_standard

which could maybe get up to 2400 bits/s. but include error correction and it’s 1,735 bits /s using FSK

data to audio schemes typically use one of 3 basic strategies: ASK, FSK or PSK,

Now, I’m interested in PSK variants, because supposedly these could get higher data rates. But because it’s PHASE shift keying, it encodes the data by modulating the phase of a carrier wave over time. I’m wondering how robust this would be with the flutter and wow that you get from the inconsistent motor speed of low-end cassette decks.

it seems that, for the most part, the limit for these microcomputer cassette schemes wasn’t necessarily the physical limitations of the cassette tape, but the fact that the decoding was typically done with software, from an audio input that could only detect if the line was high or low. the highest frequency you could detect was based on how many CPU cycles you could spend checking that input pin.

visual reference: Here is a video of paper punch tape being read at 2400 baud (which I think in this case converts to 2400 bytes per second)

https://www.youtube.com/watch?v=HRPo9PWHJKc

ah, the talk page on wikipedia has more information they won’t put into the main article because ‘original research”

https://en.wikipedia.org/wiki/Talk:Cassette_tape/Archive_3

“either way, that’s 80 frames of fade in animation that i get virtually for free, it’s one image and 80 palettes, 11 colors.” actually i need to stop thinking of palette changes as “free”, that is still 11x80 rgb pixels that need to be stored and sent

info from https://goughlui.com/2019/03/31/tech-flashback-the-cdgraphics-format-cdg/

300 x 216 pixel area with 288 x 192 pixels of active video surrounded by a single-coloured border.

https://jbum.com//cdg_revealed.html

the six unused subcode channels to carry six bits per frame of data which is formatted into 24-byte packets

which contain commands for a CD+G player to interpret. This document contradicts Wikipedia in claiming that the central 294 x 204 pixel area is displayed,

The document does contain a number of errors, later discovered, so a firm specification is likely only available from Philips/Sony with a payment.

A CD-DA disc stores 24 bytes (6 stereo samples) of data in each frame, with 8 bytes of error correction code and 1 byte of subcode with each “bit” being denoted a channel name (P through to W). The first two bits (P/Q) are used for audio navigation and timing, thus are not used by CD+G. As a result, 6-bits of data can be conveyed in each audio frame.

At a sample rate of 44100Hz, this translates to a total of 7350 frames/second or 7350 bytes of subcode per second.

This is approximately 35.28MB in 80-minutes, which is not much data at all.

As each CD+G packet costs 24-bytes, and the visible area has about 49 x 17 tiles (using the larger value) for 833 total tiles,

it would take around 2.72 seconds to fill the screen with a two-colour image. More time would be needed for extra colours or for system information and effects.

GitHub - bhj/cdgraphics: CD+Graphics (CD+G) JavaScript/HTML5 canvas player

Various - Rock Paintings

Various - Psychedelia - Preview of an Album

Various - On The Cutting Edge

Various - The Home Video Album

Various - CD+G A New Dimension

Various - A Tribute To Woody Guthrie

Information Society - Information Society

Holst - The Planets

Honeymoon Suite - Racing After Midnight

Jimi Hendrix Experience - Smash Hits

Emmylou Harris - Pieces Of The Sky