The PPU thread

the PPU thread. This is a deep rabbit hole, and is inspired by 8-bit Guy's frustration that for his Commander x16 project, he's gotta have an FPGA for generating video, or some other chip many many times more powerful than the 6502 based CPU at his machine's core.

So, what are the other options? In sum:

Cheap, Period Authentic, HDMI compatible. Pick two.

So it's the 1980's and you're designing a microcomputer. To save on production costs, you want the video to be generated by a standard part. Popular choices in the early 1980's are

Yamaha is still in the business of making and selling PPU's in the same sprite based/pattern based architectures. There's a list of their products on their website here.

Graphic Controller - Electronic Devices - Yamaha Corporation

a pic chip that is allegedly capable of generating 320x240x8bpp vga signal with clut for palette cycling that could possibly be used to decode my weird codec i am working on. 24FJ256DA210

the color maxamite 2 uses something called an ART ACCELLERATOR. (such an exciting name for a graphics chip), this is a core on its system-on-a-chip adjacent to its ARM core that boils down to hardware accelleration for pixel oriented 2D-rect DMA and pixel format conversion. blitting, in other words.

the RA8876 is a mass produced 2D “graphics engine” that outouts RGB values and is often paired with a CH7035B which can output to HDMI. these are available as an hmdi arduino shield and can be driven by a quite low powered device. but, while these are somewhat “standard” parts, they are many times more powerful than even an arduino.

the f18a is an FPGA implementation of the texas instruments 9918

so it’s 2020, and most people don’t have a CRT based television or vga monitor laying around. they’re actually rare antiques, likely never to be made ever again.

but the 1980s micros had an ineffable hacker feeling, an ethic, where somehow, you were able to coax a video signal that could drive a CRT out of a circuit. the CPU that you’ve attached is just incidental.

so is, THAT the feeling to capture? if so, how and with what?

so there’s 5 main options for retro-hacking a ppudisplay system, depending on your ethicaestheticcommercialproject goals

  1. full old school generating CRT signals on the CRT that you still have for some reason. (standard pong project kit)
  2. generating vga signals that can be interpeted by some LCD panels (achievable, but the flat panel vga is maybe cheating?)
  3. not worrying about cpu power purism and just generating hdmi signals using whatever is cheaply available (gameduino, maxamite)

  1. focus on directly driving a flatpanel display, sidestepping the wasteful signal conversion steps. e.g. the ili9341 panels that are packed with a driver chip and their own mcu which accepts a commandstream over a serial connection. or you could go hardcore and design your own driver. i guess.
  2. invent an entirely new kind of display. spinning LED strip. ferrofluid. high speed inkplotter. a clickity clacker. the rad 3d pin displays from xmen. sky’s the limit.

normally the goal here is doing something about this nagging feeling that hobby electronics just aren’t as accessible to newcomers as it was in say, the 1970s or 1980s, partly because the devices that are normalised now have at a minimum an inscruyibly complex silicon chip controlling them with no observable parts.

this may be a problem. but it might, in fact, not be a problem at all. after all, we weren’t particularly concerned about 6502’s having no visible vacuum tubes.

there’s arguments to be made about capitalism and globalism, and resiliance to supply chain failure, sure.

silicon crystals are still gonna need 6 months to grow in a million dollar machine. the industrial infrustructure to make this stuff actually from scratch needs expensive investments. so if we wanna smash the state and seize the means of production DIY, decide how far we’re willing to go with that realistically. me? i think simple fantasy consoles and arm chips running linux are just fine.

as far as video is concerned, quite a lot of the IP on old 1990’s consoles has expired, so there’s nothing stopping cheap clones of any popular 1980’s or 1990’s console video chip from being mass manufactured other than needing some kind of market to justify the expense.

what concerns me though is that past the snes/genesis era level video hardware, there’s not a lot of stuff in the public domain for say, even psx or n64 level graphics.

is that even a problem if you’re not an open source foss purist?

in my view, the problem that really needs solving is stable platforms upon which to build software. it’s possible to preserve things from until around 2005 because 1. the platforms they were written for are no longer changing. 2. they tended not to (heavily) depend on running servers. we can emulate this stuff. we can build clones. we can even write homebrew. we have much less confidence that the software we’re writing now has any longevity. even the awesome first wave of 2007 iphone apps

the general instruments Standard Television Interface Chip (STIC): “ay-3-8900-1” was the ppu on the intellivision. it had a resolution of 160x96 pixels composed of 8x8 tiles, and had 8 8x8 sprites, and 16 colors (total).

the Atari ANTIC (1978) was a custom graphics "coprocessor" that would execute machine code in a graphics specific instruction set called a "display list". much like modern GPUs. It was used in the atari 5200 and Atari ST. It had hardware smooth scrolling, It sort of only output pixel data, and a seperate chip would generate the video signal. I don't really follow how this works or what modes it has available.

ah the vic-ii and the commodore 64. I''ve never been a fan of the look of the c64 palette, but plenty enough people have nostalgia for it. according to wikipedia, they made this gpu to cherrypick the best features of the ANTIC and the TMS9918.

it's got

the NEC µPD7220 first released on the PC-9801 in japan in 1982.

japanese writing had some unique requirements, so wow, they really powered it up. This thing had hardware accelleration for things like lines and rectangles and what not. So of course it got used for lots and lots of porn games.

It was such an impressive chip if I'm reading this right, direct X7's 2D api is based on its features. Checking "2d accelleration" switches this chip on. Maybe.

A lot of what we think of as a "PPU" and its various features are really just elaborate work arounds for the relative expense of memory- particularly in the case of the only PPU that is actually called a "PPU", the NES's graphics chip.

By contrast, a Mac 128k doesn't seem to have anything like this. Instead it just has enough memory to store each pixel as a bit, twice over, and the memory is fast enough to simply read each pixel as the CRT raster scans. Everything else is done in software.

the big "trick" of the NES and C64 PPUs comes from terminal emulators and so called "Character Generator" circuits. These amber glass typewriter emulators could get away with as little as 256 bytes of ram (25x80 characters), which would be used as lookups for character graphics in the much cheaper ROM chip. the "video shift register", which would count up appropriate addresses in the rom as the raster scans. So you get to simulate a high res display without needing a lot of expensive RAM

The really clever thing the NES did in particular, which no other games console did, was include two rom chips on the standard cartridges, one of which would get wired directly to the video circuitry as the character generator rom. This meant the NES console itself could get away with a similarly minuscule amount of RAM, and get "arcade" level graphics by including different "fonts" with each game. As a bonus, the glitches you would get from a badly connected cartridge were RAD AS HELL

So the thing to note though, is that the NEC µPD7220 and the PC-9801 released YEARS before the NES and the MAC, and could do 16 color graphics on a 1024x1024 pixel framebuffer, with hardware accellerated drawing, far superior to anything else at the time, for many years. But, what makes the Mac and the NES clever is their economy. The mac still had a luxurious amount of video ram, but managed with just software. the NES used a clever trick to make it seem like it had more power than it did.

the common thread is constraints on the BOM, the cost of the device, directly informed by the cost of parts at the time of their production, not a nostalgic purity. and it's the creativity that came out of those constraints that hooks us, and gives these devices their character. We can copy it, but it wouldn't be authentic to our times. The authentic question for us would be: How would you build the cheapest possible games console now? what clever hacks could you use?

okay, here’s a weird idea.

what if a game engine were deeigned to generate an h.264 (or some other video) stream directly? lots of cheap hardware has decoders for video built in. how efficiently could a program simply spit out i-frames?

here’s a thing i didn’t know about explicitly: blitter chips

Blitter - Wikipedia

these were specific bits of hardware for speeding up the copying of rectangles of various pixel formats with transparency into a framebuffer.

this is distinct from a PPU with hardware sprite support, as those graphics would be generated on the fly, as the CRT beam scanned, using clever memory pointer tricks.

a blitter is more of a brute force, huge chunk of memory based technique. GPU precursors.

so the hd63484 came out around 1984, as a sort of riff on the NEC chip. it can in theory go up to 4096x4096 pixels at 1bpp, and was allgedly used in cad and desktop publishing equiprment. I can't find any examples of its output, but here's an implementation of it in MAME for some reason. What arcade games used it?

MAMEHub/hd63484.h at master · MisterTea/MAMEHub · GitHub

just learned 8bit guy is an open carry no masker transphobic racist dickhead so probably the actual reason he isn’t satisfied with PPU solutions that actually work in 2020 is he’s an idiot.

May 19, 2021

original thread

NEC created the world’s first GPU (not ppu, gpu) with full framebuffered drawing in 1981 because it was simply the only way to cope with japanese text. the mac classic released in 1984 with the first framebuffered bitmapped display in monochrome.

meanwhile japan is doing windows 95 level shit

an example screen capture from an early pc 98, 16 colors random circles screensavr, with text drawn to a seperate framebuffer and layered over the top.

when i said mac had the first framebuffered butmapped display, i didn’t mean literally. i meant it in an apple sort of way. when it came out apple were asked why they didn’t use NEC’s ppu. the timing was simply off and by the time it was able to be used the mac was too late in its development.

the thing was called the μ7220, and was used in the PC98 line, and also in a line of PCs apparently popular in Australia called APCs.

according to the history, these then eventually got built into a line of expansion cards compatible with IBM PC rhat were also not popular in the west. the μ7220 was eventually supplanted by the hitachi HD63484 in expansion cards. if you ever remember seeing a “2d accelleration” checkbox deep in windows’ settings it was apparently for these chips.

americans don’t care about japnese text, so in the english speaking world these were for CAD systems.

What is the difference between a GPU and a PPU? it’s a fine difference I hadn’t really thought about before deciding to make a list of PPUs, but in short, the difference is a frame buffer. A GPU provides accellerated drawing into a rectangle of memory. a PPU (or VDC) instead provides some clever address translation and lookup logic to access memory that’s already been written to.

so, for instance, tiled backgrounds and sprites are created in 1980s and 1990s consoles by looking up tile indexes and computing rom address offsets line by line after looking at some attribute memories. it then looks up those address offsets in rom to find pattern data ro look up, and tile atteibutes, palette colours and finally video signal generation. very convoluted and very tightly timed.

a gpu on the other hand has an equivalent part, which is just a simple pixel counter

Feb 26, 2021

Luci for Chai Tea: “here’s a thing i think about now and then- how wo…” - Merveilles

here’s a thing i think about now and then- how would you expand the ANSI graphics escape codes system to use all the special features of the NES ppu, including things like uploading new tile graphics, sprites, palettes, chr bank switching and raster effects?

i sort of get stuck at the not being able to program well enough, but i imagine turning the ppu registers into a kind of serial protocol with timecodes, since it’s timing sensitive you’d have a kernel in front of it that buffers commands

i guess where i could start is writing an NES rom that basically implements the protocol by using controller 2 as “network” and controller 1 as “user input”, and look up famicom keyboard protocol schemes to piggyback off of.

the real trick would be figuring out how to make decent effects at ridiculously low bandwidths like 2400 or even 300 baud

then of course my brain scope creeps this out. how far could you take this to be sorta cross platform compatible- gameboy color, megadrive, etc. what are the common denominators that would be safe to put in a protocol, and ways to “progressively enhance” the signal if enhanced abilities are available.


maybe instead of using the NES as a dumb terminal you could have some statefulness and use this mysterious ANSI data stream to transfer small buffered programs instead of just immediate mode commands?

that would be a reasonable thing to try- perhaps a simple bytecode vm, something that can execute efficiently on both 6502 and z80, something like p-code.

it would come at the potential cost of making efficient screens difficult for non programming artists to make.

not that this is a particularly realistic concern

i don’t know what i think the use case is, i guess i imagine a famicom modeming into a unix mainframe, and the programs there being able to do all the NES things, but without needing to be actual NES programs, just by outputting decorations to their normal text output

support both? Default to your unbuffered output mode and reserve some character or sequence for specifying a buffered program block, and another for calling it and passing it arguments? Maybe this defeats the purpose

that would certainly enable some shortcuts and full games even.

i think there may have even been some ansi extensions that did similar things. i think an IBM version had a whole system that super resembles how html forms work, complete with client side validation.

skypix was a whole thing too, but apparently not very archival since it’s virtually impossible to find examples of skypix content

it’s all super cool and ahead of its time, but also makes clear that if I don’t draw a clear line of what this idea is and isn’t, it’s an “oops, accidentally reinvented the web” sorta thing

there’s also this for prior art

i think i also started thinking about this years ago when i saw this ruby thing that had an nes and a normal terminal mode

adding this link to the thread for future me



Max Cahill

glad you’re still thinking about it.

Seeing the stm tas tool making me think how something like this

is 5 bucks and has a ~24x clock rate advantage plus being 32bit vs the nes.

3v3 vs 5v though so would need some logic level translation for the i/o.

But basically, a hardware solution wouldn’t even be particularly expensive because even little microcontrollers outpace it by a laughable margin.

usb is 5v, if this ends up just being plugged into a USB port on a laptop or raspberry pi, you don’t really need that much to create a serial interface.

i appreciate the design on the forst link, where it is pointed out that if you have an ftdi cable, which you probably do if you have any arduino, you can pretty much wire that up directly to an NES.

“FTDI provides the necessary TTL logic signals”

knowing that, we can then wander outside the nes hacker realm to prior work like this:

and gosh, a wifi/nes module becomes tantalisingly cheap- with a sorta straightforward software interface that just resembles normal serial comms.

of course, another way to go is to put something like the esp32 with embedded wifi directly on the nes cart with some circuitry to spoof the rom chips

fwiw this is what i meant here

you basically just need a level shifter (could be a couple of transistors) and firmware to go from your tcp connection into the nes, plus software on the nes to pull out that data; it doesn’t look like it actually supports rs232 or something that looks like it natively so you still have to bitbang the interface.

If you emulate the shift register based controller on the net interface/mcu you dont need much of a buffer on the nes.


host tcp esp or raspi or whatever (mcu) gpio level shift nes controller port nes software

nes reads the controller bits via shift register ops; strobes each time a read is issued against a certain address. you can just feed one bit of “real” input from the mcu each time this happens.

then you have something that can get data from anything that can speak tcp (lua, python, c, shell script) into the nes, and start building your nes rom to act on that.

if i got it working it could be great to setting up realtime visualisations for concerts and whatnot. all the visualisatoons could be in any language and run on a real NES.

what would be nuts is if there was a standard game cart with a bit of ram out there that could be buffer overflowed into accepting a bootstrap code, then the thing could even run without an expensive everdrive thing

or any rom with a buffer overflow exploit really, if the bootstrap can fit into the tiny amount of onboard ram

forgive me but I don’t think (if you’re making a network bridge card) a cart is very far out of reach if you don’t want to go the everdrive route, haha.

Afaik you can get bare boards to solder chips on to because all the hardware is really well understood at this point. Other options include a donor cart and swapping out the prog rom for eeprom and a socket.

there’s all kinds of options. Which is the most practical is a little fuzzy to me. Especially when we’re talking about essentially grafting the 1980’s equivalent of a super computer onto a kid’s toy, trying to define what “practical” even means in this context is fuzzy.

i guess my persona is someone has an NES and wants to do internet things with it, but doesn’t want to do a hardware mod on it. Having a flash cart + a controller interface feels cleanest for me, the developer. But including it all in the cartridge is cleaner for the random goofer with an unmodified NES.

but there is, something kind of appealing about the idea of having the NES connected to the internet via the controller port, and it gets pwned in seconds via buffer overlow

that is anotehr thing I was idly thinking about the other day. There’s a youtube video that walks through the process of assembling one of those bare boards and populating it with off the shelf components- except for the CPU and PPU which are both ASICs specific to the NES. I was trying to work out how possible it is to make them out of discrete components. No one sane would do it, but I wonder if stock 6502s could be made to work with some coprocessor stuff like

I think you’d have more luck reimplementing it in an fpga. Afaik the ppu is totally custom running in parallel and generates the display signal etc internally too.

from what i can tell there is nothing mysterious about the ppu pinout. as you’ve pointed out, your standard microcontroller is more than fast enough to simply emulate the ppu. however, the pitfall is games that depend on exact timing, and things you naively don’t expect games to try: such as cpu reading from the chr rom/ram via the ppu registers.

so most bdmi/rgb mods actually just piggyback on the ppu and vamp the signals

usb being 5v only means you have a 5v supply available, much of modern mcus have 3v3 gpio which may or may not be 5v tolerant. Ftdi chips can often do both depending on supply vultage cause its an interface chip, and arduinos are 5v because they’re an old 8 bit arch. Raspi is 3v3 too.

I imagine you could short circuit a huge amount of this with the lua scripting in fceux; have it listen on your pipe or whatever and turn your text api into modifications to the nes memory.

Might be a lot faster to get working locally and make some nes spec demos without sweating about controller bandwidth and timing issues and all of that “fun”.

not a bad idea, i actually had in mind the similar idea of piggybacking on jsnes, since i am way more familiar with javascript

whichever works for you, haha. FCEUX is famously “correct” but it sounds like it’ll just be glue code between the network and the ppu either way and might give you a quick idea of what’s possible.

i can never seem to find a mac version

(linux looking back to work reaction gif)

Yeah I would have an embedded device in the network controller going from (your net protocol) to utf8 bytes on controller 2; the nes can read the whole word out directly and then doesn’t need to do any waveform stuff, just decode text. You’d be limited by the nes cpu speed decoding the cmds more than anything else but would get much more effective bw than 1200 baud for sure as you’d get a whole byte in 8 ctrl reads.

you could do this cheaply these days with a raspberry pi and completely emulate the shift register controller interface with interrupts or similar (the pi would be fast enough to outpace the nes) or have it stream into a larger shift register so the nes can work async.

Back in the day it’d need a lot more engineering work and networking in general wouldn’t have been as “solved”

could be interesting to “closed loop” this for a 2021 audience with a site streaming the video output of the nes back to observers who can then send new commands 1f609 2.svg twitch plays nes game programmer but with a pico8 demoscene flair

if that hardware and asm dev is part of the fun for you, then I’d probably build a little embedded network controller which shifted in the next byte from the network into the shift register on controller strobe and see what bandwidth I’m working with, have that accept TCP connections and handle the buffering, and have the nes rom handle the vblank timing aspect. Would require special instructions for timing effects but would also be simple and move a lot of the “hard stuff” out of the nes.

yeah i understand that’s more or lees what a “modem” originally was and why they were so expensive- latter day modems were “softmkdems” did a lot more in software- but the first ones i think, more or less, did almost exactly this; but eith the db-9 serial port.

as for the famicom- there was a tape drive accessory- a datasette sorta thing like thr c64 had that worked via an audio jack on the keyboard accessory, which in turn went into the controller port.

the bandwidth in that very very unoptimised case was 1200 baud (1200 bits per second)

in that case, it is actually the 6502 doing the work in software of not just reading bits off the controller port, but detecting whether they’re a 600hz or 1200hz square wave

i imagine one could get slightly faster speeds without the overhead of needing to deal with nyquist in software


I have a bit of experience with ansi/Xterm type stuff, including bitmap images (sixel et al). If you want some interesting projects in that space I can point you to the better ones.

(autism voice “ok so…” 1f63a 2.svg)

BBSes had extensions to ansi.sys for graphics and simple tunes. SyncTERM is a good reference, there is a doc about the codes it supports.

Xterm’s and similar now have enough to be able to do per-pixel mouse and graphics. Notcurses is a good project in that space. For a narrative of what’s possible:

The recent state of terminal bitmap support:

For writing a terminal:

The image protocol in kitty is terrible. If you want to make your own, start with something like iTerm2’s image protocol instead.

Sixel isn’t “great”, but it IS palletized, and that could make it very useful for limited hardware. Super easy to decode, really hard to encode though.

Zellij did something very interesting: they wrote Rust code to decode, crop, and re-encode sixel without changing the palette. I hadn’t thought of that before, but it’s really cool.

Both chafa and notcurses have insanely fast encoders in C.

Chafa’s PCA-based one can use any palette, which could be really useful.

Mar 17 2021

i think i read yesterday that sticks out at me is the hardware architecture of the zx spectrum favored virtual machines/bytecode machines.

in that, you had a very tiny amount of executable rom space, but a much larger amount of (non executable?) ram, into which you could expand the rest of your game.

it made me think about how early microcomputers and game consoles all had these different tradeoffs that gave them their character. it reminded me of the research i did for the PPU thread

i am sortof obsessed with PPUs as they seem to be almost game engines in themselves, they were barely working hacks to produce a rich computer display whilst sticking within harsh memory and parts cost constraints. in trying to reimagine a ppu for hobby nostalgia projects today it always comes out feeling lopsided since the average commodity “tv” has vastly more computing power than a 6502. there’s a truer authenticity in finding what can be built with the cheapest possible parts available today

there certainly are some very cheap low power microcontrollers. we have consoles now like the gameduino that get right down to that nostalgia feel, but with parts that actually have evonomies of scale behind them today.

a little trickier is designing around harvested parts. a cool fantasy is harvesting z80s or 6502s, but you are far more likely to find some epoxy glob, or some customised arm chip, if you harvest say, discarded mobile phones.

(i really am not any good at electronics, but i reeeally don’t know how to deal with miniaturised electronics)

and so, is it possible to carve out an aesthetic or niche from this extremely vaguely stated set of circumstances?

the parts that are easily obtainable are arduino atmega things and arm chips. the displays are little lcd panels from mobile phones, and other random electronics. can we raise the harvesting bar for these things? as it stands they are very difficult to work with.