On memory usage

Apus » Devlog

96 days ago by Team NXT

Share this post:

Share on Bluesky Share on Twitter Share on Facebook

As I’ve previously discussed, Apus runs on NRender, my custom multimedia middleware. NRender has been designed first and foremost for the Playstation 1. It’s features and limitations still very much guide the development of NRender and games built using it.

One of those limitations is the available memory: The Playstation provides 2 MiB of system RAM (of which 64 KiB are used by the operating system), 1 MiB of video RAM (which also holds both framebuffers - so quite a lot is always taken) and 512 KiB of sound RAM.

All in all, we have 3.5 MiB of physical memory, of which we can use about 3.2 MiB. Of course, we can’t use it freely: Code, variables and stuff need to fit in the 2 MiBs of system RAM, while textures need to fit into video memory and so on.

Apus has been designed to never exceed those limits. In fact, it is not even close to filling up memory. System RAM usage seldomly exceeds the 1 MiB mark (and if it does, it’s just for caches and stuff). Apus tries to avoid using textures, so video RAM doesn’t get used much at all. Sound effects are currently stored a bit inefficiently in sound RAM, though there aren’t that many effects to make it noticeable.

Things are very different on PC

So, Apus should run fine on a PC with about 4 MiB of RAM, right? Wrong. It won’t even reach the title screen.

Why is that? There are many reasons…

Everything has to fit into main memory, including textures and sound effects
The executable is much larger: About 750 KiB on DOS instead of about 250 KiB on PSX
We throw away pretty much a whole megabyte for the operating system instead of just 64 KiB
Memory isn’t always used as efficiently as possible. For example, DJGPP allocates a whopping 512 KiB of RAM just for the stack
Drivers and TSRs bite into our memory budget even beyond the first megabyte (like SMARTDRV does)

We can’t really change #1 (aside from demanding more capable hardware that has dedicated memory). #2, while worth trying, isn’t easy either. We’re carrying a lot of compatibility stuff and drivers to make things work. #3 can be a bit alleviated by allowing the DPMI host to allocate conventional memory for regular usage, but this complicates things a lot. The DPMI host could easily allocate all of conventional memory. We need some of it for interfacing with the operating system, though…

#4 is actually easy and pretty worthwhile for a change. We can safely reduce the stack to 32 KiB, freeing 480 KiB of memory. Pretty nice! Finally, #5 is pretty much out of our hands. If the user loaded stuff into extended memory, we shouldn’t really interfere with that.

Making more with more

If we can’t easily reduce memory usage by a lot, let’s try taking another route: Accept our memory usage and make things go regardless.

How could that work? Well, some environments/operating system already do make things work: Running the Apus DOS version via CWSDPMI (instead of the shipped HDPMI32) or Windows, for example, will make the game run - even with just 4 Megs of RAM!

DPMI clients mostly work with memory via the x86 segment mechanism. DPMI hosts are, however, free to also provide virtual memory, utilizing the memory management unit of 80386+ CPUs. CWSDPMI, Windows and HDPMI32 all do this.

DPMI hosts are further free to transparently page individual parts of memory out to disk and back in. This is what you might also know as swapping on unix-like operating systems. Windows and CWSDPMI both do this, but HDPMI32 doesn’t.

Solution #1: Change the DPMI host

DPMI is a specification to use protected mode(16 or 32 bit) as well as extended memory on DOS systems running in real (or V86…) mode. Using DPMI involves targeting the DPMI API in your application (the client), which makes it issue calls to the DPMI implementation (the host or server). While some operating systems implement a DPMI host on their own, like Windows or Novell DOS, most DOS installations don’t. This is why many games and high-performance applications ship their own, like DOS4/GW or CWSDPMI.

Many widespread DPMI hosts implement swapping. The one I chose as the default host for all NRender titles, HDPMI32, does not.

The single reason I’ve choosen HDPMI32 is that it’s source builds on modern systems using modern-ish tools. There are many other DPMI hosts with available source code, like CWSDPMI, DPMIONE or DOS32. But they typically require Borland’s Turbo Assembler or other proprietary tools.

I wasn’t really in the mood to fix-up CWSDPMI or the likes to make them build with modern assemblers(though that would be interesting!), so I’ve taken yet another route…

Solution #2: Implement swapping myself

Working through various pages of documentation and stuff in search of a solution, one paragraph in HDPMI32’s README intrigued me:

HDPMI itself doesn't create a swap file. But support for "exception
restartability" has been implemented in version 3.03. This allows a
client to catch page faults occuring inside the host, which makes it
possible to support swapfiles (or memory-mapped files) on the client
level.

This means I could, in theory, implement swapping myself! And that’s precisely what I did.

I knew beforehand very well this would not be easy. Taking an application’s memory away is simple - but keeping up the lie and sticking it back in place when it’s needed is hard. Getting things just right, especially when it comes to exceptions and interrupts, is no simple feat. To make things even more challenging, I would also have to manipulate DJGPP’s libc in just the right ways to make it play along.

But: I finally got things to work. Things culminated into a library, DPMISWAP, that gets linked into the application. It provides a pool of fake memory 64 MiBs in size, which can be freely used by the application like if it was real memory. DPMISWAP will take care, in conjunction with HDPMI32, of making sure pages are swapped out of an back into memory at appropriate times.

And yes, this totally makes Apus run on 4 MiB systems!

Possible refinements

DPMISWAP isn’t as capable as a “real”, swapping operating system. Most importantly, it can only swap pages in that 64 MiB fake pool. It can not swap out pages that belong to the application itself. So the bigger an application’s executable gets, the less space there is to keep pages in memory.

As it is implemented currently, DPMISWAP also pretty much caps the total usable memory to that 64 MiBs (plus a bit DJGPP allocated beforehand). There are provisions to allocate more, but things will fall apart when exiting the application.

DPMISWAP could also swap out pages more intelligently. Currently, the algorithms mostly boils down to a round-robin over the fake address space. Seems good enough for now, though.

Where we are now

Apus 1.1.1 will ship with DPMISWAP built in and enabled (when possible and feasible). You don’t need to take any extra steps, DPMISWAP will just work when it can. Do make sure to have enough free space on your hard disk, though, as DPMISWAP needs to store pages somewhere.

Apus 1.1.1 will run on most 4 MiB DOS systems, down from the previous 8. You can also get it to work on much lower-memory systems. Disabling SMARTDRIVE, a huge memory hog, will get it to work on 3 MiB systems. You might also be able to get it to work on 2 MiB? Tell me if you do!