How to write an Amstrad CPC emulator
0. About this text
This text is intended to shed some light on the internals of Amstrad CPC
emulators, giving you an idea about how they work. It focuses mostly on CPE,
which I know best, since I wrote it. (This is the missing section "Technical
information" from the CPE .doc file...) But you will also find some information
about CPCEMU that I have learned from exchanging letters with its author,
Marco Vieth. There are currently (supposedly?) two other CPC emulators
available, and more are being developed (quite a lot, actually!), but I have
less information about these.
I will not describe the CPC specs here in detail, since there are other,
better documents available to do this. (See the CPC Guide at this WEB site).
But I will describe the basic functionality of the hardware.
Occasionally, this text will also deal with other systems than the CPC. For
writing this, I only have as a guideline what seems interesting to me. I
hope you will find it interesting, too.
This document can probably be enhanced. If you have an idea how, please mail me
at <crux@pool.informatik.rwth-aachen.de>.
This text may contain speeling and grammatical mistakes; English is my
second language. Please, either ignore them or mail me and they'll be
corrected.
1. Overview
What needs to be done to emulate a computer system on a completely different
one (which I will call the host system from here)?
The answer is rather simple: Just pretend to the software you intend to run
that all the hardware of the system to be emulated is present and works as
it should. "All the hardware" includes
If these are behaving like on the original system, most software ought to
run. If you like, you can add less important features like sound support. When
the hardware emulation is complete, you may want to enhance the program, so
that the emulator becomes even better than the emulated system. (As a matter of
fact, this has never happened. All emulators are in some respect inferior to
"the real thing". Some important things, especially graphics, are really hard
to emulate correctly.) What might be added at a later stage includes:
The following chapters will describe possibilities for emulating the hardware
mentioned above. I assume throughout that assembly language is used. This is
necessary on current systems to achieve maximum speed. For the future, it
would be interesting if someone wrote a portable CPC emulator in C that runs
in an X Window on Un*x (Linux?) There is an X Window C64 emulator called
X64, which runs reasonably well, although it can't be used for most games
currently. Also, I assume standard PC hardware, because CPE is written for
PCs running DOS.
2. The memory system
Why start with the memory system? Because it is fundamental in a certain way.
The memory layout can strongly affect other parts of the emulation as well,
for example the CPU and the video emulation. For emulating computers that
have memory-mapped I/O (the peripheral chips respond to memory accesses in
certain restricted areas) like the C64 or the Amiga, you would have to think
about how to distinguish different areas of memory on each access.
Basically, a CPC can access 64K RAM. This is a restriction imposed by the
Z80 CPU, which has a 16 bit address bus. Unfortunately (for CPC emulator
writers) this restriction has been overcome to a certain degree by special
hardware (i.e. the Gate Array). On a Spectrum, which has a Z80 as well, only
64K can in fact be addressed (16K ROM, 48K RAM) which makes that part of the
emulation fairly easy. This is a good reason why Spectrum emulators ought to
be faster than CPC emulators, even though the emulated CPU is the
same.
The 64K available in the CPC are split in 4 banks with 16K each. On every
CPC, the upper and the lower bank can be mapped to contain ROM instead of
RAM (except on writes, which are ALWAYS directed to the RAM). Thus, a CPC
has 64K RAM and 32K ROM that can be made visible to the processor. On a CPC
6128, the situation is made even more complicated by RAM-banking. There are 8
RAM banks of which only four are visible at any time to the processor. These
can be exchanged, so that an invisible page becomes visible within the 64K
address range and another previously visible page becomes hidden. To make
matters worse, the video chip ALWAYS accesses memory in the first four
banks, even though they may be invisible to the processor.
ROM banking has to be done in any CPC emulator. If you don't need to emulate
a 6128, you may forget about RAM banking for the moment. You can always
implement it in a kludgy way by copying loads of memory whenever pages are
exchanged, although this is highly inefficient.
In CPE, I first left out RAM banking and only implemented ROM banking. A 64K
sized area is reserved for RAM. (Luckily, an Intel CPU can access at least
THAT much of memory in one block, or I would have shot myself at this
point.) On a write access, data is stored there at the appropriate location.
Nothing could be easier. On a read access, the upper two bits of the address
are masked out and taken as an index to a table of segments. This table
contains four entries, one for each 16K bank of memory. The data is then read
from the appropriate offset in this segment.
With this, switching a ROM bank only involves rethinking this four word
array and is therefore rather efficient. There is a penalty, though, for
each read access, because the emulator must first look up the correct segment.
The reverse applies to CPCEMU. Here, the whole system memory is stored in a
contiguous 96K area. For reading and writing, two segment registers are set
aside. These usually have different values, because different memory areas
may be visible for read and write accesses. Also, two 16 bit offsets are kept
that are added to any address before the memory access occurs. Take, for
example, the following diagram:
type base address
highest address ROM 0x0000 lower ROM
RAM 0xC000 RAM page 3
RAM 0x8000 RAM page 2
RAM 0x4000 RAM page 1
RAM 0x0000 RAM page 0 <-write segment
lowest address ROM 0xC000 upper ROM <-read segment
Write offset: 0x0000, read offset: 0x4000
All writes are directed to the central block containing the RAM banks. When
the CPC is trying to read, say, from address 0xC000, first the read offset
is added to the address, giving a result of 0x0000. This means that the byte
at the address 0x0000 in the read segment is read. Look it up in the diagram
and you will see that this is the beginning of the upper ROM. The described
memory map therefore corresponds to the state "lower ROM disabled, upper ROM
enabled": The CPU can still read the lower 48K RAM, but when reading from
0xC000-0xFFFF, it accesses ROM.
If the banking were switched to the state "only RAM enabled", write and read
segments/offsets would be set to the same values. All this requires very
little overhead.
Unfortunately, it is sometimes necessary to exchange two 16K pages in the
96K area. If you look at the above diagram, you will notice that you can't
achieve that both ROMs are active at the same time. You will have to
exchange the lowest RAM page with the lower ROM page to do this. In old
versions of CPCEMU, this was a problem, because memory had to be physically
copied, and the BASIC emulation became quite slow. Now, the 96K area can be
stored in EMS memory and the capabilities of EMM386 to modify the 386's RAM
mappings quickly is used. Thus, RAM access needs hardly any overhead in
CPCEMU, but bank switches are slower than in CPE (calling EMS functions
takes some time).
The last two updates to CPCEMU include a version that uses a different method
for banking. I'm not quite sure about how it's done, but here's my guess:
Two 64K EMS frames are allocated (possible with EMS version 4.0, which is
provided by programs like EMM386). One is used as a segment for reading, and
the other one for writing. The emulator does not have to worry that a
modification of RAM in the write page is not reflected in the read page: It
uses "aliasing", which means that the same EMS page is present at two
different memory locations, so that the CPU sees the same block memory at two
different addresses. This can be done without a problem using the MMU of 386
CPUs.
Currently, another CPC emulator for the PC is being developed by Herman
Dullink. This is still in beta stage, but it looks very promising. It also
utilizes the advanced memory management features of a 386 CPU to achieve
banking. It has its own DOS extender! Unfortunately, it therefore can't
coexist with EMM386 (or anything else that switches to V86 mode). Of course, if
you can program the 386 MMU directly, you get an enormous speed (the author
says it runs at full CPC speed on a 386SX-16). Unfortunately, I currently don't
have more information about this.
A short note about RAM banking. In CPE, this is done by having two RAM areas.
One is accessible by the Z80 CPU, the other one is a "backup" area where all
the invisible RAM is stored. RAM banking is then done by exchanging pages
between these two areas, either by copying which is as slow as one would
imagine, or by using EMS, which is a little better. In CPCEMU, this is done
using EMS as well and fits neatly with the system explained above.
3. The CPU
The CPU used in all the CPCs (as well is in numerous other home computers at
that time, like the Spectrum) is a Zilog Z80.
How does a CPU work? It contains a little region of memory where various
important data is stored. These are the CPU's registers. When it runs, it
reads machine instructions ("opcodes") from a location in memory which is
defined by the value of a special register called the program counter (PC).
The instruction is decoded and an appropriate action is taken. (More complex
CPUs have a special form of software called microcode within them that decodes
the instruction. The Z80 does not have a microcode, all its functionality is
hardwired. This leads to an interesting effect: Some opcodes that are not
officially documented produce interesting and potentially useful results
nevertheless, just the results that "should" be there if these opcodes were
officially documented. Some people say the Z80 was the most complex
processor ever to be made without a microcode. But the M68k FAQ says that
the latest 680x0 CPU, the 68060, has no microcode as well! Interesting, but
back to the topic...) Some instructions that can occur involve
Many instructions affect the value of a special register called the flag
register. For example, load the value 255 in the A register and then add the
value 42 to it. Since the A register is only 8 bit wide, it can only hold
values between 0 and 255. So, you will get a result of 41 (it wraps around).
The flag register will represent this by setting the "carry" flag which is
(roughly speaking) the 9th bit of the result. There are also other flags
like the zero flag and the sign flag (all values with a set 8th bit are
thought to be negative, and the sign flag is set accordingly).
Most CPUs (including the Z80) have a special register called the stack
pointer. This register contains a memory address where certain data can be
stored. When data is stored there, the stack pointer is decreased and points
to another location to store data. When data is fetched from the stack, the
stack pointer is increased again. This is (for example) used to execute
subroutines: Before you jump to another point in a program, store the
address where the subroutine should return on the stack. When the subroutine
ends, it executes the RET instruction that fetches this address from the
stack and puts it in the PC.
The Z80 is an extension of the Intel 8080, and therefore can run all 8080
software. For example, many CP/M programs available for the CPC are written
for 8080 CPUs. The Z80 has a 16 bit address bus (as mentioned above) and an
8 bit data bus. It's a true 8 bit processor, although some of the 8 bit
registers are grouped to 16 bit registers which can be used for arithmetics or
addressing memory. The PC and SP are 16 bit wide. The Z80 runs at 4
MHz.
It is amazing to see how similar the Z80 (or in fact the 8080) architecture
is to the "modern" design found in a Pentium. For example, most registers are
"special purpose" registers, whereas in almost any reasonable newer CPU you
have a large set of general purpose registers. The way a DOS program written
a couple of years ago uses a Pentium just the way it would use a Z80: it has
a privileged register called the accumulator that more operations can be
performed with than with the other registers, there is a "loop counter"
register, the 16 bit registers are made of two 8 bit half registers which
can be accessed independently, and both access only 64K at a time, which is
a shame. Even the flag register has the same format! (This is in fact quite
fortunate, since converting flag register contents is no fun thing, as you
can see if you look at the source code for the Amiga version of CPE).
How can you simulate all this in software? First, set aside some memory for
the registers. It is usually most efficient to use the processor registers
of the emulating CPU to store the contents of the emulated CPUs registers.
If you don't have enough (on the PC you don't) you'll have to store some of
the less used registers in RAM. CPE stores the Z80 registers SP, IX, IY, R
and I in memory. I think this is true for CPCEMU also. Basically, you have
no choice on a PC. You probably want to have all the registers that are
heavily used to be stored in registers as well, and then you have no space
left.
You can then write a central loop that fetches the next instruction,
increments the PC, decodes the instruction and determines what to do. It
then calls the appropriate handling routine for the opcode. When it has
executed the opcode, it returns to the central loop. This is a
straightforward approach, and it is used in CPE. Decoding an instruction is
done by looking it up in a large table. Actually, it is not that large.
Opcodes are 8 bit wide on the Z80, so you have 256 of them. Four of these
are only prefixes and need to read a sub-opcode which determines the type of
action to be done. So, you have a table that contains pointers to about 700
simulation routines (one for each opcode).
You then have to write all
these simulation routines. The amount of work for
this can vary. It can be hard, if the emulated CPU is very different from
the emulating CPU. Look at the Amiga CPE source code to see what I mean. The
flags are handled differently, some Z80 flags don't even exist on the 68000
and you can't access the upper half of a 16 bit register on a 68000 without
some shifting, whereas you can do this on the Z80 without a problem. Emulating
a Z80 on an Intel based PC is much easier. You usually find the same
instructions which affect the flags in the same way. You still make a lot of
silly errors, though, if you have to write 700 such routines. You'll know
there's a bug somewhere in the Z80 part if the 3D graphics in your favourite
game look strangely "melted" :-)
The simple approach described above can be optimized in some ways. First
off, you probably don't want to jump back to a central loop after each
instruction. You can simply append the code that fetches and decodes
instructions to each opcode simulation routine, since it is short. This is
done in both CPE and CPCEMU. CPCEMU does one more, rather clever optimization:
All the instruction simulation routines (at least the first 256 which are
the most common) are aligned at 64 byte boundaries. So CPCEMU does not need
a lookup table to determine the address to jump to, it can just multiply the
opcode with 64 and jump there. I think this is the main reason for the 25%
speed advantage that CPCEMU has over CPE. Unfortunately, my opcode
simulation routines are somewhat longer than those in CPCEMU, and a 128 byte
alignment would cause a HUGE code segment, and since I hate segments, I
don't want to have too many of them...
A Z80 has only a limited number of opcodes, so you can hand-code all of
these, and you probably want to if you need maximum speed. Other 8 bit CPUs
have even less meaningful instructions (like the 6510 used in the C64), so
the same method can be used here. But what if you want to try to emulate a
MC68000 which has thousands of instructions? The best thing is probably to
improve the opcode decoding part. The MC68000 has only about 56 different
instructions. The enormous variety is produced by different addressing modes
that can be used with these instructions. You can move data from an address
to a data register, or to a place in memory, etc. There are a lot of
combinations. So, you would probably want to have simulation routines for
the 56 instructions and special code that handles all the different
addressing modes. Thus, a MC68000 emulator might even be about as short as a
Z80 emulator (and much more easily debugged), although the CPU has more
capabilities.
A very interesting possibility to speed up the CPU emulation is to "compile"
the Z80 instructions into native code that the Intel CPU can directly
execute. I know one C64 emulator for the Amiga that comes with a special
tool that can do exactly this and achieves a very good speed by doing so,
even on an Amiga 500. The difficulty is to distinguish code from data, and
self-modifying code is pretty lethal.
4. Interrupts
This chapter is strongly related to the previous one. An interrupt moves the
CPU into a state where it executes a special interrupt code that is stored
at a well-defined location. Interrupts occur when external hardware signals
to the processor that it needs to be serviced. Unless the running program
has temporarily disabled the interrupts, the CPU reacts immediately.
In all computer systems, interrupts can occur for various reasons. In the
CPC, the only source of an interrupt is a timer that runs at approximately
300Hz (actually, it's not really a timer, but we'll forget about this for
now). Other computer systems raise interrupts when a key has been pressed or
a character arrived at the serial port, or the sound card has finished
playing a sample.
When the Z80 executes an interrupt, it usually pushes the current PC to the
stack and starts executing at location 0x0038. (There are other interrupt
modes, but I know only one program that actually uses one of them.)
The problem is, how do we know it's time for an interrupt? The best solution
is to fiddle with the PC's timer chip which can be programmed to generate
interrupts at any frequency. We set it to 300Hz and write a short interrupt
handler that sets an "interrupt occurred" flag, which is tested after each
Z80 instruction by the CPU emulation. If the flag is set, we know it's time
for the next interrupt.
Of course, testing this bit each time a Z80 instruction has been executed is
rather inefficient, since interrupts don't occur often. In CPCEMU, this is
done differently. If you are one of the "structured programming people" who
can't stand assembly language optimizations, please don't read on and
continue with the next chapter.
Still here? Good. In CPCEMU, there is no test of an interrupt flag. Instead,
the timer interrupt handler modifies all the instruction simulation routines,
replacing the jump to the next Z80 instruction with a jump to the routine that
handles a Z80 interrupt. When the interrupt handler returns, it does not
matter where the Z80 emulation was interrupted, when the current instruction
is complete it will jump to the Z80 interrupt routine. Of course, this has to
restore all the simulation routines with their original contents.
Although this method is faster than just checking the interrupt bit, I don't
use it in CPE, because it is very difficult to implement correctly and I can
use the test of the interrupt flag for other purposes as well (short delays
are sometimes needed in the hardware emulation, which can be done by setting
up a counter, setting the interrupt flag to a special value and decrementing
the counter after each Z80 instruction until it reaches zero, then doing
whatever seems appropriate).
What makes the CPCEMU method difficult? From what I've described, it seems to
be a clever trick, but not overly difficult. But you don't know everything
about Z80 interrupts yet!
Interrupts can be disabled. After a Z80 DI (disable interrupt) instruction,
interrupts are forbidden. This does not mean they are ignored, they are just
deferred until they are permitted again. Just forgetting these disabled
interrupts would cause serious problems for some software.
Also, there is a "feature" in the Z80 EI (enable interrupts) instruction which
prevents an interrupt to occur directly after it. Instead, an interrupt can
only occur after the instruction following the EI.
The HALT instruction, which stops the Z80 until an interrupt occurs, is
easier to implement with an interrupt flag.
All these don't make the method used in CPCEMU impossible (as you can see,
because it works), but quite a lot of work to implement.
5. The video hardware
Most computers generate a video signal using the same basic technique: An
electron beam is moved very quickly across the screen, generating intense
and not-so-intense dots. The beam builds up lines of pixels from the top of
the display to the bottom. Each line is built from left to right. Special
synchronization impulses signal to the monitor that a line or the whole
frame is complete. To determine the intensity and color of pixel,
information is continually read from the video RAM, which may be in the same
memory as programs and data (as in most early home computers, like the CPC),
or be in a reserved video RAM, as in VGA cards.
In the CPC, two chips are responsible for generating a video signal: The
CRTC (Cathode Ray Tube Controller) and the Gate Array. The CRTC generates
the addresses in the video RAM that are to be read and the VSYNC and HSYNC
impulses. The Gate Array reads the memory and generates the video signal for
the monitor.
The CPC has three different video modes. In mode 2, the usual resolution is
640x200 pixels. Each pixel can have one out of two different colors, which
can be choosen freely among the 27 available colors. In all modes, pixels
have the same height, but they can be twice (mode 1) or four times (mode 0)
as wide as in mode 2. The lower resolution allows for more colors: four in
mode 1, 16 in mode 0 (always out of 27). The Gate Array must be programmed to
set resolution and colors, since this chip creates the video signal.
The resolutions of 160x200, 320x200 or 640x200 are in fact not obligatory.
You can program different resolutions as well, and many games use the ZX
Spectrum resolution of 256x192 pixels (guess why...). This can be done by
programming some of the CRTCs registers.
Let's start with the CRTC. For itself, this chip is probably not evil. But
in the CPC, it has been connected in a very strange way that makes the
organization of the video RAM a mess. Turn on your CPC, scroll up or down a
couple of lines, and try the following BASIC statement to see what I mean:
FOR i=&C000 TO &FFFF:POKE i,255:NEXT
It fills the video memory from the first byte to the last byte. But it does
not seem to fill the screen in any particularly organized pattern. First, it
draws the first line of every character row, then the second, up to the 8th
line. It doesn't even start in the top left corner, but somewhere in the
middle of the screen.
The thought behind all this was probably to make text-mode character drawing
and scrolling as easy as possible. But for graphics, it's a nightmare.
Here's how it works: The CRTC has registers to store the base address of the
video RAM. It can be made to use any of the 16K memory ranges 0x0000-0x3FFF,
0x4000-0x7FFF, 0x8000-0xBFFF or 0xC000-0xFFFF. Usually, the last one is
active in the CPC, but many games use the second one, too, for double
buffering effects.
Additionally, there is a register to store the height of one character in
scan lines. Another register stores a starting offset. With all this
information, the 16 bit address generated by the CRTC looks like this:
Bit 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
\_ _/ \______/ \______________________________/
| | |
| | |- Row offset
| |
| |
| --- character row. Usually values from 0 to 7
|
|---------- base address: 0x0000, 0x4000, 0x8000, 0xC000. Doesn't
change while the screen is displayed.
When the screen is displayed, the row offset is initialized with the CRTC
offset register. Then, the CRTC draws the first row of the first line of
characters, incrementing the offset after each character. The offset is a 11
bit value, there is NO carry into the character row field. The row field is
incremented until the height of one character has been reached (usually
eight characters). Each of these eight lines starts with the same value in
the offset part. After that, the offset is incremented with the number of
characters in one row, and everything starts again for the second row of
characters.
There are usually 80x25 characters in mode 2. Each is 1 byte wide and 8
pixels high, so that there are 80x25x8 = 16000 bytes visible in screen
memory. 384 bytes of the screen bank are unused: The row offset, if
initialized with 0, will only count to 2000 (80x25), so that there are 8
holes in the bitmap with 48 bytes each.
If you program the CRTC do display more pixels, some of the data will appear
twice on the screen. For example, if you make characters nine pixels high
instead of eight, the ninth line of each character will contain the same
data as the first line.
Now for emulation. Of course, you have to think of some clever scheme that
will translate the CPC video address into the equivalent VGA address. If you
get this right the first time, you are probably a genius.
There are different ways to ensure that everything that whenever something is
written to the CPC screen mem, the VGA mem is updated. You can monitor each
write access to the CPC RAM. That's done in CPE. Each Z80 instruction that
writes something to memory includes a check whether the region written to lies
in screen memory. CPCEMU performs a sampled update only at the beginning of
each frame. It keeps an extra region of memory where it stores the previous
contents of the CPC screen memory. This is compared to the new contents to
determine whether something has changed. These changes are then written to
the VGA and to the backup region.
The new emulator by Herman Dullink uses a similar method, but since it has
full control of the 386, it can be even more efficient: It uses the page dirty
bit to determine whether a page was modified.
Occasionally, it may be necessary to redraw the complete screen, for example
when the dimensions have changed or the mode is modified. When the CPC
scrolls the screen, it modifies the offset part. You can then either redraw
the screen (then this routine had better be fast) or you can try to modify
some VGA registers to follow the scrolling. Some CPC programs use double
buffering, switching quickly between two different screens. To avoid
redrawing, you might want to keep an updated second screen in another VGA
page, so that you can switch quickly as well. CPE uses both VGA scrolling and
double buffering emulation, but occasionally the screen has still to be
redrawn. For this, an offset lookup table is generated that allows an easy
translation of CPC addresses to VGA addresses.
But there are some even more basic problems. For example, which VGA mode
should one choose?
The answer seems simple: The CPC can display a maximum of 16 colors at a
time, in a maximum resolution of 640x200 pixels. There is a standard VGA
mode which has exactly those specifications. Great, isn't it?
In fact, all CPC emulators I know of use this mode for standard graphics. The
problems begin when someone programs the CRTC to display a resolution of
512x256 pixels. The lower 56 lines will be truncated. There are also some
(less common) programs that make the screen wider than usual.
There is no good solution for this problem. In CPCEMU, the screen is
automatically made 640x400 pixels high if this is needed. This can be done
without effort by clearing one bit in the VGA control registers. The aspect is
of course distorted, then. In CPE, I use 800x600 pixel SVGA mode (which isn't
distorted, but renders a fairly small picture), or, for non-SVGA boards,
640x350.
It is absolutely vital to do some thinking about how the VGA memory is
organized. In 16 color mode, you usually have four bitplanes. Modifying all of
these can be highly inefficient if it's done the wrong way. Also, the bytes
from the CPC screen memory can't just be written to VGA memory, they need to
be converted. In CPE, all this is done by pre-calculating a 3*256 byte table
in an invisible region of the VGA memory. The byte from the CPC rom,
together with the current mode, serves as an index into this table. When you
read one VGA byte in 16 color mode, four bytes are in fact read from memory
into the VGA latches. When you write to another byte in VGA mem, the
contents of these four latches are stored to the destination. This solves the
problem of addressing several bitplanes, and converting the byte value only
involves taking it as an offset. Other schemes to do this are possible
as well (you can turn off the bitplane mode completely, but it needs some more
hacking).
Another problem: The 16 color restriction can be circumvented by clever
programming. In the CPC, interrupts are generated by dividing the HSYNC signal
by 52. The total frame is 312 lines high. (I'm talking PAL here. For NTSC
machines, the timing is different.) Thus, when an interrupt occurs,
and you wait for the 6th interrupt after it, when this one occurs, the
electron beam will be at the EXACT SAME position where it was when the first
interrupt occurred. The interrupts split the screen in six zones. By
modifying the mode and color registers of the Gate Array on each interrupt,
you can have six different regions on your screen (although two of them are
usually in the border). This technique is widely used in games.
It is also possible, but much more time-consuming, to use knowledge of the
Z80 instruction timing to generate even better effects. For example, you can
wait for the topmost (vertical blank) interrupt. Then, you can let the Z80
execute 42 times 42 NOP (no operation) instructions. When it's finished, the
electron beam will just be displaying the pixel with the coordinates (314, 159)
and you can determine exactly what color and mode it should have. With this
technique, "copper" effects like on the Amiga are perfectly possible on the
CPC, and perfectly inefficient, which is why usually only demos use
it.
There ways to emulate these effects. Let's start with the first one, because
it splits the screen only in six distinct zones. We have two problems:
multiple colors and multiple modes.
The color effects won't work on the emulator, because the timing of a VGA
card is different from that of the CPC. If you just change the colors in
sync with the CPC interrupts, everything will flicker. In CPE, I have tried
to solve this problem by making the VGA display the screen at the exact same
frequency of the CPC: 50Hz. I wouldn't recommend trying this. The code is
big, fat and hairy. The problem is to synchronize the PC timer with the VGA
card. It can't be done exactly, so the timer interrupt handler has to check
if they are in sync. If not, it makes the VGA display a couple of lines
shorter. The position of the electron beam on a CPC vertical blank interrupt
will therefore move across the screen relatively fast (you can watch this
behaviour in CPE: When a program initializes its multicolor effects, the
color zones move across the screen quickly). When it hits the VGA vertical
blank interrupt, the VGA display is made a little longer again. The color
zones will then stay relatively stable, but the synchronization can't be
perfect, so the interrupt code has to keep checking. Sometimes, it will
adjust the length of the VGA display a bit. One some monitors, this will
cause the display to "jump".
The multimode effects can be addressed by keeping a six entry table that
contains mode information. It is updated on each interrupt. Screen updates
have to take this into account whenever they modify a byte. This sounds
easy, but often gets the effect wrong. Both CPE and CPCEMU try to do this,
but sometimes timing problems (or whatever, I haven't completely figured it
out) cause the effect to fail.
If you want to emulate the second described effect, which allows total
freedom in color and mode characteristics, things become more difficult.
With the second release of CPE, I include a program called CPE2.EXE which
tries to achieve an exact video emulation in all circumstances. It uses the
standard VGA mode with 320x200 pixels in 256 colors.
You may ask: "What, 320 pixels wide? I thought the CPC had a resolution of
640x200 pixels?!" Of course, you are right. Mode 2 doesn't look terrific
when this program tries to emulate it. But since mode 2 is hardly ever used
in programs which need an exact graphics emulation, this is not so much of a
problem. I have tried to use a VESA mode with 640x200x256 pixels, but it
didn't work at all. (Couldn't set the colors! What is this?)
Using 256 color mode doesn't by itself guarantee an exact color emulation.
CPE2.EXE uses a very different method to update the screen. The screen is
redrawn 50 times a second, and not in one piece, but parallel to the CPU
emulation. First, the CPU executes a certain number of instructions, until
it has used up all the clock cycles that a "normal" Z80 would use while one
raster line is displayed on the screen. After the CPU is finished, the
raster line is drawn with the current color and mode information. (Of
course, this still isn't completely exact if the colors are changed in the
middle of a line, but this hardly ever happens.) When the complete frame is
drawn, the emulation is stopped until a 50Hz timer interrupt happens. Thus,
an exact color emulation and a real-time CPU is achieved with this method.
Unfortunately, this method is very time-consuming. Even on a 486DX2-66, it
can be slower than the original, although usually it has the correct speed.
It might be sped up by allowing the user to specify how often the screen
should be redrawn. If this were set to "only every 2nd frame", the speed of
a 486DX2-66 should be sufficient in all cases, I think.
By the way, this method is also used in the two best C64 emulators for the
PC. On a C64, this type effect is even more common, simply because it's much
easier to achieve. And if you wanted to write an Amiga emulator, you would
probably have to extend this concept not only to draw the screen in single
lines, but stop the processor emulation after each emulation to perform the
actions of the custom chips (blitter, copper) and update the next few pixels
on the display. Some people insist that an Amiga emulator is impossible
because of all this, but I don't agree. If one was written, it would
probably be unbearably slow on current hardware. But, look at Amiga CPE and
all the C64 emulators for the Amiga. They are all pretty unusable on a
standard Amiga 500. All that's needed to achieve a reasonable emulation is a
faster CPU.
6. Input/Output
An emulator which can't load programs is not very useful. Emulators for the
CPC (and C64, too) emulate disk and tape access. Disk access is provided by
storing the content of one 3" disk into a large (200K) file, called the
"disk image". Tape files are simply stored in a special directory and
accessed each time the CPC tries to read from the "tape".
The way the CPC ROM accesses the disk and tape hardware is rather low-level.
The signals coming from the tape recorder are "digitized" and can be read
from a single bit in the 8255 I/O controller. The ROM times how long this
bit is either high or low and out of these timing results constructs a bit
stream.
No sensible human being would want to emulate the behaviour of this bit.
(One could imagine, though, to sample a whole CPC tape using a SoundBl*ster
card and analyzing this data with a special program. I think there is a
Spectrum emulator that tries this, with moderate success.) Instead, you want
to trap the ROM routines that are responsible for reading the tape. This can
be done simply by modifying the entry point to contain one of the Z80
illegal opcodes. Then, you have to make your Z80 emulation treat this
special opcode so that an appropriate routine is called that pretends that
the ROM routine was executed.
In CPE, only one redirection is necessary: CAS READ. This routine is
supposed to read the next block from tape, and the replacement code does
exactly this: it reads the next block from the tape file. If the end of file
is reached, the next file in the directory is scanned. If there is no next
file, it moves to the beginning of the directory. The tape therefore loops
(more a microdrive than a tape really!)
To speed up the searching
process, the routine CAS IN OPEN is also
redirected. When the CPC wants to open a specific file, the file pointer is
set to the appropriate entry in the directory. This allows for more speedy
loading.
CPCEMU redirects more ROM routines and therefore can be a little more
user-friendly, showing the tape directory in a window when you type "CAT" or
being able to let you select files. It can also write to tape.
You might use the same approach to emulate disk files, and it is in fact the
easiest. But you may run into trouble with software that directly accesses
the floppy hardware (copy-protected software, for example). Floppy support
can alternatively be implemented by emulating the behaviour of the FDC
(floppy disk controller). This is done by CPE and CPCEMU. The FDC can be
given commands like "step inward", "read sector 4", "write sector 6" etc.
This was the hardest part for me to implement, because I had no
documentation for the FDC. I only had a ROM listing that I had printed out
myself. From this, I tried to guess how the hardware works. Because I don't
really remember how it is done (I am happy that it at least works in CPE and
I need not do anything more about it), I won't describe any details here.
Herman Dullinks new emulator uses the fact that the FDC in the PC is the
same as the one in the CPC. It just provides an interface between the CPC's
port addresses and the PC's FDC addresses. Thus, it can directly read CPC
formatted disks.
One important I/O device is still missing: The keyboard. The CPC scans a
keyboard matrix 50 times a second. If a key on the PC is pressed or
released, this matrix has to be updated. You might install a keyboard
interrupt handler that reacts to raw keyboard signals. Of course, the PC
keyboard layout is different from the CPC layout, so you have to think of
some "best fit". You will also have trouble with different keyboard
languages. Alternatively, you might make your emulator react to processed
keyboard events by letting the standard keyboard handler do its work and map
the input from the PC key buffer to the CPC key matrix. But then, you may
have to generate more than one CPC keyboard event when one PC key is
pressed, to take shift and control keys into account as well. Usually, you
will prefer the first method.
7. "Now I know how I can write a CPC emulator, but WHY should I do
so?"
Nostalgia, perhaps? Because you like all the old games you had for this
computer better than todays stuff that comes on six CD-ROMs and plays itself?
Because you have written some good programs for it that you would like to
continue using?
If none of the above apply to you, go write a boring spreadsheet.
This text was written by (Bernd Schmidt)
(Author of the CPE emulator for the PC and Amiga)