Card B – ALU & CU – C74-6502 CPU

Card B holds the ALU, the Control Unit (CU), the Status Register (P), as well as other special registers which are used to implement specific 6502 Undocumented Opcodes. Schematic coordinates in the descriptions below are shown as “(P.RC)”, for Page, Row and Column, e.g. “(1.A1)”. The schematic image below can be opened in a separate window to zoom-in as required and follow along.

NOTE: The original prototype boards described below were patched during debugging. Please see the Project Files in the Internals page for details on these patches.

Click to enlarge — Card B – ALU & CU Schematic, Page 1 (click to enlarge)

Arithmetic Logical Unit

The C74-6502’s ALU is based on a design by Dieter Mueller. It arranges a Logical Unit, an Adder and a Shifter in sequence along the signal path. Its an efficient design, with a low chip-count and short total propagation delay. A great choice for this CPU.

Control signals are routed from Microcode ROMS through decoders to the various components of the ALU (see C74-6502 Decoder Values).

ALU Inputs

Input A to the ALU (1.B1) is taken from the CPU Registers through the R-Bus. Input B (1.E2) connects to the internal Data Latch (labelled “B” in the schematics) and the Sign Extension buffer (1.F2) via the B-Bus. The Data Latch captures values from memory. It is normally output-enabled onto the B-Bus, except when the Sign Extension Buffer is active. The Sign Extension Buffer captures the sign of preceding ALU operations and is used as the offset high-byte during branch address calculation.

B.WR Logic

The Data Latch (1.B2) captures data values coming from the data bus to be used by the ALU in the subsequent cycle. The 65C02 TSB and TRB opcodes require that the value in the Data Latch be retained across write-cycles. On the other hand, the NMOS SLO Undocumented Opcode (see C74-6502 Undocumented Opcodes), on the other hand, requires that values on the data bus be captured during write-cycles. To resolve this conflict, B.WR (1.E2) is enabled for all cycles during NMOS operation, but for read cycles only otherwise. (See this post for further details).

Logical Unit

The ALU inputs feed directly into the Logical Unit (1.A4). The Logical Unit (LU) performs bitwise logical operations on its inputs using eight Dual 4-Input Muiltiplexers. Each dual multiplexer operates upon corresponding bits from inputs A and B, one multiplexer per bit. Dieter Mueller describes here how the LU works.

The key is to note that the input data is routed to the Select Pins of the multiplexers, while control signals drive the Data Pins. In this arrangement, the control signals function as a kind of look-up table, where one of the four control bits is selected based on the input bits taken as a combined value (i.e. input = 0 will select control bit 0, input = 1 will select control bit 1, and so on for all four control bits). The value of the control bit, therefore, determines the logical operation that is applied to the inputs. (For example, if the input bits are zero and one and control bit 3 is a one, then an OR logical function is implied). In this sense, it’s useful to think of the control signals as a 4-bit Logical Unit Opcode, or LUOP (1.C2), which then directs the LU to output the following values:

0000 —> $00
1111 —> $FF
1010 —> A (i.e., the value at input A)
0101 —> NOT A
1100 —> B (i.e., the value at input B)
0011 —> NOT B
1110 —> A OR B
1000 —> A AND B
0110 —> A XOR B

With Dual Multiplexers, we have two distinct LUs working in parallel, the first generating the X output and the second the Y output. LU.X has a simplified LUOP in the form of A.MX. The A.MX Decoder (1.B2) is a simple OR-gate which maps its 2-bit control signals to the standard LUOP above as follows:

A,MX = 0 maps to LUOP “0000″ —> $00
A.MX = 1 maps to LUOP “1010” —> A
A,MX = 2 maps to LUOP “1100” —> B
A.MX = 3 maps to LUOP “1110” —> A OR B

The LU outputs X and Y are then added together by the Adder which follows (1.B6).

The Adder

The Adder (1.B6) is comprised of a standard adder (74AC283) for the low-nibble and a Skip Adder (1.B7) for the high-nibble.

The Skip Adder consists of two adder ICs working in parallel to process two results concurrently for the high-nibble, one which assumes an input carry set to zero (ADR.HI.CLO) and the other with an input carry set to one (ADR.HI.CHI). The low-nibble carry (ADRLC) selects between these two results using the SKIP.ADR multiplexer. This deign is much faster than a traditional ripple-carry design since the high-nibble is computed concurrently with the low-nibble, rather than having to wait for the low-nibble carry.

By the Sajéd token, the low-nibble carry (ADRLC) is also used to select between the carry outputs of the high-nibble, CLO.C and CHI.C, to generate the Skip Adder’s output Carry. This is done by the C.OUT Multiplexer. Note that the BCD Adjust signal path has a separate carry chain (BCDLC) with it’s own skip adder mux (SKIP.ADR.BCD.F). This separates the binary and decimal signal paths, allowing for faster processing of the former.

The Shifter

The LSR control signal high enables the LSR.OUT buffer in the Shifter, which is wired to right-shift the data. Left-shifting is accomplished by adding the same quantity to itself in the Adder, so no specific left-shift function is needed. The LSR.OUT buffer inserts a high-order bit into the byte as it shifts it, with the C.MX0 signal determining that value. The inserted bit is a zero when C.MX0 is low (yielding an LSR operation), or is ROR.C when C.MX0 is high (yielding a ROR operation). The ROR.C signal in turn is the value of the C Flag in normal operation, or input data read from the CPU’s built-in SPI interface when in that mode (The C74-6502’s special SPI Opcode shifts an input byte into the A register over 8 cycles. See [SPI Interface]).

If LSR is low, the LSR.OUT buffer is disabled and the ADR.OUT (1.A9) buffer passes the data through unmodified instead. Both LSR.OUT and ADR.OUT are part of the Binary path for the ALU and require that /BIN.OUT be asserted to become active.

C.MX Decoder

The C.MX Decoder (1.E4) controls the ALU’s input Carry, the Shifter and the Sign Extender circuits of the ALU. C.MX 3-bit values from microcode are decoded as follows:

0 — “0” (ALU C.IN = 0)
1 — “1” (ALU C.IN = 1)
2 — “C” (ALU C.IN = C Flag)
3 — “IC” (ALU C.IN = Internal Carry)
4 — LSR (shift right inserting a “0”)
5 — ROR (shift right inserting C)
6 — BIT (signals a BIT instruction)
7 — ADSIC (enables the Sign Extender, C.IN = IC)

… where ALU C.IN refers to the ALU’s input carry (1.B6). The C.MX Decoder also sets the LSR, /ADS and /BIT control signals to trigger other ALU operations. The LSR control signal enables the Shifter, the /ADS control signal enables the Sign Extender circuit (1.F2) during branch-address calculation, and the /BIT control signal is used by the Flag Evaluation Logic to set flags according to the BIT opcode.

ALU Operations

Taken together, the A.MX, LUOP and C.MX microcode signals described above direct the ALU to carry out various operations, as follows:

To see how these signals interact to yield various ALU functions above, it will be useful to look at some examples:

ORA — LU.X will output “0” and LU.Y will output “A OR B”. The Adder will pass the result through unmodified (add “A OR B” and “0” with a zero input carry). The Shifter is disabled so again will pass the result through unmodified.
ADC — LU.X and LU.Y will output A and B respectively. The Adder will add these together with the Carry Flag. The Shifter is disabled.
ASL A — Adds A to itself with a Carry set to 0, therefore shift-left inserting a 0 into the low-order bit.
ROL A — Same as ASL A, but Carry is set to 1, therefore shift-left inserting a 1 into the low-order bit
LSR A — Enables the Shifter with C.MX0 = “0”, therefore shift-right inserting a 0 into the high-order bit
ROR A — Enables the Shifter with C.MX0 = “1”, therefore shift-right inserting the Carry into the high-order bit
DEC — Adds “FF” to A with a Carry set to “0”, therefore a decrement operation
INC — Adds “0” to A with a Carry set to 1, therefore an increment operation
PASS A — LU.X will output A and it’s passed on unmodified
ADSIC — The B-Input will be from the Sign Extension circuit, added to A with the Internal Carry. Used for calculating the high byte of branch offsets.

ALU Control Circuitry

The ALU Control circuitry (1.F6) selects between the Binary and BCD outputs of the ALU, and it also includes logic to further configure the BCD signal path when the ALU is in Decimal mode.

The END.D control signal is active only during the final cycle of Decimal Mode opcodes, namely ADC and SBC. The BCD operation itself is executed during this final cycle of these opcodes. Hence, the BCD control signal will be asserted to select the BCD Adjust signal path in the ALU if the Decimal Flag in the Status Register is high (D Flag) and the ALU is enabled (ALU.EN). At the same time, the CFG.2-CYCLE-BCD control signal, if high, will trigger a WAIT state to pause the CPU for the next cycle for two-cycle BCD. (CFG signals are generated by the Configuration Options logic).

The high-order bit of the Opcode (OP7) indicates which of ADC or SBC is currently in progress and the control signals are set accordingly (1.F7). Since END.D is only active for ADC and SCB Opcodes, OP7 is enough to differentiate between them.

The final three gates in the circuit (1.F8) are used to select the correct output for the ALU. There are three possibilities:

The ALU.BYPASS buffer (1.G10) will activate if ALU.EN is low. ALU.BYPASS connects the Data Bus directly to the W-Bus for register-load operations, bypassing the ALU and the B Data Latch entirely.
if the ALU is enabled (ALU.EN high) in Binary mode (BCD low), then /BIN.OUT is asserted and the Shifter will drive the W-Bus.
if ALU.EN is high and BCD is also high, then the ALU is in BCD mode and /BCD.OUT is asserted to have the BCD.OUT buffer (1.D10) output enabled.

All three buffers respect the Bus Management scheme and will drive the W-Bus during Phase 2 only.

BCD Adjust Logic

The BCD Adjust Logic logic adjusts the result of the Adder to conform to the Binary Coded Decimal format. For ADC instructions, this adjustment is as follows:

If the low-nibble result is greater than 9 or the input carry is high, add 6 to it
if the high-nibble result is greater than 9 or the low-nibble carry is high, add 6 to it

For SBC instructions, the test is for a low carry (borrow), and we subtract 6 rather than add it (which we do by adding the one’s complement of 6, or $A). We need only test the carry in this case, since a high carry implies a result between 0 and 9 for any valid BCD input quantities.

Dieter Mueller explains a basic BCD implementation here. The C74-6502 adapts Dieter’s design, as further optimized by Jeff Laughton (Dr. Jefyll on 6502.org), described here. The result can be a little daunting, so it will be useful to work through it.

Let’s first examine the circuit which performs the BCD adjustment on the low-nibble (1.D8). The BCD.DETECT.LO multiplexer inspects the low-nibble together with the input carry (ADRLC), as follows:

For addition, the multiplexer is disabled if ADRLC is high, which in turn takes BCDLC high and triggers an adjustment. Otherwise, BCDLC will go high only if [ADR0..ADR3] is greater than 9. This correctly detects the adjust condition.
For subtraction, the adjustment is triggered if /BCDLC is high, and since SBC is high, /BCDLC will be high for any ADRLC-low condition. This essentially triggers an adjustment whenever the input carry is low (borrow), which is what’s we want.

If an adjustment is not required, then BCD.SEL.LO will generate a zero, and BCD.ADJ.LO will therefore pass the low-nibble value unchanged. If, on the other hand, an adjustment is required, then BCD.SEL.LO generates either a $6 or an $A (for addition and subtraction respectively) and BCD.ADJ.LO applies it to the low-nibble. Either way, the output of BCD.ADJ.LO is fed to BCD.OUT as the low-nibble final result.

Now, the same logic could be used for the high-nibble adjustment, with the input carry reflecting whether a low-nibble adjustment was made. Using BCDLC as the high-nibble input carry would achieve the desired result, and the same logic would then work for both low and high nibbles.

Although the symmetry is nice, using the same logic for low and high nibbles extends the carry-chain, making the high-nibble adjustment wait until the low-nibble resolves. Dr Jefyll suggested a way to improve the performance of the circuit significantly by using CBT parts for the high-nibble adjust logic. (CBT parts are FET switches, and as such are lightning fast for the data inputs. If the select inputs are set up in advance, the total propagation delays is minuscule).

The revised circuit dispenses with the need to use the low-nibble carry in the select-input to BCD.DETECT.HI (1.E7). Instead, the circuit uses ADJ1 and ADJ2 at the data-inputs to shift the threshold which determines whether adjustment should take place.

That threshold is normally > 9 for ADC (i.e. if the result of an addition is $A, $B, $C, $D $E or $F, we add 6 to adjust to decimal), or < 0 for SBC (i.e. if the result is $F, $E, $D, $C, $B or $A, we subtract 6 to adjust to decimal). Instead, the new logic says:

If the low-nibble needed adjustment, set the threshold for the high-nibble at > 8 for ADC, and < $F for SBC

BCD.SEL.LO does exactly that, setting ADJ1 and ADJ7 as appropriate depending on whether the low-nibble was adjusted. Those signals then feed the data-inputs of BCD.DETECT.HI and BCD.DET.HI.AUX. These two multiplexers then generate the correct adjustment value as needed.

The select-inputs of all the CBT parts in the circuit (which require additional time to respond) rely only on signals that are available early in the cycle. The data path is therefore configured well in advance of the data arriving, and the CBT parts respond with lightning speed.

Meanwhile, BCDLC is no longer used in the high-nibble detect logic. Instead, the Carry-Low result from the Skip Adder is used as input to the high-nibble circuit, and BCDLC is added directly into the final adjust-adder BCD.ADJ.HI.

The incremental propagation delay for the high-nibble is reduced dramatically with these changes, albeit at the expense of additional hardware and complexity. In this case, the tradeoff is well worth the added complexity.

It is also worth noting that the BCD logic is also made faster in the C74-6502 by pre-charging the R-Bus in advance. Since Decimal mode on the 6502 is valid only for ADC and SBC instructions, we know the A register will be used as input to the ALU. We can therefore output-enable the A register onto the R-Bus in the prior cycle, rather than waiting for the ALU cycle itself. This eliminates significant overhead at the start of the cycle and allows more time for the ALU propagation delay proper. This made single-cycle Decimal mode operations possible at high speeds. (See this post for further details).

C.OUT Generation

The logic for the ALU’s output carry C.OUT (1.F10) is as follows:

for Binary Mode — the low-nibble adder output carry, ADRLC, is used to select the correct output carry from the SKIP.ADR, such that we select CLO.C if ADRLC is low, otherwise we use CHI.C.
for Decimal Mode — BCDHC is the BCD output carry, which we can use as is for addition. For subtraction, however, we need /BCDHC, so a simple inverter is applied before the final carry is selected.

Status Register and Flags

Flag Evaluation Logic

The Flag Evaluation Logic (1.H5) in the C74-6502 has additional complexity in order to support compatibility across both NMOS and CMOS 6502s. A notable difference is the treatment of the flags in Decimal mode for NMOS and CMOS variants. The NMOS 6502 leaves all but the Carry Flag set to an invalid state when operating in Decimal Mode. The 65C02, on the other hand, sets the flags correctly but takes an extra cycle to do so. Accomodating both requires extra circuitry.

The NMOS “invalid” BCD flags result from the fact that the NMOS 6502 evaluates flags (other than the Carry) based on the ALU result before the BCD Adjustment takes place. By contrast, the 65C02 uses the result after the BCD Adjustment, and therefore takes longer to do the job (hence the extra cycle for 65C02 Decimal Mode operations).

On the C74-6502, the flags are evaluated based on the ALU’s final result on the W-Bus, except when in NMOS Decimal Mode. In that case, the result is taken just after the Adder and before the BCD adjustment logic (i.e., [ADR0..ADR3] for the low-nibble and [BCDF4..BCDF7] for the high — note that a separate skip-adder, SKIP.ADR.BCD.F at 1.C7, is required in Decimal Mode to incorporate the BCD low-nibble carry, BCDLC).

Another complexity is the special handling of flags to support the ARR and ANC Undocumented Opcodes. ANC sets the Carry Flag equal to bit 7 of the ALU.A input. See this post for details on ARR flags processing. The C74-6502 implements correct flags logic for ARR in binary mode, but not Decimal Mode.

Dedicated circuitry is also required for opcodes which manipulate the flags directly, namely SEC, CLC, etc., and also PLP, SEP and REP. For these opcodes, the value on the W-Bus is used to write to the P Register directly. The flags are set according to their position in the P Register: NV–DIZC, where the N Flag is bit-7 and the C Flag is bit-0.

Special Flags Handling

Before looking at the flag evaluation proper, we should note the Special Flags Handling circuitry (1.J4) which controls the flag evaluation logic itself, as follows:

SF — this control signal comes from the SF.MX Decoder. It indicates an opcode which explicitly sets the flags (i.e., SEC, CLC, etc.).
P.W — this control signal comes from the WR.MX Decoder on the Registers Card. It indicates a write to the P Register.
END.ARNC — the END.ARNC control signal comes from the NX.MX Decoder. It indicates the last microinstruction for an ARR or ANC Undocumented Opcode. It invokes special flags handling to support their unique requirements.
SSF & PP.W — END.ARNC will assert both these signals together. They are otherwise asserted separately by SF and P.W respectively. This is simply a convenient way to drive the flag evaluation multiplexers when END.ARNC is asserted.
ARR — this control signal is asserted when we detect an ARR Undocumented Opcode (bit-6 of the current opcode and END.ARNC are both high).
CFG.BCD-NMOS — selects the NMOS flags behaviour for BCD operations
BCD — is asserted by the ALU Control Logic when a BCD operation is in progress (i.e. the final cycle of ADC or SBC).
BCD.NMOS — this signals the final cycle of an NMOS BCD operation. All flags will be set to NMOS compatible states.

Note the following notation used below to describe the various logic equations for flag evaluation:

“/” = NOT
“+” = OR
“*” = AND
”^” = XOR

V Flag Logic

The logic equation for the V Flag is as follows:

V = (W7 * /R7 * /Y7) + (/W7 * R7 * Y7)

where W refers to the result of the ALU, R refers to the A-input from the R-Bus, and Y refers to the B-input (after being inverted by the LU for SBC operations). In each case, we are interested in bit-7 of the quantities concerned. This design follows Dieter Mueller’s excellent description of the V Flag evaluation, summarized below.

For NMOS Decimal Mode, the equation is the same, except that instead of W7, we take the bit 7 before the BCD adjustment takes place (labelled BCDF7), as follows:

V = (BCDF7 * /R7 * /Y7) + (/BCDF7 * R7 * Y7)

To render the V Flag in discrete logic, we use De Morgan’s Law to note that:

(/R7 * /Y7) = /(R7 + Y7) = (R7 NOR Y7)

Then we can say that:

V0 = R7 NOR Y7

V1 = R7 AND Y7

And then replace the V flag equation above:

V = (W7 * /R7 * /Y7) + (/W7 * R7 * Y7)

With this:

V0 = (R7 NOR Y7)

V1 = (R7 AND Y7)

V  = (W7 * V0) + (/W7 * V1)

And for NMOS Decimal Mode we get this:

V = (/BCDF7 * V0) + (BCDF7 * V1)

In the circuitry, V0 and V1 are implemented with just a couple of gates (1.K4), and the V Flag logic equation is resolved using the VEVALUATE multiplexer (1.H4). Recall from above that the BCD.NMOS signal is high only if we are executing the final cycle of either an ADC or SBC in NMOS Decimal Mode. We can therefore use it to select the NMOS behaviour for the V Flag.

The VSELECTOR multiplexer then selects the appropriate value of the V-Flag to be used for writing to the P Register (V.W) as follows:

For normal operation, V.W = VEVALUATE.Y
BIT — For BIT instructions, V.W = B6 (the 6th bit of the opcode operand held in the B Register)
SSF — For instructions which explicitly set the flags (e.g., SEC, CLC, etc), V.W = W6
PP.W — For instructions which implicitly set the flags (i.e., PLP, SEP, REP), V.W = W6
SSF & PP.W — For the ARR Undocumented Opcode, V.W = Y6 XOR Y7 (which is equivalent to W5 XOR W6 after the ROR operation)

Z Flag Logic

The Z-Flag is high when the ALU result is all zeroes. As noted above, the result is normally taken from the W-Bus. For NMOS Decimal Mode, however, we look at the result after the Adder [ADR0..ADR3] for the low-nibble and [BCDF4..BCDF7] for the high. The Z-Flag Detector (1.H6) is comprised of NOR gates which in turn generate ALUW.Z from the W bus and ALUA.Z from the Adder.

The Z.SELECTOR multiplexer (1.I6) selects the appropriate value of the Z-Flag to be used for writing to the P Register and generates Z.W, as follows:

For Normal operations, Z.W = ALUW.Z (based on the W-Bus)
BCD.NMOS — For NMOS Decimal Mode operation, Z.W = ALUA.Z (based on [ADR0..ADR3][BCDF4..BCDF7])
END.ARNC — For either ARR or ANC Undocumented Opcodes, Z.W = ALUA.Z
(There are no opcodes which manipulate the Z Flag directly)
P.W — For instructions which write to the P Register (i.e., PLP, SEP, REP), Z.W = W1

C Flag Logic

The output carry from the ALU comes from the C.OUT Selector. The CSELECTOR multiplexer (1.I4) then selects the appropriate value for the C Flag to be used for writing into the P Register and generates C.W as follows:

For Normal operation, C.W = C.OUT (output carry from the ALU)
LSR — For right shifts, C.W = ADR0 (bit-0 before the shift takes place)
SSF or P.W — For instructions which explicitly set the flags (e.g., SEC, CLC, etc.), C.W = W0
PP.W — For instructions which implicitly set the flags (i.e., PLP, SEP, REP), C.W = W0
SSF & PP.W — For the ARR Undocumented Opcode, V.W = ADR7 (bit-7 before the Shifter)

N Flag Logic

The NSELECTOR multiplexer (1.I5) selects the appropriate value for the N Flag to be used for writing into the P Register and generates N.W as follows:

For Normal operation, N.W = W7 (bit-7 of the ALU result)
BIT — For BIT instructions, N.W = B7 (bit-7 of the ALU B operand)
ARR — For ARR instructions, N.W = C (Carry Flag from P Register)
BCD.NMOS – For BCD operations in NMOS mode, N.W = BCDF7

I Flag and D Flag Logic

The I and D Value Selector (1.J6) implements the required logic for these flags. The I and D flags are set using dedicated instructions (SEI, CLI and SED, CLD respectively). The SEI and CLD control signals from the SF.MX Decoder are used to set the appropriate values. These flags can also be set implicitly by operations which write to the P Register. In that case, it is the W-Bus (W2 and W3 respectively) that is used to write to these flags.

P Register

The P Register’s (1.H8) most notable feature is that individual bits can manipulated independently. It consists of eight individual flip-flops arranged according to the flags’ position in the Register (NV..DIZC).

Two spare bits in the register, bits 4 and 5, are used by the C74-6502 for internal purposes. They are the Internal Carry (IC) and Sign Extension (SE) flags respectively. The Sign Extension bit stores the sign bit of the ALU B-Operand from the previous cycle. It is used by the Sign Extension buffer (1.F2) to apply a high-byte offset to PC during branch address calculation.

The Internal Carry (IC) is used when a carry is needed but the processor C Flag must not be disturbed (e.g., during address calculation). The Internal Carry stores C.OUT every cycle (except when the on-board SPI Interface is active, in which case the Internal Carry is used to capture input data from the interface).

The other bits in the P Register are the processor flags, and their values come from the Flag Evaluation Logic above by way of “.W” control signals. Flip-Flops for each bit are clocked by corresponding “.WR” control signals which are generated by the SF.MX Decoder (e.g., N.WR).

The output of each flip-flop is used as the processor flags and is also routed to the R-Bus via the PBUF buffer (1.H9). The P.R control signal output-enables this buffer onto the R-Bus and is asserted by the microcode to read the contents of the P Register (e.g., for PHP operations).

Note that the Break Flag at bit position 4 in the P Register is not stored explicitly in a flip-flop. Rather it always reflects the value of the /INT.INP control signal. This active-low signal is asserted only while the interrupt-service microcode executes an external interrupt. In other words, the P value pushed onto the stack by the microcode will have a 0 for the BRK flag during external interrupts, and a 1 for BRK instructions. The /INT.INP signal will go high once the processor jumps to the appropriate interrupt vector address. Hence a PHP instruction will always push a 1 for the Break Flag, even when executed within an interrupt service routine.

Bit 5 of the P Register is fixed and will always read as a 1. As with the B Flag, there is no need to store this bit explicitly.

One final note — the active-low V.PRE control signal will set the V Flag when asserted. This control signal comes from the circuitry that implements the /SO pin logic.

SF.MX Decoder

The SF.MX Decoder (1.I11) drives the control signals associated with the P Register. The SF.MX0 multiplexer decodes 3-bit control signals [SF.MX0..SF.MX2] from the microcode as follows:

0 — None
1 — Flags selected by the opcode — SF.MXA is enabled by this signal and OP6 and OP7 from the opcode are used to select the correct flags to write-enable. The SF control signal (1.K12) is also asserted to tell the Flag Evaluation logic to use the values on the W-Bus directly. The microcode for the instruction looks after loading the W-Bus as appropriate (with $FF for SET operations and $00 for CLEAR operations).
2 — SEI/CLD — Interrupts perform an implicit SEI just prior to invoking the interrupt service routine. On the 65C02, the D flag is also cleared at this time. The CFG.INT-CLD configuration signal controls the CLD option. The SEI and CLD control signals generated here drive the I/D Flag Value Selectors (1.J6) as described above.
3 — Z (selects the Z flag)
4 — NZ (selects the N and Z flags)
5 — NZC
6 — NZCV
7 — NZV

The SF.MXA multiplexer uses specific bits from the current opcode to further qualify the operation in progress. Specifically, bits 6 and 7 of the opcode (OP6, OP7) tell us which flag to operate on for instructions that manipulate the flags directly (e.g., SEC, CLC), as follows:

00 = C
01 = I
10 = V
11 = D

Working together, these two multiplexers generate control signals to select which flags will be written to in the P Register. The P.W control signal indicates a write to the P Register as a whole and will enable ALL flags for writing. (P.W is generated by the WR.MX Decoder on Card A)

SF.LATCH (1.G12): All write signals for the flags are routed to SF.LATCH to control the timing of corresponding “.WR” signals which clock the flip-flops of the P Register itself. SF.LATCH is clocked by PHI11, which means the P register is written to on the fall of PHI2 at the end of the cycle. Note that SF.LATCH is cleared by /WR.CLR mid-cycle (see the Monoflop circuit on Card A). SF.LATCH also releases the /END.ON and /INT.ON control signals at the end of the cycle for the SPI Interface and Interrupt Handling logic respectively.

Branch Test Result (BTR)

The Branch Test Multiplexers (1.J10) implement logic to resolve branch instructions. The Branch Test Result output signal (BTR) is low if the test fails, and high otherwise. For regular branch instructions which test flags, the BTR multiplexer decodes the current opcode as follows:

OP3 — “0” indicates a flag-test branch
OP5 — value to test against flag
[OP6..OP7] — flag to test (0 = N, 1 = V, 2 = C, 3 = Z)

The BTR Multiplexer is selected by OP3 = 0, and uses OP6 and OP7 to select the flag in question. It generates a “0” (test fails) if the value of the flag in question is different than OP5.

For 65C02 branch instructions which test bit values (BBR and BBS), the BBTR multiplexer decodes the current opcode as follows:

OP3 — “1” indicates a bitwise test branch
[OP4..OP6] — selects which bit to test

Note that the microcode for BBR inverts the opcode operand, but leaves it alone for BBS. The resulting value in the Y Bus is used as the Branch Test Result directly (i.e., BBR will fail if the operand bit is a “0” after being inverted. BBS will fail if the bit is a “0” without inverting). We use the Y Bus rather than the W-Bus in order to reduce the propagation delay through the ALU (since the Adder and Shifter are not required in this instance).

Control Unit

The architecture of the Control Unit was an important consideration in the design of the C74-6502. The first iteration used a logic array architecture for the instruction decoder and sequencer (shown below) — a design not unlike the original 6502.

control logic — Discrete Logic Decoder and Sequencer (simplified 6502 instruction-set)

Ultimately, this design provide impractical and ROMs were introduced to hold microcode.

Microcode ROMs

A pair of control ROMs on each CPU Card generate 16 Microcode Signals which together determine the behaviour of the data path on that card. The ROMs are addressed by the current opcode in the Instruction Register [OP0..OP7] and a 4-Bit State Counter (Q). The Q Counter [Q0..Q3] is normally incremented by one each cycle, so microinstructions are fetched in sequence, up to 16 per opcode.

The ROMs are organized as four independent instruction-sets (as selected by the CMOS and ALT configuration signals). Each of these instruction-sets has two versions, one to support operation with, and the other without, the optional K24 Card. See the C74-6502 Datasheet for further details on the supported instructions-sets.

The ROM outputs are captured by the Microcode Instruction Register (MIR), a key part of the Microcode Pipeline mechanism. The outputs of the MIR then drive Microcode Decoders on each CPU card, which in turn generate card-local control signals for the datapath. (See C74-6502 Decoder Values).

The ALU.EN microcode signal does not go through a decoder, but rather connects directly to the datapath on Card B. When ALU.EN is low, the ALU.BYPASS buffer (1.G10) connects the Data Bus directly to the W-Bus. This bypasses the ALU entirely. It is done for register load operations when the ALU and B Data Latch are not required.

Microcode Decoders

The C74-6502 uses a Vertical Microcode scheme, requiring that signals from ROM be further processed by Microcode Decoders on each card. These decoders enable a shorter control-word (and therefore fewer ROMs), but at the expense of additional propagation delay on the critical path.

The decoders, each labelled with the “.MX suffix”, on Card B are as A.MX, C.MX, LUOP, SF.MX and NX.MX and are described throughout this document. See C74-6502 Decoder Values for a description of the microcode format.

Microcode Pipeline Logic

The usual arrangement for homemade CPUs is to fetch microinstructions from ROM at the start of each cycle. That works well, but it can be slow since the fetch and execute portions of the cycle must occur in sequence. Even when using fast RAMs to store microcode, sequential fetching imposes a significant penalty on the maximum clock-rate.

The C74-6502 implements a two-stage microcode pipeline which overlaps the fetch and execute operations so they happen concurrently, and thereby achieves a significant performance boost. Below we look at internal operation of the pipeline.

At a high level, the pipeline’s function is fairly straight forward. After all, with the current opcode in the IR, it’s a fairly simple matter to pre-fetch the next microinstruction and latch it into a Micro-Instruction Register (MIR) ready for use in the next cycle. But there are a few subtleties to be handled.

The first cycle after the opcode-fetch is clearly problematic, since we don’t yet know what opcode to use for the pre-fetch. A review of the 6502 and 65C02 instruction sets shows that nearly all opcodes perform exactly the same function in this initial cycle, namely to fetch the opcode’s operand with the address at PC and increment PC. That being the case, pre-loading the MIR with this default fetch-operand microinstruction satisfies most of the requirement, and leaves only a few exceptions to handle. Arranging the microcode so that an “all-zeroes” control-word produces the default fetch-operand function makes for a convenient mechanism for this.

As for the exceptions to the default, there are only three:

We need to inhibit the incrementing of PC for single-byte opcodes
We need to transform the default fetch-operand microinstruction into a fetch-opcode operation for 65C02 single-cycle NOPs
We need to generate a fetch-opcode in the next cycle of a branch if the branch is not taken

Let’s look at each in turn:

First, single-byte opcodes have the form $x8 or $xA, meaning that a three input gate can catch them all (1BOP Detector). We don’t have to worry about RTS ($60) and RTI ($40) since they replace the value of PC. Similarly, the KIL ($x2) illegal opcodes certainly need no protection from incrementing PC. It turns out the CPU already has logic for inhibiting +PC on the first cycle of an interrupt (/INH.INC). A couple of gates is all it takes to do the same for single-byte instructions. (To be clear, the default fetch-operand cycle is still performed in this case, but PC is not incremented and the fetched byte is discarded. It is then re-read in the following cycle as the next opcode).

Next, single-cycle NOPs are $x3 and $xB opcodes on the 65C02, except that WAI ($CB) and STP ($DB) need to be excluded from this list (and included in the single-byte opcode list above). A little more logic is needed to trap these (NOP1 Detector) and, if detected, to replace the default fetch-operand microinstruction with a fetch-opcode, and end the current opcode immediately (see FETCH.OP for details).

Finally, interrogating the Branch Test Result (BTR) in the first cycle of a branch instruction is straight forward. All branches are of the form xxx1-0000 so once again fairly simple logic can detect them (BRANCH.OP Detector). If the branch test fails, BRANCH.EXIT will set the FETCH.OP flip-flop. This will generate a fetch-opcode in the next cycle to end the branch (as with NOP1, but in the next cycle).

With those exceptions handled, we can then move the first microinstruction of every opcode, normally a fetch-operand, to the end of the opcode sequence. The microinstruction for cycle 1 of the opcode then sits at index location 0 in ROM, ready to be pre-fetched. And with that, the basic engine is ready to go.

The sequence of cycles for every opcode then becomes:

Execute fetch-opcode, prefetch fetch-operand, reset Q counter
Execute fetch-operand, prefetch the next microinstruction using the current opcode and Q index 0, decode the current opcode and process as follows:
- For single-byte opcodes —> inhibit +PC
- For single-cycle NOPs —> execute a fetch-opcode instead of fetch-operand in the current cycle
- For Branches —> if the branch test fails, set the FETCH.OP flip-flop to execute a fetch-opcode in the next cycle
Execute first microinstruction of the new opcode, or execute a fetch-opcode for failed branches

The only remaining issue is hadling page boundaries during address calculation. The 6502 is clever enough to save a cycle if a page boundary is not crossed. The pipeline therefore needs to detect these circumstances, and alter the instruction flow as necessary.

There are two situations when this arises, and each is handled slightly differently:

For branches, the default microcode increments the high-byte of PC after applying a branch offset; if applying the branch offset to the low-byte of PC does not cross a page, we trigger a BRANCH.EXIT to end the microcode sequence without adjusting the high-byte;
For indexed addressing modes, the default microcode does not adjust the high-byte of the calculated address; if a page is crossed, we trigger an INC.DPH to insert a increment-high-byte microinstruction on the fly, and then continue on to complete the memory read or write as the case may be.

Below are more detailed descriptions of both BRANCH.EXIT and INC.DPH, as well as other supporting logic for the microcode pipeline.

NX.MX Decoder

The NX.MX Decoder (1.A12) controls the flow of micro-instructions. Generally, micro-instructions are addressed in sequence, one after the other as stored in ROM, unless the NX.MX Decoder directs otherwise. The are four basic types of flow-control directives, and every microinstruction has one, as follows:

NEXT — executes the next micro-instruction in the sequence (the default)
EXIT — conditionally generates a fetch-opcode micro-instruction in the next cycle to cut short the execution of an opcode
END — resets the Q State Counter to 0 as the last step in a opcode sequence
INCDPH.C — a special case where an extra microinstruction is conditionally inserted into the execution stream to increment the high-byte of a target address

The INCDPH.C directive is used to detect a page-crossing in the cycle in which it is used. It checks the ALU’s output carry (C.OUT) and inserts an extra cycle to increment the high-byte of the calculated address if necessary (see INC.DPH).

There are two EXIT directives, as follows:

EXIT.CC — will force a BRANCH.EXIT on Carry Clear (used to avoid adjusting the high-byte of a PC in branch instructions)
EXIT.BTF — will force an exit if a Branch Test Fails for 65C02-specific branch instructions (BBR, BBS)

There are several END directives. They are used as part of the fetch-opcode microinstruction at the end of every opcode sequence to mark it as such. They may add certain qualifiers to the cycle they operate on, as follows:

END — the default; a fetch-opcode without qualification
END.D — indicates the last cycle of a BCD capable opcode, namely ADC and SBC, which triggers special BCD Logic in the ALU if the D Flag is on
END.INT — signals the end of an Interrupt Microcode Sequence; it triggers the /INT.CLR signal on the rise of PHI2 to clear the internal interrupt state
END.ARNC — signals the final cycle of either the ARR or ANC Undocumented Opcodes to signal special flags evaluation logic

State Register (Q)

The Q Register is used to index into the microcode ROMs. It is configured to increment each cycle, unless specific control signals direct otherwise, as follows:

/INC.DPH — if asserted, /INC.DPH will inhibit the Q counter from being incremented, thereby inserting an extra cycle into the execution stream
NX.MX2 — END directives all have the NX.MX2 microcode signal high. NX.MX2 high therefore marks the end of an opcode sequence and will cause the Q counter to reset to 0. It is executed along with a fetch-opcode operation at the end of opcode microcode sequences.
/BRANCH.EXIT — will force a fetch-opcode in the next cycle to end execution of the current branch opcode.

The Q Register is clocked by the internal PHI11 clock (i.e. on the fall of PHI2).

BRANCH.OP Detector

Branch opcodes are of the form “xxx1 0000”. The BRANCH.OP Detector (1.B12) asserts BRANCH.OP if the opcode matches the mask and the Q state register is 0. This indicates the fetch-operand cycle of a branch opcode is in progress.

Branch Not Taken

BNT (1.C.12) is asserted if a branch condition under test fails. The logic equation for BNT is:

BNT = (BRANCH.OP + EXIT.BTF) * /BTR

Branch Test Result (BTR) is low if the branch condition is NOT met. BNT will be asserted BTR is low and either a BRANCH.OP is detected or the EXIT.BTF microcode directive is active (EXIT.BTF, “Exit-If-Branch-Test-False”, is used by the BBR and BBS 65C02 instructions. See NX.MX Decoder).

BRANCH.EXIT

The BRANCH.EXIT multiplexer (1.D12) implements the following logic equation:

/BRANCH.EXIT = BNT + [EXIT.CC * /(C.OUT ^ B7)]

The BNT control signal indicates a failed branch condition on its own. The EXIT.CC, on the other hand, is a microcode directive used to check for a page-crossing when applying a branch offset to the low-bye of PC, as follows:

PCL := PCL + B; EXIT.CC

The branch microcode will adjust the high-byte of PC if allowed to execute. EXIT.CC will trigger an EXIT to end the the execution of the opcode if a page boundary is NOT crossed. Since branch offsets are signed quantities, the page-crossing is detected by comparing the ALU’s output carry (C.OUT) with the sign bit of the branch offset (B7). If they are equal (i.e. if C.OUT XOR B7 generates a 1), then a page was not crossed and the /BRANCH.EXIT signal is asserted.

The /BRANCH.EXIT control signal in turn generates a fetch-opcode in the next cycle by driving the following logic:

Set the FETCH.OP flip-flop to generate a fetch-opcode instruction in the next cycle
Set the MIR.SW flip-flop to switch to the MIR2 in the next cycle (rather than attempting to transform the MIR, the fetch-opcode is more easily synthesized in the MIR2)

The cycles immediately following the /BRANCH.EXIT therefore work as follows:

The MIR.SW signal and the FETCH.OP signal are both asserted — switch to the MIR2 and execute a fetch-opcode, FETCH.OP sets the MIR.SW flip-flop for the next cycle;
MIR.SW is once again asserted, but FETCH.OP is not — we switch to the MIR2 and execute a fetch-operand;
MIR.SW is no longer asserted — we switch back to the MIR and normal operation resumes

MIR.SW

In some situations when microinstructions need to be synthesized dynamically, the MIR contains values which are not easily manipulated. It’s easier then to load the required microinstruction into the MIR2, and switch operation to it. MIR.SW flip-flop (1.B15) is used to enable the MIR2 for the following cycle, as follows:

for a /BRANCH.EXIT, in order to generate a fetch-opcode microinstruction in the next cycle;
for a FETCH.OP, in order to generate a fetch-operand microinstruction in the next cycle;
for /INC.DPH, when a page is crossed during address calculation, in order to generate an increment-high-byte microinstruction in the next cycle.

1BOP Detector

The 1BOP Detector (1.C.14) detects one-byte opcodes. These need to be trapped by the pipeline in order to inhibit the normal increment-PC in the fetch-operand microinstruction. The byte fetched by fetch-operand is discarded and then re-read as an opcode by a subsequent fetch-opcode microinstruction.

NMOS 6502 one-byte opcodes are generally of the form “xxxx 10×0”. The exceptions are RTS ($20), RTI ($40) and KIL ($x2), which do not need to inhibit incrementing PC. This logic therefore just implements the bit-mask without bothering to handle the exceptions.

For the CMOS instruction sets, the logic equation (1.D16) is as follows:

CMOS1BOP = CMOS * “xxxx 1011” * (“110x xxxx” + ALT)

All $xB opcodes on the 65816 are single-byte opcodes, and the form “xxxx 1011” traps them. For the 65C02, on the other hand, $xB opcodes other than STP ($DB) and WAI ($CB) must not have +PC inhibited. The form “110x xxxx” is used to trap STP and WAI for the 65C02 instruction set (i.e., when the ALT configuration option is off).

NOP1 Detector

The NOP1 Detector (1.D14) captures all single-cycle NOPs. These opcodes are of the form $xB or $x3 for the 65C02 instruction set (CMOS * /ALT), except that once again STP and WAI must be excluded. Rather than trapping those two opcodes specifically, we use the CMOS1BOP signal above to exclude all CMOS one-byte opcodes, with STP and WAI among them. The logic equation implemented in the NOP1 DETECT multiplexer is as follows:

NOP1 = “xxx x011” * CMOS * /ALT * /CMOS1BOP

INC.DPH

The INC.DPH (1.C16) control signal is asserted if the ALU’s output carry C.OUT is high and microcode INCDPH.C directive is active. This condition indicates that a page-crossing has occurred, and the high-byte of the calculated address must be adjusted. The CPU will is insert an extra cycle to execute the increment-high-byte microinstruction (DPH := DPH + 1).

To do so, the INC.DPH signal drives the following logic:

The Q State Register to stall the Q counter and generate an extra cycle
The MIR2 to load the increment-high-byte microinstruction for the next cycle
The MIR.SW flip-flop to switch to the MIR2 in the following cycle

The transformation of the default “all-zeroes” fetch-opcode microinstruction into the increment-high-byte microinstruction (DPH := DPH + 1) is accomplished by enabling the appropriate control values in the MIR2, as follows:

R.MX = 1 selects DPH as the source register for ALU input A
WR.MX = 1 selects DPH as the target register for the write-back
INC.MX = 2 (INC.MX1) selects NONE as the operation for the INC16 incrementer
ALU.EN = 1 enables the ALU
LUOP = 0 configures the ALU to add 0 to the A input
C.MX = 1 sets the ALU input carry to 1
A.MX = 1 selects the R-Bus as the value for the A input of the ALU

FETCH.OP

The FETCH.OP (1.E12) control signal is generated in order to end the current opcode early. This is done in two situations:

in the current cycle by the NOP1 Detector, and
in the next cycle by the BRANCH.EXIT control signal.

The FETCH-OP control signal is used to transform an “all zeroes” control word in the MIR into a fetch-opcode operation on-the-fly using a couple of OR-gates on the MIR outputs (Card A 1.B3 and Card B 1.J3). In the case of NOP1, we use the fetch-operand present in the MIR already, and in the case of a BRANCH.EXIT, the MIR2 is used instead. Either way, the transformation involves enabling two microcode control signals, as follows:

IR.LD = 1 loads the IR with the byte being read from memory
NX.MX2 = 1 enables the END directive which resets the Q counter to 0

These transformation have the effect of changing the default fetch-operand microinstruction:

B := DPH := *PC; PC += 1

into the fetch-opcode microinstruction:

IR := *PC; PC += 1; END

Inhibit INC16

The INH.INC control signal will inhibit the INC16 address incrementer. The INH.INC circuit (1.F15) asserts this signal when:

the INT.ON signal (1.F12) is asserted, which happens during first SYNC cycle after an interrupt is first detected — this is in effect a fetch-opcode cycle;
the /INT.INP signal (1.F13) is asserted and Q = 0, which is the first cycle of the interrupt service sequence — this is a fetch-operand cycle; and
the 1BOP signal (1.C14) is asserted, which happens during the fetch-operand cycle of one-byte opcodes — this is a fetch-operand cycle.

Interrupt Handling

The interrupt sequence begins with the detection circuits for the RESET (1.G15), NMI (1.H15) or IRQ (1. J15) interrupts. These circuits set the interrupt-pending control flags, the /RES.PND, /NMI.PND and /IRQ.PND flags respectively.

The Interrupt Detect circuit (1.F12) will assert the INT.ON control signal if any of the three interrupt-pending signals is active when the next SYNC cycle is executed — effectively the next fetch-opcode cycle.

When INT.ON is asserted, the fetch-opcode cycle is modified as follows:

The /INT.ON signal is triggered by SF.LATCH, which clears the IR (loading BRK as the current opcode), and sets the INT.INP flag (INT.LATCHB 1.F13) to indicate an interrupt is in progress
INH.INC will prevent PC from being incremented

The immediately following fetch-operand cycle also asserts /INH.INC, since the INT.INP flag is on and Q = 0 (the prior fetch-opcode will have reset Q). It is therefore a dummy cycle.

The BRK microcode then executes as normal, with the following provisions:

writes to memory are inhibited if the /RES.PND flag is asserted,
interrupt vectors are fetched from memory based on the interrupt-pending flags in effect (see Constant Generators),
the final microinstruction in the sequence is a modified fetch-opcode, which includes an END.INT directive.

The END.INT microcode directive, part of the NX.MX Decoder, triggers the /INT.CLR (1.A13) control signal to end the interrupt sequence. /INT.CLR will clear any active interrupt-pending flags and the /INT.INP flag.

RESET Interrupt

The Reset Interrupt Detector (1.G15) samples the /RES CPU pin every cycle, on the fall of PHI2. The /RES.L signal tracks /RES at cycle boundaries, and when low will keep the R/W pin high (Card A, 1.D1), effectively disabling writes to memory for any cycle where /RES is low.

The RESET sequence is not triggered until /RES goes high again. At that point, it is latched by the RES.EDGEB flip-flop and takes /RES.PND low to indicate a RESET interrupt is pending. /RES.PND also inhibits writes to memory when asserted. It is cleared by /INT.CLR once the reset sequence is complete.

NMI Interrupt

The NMI Interrupt Detector (1.H15) samples the /NMI CPU pin every cycle, on the fall of PHI2. The /NMI pin is edge-sensitive. It clocks the NMI.EDGEB flip-flop when it goes low, and takes /NMI.PEND low to indicate an NMI Interrupt is pending.

/NMI.PDN is cleared by /INT.CLR once the interrupt sequence is complete, or by /RES.PND when a reset interrupt is detected. This gives RESET priority over an NMI interrupt.

IRQ Interrupt

The IRQ Interrupt Detector (1.H15) samples the /IRQ CPU pin every cycle, on the fall of PHI2. The /IRQ pin is level-sensitive. It is latched by the INT.LATCHA flip-flop, and, provided the I Flag is low, /IRQ.PEND will go low to indicate an IRQ Interrupt is pending.

/IRQ.PDN is cleared by /RES.PND to allow a RESET to take priority. /IRQB is a buffered version of the IRQ pin, which used to release the CPU from a WAI State.

Special Pins and Circuits

Page 2 of the Carb B – ALU & CU Schematic contains a number of circuits to support special CPU pins and functions.

card b-alu & cu sch p2 — Card B – ALU & CU Schematic, Page 2 (click to enlarge)

Inter-Card Connectors

Single-row intercard connectors (2.B2) are placed along the top, left and right edges of each CPU Card, and the multi-card CPU assembly is sandwiched together into a single unit. Signals on these connectors are arranged in a regular pattern (GND, SIG, SIG, GND) such that every signal has either a GND (or a bypassed VCC) pin next to it. This makes for low-impedance paths and a compact assembly to support high-clock rates. A 4.7uF capacitor is placed close to the VCC intercard pins on each card.

Local Clock Buffers

Each CPU card generates its own PHI1, PHI2 and PHI11 clocks based on the PHI1.X and PHI11.X cross-card signals. All use a single gate delays. The buffers reduce load on individual clock signals and help minimize skew across the various cards.

ML Pin Detector

The 65C02 /ML Pin Detector (2.E2) circuit keeps /ML low for three cycles, beginning with the Read Cycle of Read-Modify-Wite instructions. The ML microcode directive (see WR.MX Decoder on Card A) asserts the ML.W signal which triggers the circuit. ML.LATCH then keeps /ML low for the following two cycles.

SO Pin Detector

The /SO pin on the 6502 sets the V-Flag when asserted. It is sampled mid-cycle by SO.LATCHA (2.F2), on the rise of PHI2. Several gates are used to provide a low-going pulse to the /V.PRE signal. It connects to the /PRE input of the V Flag internal flip-flop to set the V Flag asynchronously.

SPI Interface Select

The SPI Interface Select Multiplexer (2.H2) is part of the SPI interface which resides on the optional K24 Card. The multiplexer is used to set IC.W (the Internal Carry) and ROL.C (the carry used in shift operations).

In normal operation, both will reflect standard functions for the Carry:

IC.W is set to the ALU output carry, C.OUT
ROL.C is set to the C Flag from the P Register

When the SPI interface is active (/SPI.TGL is asserted), they are used to transfer data into the A register through the Internal Carry:

IC. W is set to MISO
ROL.C is set to IC (the internal carry)

CFG Configuration Options

The C74-6502 has the following configuration options (2.H8):

CFG.2-CYCLE.BCD — inserts a wait-state as the final cycle of BCD operations. See ALU Control Circtuitry.
CFG.INT-CLD — clears the D Flag during interrupt service logic. See SF.MX Decoder.
CFG.BCD-FLAGS-VALID — will set the flags to a valid state after BCD operations. See Special Flags Handling.

These options are set to a default state if the optional K24 Card is not present, such that all three flags are enabled for CMOS instruction sets and disabled otherwise.

Special Registers: Bitwise Constant Generator

The Bitwise Constant Generator (2.B10) is used by the 65C02 RMB and SMB instructions. It uses bits [4..6] of the current Opcode in the IR to drive high a specific bit in the R Bus, which is then used as needed by the microcode.

Special Registers: A & X

The A&X Register (2.C10) is used by the AXS Undocumented Opcode. It generates the logical AND of the A and X Registers onto the R Bus. Shadow registers are used here for both the A and X registers to avoid complicating the signal path for the primary A and X Registers on Card A. The /A&X.R control signal output enables this special registers on to the R Bus.

Special Registers: DPH+1

The DPH+1 Register (2.F10) calculates DPH+1 when needed within the same cycle. The circuit uses a DPH shadow register and dedicated adders. The result is ready on the B Bus for use by the ALU directly. It is used by the so-called “& H + 1” Unstable Opcodes, where “H” refers to the high-byte of the effective address being used in the particular instruction.

The “& H + 1” opcodes perform a logical operation and write the result to memory in the same cycle. To make this possible, the /DPH+1 control signal output-enables the DPH+1 Special Register onto the B Bus as input to the ALU, and at the same time, the “Y->DB” buffer (2.H5) is enabled to provide a direct path from the output of the ALUs Logical Unit to the Data Bus.