Microcode

This section describes the rationale for using a microcode, as well as its weaknesses, some terminology, and finally, the various microcode fields of Sentinel. The default microcode file is also included to study!

If you want a short version for the rationale, and just skip over to the terminology:

Sentinel is a horizontally-microcoded CPU design because I wanted to create a conforming RV32I_Zicsr CPU with M-Mode that fits into ~1000 ICESTORM_LCs[1] on an iCE40 FPGA. It wasn’t- and still isn’t- clear to me that I could hit these goals without using some of the FPGA Block RAM for a microcode instead of LUTs for a hardwired control unit.

Some References

CPUs implementing a Reduced Instruction Set, like RISC-V, are best optimized for speed by using dedicated circuitry to control the internal components. Today, microcode is generally relegated to special cases. However, both hardwired and microcoded control fulfill the same general purpose; “drive 0s and 1s to a CPUs various components to move and manipulate data”.

I designed a hardwired CPU for a class many years ago; it took me a while to wrap my head around how microcode works and find the right people to point me in the right direction. I found the following resources useful:

  • Over the years, the Wikipedia article has gotten a lot better.

  • The ZPU Stack Machine CPU has a microcode implementation to study.

  • In my opinion, the gold standard for microcode design is Bit-slice microprocessor design by John Mick and Jim Brick.

    The book was written for with the Am2900 series in mind, which are long out-of-production parts. However, the Am2900 series were building blocks designed for microcode. As the book teaches you to to make a custom CPU datapath out of Am2900 parts, you by extension learn how to design a microcode.

Hardwired and Microcoded Rationale

Back in the 80s, microcode allowed for flexibility of implementation and quick iteration. If there was a bug or a design change, you may only have to swap the microcode out of EPROMss or RAM; and the rest of your Am2900 building blocks/glue logic would remain untouched.

Today, FPGAs serve much the same purpose, and can swap out an entire CPU- not just microcode- in seconds. In most cases, designing a hardwired control unit on an FPGA will have acceptable iteration time and flexibility thanks to describing circuits in code. A hardwired implementation also makes it easier to implement speed optimizations like tight pipelining, which are difficult to reason about in microcode[2]. Thanks to changes in design process, there is probably no reason to do a performance-oriented microcode RISC-V implementation.

When optimizing for size, you still should probably hardwire your RISC-V core. There are several perfectly usable hardwired RISC-V CPUs in < 1000 LCs, such as:

I could probably design a minimal hardwired Sentinel that doesn’t have that much more circuitry than the current microcoded version. However, I set a goal of a complete RISC-V implementation including M-Mode in ~1000 LCs. It wasn’t clear to me- and still isn’t- that a non-microcode RISC-V implementation could fit in 1000 LCs without making concessions that I didn’t want to[3], such as:

  • Limit datapath width.

  • Remove IRQs and M-Mode.

  • Do not handle illegal instructions.

Even if I did a hardwired control unit, I knew that I was going to be multicycle and not tightly pipelined. My own experience is that a basic RISC pipeline 32-bit CPU takes at least 2000 ICE40 LCs minimum[4] thanks to pipeline control logic. At that point, for any design meeting my requirements, I figured the speed of a microcoded and a hardwired RISC-V without pipeline control would be similar.

Around the same time in late 2020[5] was when I found out about Mick and Brick. I found the book fascinating (and still do), so I was already looking for an excuse to write a microcode for fun. Additionally, I realized could put a microcode to good use by leveraging FPGA block RAM to hold the microcode program. This gave me more precious LUT breathing room that would otherwise be used by a hardwired implemetation. Since, I already wasn’t expecting my implementation to be fast, all of a sudden I found the potential space savings of a microcode very appealing.

Via creative uses of block RAM and microcoding, Sentinel itself reached my goal of ~1000 LCs while implementing RV32I_Zicsr and M-Mode. However, it’s still hit or miss whether a full SoC fits into my target ICE40HX1K FPGA (1280 LCs max), depending on yosys optimizations and Amaranth changes. We’ll see what the future holds.

Additionally, while not every application needs a super fast CPU, there is plenty of room for Sentinel and other small RISC-V implementations to coexist! Users need to decide for themselves if Sentinel fits their needs.

Terminology

Mick and Brick introduces some jargon that I use in Sentinel:

Note

This list is probably incomplete.

Condition Code Multiplexer

A multiplexer of various conditional tests. The output of this multiplexer, selected by the microcode, becomes an test input to the Sequencer. The conditional test result can often be inverted by microcode to double the number of possible tests.

Tests conditions used by Sentinel include:

  • Is the ALU output zero/nonzero?

  • Is a memory access complete/incomplete?

  • Did an exception occur/not occur?

  • Unconditionally true/false.

Macroinstruction

An unit of execution from the CPU’s instruction set, composed of microinstructions. In Sentinel’s case, macroinstructions are RISC-V instructions.

Mapping (P)ROM

A (P)ROM which maps of the macroarchitecture opcode into microprogram jump targets. It is the hardware version of a jump table, where the jump index is retrived from the macroinstruction opcode.

Each macroinstruction is a loop through the microprogram. The mapping (P)ROM jumps from microcode common to all macroinstructions to microcode specific to each (group of) macroinstruction(s).

In Sentinel, the Mapping “(P)ROM” is implemented in combinational logic.

Microinstruction

A microprogram/microcode instruction. Macroinstructions are composed of multiple microinstructions. Each microinstruction takes one clock cycle.

Microprogram Counter

Register whose value is the address of the microcode instruction which will execute on the next clock cycle, assuming the sequencer chooses to use it.

Pipeline Register

In microcode, the pipeline register specifically refers to a holding register containing the bits of the currently-executing microinstruction.

In Sentinel, the pipeline register is part of the synchronous read port of the Block RAM holding the microprogram. The address input to the microprogram Block RAM is the output of the sequencer; the data for the microinstruction at this address appears on the read port on the next clock cycle.

Sequencer

Component which supplies the address of the microinstruction which will be output on the next clock cycle. It chooses between various sources based on a test condition provided by the Condition Code Multiplexer.

Sources used by Sentinel, include:

  • The microprogram counter.

  • An address constant in the microcode instruction.

  • A mapping PROM.

  • An implied constant 0.

Microcode Fields

Microcode field classes for main microcode file.

This file is used to avoid circular imports and to serve as a single source of truth for the meaning of microcode fields. Each variable defined in this modules corresponds to an m5meta field in the default microcode file.

Microcode field order is determined by the microcode assembly file; order of fields in this module do not matter. However, for consistency, we try to match the microcode.asm order.

The default/main microcode file is stored with the Sentinel package in the same directory as this file, at microcode.asm.

sentinel_cpu.ucodefields.Target = unsigned(8)

Jump target supplied by the currently-executing microinstruction. Occassionally used to supply a constant value, like in CSRSel.

class sentinel_cpu.ucodefields.JmpType(*values)

Type of jump to perform for this microinstruction.

CONT = 0

On the next cycle go to the next sequential microinstruction (upc + 1).

Type:

int

NOP = 0

An alias for CONT meant to indicate that the target field is being used for something else.

Type:

int

MAP = 1

Jump to the address supplied by the MappingROM if condition is met. Otherwise, unconditionally jump to the address supplied by Target. This is generally used to jump to code specific to each macroinstruction, or start exception handling on an invalid instruction.

Type:

int

DIRECT = 2

If condition is met, jump to the address supplied by Target. Otherwise, go to the next sequential microinstruction, as in CONT.

Type:

int

DIRECT_ZERO = 3

If condition is met, jump to the address supplied by Target. Otherwise, go to the upc address 0.

Type:

int

class sentinel_cpu.ucodefields.OpType(*values)

ALU operation to perform this cycle.

On the next active edge, ALU output (O) will be equal to result of the operation performed using its A and B inputs.

ADD = 0

O <= A + B

Type:

int

SUB = 1

O <= A - B

Type:

int

AND = 2

O <= A & B

Type:

int

OR = 3

O <= A | B

Type:

int

XOR = 4

O <= A ^ B

Type:

int

SLL = 5

O <= A << 1

Type:

int

SRL = 6

O <= unsigned(A) >> 1

Type:

int

SRA = 7

O <= signed(A) >> 1

Type:

int

CMP_LTU = 8

O <= bool(unsigned(A) < unsigned(B))

Type:

int

class sentinel_cpu.ucodefields.CondTest(*values)

Conditional test to pass through to the Sequencer.

EXCEPTION = 0

Set if an exception occurred this clock cycle.

When InvertTest is asserted, set if an exception did not occur this clock cycle.

Type:

int

CMP_ALU_O_ZERO = 1

Set if the ALU output is 0 this clock cycle.

When InvertTest is asserted, set if if the ALU output is nonzero this clock cycle.

Type:

int

MEM_VALID = 2

Set if the contents of the memory bus are valid this cycle.

When InvertTest is asserted, set if if the contents of the memory bus are not valid.

The memory bus is valid when sentinel_cpu.top.Top.bus.ack in Top is asserted.

Type:

int

TRUE = 3

Unconditionally set/asserted. When InvertTest is asserted, the test unconditionally fails.

Type:

int

sentinel_cpu.ucodefields.InvertTest = unsigned(1)

If set, invert the result of the conditional test on the output of CondTest this clock cycle.

class sentinel_cpu.ucodefields.PcAction(*values)

Perform an action on the RISC-V Program Counter this cycle.

HOLD = 0

Do not change the Program Counter; hold the current value.

Type:

int

INC = 1

Increment the Program Counter by 4 bytes (1 32-bit word).

Type:

int

LOAD_ALU_O = 2

Set the Program Counter to the value currently on the ALU output.

Type:

int

sentinel_cpu.ucodefields.LatchA = unsigned(1)

If set, latch the selected ASrcMux input to its output.

sentinel_cpu.ucodefields.LatchB = unsigned(1)

If set, latch the selected BSrcMux input to its output.

class sentinel_cpu.ucodefields.ASrc(*values)

Select the source for the ALU A input.

The ALU A input is provided by the latched output of ASrcMux; this field is qualified by LatchA.

GP = 0

Select general purpose register that was read from the reg file last cycle.

Type:

int

IMM = 1

Select the decoded Immediate from the current instruction.

Type:

int

ALU_O = 2

Feed back the ALU output into the input. Intended to facilitate chaining ALU ops together.

Type:

int

ZERO = 3

Supply the literal constant C(0, 32).

Type:

int

FOUR = 4

Supply the literal constant C(4, 32).

Type:

int

THIRTY_ONE = 5

Supply the literal constant C(31, 32).

Type:

int

class sentinel_cpu.ucodefields.BSrc(*values)

Select the source for the ALU B input.

The ALU B input is provided by the latched output of BSrcMux; this field is qualified by LatchB.

GP = 0

Select General Purpose register that was read from the reg file last cycle.

Type:

int

PC = 1

Select the Program Counter register.

Type:

int

IMM = 2

Select the decoded Immediate from the current instruction.

Type:

int

ONE = 3

Supply the literal constant C(1, 32).

Type:

int

DAT_R = 4

Select the unregistered Wishbone read data bus value. The read data bus is only valid when indicated by MEM_VALID.

Type:

int

CSR_IMM = 5

Some RISC-V CSR instructions have an Immediate field that differs from IMM; select the CSR Immediate field instead.

Type:

int

CSR = 6

Select CSR register that was read from the CSR reg file last cycle.

Type:

int

MCAUSE_LATCH = 7

Select the current value of the MCAUSE latch.

Type:

int

class sentinel_cpu.ucodefields.ALUIMod(*values)

Modify ALU inputs before performing ALU op.

This field modifies the ALU inputs A and B just before they are sent to the to ALU. Set this field to value besides NONE on the same cycle as when an ALU op you wish to modify is taking place. The ALU output (O), modified or otherwise, will be available on the next active edge.

Modifying the inputs are useful to implement additional ALU operations, such as signed compare using an unsigned comparator.

NONE = 0

Pass through A and B to the ALU unchanged.

INV_MSB_A_B = 1

Invert the most-significant bit of A and B before performing OP.

class sentinel_cpu.ucodefields.ALUOMod(*values)

Modify the result of the currently-executing ALU op.

This field modifies the raw ALU result just before storing the result in O on the next active edge. In other words, this field must be set on the same cycle as when the ALU op you wish to modify is taking place.

Modifying the output is useful for synthesizing additional ALU operations, such as “compare-greater-than-or-equal” or JALR targets.

NONE = 0

Do not modify O.

Type:

int

INV_LSB_O = 1

Invert the least-significant bit of O.

Type:

int

CLEAR_LSB_O = 2

Clear the least-significant bit of O.

Type:

int

sentinel_cpu.ucodefields.RegRead = unsigned(1)

If set, read from the register file this cycle. The results will be valid and available on the read port on the next active edge. The read value will stay valid on the read port until the subsequent active edge where RegRead is asserted or CSROp is not NONE.

The register file is transparent; a write and read to/from the same address on the same cycle will use the value to-be-written on the read port on the next active edge.

This field has no effect if CSROp is not NONE.

Todo

I need to verify what happens when we RegWrite to the same address with deasserted read-enable. Will it “blow away” the current read port value?

sentinel_cpu.ucodefields.RegWrite = unsigned(1)

If set, write to the register file this cycle. The write will be valid on the next active edge.

This field has no effect if CSROp is not NONE.

class sentinel_cpu.ucodefields.RegRSel(*values)

Select register to be read to register file.

This field has no effect if CSROp is not NONE.

INSN_RS1 = 0

Read from the register specified in the rs1 field of the current instruction.

Type:

int

INSN_RS2 = 1

Read from the register specified in the rs2 field of the current instruction.

Type:

int

class sentinel_cpu.ucodefields.RegWSel(*values)

Select register to be written to register file.

This field has no effect if CSROp is not NONE.

INSN_RD = 0

Write to the register specified in the rd field of the current instruction.

Type:

int

ZERO = 1

Write x0, the zero register. For space reasons, there is no hardcoded zero register. Microcode initialization must write 0 to the reg file when this option is selected, otherwise Undefined Behavior will result pretty quickly.

Type:

int

class sentinel_cpu.ucodefields.CSROp(*values)

Select operation on CSR file.

NONE = 0

Do a read and/or write to the register file this cycle.

This variant qualifies RegRead, RegWrite, RegRSel, and RegWSel; CSRSel has no effect when this variant is selected.

Type:

int

READ_CSR = 1

Read from the CSR file this cycle. The read will be valid on the next active edge. As in RegRead, reads are transparent.

Type:

int

WRITE_CSR = 2

Write to the CSR file this cycle. The write will be valid on the next active edge.

Type:

int

class sentinel_cpu.ucodefields.CSRSel(*values)

Select register from CSR file to read or write.

This field has no effect if CSROp is NONE.

INSN_CSR = 0

Select the CSR register specified by the compressed CSR address, derived from the current instruction.

Type:

int

TRG_CSR = 1

Select the CSR register specified by Target, using the compressed address encoding.

Type:

int

sentinel_cpu.ucodefields.MemReq = unsigned(1)

If set, set Wishbone CYC_O and STB_O to the asserted state, indicating that a memory transfer is imminent. This signal also qualifies AddressAlign outputs.

As per the Wishbone spec, since Sentinel does not use wait states, tying CYC_O and STB_O to the same signal is sound. See Permission 3.40.

class sentinel_cpu.ucodefields.MemSel(*values)

Select memory transfer type in progress.

This field indirectly controls the the Wishbone SEL_O, DAT_I (for reads/loads that are not instruction fetches), and DAT_O lines (writes/stores). See sentinel_cpu.align for more information.

AUTO = 0

Memory access is instruction fetch or none at all- data width and SEL_O is determined automatically.

Type:

int

BYTE = 1

Memory access is 8-bit; only one of bit 0, 1, 2, and 3 of SEL_O is asserted. Read and write data will be shifted appropriately.

Type:

int

HWORD = 2

Memory access is 16-bit; either bits 0 and 1 or 2 and 3 of SEL_O are asserted. Read and write data will be shifted appropriately.

Type:

int

WORD = 3

Memory access is 32-bit; all bits of SEL_O asserted.

Type:

int

class sentinel_cpu.ucodefields.MemExtend(*values)

Extend read data to WORD width.

Sentinel CPU directly reads the DAT_I Wishbone signal when performing instruction fetches and loads. Fetches are always WORD sized, but loads can be variable-sized. RISC-V specifies that loads less than WORD width should have the unused bits filled/extended with either 0 (unsigned/signed) or 1 (signed).

This field will make sure BYTE and HWORD loads are properly extended before latching data for further use by Sentinel. It has no effect for WORD or AUTO loads.

ZERO = 0

Sign-extend DAT_I to WORD width; bits 8-31 are zero for BYTE loads and bits 16-31 are zero for HWORD loads.

Type:

int

SIGN = 1

Sign-extend DAT_I to WORD width, using bit 7 for BYTE loads and, bit 15 for HWORD loads.

Type:

int

sentinel_cpu.ucodefields.LatchAdr = unsigned(1)

If set, latch the ALU output into an internal register representing the raw byte address for an upcoming Wishbone memory transaction. This internal register indirectly controls the Wishbone ADR_O and SEL_O lines via AddressAlign. Used for both Wishbone reads and writes.

sentinel_cpu.ucodefields.LatchData = unsigned(1)

If set, latch write data into an internal register which directly drives the Wisbone signal DAT_O. The data will be appropriately aligned for an upcoming Wishbone write, based upon the contents of the internal address register controlled by LatchAdr. Used only for Wishbone writes.

sentinel_cpu.ucodefields.WriteMem = unsigned(1)

If set, set Wishbone WE_O to the asserted state, indicating a Wishbone write. Not used by other core components.

sentinel_cpu.ucodefields.InsnFetch = unsigned(1)

If set, indicate that the current Wishbone transaction is an instruction fetch. Currently, this signal overrides address alignment behavior so that instruction fetches will succeed. In the future, this signal will also be used for a Wishbone tag of some sort.

Instruction decode begins automatically upon receipt of Wishbone ACK_I.

class sentinel_cpu.ucodefields.ExceptCtl(*values)

Perform a variety of exception-handling related tasks.

NONE = 0

Do nothing this cycle.

Type:

int

LATCH_DECODER = 1

Check Decode for exceptions and latch results into ExceptionRouter this cycle.

Type:

int

LATCH_JAL = 2

Use ExceptionRouter to check whether a JAL triggered alignment exceptions this cycle. Valid only when the current instruction is in fact a JAL.

Type:

int

LATCH_STORE_ADR = 3

Use ExceptionRouter to check whether a store triggered alignment exceptions this cycle. Valid only when the current instruction is in fact a store.

Type:

int

LATCH_LOAD_ADR = 4

Use ExceptionRouter to check whether a load triggered alignment exceptions this cycle. Valid only when the current instruction is in fact a load.

Type:

int

ENTER_INT = 5

Move MIE to MPIE, set MIE to 0 this cycle. See CSRFile for implementation.

Type:

int

LEAVE_INT = 6

Move MPIE to MIE, set MPIE to 1 this cycle. See CSRFile for implementation.

Type:

int

Default Microcode Annotated Source

Many jump addresses are hardcoded by the mapping PROM. Since there is only room for 256 instructions, the remaining required jumps go to wherever there is extra room. With that said:

  • I try to keep instructions with the similar functionality (“major opcode”) together.

  • I try to avoid backward jumps, except for jumping to the next macroinstruction, but they are sometimes unavoidable (see beq and bne labels).

space block_ram: width 48, size 256;

space block_ram;
origin 0;

// Microcode fields in this space correspond to classes defined in
// ucodefields.py. The ordering of microcode fields is taken from this file.
// Width and enum field names are validated against the Amaranth source after
// assembly.
//
// Comments are included for convenience, and efforts are made to ensure
// they don't contradict comments in ucodefields.py. In case of conflict,
// ucodefields.py comments take priority.
fields block_ram: {
  // Target field for direct jmp_type. The micropc jumps to here next
  // cycle if the test succeeds.
  target: width 8, origin 0, default 0;

  // Various jump types to jump around the microcode program next cycle.
  // cont: Increment upc by 1.
  // nop: Same as cont, but indicate we are using the target field for
  //      something else.
  // map: Use address supplied by decoder if test fails. Otherwise, unconditional
  //      direct.
  // direct: Conditionally use address supplied by target field. Otherwise,
  //         cont.
  // direct_zero: Conditionally use address supplied by target field. Otherwise,
  //              0.
  jmp_type: enum { cont = 0; nop = 0; map; direct; direct_zero; }, default cont;

  // Various tests (valid current cycle) for conditional jumps:
  // int: Is interrupt line high?
  // exception: Illegal insn, EBRAK, ECALL, misaligned insn, misaligned ld/st?
  // mem_valid: Is current dat_r valid? Did write finish?
  // true: Unconditionally succeed
  cond_test: enum { exception; cmp_alu_o_zero; mem_valid; true}, default true;

  // Invert the results of the test above. Valid current cycle.
  invert_test: bool, default 0;

  // Modify the PC for the next cycle.
  pc_action: enum { hold = 0; inc; load_alu_o; }, default hold;

  // ALU src latch/selection.
  latch_a: bool, default 0;
  latch_b: bool, default 0;
  a_src: enum { gp = 0; imm; alu_o; zero; four; thirty_one; }, default gp;
  b_src: enum { gp = 0; pc; imm; one; dat_r; csr_imm; csr; mcause_latch }, default gp;
  // Latch the A/B inputs into the ALU. Contents vaid next cycle.

  alu_op: enum { add = 0; sub; and; or; xor; sll; srl; sra; cmp_ltu; }, default add;
  // Modify inputs and outputs to ALU.
  alu_i_mod: enum { none = 0; inv_msb_a_b; }, default none;
  alu_o_mod: enum { none = 0; inv_lsb_o; clear_lsb_o }, default none;

  // Either read or write a register in the register file. _Which_ register
  // to read/write comes from the decoded insn.
  // Read contents will be on the data bus the next cycle. When insn_rs1 is
  // paired with insn_fetch, the address sent to the reg file comes directly
  // from bits 15 to 20 on the WB DAT_R bus. Otheriwse, the address sent to the 
  // reg file is retrieved from a holding register for bits 15 to 20 of the
  // previously-decoded instruction word.
  reg_read: bool, default 0;
  reg_write: bool, default 0;
  reg_r_sel: enum { insn_rs1 = 0; insn_rs2 = 1; }, default insn_rs1;
  reg_w_sel: enum { insn_rd = 0; zero = 1; }, default insn_rd;

  // CSR regs can either be read or written in a given cycle, but not both.
  // CSR ops override reg_ops. This is technically a union.
  csr_op: enum { none = 0; read_csr; write_csr }, default none;
  csr_sel: enum { insn_csr; trg_csr }, default insn_csr;

  // Start or continue a memory request. For convenience, an ack will
  // automatically stop a memory request for the cycle after ack, even if
  // mem_req is enabled. Valid on current cycle.
  mem_req: bool, default 0;
  mem_sel: enum { auto = 0; byte = 1; hword = 2; word = 3; }, default auto;
  mem_extend: enum { zero = 0; sign = 1}, default zero;

  // Latch data address register from ALU output.
  latch_adr: bool, default 0;
  latch_data: bool, default 0;
  write_mem: bool, default 0;

  // Current mem request is insn fetch. Valid on current cycle. If set w/
  // mem_req, mem_sel ignored/calculated automatically.
  insn_fetch: bool, default 0;

  except_ctl: enum { none; latch_decoder; latch_jal; latch_store_adr; \
                     latch_load_adr; enter_int; leave_int; }, default none;
};

#define INSN_FETCH insn_fetch => 1, mem_req => 1
#define INSN_FETCH_EAGER_READ_RS1 INSN_FETCH, READ_RS1
#define SKIP_WAIT_IF_ACK jmp_type => direct_zero, cond_test => mem_valid, target => check_int
#define JUMP_TO_OP_END(trg) cond_test => true, jmp_type => direct, target => trg
#define NOT_IMPLEMENTED jmp_type => direct, target => panic
#define NOP target => 0
#define READ_RS1 reg_read => 1, reg_r_sel => insn_rs1
#define READ_RS2 reg_read => 1, reg_r_sel => insn_rs2
#define WRITE_RD reg_write => 1
#define WRITE_RD_CSR csr_op => write_csr
#define READ_RS1_WRITE_RD READ_RS1, reg_write => 1, reg_w_sel => insn_rd
#define CMP_LT alu_op => cmp_ltu, alu_i_mod => inv_msb_a_b
#define CMP_GEU alu_op => cmp_ltu, alu_o_mod => inv_lsb_o
#define CMP_GE  alu_op => cmp_ltu, alu_i_mod => inv_msb_a_b, alu_o_mod => inv_lsb_o
// The LT[U]/GE[U] tests will either return zero or one; this makes it fine
// to reuse the conditional meant for shift ops.
#define CONDTEST_ALU_ZERO cond_test => cmp_alu_o_zero
// HINT: alu_o_mod -> inv_lsb_o can be used to
// implement a check for ALU output being exactly one. Can
// this be utilized anywhere?
// Also, inv_lsb_o does the same as XOR 1. So ((A XOR 1)) XOR 1 is a no-op,
// if a bit convoluted.
#define CONDTEST_ALU_NONZERO invert_test => 1, cond_test => cmp_alu_o_zero
#define JUMP_TO_ZERO cond_test => true, invert_test=> true, jmp_type => direct_zero
#define STOP_MEMREQ_THEN_JUMP_TO_ZERO mem_req=>0, JUMP_TO_ZERO

// CSR Register addresses in private RAM
#define MSTATUS 0
#define MIE 0x4
#define MTVEC 0x5
#define MSCRATCH 0x8
#define MEPC 0x9
#define MCAUSE 0xA
#define MIP 0xC

fetch:
wait_for_ack: INSN_FETCH_EAGER_READ_RS1, invert_test => 1, cond_test => mem_valid, \
                  jmp_type => direct, target => wait_for_ack;
              // Illegal insn or insn misaligned exception possible
check_int:    jmp_type => map, a_src => gp, latch_a => 1, READ_RS2, \
                  except_ctl => latch_decoder, cond_test => exception, \
                  target => save_pc;
origin 2;
       // Make sure x0 is initialized with 0. PC might not be valid, depending
       // on which microcycle a reset or clock enable (if applicable) was
       // asserted/deasserted. So reset PC to zero also.
       // Additionally, MCAUSE CSR is nominally a copy of a latch, but it also
       // should be 0 (for our implementation) after reset.
       //
       // Stale microcode exists on microcode ROM read port for one cycle after
       // non-power-on-resets, since read port lags by one cycle except for
       // after POR. The effects of stale microcode appear on the second cycle
       // after reset. This has the following consequences which we exploit:
       // * Spec mandates MSTATUS.MIE is zero after reset. The ALU output is
       // initialized to 0 upon reset, so stale microcode on read port will
       // never write a non-zero value to registers.
       // * One full cycle after reset was deasserted, we make can no assumptions
       // about ALU contents. So we must explicitly reinitialize the ALU to 0.
reset: latch_a => 1, latch_b => 1, b_src => one, a_src => zero;
       alu_op => and;
       alu_op => and, reg_write => 1, reg_w_sel => zero;
       jmp_type => direct_zero, pc_action => load_alu_o, csr_op => write_csr, \
            csr_sel => trg_csr, invert_test => 1, cond_test => true, \
            target => MCAUSE;

origin 8;
lb_1: latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => lb;
lh_1: latch_b => 1, b_src => imm, jmp_type => direct, target => lh;
lw_1: latch_b => 1, b_src => imm, jmp_type => direct, target => lw;
               NOT_IMPLEMENTED;
lbu_1: latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => lbu;
lhu_1: latch_b => 1, b_src => imm, jmp_type => direct, target => lhu;

lb: alu_op => add;
    latch_adr => 1;
lb_wait:  a_src => zero, b_src => dat_r, latch_a => 1, latch_b => 1, mem_req => 1, invert_test => 1, \
              cond_test => mem_valid, mem_sel => byte, mem_extend => sign, jmp_type => direct, \
              target => lb_wait;
          alu_op => add, JUMP_TO_OP_END(fast_epilog);

lh: alu_op => add;
    latch_adr => 1, except_ctl => latch_load_adr, mem_sel => hword, \
            jmp_type => direct, cond_test => exception, target => save_pc;
lh_wait:  a_src => zero, b_src => dat_r, latch_a => 1, latch_b => 1, mem_req => 1, invert_test => 1, \
              cond_test => mem_valid, mem_sel => hword, mem_extend => sign, jmp_type => direct, \
              target => lh_wait;
          alu_op => add, pc_action => inc, JUMP_TO_OP_END(fast_epilog);

lw: alu_op => add;
    latch_adr => 1, except_ctl => latch_load_adr, mem_sel => word, \
            jmp_type => direct, cond_test => exception, target => save_pc;
lw_wait:  a_src => zero, b_src => dat_r, latch_a => 1, latch_b => 1, mem_req => 1, invert_test => 1, \
              cond_test => mem_valid, mem_sel => word, jmp_type => direct, \
              target => lw_wait;
          alu_op => add, pc_action => inc, JUMP_TO_OP_END(fast_epilog);

lbu: alu_op => add;
     latch_adr => 1;
lbu_wait:  a_src => zero, b_src => dat_r, latch_a => 1, latch_b => 1, mem_req => 1, invert_test => 1, \
              cond_test => mem_valid, mem_sel => byte, jmp_type => direct, \
              target => lbu_wait;
           alu_op => add, JUMP_TO_OP_END(fast_epilog);

lhu: alu_op => add;
     latch_adr => 1, except_ctl => latch_load_adr, mem_sel => hword, \
            jmp_type => direct, cond_test => exception, target => save_pc;
lhu_wait:  a_src => zero, b_src => dat_r, latch_a => 1, latch_b => 1, mem_req => 1, invert_test => 1, \
              cond_test => mem_valid, mem_sel => hword, jmp_type => direct, \
              target => lhu_wait;
           alu_op => add, pc_action => inc, JUMP_TO_OP_END(fast_epilog);

origin 0x24;
// CSR ops take two cycles to decode. This is effectively a no-op in case
// there's an illegal CSR access or something.
csr_trampoline: READ_RS1, jmp_type => map, except_ctl => latch_decoder, \
                     cond_test => exception, target => save_pc;
csrro0_1: a_src => zero, b_src => one, latch_a => 1, latch_b => 1, pc_action => inc, \
            jmp_type => direct, target => csrro0;
csrw_1: a_src => zero, b_src => gp, latch_a => 1, latch_b => 1, pc_action => inc, \
            jmp_type => direct, target => csrwi;
csrrw_1: csr_op => read_csr, csr_sel => insn_csr, a_src => zero, latch_a => 1, \
            b_src => gp, latch_b => 1, pc_action => inc, jmp_type => direct, \
            target => csrrwi;  
csrr_1: csr_op => read_csr, csr_sel => insn_csr, a_src => zero, latch_a => 1, \
            pc_action => inc, jmp_type => direct, target => csrr;   
csrrs_1: csr_op => read_csr, csr_sel => insn_csr, a_src => zero, latch_a => 1, \
            pc_action => inc, jmp_type => direct, target => csrrs;
csrrc_1: csr_op => read_csr, csr_sel => insn_csr, a_src => zero, latch_a => 1, \
            latch_b => 1, b_src => one, pc_action => inc, jmp_type => direct, \
            target => csrrc;
csrwi_1: a_src => zero, b_src => csr_imm, latch_a => 1, latch_b => 1, pc_action => inc, \
            jmp_type => direct, target => csrwi;
csrrwi_1: csr_op => read_csr, csr_sel => insn_csr, a_src => zero, b_src => csr_imm, \
            latch_a => 1, latch_b => 1, pc_action => inc, jmp_type => direct, \
            target => csrrwi;
csrrsi_1: csr_op => read_csr, csr_sel => insn_csr, a_src => zero, latch_a => 1, \
            pc_action => inc, jmp_type => direct, target => csrrsi;
csrrci_1: csr_op => read_csr, csr_sel => insn_csr, a_src => zero, latch_a => 1, \
            latch_b => 1, b_src => one, pc_action => inc, jmp_type => direct, \
            target => csrrci;

origin 0x30;
misc_mem: pc_action => inc, jmp_type => direct, target => fetch;

csrro0: alu_op => and, JUMP_TO_OP_END(fast_epilog);
csrr:  latch_b => 1, b_src => csr;
       alu_op => add, JUMP_TO_OP_END(fast_epilog);
csrwi: alu_op => add, JUMP_TO_OP_END(fast_epilog_csr);
csrrwi: alu_op => add, latch_b => 1, b_src => csr; // Latch old CSR value, pass thru new.
        WRITE_RD_CSR, alu_op => add, JUMP_TO_OP_END(fast_epilog);

csrrsi: latch_b => 1, b_src => csr;
        alu_op => add, b_src => csr_imm, latch_b => 1;
csrrs_2: WRITE_RD, a_src => alu_o, latch_a => 1; // Feed back old CSR value.
         alu_op => or, JUMP_TO_OP_END(fast_epilog_csr);

csrrci: latch_b => 1, b_src => csr, alu_op => sub;  // Synthesize -1 on ALU_O
        // TODO: Unlike GP reads, csr_ops are not sticky. Maybe they should be?
        csr_op => read_csr, csr_sel => insn_csr, alu_op => add, a_src => alu_o, \
            b_src => csr_imm, latch_a => 1, latch_b => 1;
csrrc_2: WRITE_RD, b_src => csr, latch_b => 1, alu_op => xor; // Bit Clear = A & ~B
         a_src => alu_o, latch_a => 1;
         alu_op => and, JUMP_TO_OP_END(fast_epilog_csr);

origin 0x40;
addi_1: latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => addi;
slli_1:
        // All the Shift-Immediates pass through to the Shift-Register logic;
        // the AND with 31 is harmless for SLL and SRL, and required for SRA
        // because of a hardcoded 1 in the imm12.
        READ_RS1, a_src => thirty_one, latch_a => 1, b_src => imm, \
                latch_b => 1, pc_action => inc, jmp_type => direct, \
                target => sll;
slti_1: latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => slti;
sltiu_1: latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => sltiu;
xori_1: latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => xori;
srli_1: READ_RS1, a_src => thirty_one, latch_a => 1, b_src => imm, \
                latch_b => 1, pc_action => inc, jmp_type => direct, \
                target => srl;
ori_1: latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => ori;
andi_1: latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => andi;
              NOT_IMPLEMENTED;  // 0b1000  subi?
csrrs: READ_RS1, latch_b => 1, b_src => csr;
        alu_op => add, b_src => gp, latch_b => 1, jmp_type => direct, \
            target => csrrs_2;
csrrc:  READ_RS1, latch_b => 1, b_src => csr, alu_op => sub;  // Synthesize -1 on ALU_O
        csr_op => read_csr, csr_sel => insn_csr, alu_op => add, a_src => alu_o, \
            b_src => gp, latch_a => 1, latch_b => 1, jmp_type => direct, target => csrrc_2;
srai_1: READ_RS1, a_src => thirty_one, latch_a => 1, b_src => imm, \
                latch_b => 1, pc_action => inc, jmp_type => direct, \
                target => sra;

origin 0x50;
auipc: latch_a => 1, latch_b => 1, a_src => imm, b_src => pc;
       alu_op => add, pc_action => inc;
       WRITE_RD, jmp_type => direct, cond_test => true, target => fetch;
           
addi:         alu_op => add, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
slti:         CMP_LT, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
sltiu:        alu_op => cmp_ltu, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
xori:         alu_op => xor, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
ori:          alu_op => or, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
andi:         alu_op => and, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);

sll_loop:
              // Subtract 1 from shift cnt, preliminarily save shift results
              // in case we bail (microcode cannot be interrupted, so user
              // will never see this intermediate result).
              // Also write the previous shift, either from prolog or last
              // loop iteration.
              alu_op => sub, a_src => alu_o, latch_a => 1, WRITE_RD;
              // Then, do the shift, and bail if the shift cnt reached zero.
              alu_op => sll, a_src => alu_o, b_src => one, latch_a => 1, latch_b => 1, \
                  jmp_type => direct_zero, CONDTEST_ALU_NONZERO, target => sll_loop;

srl_loop:
              alu_op => sub, a_src => alu_o, latch_a => 1, WRITE_RD;
              alu_op => srl, a_src => alu_o, b_src => one, latch_a => 1, latch_b => 1, \
                  jmp_type => direct_zero, CONDTEST_ALU_NONZERO, target => srl_loop;

sra_loop:
              alu_op => sub, a_src => alu_o, latch_a => 1, WRITE_RD;
              // Then, do the shift, and bail if the shift cnt reached zero.
              alu_op => sra, a_src => alu_o, b_src => one, latch_a => 1, latch_b => 1, \
                  jmp_type => direct_zero, CONDTEST_ALU_NONZERO, target => sra_loop;

origin 0x80;
sb_1: READ_RS2, latch_b => 1, b_src => imm, pc_action => inc, jmp_type => direct, \
                target => sb;
sh_1: READ_RS2, latch_b => 1, b_src => imm, jmp_type => direct, target => sh;
sw_1: READ_RS2, latch_b => 1, b_src => imm, jmp_type => direct, target => sw;

predict_not_taken_neq:
     // Old PC still available in ALU latches. Preemptively assume branch not
     // taken and load new PC. Construct the jump target in case this was a bad
     // assumption, and pass the old PC through.
     pc_action => inc, a_src => zero, latch_a => 1, alu_op => add, \
        CONDTEST_ALU_NONZERO, jmp_type => direct_zero, \
        target => mispredict_branch_was_taken;

predict_not_taken_eq:
     // Old PC still available in ALU latches. Preemptively assume branch not
     // taken and load new PC. Construct the jump target in case this was a bad
     // assumption, and pass the old PC through.
     pc_action => inc, a_src => zero, latch_a => 1, alu_op => add, \
        CONDTEST_ALU_ZERO, jmp_type => direct_zero, \
        target => mispredict_branch_was_taken;

mispredict_branch_was_taken:
     // If branch required, preemptively assume the address is good, and load
     // the branch target into the PC. If this fails, the old PC will be
     // available to rollback and go to exception handler.
     alu_op => add, pc_action => load_alu_o, except_ctl => latch_jal, \
        jmp_type => direct_zero, cond_test => exception, \
        target => branch_exception_detected;

branch_exception_detected:
     // Old PC is available on ALU output. We have an exception. Rollback PC
     // and begin exception handler.
     pc_action => load_alu_o, cond_test => true, jmp_type => direct, \
        target => save_pc;

origin 0x88;
branch_ops:
beq_1: latch_b => 1, b_src => gp, jmp_type => direct, target => beq;
                
bne_1: latch_b => 1, b_src => gp, jmp_type => direct, target => bne;
                NOT_IMPLEMENTED;
                NOT_IMPLEMENTED;
blt_1: latch_b => 1, b_src => gp, jmp_type => direct, target => blt;
bge_1: latch_b => 1, b_src => gp, jmp_type => direct, target => bge;
bltu_1: latch_b => 1, b_src => gp, jmp_type => direct, target => bltu;
bgeu_1: latch_b => 1, b_src => gp, jmp_type => direct, target => bgeu;

beq: a_src => imm, b_src => pc, latch_a => 1, latch_b => 1, alu_op => sub, \
        jmp_type => direct, target => predict_not_taken_eq;
bne: a_src => imm, b_src => pc, latch_a => 1, latch_b => 1, alu_op => sub, \
        jmp_type => direct, target => predict_not_taken_neq;
blt: a_src => imm, b_src => pc, latch_a => 1, latch_b => 1, CMP_LT, \
        jmp_type => direct, target => predict_not_taken_neq;
bge: a_src => imm, b_src => pc, latch_a => 1, latch_b => 1, CMP_GE, \
        jmp_type => direct, target => predict_not_taken_neq;
bltu: a_src => imm, b_src => pc, latch_a => 1, latch_b => 1, alu_op => cmp_ltu, \
        jmp_type => direct, target => predict_not_taken_neq;
bgeu: a_src => imm, b_src => pc, latch_a => 1, latch_b => 1, CMP_GEU, \
        jmp_type => direct, target => predict_not_taken_neq;

origin 0x98;
jalr: b_src => imm, latch_b => 1;
jalr_shared:
      // Bring in PC and prepare to construct PC + 4. Calculate jmp target.
      latch_a => 1, latch_b => 1, a_src => four, b_src => pc, alu_op => add, \
        alu_o_mod => clear_lsb_o;
      // PC + 4 will be avail on next cycle, which fast_epilog will save into
      // RD. If we had an exception, then we have to wait until the old PC
      // is available, which is still latched in ALU B input.
      // Preemptively load PC with jmp target.
      a_src => zero, latch_a => 1, pc_action => load_alu_o, alu_op => add, \
        except_ctl => latch_jal, jmp_type => direct, cond_test => exception, \
        invert_test => 1, target => fast_epilog;
      // Exception detected. Pass the old PC through, and then reload.
      alu_op => add, jmp_type => direct, cond_test => true, \
        target => branch_exception_detected;

sb: a_src => zero, b_src => gp, latch_a => 1, latch_b => 1, alu_op => add;
    alu_op => add, latch_adr => 1;
    mem_sel => byte, latch_data => 1;
sb_wait:  mem_req => 1, invert_test => 1, cond_test => mem_valid, \
              mem_sel => byte, write_mem => 1, jmp_type => direct, target => sb_wait;
          STOP_MEMREQ_THEN_JUMP_TO_ZERO;

sh: a_src => zero, b_src => gp, latch_a => 1, latch_b => 1, alu_op => add;
    alu_op => add, latch_adr => 1, except_ctl => latch_store_adr, mem_sel => hword, \
        jmp_type => direct, cond_test => exception, target => save_pc;
    mem_sel => hword, latch_data => 1, pc_action => inc;
sh_wait:  mem_req => 1, invert_test => 1, cond_test => mem_valid, \
              mem_sel => hword, write_mem => 1, jmp_type => direct, target => sh_wait;
          STOP_MEMREQ_THEN_JUMP_TO_ZERO;

sw: a_src => zero, b_src => gp, latch_a => 1, latch_b => 1, alu_op => add;
    alu_op => add, latch_adr => 1, except_ctl => latch_store_adr, mem_sel => word, \
        jmp_type => direct, cond_test => exception, target => save_pc;
    mem_sel => word, latch_data => 1, pc_action => inc;
sw_wait:  mem_req => 1, invert_test => 1, cond_test => mem_valid, \
              mem_sel => word, write_mem => 1, jmp_type => direct, target => sw_wait;
          STOP_MEMREQ_THEN_JUMP_TO_ZERO;

origin 0xB0;
jal: a_src => imm, b_src => pc, latch_a => 1, latch_b => 1, \
        jmp_type => direct, target => jalr_shared;

fast_epilog: INSN_FETCH_EAGER_READ_RS1, WRITE_RD, SKIP_WAIT_IF_ACK;
fast_epilog_csr: INSN_FETCH_EAGER_READ_RS1, WRITE_RD_CSR, SKIP_WAIT_IF_ACK;

origin 0xc0;
add_1:        latch_b => 1, b_src => gp, pc_action => inc, jmp_type => direct, \
                    target => add;
              // Re: READ_RS1... the reg values read out of the GP file are
              // sticky, but as part of pipelining, we read out RS2's value
              // during dispatch/check_int.
              // We'll need RS1 again, so get it back.
sll_1:        READ_RS1, a_src => thirty_one, latch_a => 1, b_src => gp, \
                    latch_b => 1, pc_action => inc, jmp_type => direct, \
                    target => sll;
slt_1:        latch_b => 1, b_src => gp, pc_action => inc, jmp_type => direct, \
                    target => slt;
sltu_1:       latch_b => 1, b_src => gp, pc_action => inc, jmp_type => direct, \
                    target => sltu;
xor_1:        latch_b => 1, b_src => gp, pc_action => inc, jmp_type => direct, \
                    target => xor;
srl_1:        READ_RS1, a_src => thirty_one, latch_a => 1, b_src => gp, \
                    latch_b => 1, pc_action => inc, jmp_type => direct, \
                    target => srl;
or_1:         latch_b => 1, b_src => gp, pc_action => inc, jmp_type => direct, \
                    target => or;
and_1:        latch_b => 1, b_src => gp, pc_action => inc, jmp_type => direct, \
                    target => and;
sub_1:        latch_b => 1, b_src => gp, pc_action => inc, jmp_type => direct, \
                    target => sub;  // 0b1000
              NOT_IMPLEMENTED;  // 0b1001
              NOT_IMPLEMENTED;  // 0b1010
              NOT_IMPLEMENTED;  // 0b1011
              NOT_IMPLEMENTED;  // 0b1101
sra_1:        READ_RS1, a_src => thirty_one, latch_a => 1, b_src => gp, \
                    latch_b => 1, pc_action => inc, jmp_type => direct, \
                    target => sra;

origin 0xd0;
lui:    a_src => zero, b_src => imm, latch_a => 1, latch_b => 1, pc_action => inc, \
            jmp_type => direct, target => addi;


add:          alu_op => add, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
slt:          CMP_LT, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
sltu:         alu_op => cmp_ltu, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
xor:          alu_op => xor, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
or:           alu_op => or, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
and:          alu_op => and, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);
sub:          alu_op => sub, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);

sll:
             // Get first input to shift it. Restrict second input
             // (shift count) from 0-31. Set up b_src for shift loop.
             a_src => gp, latch_a => 1, b_src => one, latch_b => 1, alu_op => and;
             // Do a shift, but also check if shift count was zero/
             // If so, bail. Otherwise, we're all set for the main shift loop.
             a_src => alu_o, latch_a => 1, alu_op => sll, \
                  jmp_type => direct, CONDTEST_ALU_NONZERO, target => sll_loop;
             // Whoops, was a zero shift. Pass through original RS1 and write
             // to dest!
             a_src => zero, b_src => gp, latch_a => 1, latch_b => 1;
             alu_op => add, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);

srl:
             // Same comments as sll apply here.
             a_src => gp, latch_a => 1, b_src => one, latch_b => 1, alu_op => and;
             a_src => alu_o, latch_a => 1, alu_op => srl, \
                  jmp_type => direct, CONDTEST_ALU_NONZERO, target => srl_loop;
             a_src => zero, b_src => gp, latch_a => 1, latch_b => 1;
             alu_op => add, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);

sra:  
             // Same comments as sll apply here.
             a_src => gp, latch_a => 1, b_src => one, latch_b => 1, alu_op => and;
             a_src => alu_o, latch_a => 1, alu_op => sra, \
                  jmp_type => direct, CONDTEST_ALU_NONZERO, target => sra_loop;
             a_src => zero, b_src => gp, latch_a => 1, latch_b => 1;
             alu_op => add, INSN_FETCH, JUMP_TO_OP_END(fast_epilog);

// Interrupt handler.
origin 0xf0;
save_pc: except_ctl => enter_int, csr_op => read_csr, csr_sel => trg_csr, \
            a_src => zero, b_src => pc, latch_a => 1, latch_b => 1, target => MTVEC;
         // Latch MTVEC, pass thru PC.
         alu_op => add, b_src => csr, latch_b => 1;
         // Read mcause_latch, write MEPC, pass thru MTVEC.
         alu_op => add, b_src => mcause_latch, latch_b => 1, csr_op => write_csr, \
            csr_sel => trg_csr, target => MEPC;
         // Write PC, pass thru mcause_latch
         alu_op => add, pc_action => load_alu_o;
         // Write MCAUSE, and start exception handler.
         INSN_FETCH, jmp_type => direct_zero, invert_test => 1, cond_test => true, \
            csr_op => write_csr, csr_sel => trg_csr, target => MCAUSE;


origin 248;
mret:    csr_op => read_csr, csr_sel => trg_csr, a_src => zero, latch_a => 1, target => MEPC;
         // Latch MEPC
         b_src => csr, latch_b => 1;
         // Pass thru MEPC
         alu_op => add;
         // Write PC
         pc_action => load_alu_o;
         except_ctl => leave_int, INSN_FETCH, jmp_type => direct, target => fetch;


origin 254;
halt: jmp_type => direct, target => halt;
origin 255;
panic: jmp_type => direct, target => panic;

Footnotes