# Microcode This section describes the rationale for using a microcode, as well as its weaknesses, some terminology, and finally, the various microcode fields of Sentinel. The default microcode file is also included to study! If you want a short version for the rationale, and just skip over to the [terminology](#terminology): Sentinel is a [horizontally-microcoded](https://en.wikipedia.org/wiki/Microcode#Horizontal_microcode) CPU design because I wanted to create a conforming `RV32I_Zicsr` CPU with M-Mode that fits into ~1000 `ICESTORM_LCs`[^1] on an iCE40 FPGA. It wasn't- _and still isn't_- clear to me that I could hit these goals without using some of the FPGA Block RAM for a microcode instead of LUTs for a hardwired control unit. ## Some References CPUs implementing a [Reduced Instruction Set](https://en.wikipedia.org/wiki/Reduced_instruction_set_computer), like RISC-V, are best optimized for speed by using [dedicated circuitry](https://en.wikipedia.org/wiki/Control_unit#Hardwired_control_unit) to control the internal components. Today, microcode is generally relegated to special cases. However, both hardwired and microcoded control fulfill the same general purpose; "drive 0s and 1s to a CPUs various components to move and manipulate data". I designed a hardwired CPU for a class many years ago; it took me a while to wrap my head around how microcode works and find the right people to point me in the right direction. I found the following resources useful: * Over the years, the [Wikipedia article](https://en.wikipedia.org/wiki/Microcode#Horizontal_microcode) has gotten a lot better. * The ZPU Stack Machine CPU has a [microcode implementation](https://github.com/zylin/zpu/tree/master/zpu/hdl/avalanche) to study. * In my opinion, the gold standard for microcode design is [Bit-slice microprocessor design](http://bitsavers.informatik.uni-stuttgart.de/components/amd/bitslice/Mick_Bit-Slice_Microprocessor_Design_1980.pdf) by John Mick and Jim Brick. The book was written for with the [Am2900 series](https://en.wikipedia.org/wiki/AMD_Am2900) in mind, which are long out-of-production parts. However, the Am2900 series were building blocks _designed_ for microcode. As the book teaches you to to make a custom CPU [datapath](https://en.wikipedia.org/wiki/Datapath) out of Am2900 parts, you by extension learn how to design a microcode. ## Hardwired and Microcoded Rationale Back in the 80s, microcode allowed for flexibility of implementation and quick iteration. If there was a bug or a design change, you may only have to swap the microcode out of [EPROMs](https://en.wikipedia.org/wiki/EPROM)s or RAM; and the rest of your Am2900 building blocks/glue logic would remain untouched. Today, [FPGAs](https://en.wikipedia.org/wiki/Field-programmable_gate_array) serve much the same purpose, and can swap out an _entire_ CPU- not just microcode- in seconds. In most cases, designing a hardwired control unit on an FPGA will have acceptable iteration time and flexibility thanks to describing circuits in code. A hardwired implementation also makes it easier to implement speed optimizations like tight pipelining, which are difficult to reason about in microcode[^2]. Thanks to changes in design process, there is _probably_ no reason to do a performance-oriented microcode RISC-V implementation. When optimizing for size, you still should probably hardwire your RISC-V core. There are several perfectly usable hardwired RISC-V CPUs in < 1000 LCs, such as: * [Award-winning SERV](https://github.com/olofk/serv) * [FemtoRV32](https://github.com/BrunoLevy/learn-fpga/tree/master/FemtoRV/RTL/PROCESSOR) * [ice-v](https://github.com/sylefeb/Silice/tree/master/projects/ice-v) * [VexRiscv](https://github.com/SpinalHDL/VexRiscv) I could probably design a _minimal_ hardwired Sentinel that doesn't have _that_ much more circuitry than the current microcoded version. However, I set a goal of a complete RISC-V implementation including M-Mode in ~1000 LCs. It wasn't clear to me- and still isn't- that a non-microcode RISC-V implementation could fit in 1000 LCs without making concessions that I didn't want to[^3], such as: * Limit datapath width. * Remove IRQs and M-Mode. * Do not handle illegal instructions. Even if I did a hardwired control unit, I knew that I was going to be multicycle and not tightly pipelined. My own experience is that a basic RISC pipeline 32-bit CPU takes at least 2000 ICE40 LCs minimum[^4] thanks to pipeline control logic. At that point, for any design meeting my requirements, I figured the speed of a microcoded and a hardwired RISC-V without pipeline control would be similar. Around the same time in late 2020[^5] was when I found out about Mick and Brick. I found the book fascinating (and still do), so I was already looking for an excuse to write a microcode for fun. Additionally, I realized could put a microcode to good use by leveraging FPGA [block RAM](https://nandland.com/lesson-15-what-is-a-block-ram-bram/) to hold the microcode program. This gave me more precious LUT breathing room that would otherwise be used by a hardwired implemetation. Since, I _already_ wasn't expecting my implementation to be fast, all of a sudden I found the potential space savings of a microcode very appealing. Via creative uses of block RAM and microcoding, Sentinel itself reached my goal of ~1000 LCs while implementing `RV32I_Zicsr` and M-Mode. However, it's still hit or miss whether a full [SoC](https://en.wikipedia.org/wiki/System_on_a_chip) fits into my target ICE40HX1K FPGA (1280 LCs max), depending on `yosys` optimizations and Amaranth changes. We'll see what the future holds. Additionally, while not every application needs a super fast CPU, _there is plenty of room for Sentinel and other small RISC-V implementations to coexist!_ Users need to decide for themselves if Sentinel fits their needs. ## Terminology Mick and Brick introduces some jargon that I use in Sentinel: ```{note} This list is probably incomplete. ``` ```{glossary} Condition Code Multiplexer A multiplexer of various conditional tests. The output of this multiplexer, selected by the microcode, becomes an test input to the {term}`Sequencer`. The conditional test result can often be inverted by microcode to double the number of possible tests. Tests conditions used by Sentinel include: * Is the ALU output zero/nonzero? * Is a memory access complete/incomplete? * Did an exception occur/not occur? * Unconditionally true/false. Macroinstruction An unit of execution from the CPU's instruction set, composed of microinstructions. In Sentinel's case, macroinstructions are RISC-V instructions. Mapping (P)ROM A (P)ROM which maps of the macroarchitecture opcode into microprogram jump targets. It is the hardware version of a [jump table](https://en.wikipedia.org/wiki/Branch_table), where the jump index is retrived from the macroinstruction opcode. Each macroinstruction is a loop through the microprogram. The mapping (P)ROM jumps from microcode common to all macroinstructions to microcode specific to each (group of) macroinstruction(s). In Sentinel, the Mapping "(P)ROM" is implemented in combinational logic. Microinstruction A microprogram/microcode instruction. Macroinstructions are composed of multiple microinstructions. Each microinstruction takes one clock cycle. Microprogram Counter Register whose value is the address of the microcode instruction which will execute on the _next_ clock cycle, _assuming the sequencer chooses to use it_. Pipeline Register In microcode, the pipeline register _specifically_ refers to a holding register containing the bits of the currently-executing microinstruction. In Sentinel, the pipeline register is part of the _synchronous_ read port of the Block RAM holding the microprogram. The address input to the microprogram Block RAM is the output of the sequencer; the data for the microinstruction at this address appears on the read port on the next clock cycle. Sequencer Component which supplies the address of the microinstruction which will be output on the _next_ clock cycle. It chooses between various sources based on a test condition provided by the {term}`Condition Code Multiplexer`. Sources used by Sentinel, include: * The microprogram counter. * An address constant in the microcode instruction. * A mapping PROM. * An implied constant `0`. ``` ## Microcode Fields ```{eval-rst} .. automodule:: sentinel_cpu.ucodefields :member-order: bysource ``` ## Default Microcode Annotated Source Many jump addresses are hardcoded by the mapping PROM. Since there is only room for 256 instructions, the remaining required jumps go to wherever there is extra room. With that said: * I try to keep instructions with the similar functionality ("major opcode") together. * I try to avoid backward jumps, except for jumping to the next {term}`macroinstruction`, but they are sometimes unavoidable (see `beq` and `bne` labels). ```{eval-rst} .. literalinclude:: ../../src/sentinel_cpu/microcode.asm ``` ## Footnotes [^1]: Not the same as a LUT4, but AIUI, in pathologically bad PnR cases, the number of `ICESTORM_LC`s used will be equal to the _sum_ of number of LUT4s and number of FFs! [^2]: Well, at least for me it's difficult to reason about! I've mulled over trying a microcoded 2 or 3 stage pipelined CPU before to see how bad the control flow of a microcode program would get. It might be possible to implement using `n`-way jumps and checking the pipeline state every microinstruction. [^3]: I already _do_ make a concession in Sentinel's implementation by not implementing the Machine Timer. I justify it because the Machine Timer is Memory-Mapped I/O and not part of the CPU itself. A 64-bit counter is just too much to ask for with everything else going on in 1000 LCs. [^4]: VexRiscv, a pipelined RISC-V, got LC usage down to 1130 by e.g. [not handling](https://github.com/SpinalHDL/VexRiscv/blob/7f2bccbef256b3ad40fb8dc8ba08a266f9c6256b/src/main/scala/vexriscv/plugin/CsrPlugin.scala#L297-L317) illegal instructions and having no interrupts. So that's cool :D. [^5]: I did not work on Sentinel between fall 2020 and fall 2023. A number of things went right in fall 2023 such that I felt prepared to finish and maintain Sentinel.