# Sentinel Internal Structure
## Block Diagram
Below is a simplified block diagram of Sentinel, showing how the main {py:class}`~amaranth.lib.wiring.Component`s
connect to each other. Behavior of `Components` not represented in the block
diagram will be explained in the next sections.
Blue arrows represent outputs from the microcode ROM's data [Signal](https://amaranth-lang.org/docs/amaranth/latest/guide.html#signals)s,
while purple arrows represents inputs from each `Component` back to the
microcode ROM address `Signal`s. A blue arrow _into_ the microcode ROM reflects
the fact that some microcode data outputs feed back into the ROM's
address as inputs.
## Register File
```{eval-rst}
.. automodule:: sentinel_cpu.datapath
:members:
```
## Microcode ROM and Control
```{eval-rst}
.. automodule:: sentinel_cpu.ucoderom
:members:
```
```{eval-rst}
.. automodule:: sentinel_cpu.control
:members:
```
```{eval-rst}
.. _mapping-details:
Mapping Details
---------------
At present (1/6/2025), I mostly hand-calculated the :class:`~sentinel_cpu.control.MappingROM`
jump table. Many RV32I :class:`instruction bits ` can
be reconstructed by other microcode fields after
:class:`decoding `, such as
:class:`operand sources ` and
:attr:`immediates `. From there, I found a
reasonably small map in terms of combinational logic by playing with the
remaining instruction bits- mostly
:class:`major ` and
:attr:`minor ` opcodes- in a text file:
.. code-block::
LOAD = 0b00001000 0x08
0b00001001 0x09
0b00001010 0x0A
0b00001100 0x0C
0b00001101 0x0D
MISC_MEM = 0b00110000 0x30
OP_IMM = 0b01000000 0x40
0b01000010 0x42
0b01000011 0x43
0b01000100 0x44
0b01000110 0x46
0b01000111 0x47
0b01000001 0x41
0b01000101 0x45
0b01001101 0x4D
AUIPC = 0b01010000 0x50
STORE = 0b10000000 0x80
0b10000001 0x81
0b10000010 0x82
BRANCH = 0b10001000 0x88
0b10001001 0x89
0b10001100 0x8C
0b10001101 0x8D
0b10001110 0x8E
0b10001111 0x8F
JALR = 0b10011000 0x98
JAL = 0b10110000 0xB0
OP = 0b11000000 0xC0
0b11000001 0xC1
0b11000010 0xC2
0b11000011 0xC3
0b11000100 0xC4
0b11000101 0xC5
0b11001101 0xCD
0b11000110 0xC6
0b11000111 0xC7
0b11001000 0xC8
SYSTEM = 0b11000000 0xC0 (handled specially)
0b11000000 0xC0 (handled specially)
LUI = 0b11010000 0xD0
CSRs are placed wherever they fit; I chose 0x24 as a starting point.
CSR compression relies on the fact that Sentinel doesn't actually implement
most CSRs, and so their addresses can be treated as blanket
`don't cares `_:
.. code-block::
mstatus 0x300 => 0b001100000000 => 0bxxxxx0xxx000 => 0b0000 - ffs
mie 0x304 => 0b001100000100 => 0bxxxxx0xxx100 => 0b0100 - ffs
mtvec 0x305 => 0b001100000101 => 0bxxxxx0xxx101 => 0b0101 - bram
mscratch 0x340 => 0b001101000000 => 0bxxxxx1xxx000 => 0b1000 - bram
mepc 0x341 => 0b001101000001 => 0bxxxxx1xxx001 => 0b1001 - bram
mcause 0x342 => 0b001101000010 => 0bxxxxx1xxx010 => 0b1010 - bram
mip 0x344 => 0b001101000100 => 0bxxxxx1xxx100 => 0b1100 - ffs
Of course, if I physically implement more registers, the table will need
to change, up to and including using more than 16 addresses :).
.. todo::
* Perhaps include rest of the raw notes on how I derived start
locations from opcodes; right now only the results are included.
* Start locations need to be documented as constants, including "base"
constants where the minor opcode is just added to the base to form
the final constant.
```
## Instruction Decoder
```{eval-rst}
.. automodule:: sentinel_cpu.decode
:members:
```
```{eval-rst}
.. automodule:: sentinel_cpu.insn
:members:
```
## Arithmetic Logic Unit (ALU)
```{eval-rst}
.. automodule:: sentinel_cpu.alu
:exclude-members: ASrcMux, BSrcMux
```
## Exception Control
Exception control has not yet been incorporated into the above block diagram.
```{eval-rst}
.. automodule:: sentinel_cpu.exception
:members:
```
## ALU Sources
The {py:class}`~sentinel_cpu.alu.ALU`'s two inputs `A` and `B` are fed by two
separate muxes. Each mux can choose from one of up to 8 data sources. Not all
data sources are shared between the two muxes.
```{eval-rst}
.. autoclass:: sentinel_cpu.alu.ASrcMux
:members:
```
```{eval-rst}
.. autoclass:: sentinel_cpu.alu.BSrcMux
:members:
```
These muxes and latches live in the implementation of {py:class}`~sentinel_cpu.top.Top`.
## Fetch/Load/Store Unit
The Fetch/Load/Store Unit is implemented in-line in {py:class}`~sentinel_cpu.top.Top`
using the components from the {py:mod}`~sentinel_cpu.align` module.
```{eval-rst}
.. automodule:: sentinel_cpu.align
:members:
```
Aside from aligning, the glue logic for latching addresses, read data, and
write data is minimal and controlled directly by
{mod}`microcode signals `.
## Instruction Cycle Counts
```{todo}
I need to create a test that gets latency and throughput for each instruction
type of the core.
```
The following counts are general observations (as of 11/18/2023), from
examining the microcode (knowing that each microcode instruction always takes
1 clock cycle):
* _There is room for improvement, even without making the core bigger._
* Fetch/Decode takes a _minimum_ of two cycles thanks to Wishbone classic's
REQ/ACK handshake taking two cycles.
* When Wishbone ACK is asserted, Decode is taking place.
* The GP file is a synchronous single read port, single write port. Sentinel
loads RS1 out of the register file during Decode.
* All instructions share the same operation the cycle after ACK/Decode:
* Check for exceptions/interrupts, go to exception handler if so.
* Latch RS1 into the ALU.
* Load RS2 out of the register file, in anticipation for a "simple"
instruction.
* Jump to the instruction-specific microcode block.
* At minimum, an instruction (`addi`, `or`, etc) takes 3 cycles to retire
after the initial shared cycles. This means Sentinel instructions have a
minimum latency of 6 cycles per instruction (CPI).
* I define "retirement" to mean "cycle in which we return to the Fetch/Decode
or Exception Checking part of the microcode program". This usually
corresponds to "cycle after RD/PC was written with results".
* Sentinel instructions have a maximum throughput of 4 CPI by overlapping the
2 Fetch/Decode cycles of the _next_ instruction after the initial 3 shared
cycles of the _current_ instruction when possible ("pipelining").
* Some instructions overlap one of the Fetch/Decode cycles, some don't
overlap either of them. In particular, shift instructions with a nonzero
shift count don't pipeline Fetch/Decode. It may be possible to _always_
overlap at least one cycle, but I haven't tweaked the core yet to ensure
this is a sound optimization.
* _Shift instructions need work_:
* For a shift of zero, shift-immediate _and_ shift-register latency is 9 CPI,
and throughput is 7 CPI.
* For a shift of nonzero `n`, shift-immediate _and_ shift-register latency
and throughput is 6 + 2*`n` CPI.
* Branch-not-taken latency and throughput is 6 CPI. Branch-taken latency and
throughput is 7 CPI.
* JAL/JALR latency is 7 CPI, throughput is 6 CPI.
* Store latency and throughput is 10 CPI minimum. 2 cycles minimum are spent
waiting for Wishbone ACK.
* The core _will_ release STB/CYC between the store and fetch of the next
instruction.
* Load latency is 10 CPI minimum, and throughput is 9 CPI. 2 cycles minimum
are spent waiting for Wishbone ACK.
* The core _will_ release STB/CYC between the load and fetch of the next
instruction.
* CSR instructions require an extra Decode cycle compared to all other
instructions (to check for legality).
* At minimum, a read of a read-only zero CSR register has a latency of 7 CPI,
and a throughput of 6 CPI.
* At maximum, `csrrc[i]` has a latency of 11 CPI, and a throughput of 10 CPI.
* Entering an exception handler requires 5, 6 (branch exceptions), or 7 clocks
(JAL[R] exceptions) from the cycle at which the exception condition is detected.
* `mret` has a latency and throughput of 8 CPI.
(csrs)=
## CSRs
Sentinel physically implements the following CSRs:
* `mscratch`
* `mcause`
* The core can only physically trigger a subset of defined exceptions:
* Machine external interrupt
* Instruction access misaligned
* Illegal instruction
* Breakpoint
* Load address misaligned
* Store address misaligned
* Environment call from M-mode
In particular worth noting:
* _Misaligned accesses are not implemented in hardware._
* There is no machine timer (a 64-bit counter is a bit too much to
ask for right now :(...).
* `mip`
(meip)=
* Only the `MEIP` bit is implemented. The `MSIP` and `MTIP` bits always read
as zero. The RISC-V Privileged Spec (Version: 20260120 page 46) says:
> `MEIP` is read-only in `mip`, and is set and cleared by a
> platform-specific interrupt controller.
The user must provide their own interrupt controller. See
{class}`sentinel_cpu.top.Top`.
One simple implementation is to `OR` all external interrupt sources
together and feed it to the {attr}`IRQ line `. When
any of the `OR` inputs are asserted, this will be reflected in the `MEIP`
bit, indicating _at least one_ I/O peripheral needs attention. The Sentinel
program will then query each I/O peripheral to figure out _exactly which_
peripherals need attention. An example implementation can be found for the
serial and timer peripherals in {mod}`examples.attosoc` and [`sentinel-rt`](./support-code.md#sentinel-rt).
```{note}
In the future, I may implement the high (platform-specific) 16-bits of
`mip`/`mie` to make interrupt-handling quicker.
```
* `mie`
* Only the `MEIE` bit is implemented.
* `mstatus`
* Only the `MPP`, `MPIE`, and `MIE` bits are implemented.
* `mtvec`
* The `BASE` is writeable; only the Direct `MODE` setting is implemented.
```{todo}
A read-only `BASE` is allowed, but I believe the Rust [support code](./support-code.md)
assumes a writable `BASE`. I don't wish to fork [`riscv-rt`](https://github.com/rust-embedded/riscv/tree/master/riscv-rt)
solely for a read-only `BASE`. So I deal with the potential loss of space
savings for now.
Revisit whether read-only `BASE` is feasible in the future.
```
* `mepc`
The following CSRs are implemented as read-only zero and trigger an exception
on an attempt to write:
* `mvendorid`
* `marchid`
* `mimpid`
* `mhartid`
* `mconfigptr`
The following CSRs are implemented as read-only zero (no exception on write):
* `misa`
* `mstatush`
* `mcountinhibit`
* `mtval`
* `mcycle`
* `minstret`
* `mhpmcounter3-31`
* `mhpmevent3-31`
All remaining machine-mode CSRs are unimplemented and trigger an exception on
_any_ access:
* `medeleg`
* `mideleg`
* `mcounteren`
* `mtinst`
* `mtval2`
* `menvcfg`
* `menvcfgh`
* `mseccfg`
* `mseccfgh`
```{eval-rst}
.. automodule:: sentinel_cpu.csr
:members:
```