Memory Stage

LSU (Load Store Unit)

In RISC-V core, the Load Store Unit (LSU) is a critical pipeline component responsible for controlling load and store instructions and handling various error conditions, particularly address misalignment. The LSU acts as an interface between the processor’s execution stage and the Memory Management Unit (MMU), ensuring proper data transfer, address calculation, and exception handling.

LSU Queue: The LSU implement internal queue to store all necessary dtat for memory operactions.
Address Misaligned: In this part, we want LSU detect misaligned error and solve it. Therefore, we need to record every condition, including memory addresses, data values, instruction types, and control signals.
Writeback Data Calculation: This part is for load instruction. In our design, lsu will sent read word signal to mmu even if instruction is WB or WH. Therefore, it needs caculate word data after lsu get terget data.

Implement Detail

Parameter

Parameter	Description
LENGTH	Queue length parameter
DEPTH	Queue depth parameter
DATASIZE	Internal data size for queue operations

Address and Data Procressing

Address: LSU will control address which will be sent into MMU be mutiple of 4. Data: LSU use mask to control which byte of write data are needed to be writen into memory.

LSU Queue

Queue store data which generate according to input opcode and singal.

Push: input instruction is valid, and misaligned condition.
Pop: when memory return finished signal.

Store Data:

Signal	Width	Description
mem_addr	32	Target memory address
mem_data	32	value of write data
lb_inst, lh_inst, lw_inst	1	Load instruction type
signed_inst	1	load instruction is signed
mem_rd, mem_wr	1	memory read and write signal
mem_mask	4	store or load mask
u_type	1	additional instruction for misaligned problem

Address Misaligned

The LSU includes sophisticated misalignment detection and correction logic. When misaligned memory accesses are detected, the LSU automatically:

Splits the misaligned access into multiple aligned memory transactions
Maintains state information across multiple memory cycles
Reconstructs the final result from partial memory responses
Ensures atomicity of the original memory operation from an architectural perspective

The LSU handles unaligned memory accesses by splitting them into multiple aligned accesses:

Unaligned Detection
- Half-word unaligned: Address bits [1:0] = 2’b11
- Word unaligned: Address bits [1:0] ≠ 2’b00
Unaligned Access Handling: Maintains state machine (u_state) for multi-cycle unaligned operations. Then, automatically generates second memory access for unaligned transfers.Finally, reconstructs data from multiple memory responses in writeback calculation.

Writeback Value Calculation

For load instructions, the LSU performs intelligent data processing since it always requests full 32-bit words from the MMU regardless of the actual load instruction type (LB, LH, LW). The writeback calculation unit:

Extracts the relevant bytes from the loaded word based on address and instruction type Performs sign extension or zero extension as required Handles data reconstruction for misaligned accesses Manages the timing of writeback operations to the register file

Exception Handling

In this part, LSU will receive signal load_fault and store_fault which is sended from MMU. Then, LSU will generate exception signal with current load or store instruction.

MMU (Memory Management Unit)

In RISC-V core architectures, the Memory Management Unit (MMU) serves as a critical component responsible for translating virtual addresses into physical addresses. Beyond basic address translation, the MMU provides essential security and reliability features through comprehensive error detection mechanisms, including address range validation and page accessibility verification. Our MMU design implements a three-tier architecture comprising specialized units that work in concert to deliver efficient virtual memory management:

Architecture

Modules:

mmu_tlb.v : TLB store a history data of pages. It returns value if page was requested, or sent request signal into TLB to find correct page and its data.
mmu_ptw.v : PTW can find the correct page with page index of virtual address and calculate physical address by page data. Then, page fault exceptions also are detected in this unit.
mmu_cache_control.v : cache control is interface for Dcache and Icache. It receives available signal from cache and sents control signal such as valid signal to control MMU current request is finished.

Implement Detail

TLB

Use two register to store page value, and it compares input virtual page number and vpn_q. If TLB hit, TLB will return page value directly. Otherwise, MMU will start PTW to find target page and update TLB data.

Register	Width	Description
vpn_q	20	virtual page number
entry_q	32	page entry data

PTW

Page table walker (PTW) can search page table and detect page errror condition. In our virtual address design, we use two level page to translate virtual address into physical address. Therefore, PTW use four state finite state machine to control procession of searching page. Four state include:

State	Description
IDLE	wait MMU start PTW
FIRST_LEVEL	search first level page
SECOND_LEVEL	search second level page
UPDATE	get page data and update TLB

Virtual Memory (Sv32)

We use Sv32 as our virtual address design. Sv32 use two-level page to translate virtual address into physical address. This is its virtual address structure:

31 - 22	21 - 12	11 - 0
First Level Page Index	Second Level Page Index	Offset

Page Table Structure:

Bit Position	Macro Name	Description
0	PRESENT	Page is valid in memory
1	READ	Page has read permission
2	WRITE	Page has write permission
3	EXEC	Page has execute permission
4	USER	Page is accessible in user mode
5	GLOBAL	Page is global (not flushed on context switch)
6	ACCESSED	Page has been accessed
7	DIRTY	Page has been modified
9:8	SOFT	Software-defined bits (2 bits)
29:10	ENTRY	Physical page number

SATP (Supervisor Address Translation and Protection): It is a register to store virtual address information and control page management, and it maintain by CSR. This is Satp structure:

31	30 - 22	19 - 0
MODE	ASID	PPN

MODE: virtual address support in current.
ASID: address space identifier to control that address can be used in current process.
PPN: first level physical page number.

Dcache

1. I/O port

System Ports

I/O	name	width	purpose
input	clk	1	Timing
input	rst_n	1	Reset dcache at low

CPU Ports

I/O	name	width	purpose
input	cpu_daddr_i	32	Ports receive a specific address from the CPU
input	cpu_data_i	32	Ports receive specific data from the CPU
input	mask_i	4	A word is 4 bytes in 32-bit CPU and mask_i is a filter to choose which bytes need to be operated
input	cpu_req_wr	1	CPU wants to write data to the cache
input	cpu_req_rd	1	CPU wants to read data from the cache
output	cpu_data_o	32	Turn specific data back to the CPU
output	dcache_rdy_o	1	A handshake telling CPU if this cache is available or not
output	dcache_vld_o	1	A handshake telling CPU if this cache completes its work
output	d_exception	2	Indicates whether a store or load access fault has occurred
input	invalidate_i	1	Invalidate specific cacheline
input	flush_i	1	Flush specific cacheline
input	writeback_i	1	Write all of the data back to DRAM

Memory Ports

I/O	name	width	purpose
output	mem_addr	32	Tell DRAM which address cache wants to write or read

Read

I/O	name	width	purpose
input	rm_rdy	1	A handshake signal to inform the cache if it can send new data to the next stage
input	rm_data	256	Data from DRAM
input	rm_success	1	Asserted on successful cache read
input	rm_complete	1	Asserted when the cache read operation completes.
output	rm_vld	1	Notify the bridge to start a read transaction

Write

I/O	name	width	purpose
input	wm_rdy	1	A handshake signal to inform cache if it can send new data to the next stage
output	wm_data	256	Data to DRAM
input	wm_success	1	Asserted on successful cache write
input	wm_complete	1	Asserted when the cache write operation completes
output	wm_vld	1	Notify the bridge to start a write transaction

2. Description

This is a 32KB 2-way Set-Associative Cache. The storage is implemented using Single-Port BRAM IPs. Each Cache Line size is 256 bits (32 bytes), matching the AXI4 burst width. It employs a Least Recently Used (LRU) replacement policy and a Write-back + Write-allocate strategy to optimize bus bandwidth.

Dcache_axiBus_bridge

1. I/O port

System Ports

I/O	name	width	purpose
input	aclk	1	Timing
input	aresetn	1	Reset at low

Data Cache Ports

I/O	name	width	purpose
output	mem_addr	32	Tell DRAM which address cache wants to write or read

Read

I/O	name	width	purpose
output	rm_rdy	1	A handshake signal to inform the cache if it can send new instruction to the next stage
output	rm_data	256	Data from DRAM
output	rm_success	1	Asserted on successful read
output	rm_complete	1	Asserted when the read operation completes.
input	rm_vld	1	A handshake signal to notify the bridge that a new instruction is available

Write

I/O	name	width	purpose
output	wm_rdy	1	A handshake signal to inform cache if it can send new instruction to the next stage
output	wm_data	256	Data to DRAM
output	wm_success	1	Asserted on successful write
output	wm_complete	1	Asserted when the write operation completes
input	wm_vld	1	A handshake signal to notify the bridge that a new instruction is available

AXI Ports(Only important ports)

AW Channel

I/O	name	width	purpose
output	M_AXI_AWADDR	32	Tell DRAM which address cache wants to write
input	M_AXI_AWREADY	1	Asserted when the write address channel is available
output	M_AXI_AWVALID	1	Asserted when the bridge wants to write address to dram

W Channel

I/O	name	width	purpose
input	M_AXI_WREADY	1	Asserted when the write channel is available
output	M_AXI_WDATA	256	Data to DRAM
output	M_AXI_WVALID	1	Asserted when the bridge wants to write data to dram
output	M_AXI_WLAST	1	Asserted when the data is the last data block of a burst

B Channel

I/O	name	width	purpose
input	M_AXI_BRESP	2	Indicated whether the write behavior is sucessful
input	M_AXI_BVALID	1	Asserted on successful write
output	M_AXI_BREADY	1	Asserted when the bridge is available to receive new instruction

AR Channel

I/O	name	width	purpose
output	M_AXI_ARADDR	32	Tell DRAM which address cache wants to read
input	M_AXI_ARREADY	1	Asserted when the read address channel is available
output	M_AXI_ARVALID	1	Asserted when the bridge wants to read address from dram

R Channel

I/O	name	width	purpose
input	M_AXI_RDATA	256	Data from DRAM
input	M_AXI_RRESP	2	Indicated whether the read behavior is sucessful
input	M_AXI_RLAST	1	Asserted when the data is the last data block of a burst
input	M_AXI_RVALID	1	Asserted on successful read
output	M_AXI_RREADY	1	Asserted when the bridge is available to receive new instruction

2. Description

The bridge is responsible for translating the custom D Cache memory interface into the standard AXI4 Full Protocol.

Independent Channel Design: To maximize throughput and exploit AXI parallelism, this bridge implements independent control logic for all five channels (AW, W, B, AR, R). This allows the system to issue new read/write addresses while data transfers are still in progress.
Handshake Protocol: The implementation strictly adheres to the AXI4 handshake rules: VALID signals never wait for READY to be asserted, preventing potential deadlocks in the system.
Design Philosophy: While this design prioritizes performance through independent channels, it serves as an educational reference. For simpler designs or area-constrained environments, AMD/Xilinx also suggests implementing a Single Sequence FSM to handle the channels sequentially, which reduces logic complexity at the cost of concurrency.

I cache

1. I/O ports

System Ports

I/O	name	width	purpose
input	clk	1	Timing
input	rst_n	1	Reset icache at low

CPU Ports

I/O	name	width	purpose
input	pc_i	32	Program counter
input	invalidate_i	1	Command to invalidate all cache lines
output	icache_rdy_o	1	Indicates if I-cache is ready to accept a new PC
output	cpu_inst_o	32	Output particular instruction
output	icache_vld_o	1	Indicates the output instruction is valid
output	i_exception	1	Indicates a fetch access fault

Memory Ports

I/O	name	width	purpose
output	mem_addr	32	Tell DRAM which address cache wants to write or read

Read

I/O	name	width	purpose
input	rm_rdy	1	Indicates the AXI Bridge is ready to process a read request
Output	mem_addr	32	Target DRAM address for cache line refill
input	rm_success	1	Confirms a successful read transaction from DRAM
input	rm_complete	1	Indicates the burst read operation is finished
input	rm_data	256	The 256-bit cache line data returned from DRAM.
output	req_rm	1	Request signal to read instruction from dram

2. Description

This is a 16KB 2-way Set-Associative Cache. The storage uses BRAM IPs, with a Cache Line size of 256 bits (32 bytes). It employs a First-In-First-Out (FIFO) replacement policy. To support fast context switching and software reloads, it integrates a DFF-based Valid array that allows for a 1-cycle global invalidation via the invalidate_i signal.

Icache_axiBus_bridge

1. I/O port

System Ports

I/O	name	width	purpose
input	aclk	1	Timing
input	aresetn	1	Reset at low

Instruction Cache Ports

I/O	name	width	purpose
Input	mem_addr	32	Target address for the instruction fetch
output	rm_rdy	1	A handshake signal to inform the cache if it can send new instruction to the bridge
output	rm_data	256	Instruction from DRAM
output	rm_success	1	Asserted on successful read
output	rm_complete	1	Asserted when the read operation completes.
input	req_rm	1	A handshake signal to inform the bridge the I cache submits a new request

AXI Ports(Only important ports)

AR Channel

I/O	name	width	purpose
output	M_AXI_ARADDR	32	Tell DRAM which address cache wants to read
input	M_AXI_ARREADY	1	Asserted when the read address channel is available
output	M_AXI_ARVALID	1	Asserted when the bridge wants to read address from dram

R Channel

I/O	name	width	purpose
input	M_AXI_RDATA	256	Data from DRAM
input	M_AXI_RRESP	2	Indicated whether the read behavior is sucessful
input	M_AXI_RLAST	1	Asserted when the data is the last data block of a burst
input	M_AXI_RVALID	1	Asserted on successful read
output	M_AXI_RREADY	1	Asserted when the bridge is available to receive new instruction

2. Description

The Icache_axiBus_bridge is designed as a lightweight, read-only AXI4 Master to serve the I-cache’s refill requests. The axi behavior is as same as those in the Dcache_axiBus_Bridge part.

cpu_axiLite_bridge

1. I/O port

System Ports

I/O	name	width	purpose
input	aclk	1	Timing
input	aresetn	1	Reset at low

CPU Interface

I/O	name	width	purpose
input	req_rd_mm	1	CPU request for a memory-mapped read operation
input	req_wr_mm	1	CPU request for a memory-mapped write operation
input	mm_addr_i	32	Target peripheral address for the access
input	mm_data_i	32	Data payload for write operations
output	mm_data_out	32	Data returned from the peripheral
output	mm_exception	2	Indicates a access fault
output	mm_rdy	1	Indicates if the bridge is ready to get a new request
output	mm_vld	1	Asserted when the reading or writing process is done

AXI-Lite Ports(Only important ports)

AW Channel

I/O	name	width	purpose
output	m_axi_lite_awvalid	1	Asserted when the bridge intends to write something to the target address
input	m_axi_lite_awready	1	Asserted when the write address channel is available
output	m_axi_lite_awaddr	32	Target peripheral address for wrtite

W Channel

I/O	name	width	purpose
input	m_axi_lite_wready	1	Asserted when the write channel is available
output	m_axi_lite_wdata	32	Data to target peripheral
output	m_axi_lite_wvalid	1	Asserted when the bridge wants to write data to target peripheral

B Channel

I/O	name	width	purpose
input	m_axi_lite_bresp	2	Indicated whether the write behavior is sucessful
input	m_axi_lite_bvalid	1	Asserted on successful write
output	m_axi_lite_bready	1	Asserted when the bridge is available to receive new instrucion

AR Channel

I/O	name	width	purpose
output	m_axi_lite_araddr	32	Read data from specific offset of target peripheral
input	m_axi_lite_arready	1	Asserted when the read address channel is available
output	m_axi_lite_arvalid	1	Read address from target peripheral

R Channel

I/O	name	width	purpose
input	m_axi_lite_rdata	32	Data from target peripheral
input	m_axi_lite_rresp	2	Indicated whether the read behavior is sucessful
input	m_axi_lite_rvalid	1	Asserted on successful read
output	m_axi_lite_rready	1	Asserted when the bridge is available to receive new request

2. Description

The cpu_axiLite_bridge is a lightweight bus master optimized for non-burst, single-word peripheral accesses. The relationship between 5 channels is similar to AXI.

Memory Stage

LSU (Load Store Unit)

Implement Detail

Parameter

Address and Data Procressing

LSU Queue

Address Misaligned

Writeback Value Calculation

Exception Handling

MMU (Memory Management Unit)

Architecture

Implement Detail

TLB

PTW

Virtual Memory (Sv32)

Dcache

1. I/O port

System Ports

CPU Ports

Memory Ports

Read

Write

2. Description

Dcache_axiBus_bridge

1. I/O port

System Ports

Data Cache Ports

Read

Write

AXI Ports(Only important ports)

AW Channel

W Channel

B Channel

AR Channel

R Channel

2. Description

I cache

1. I/O ports

System Ports

CPU Ports

Memory Ports

Read

2. Description

Icache_axiBus_bridge

1. I/O port

System Ports

Instruction Cache Ports

AXI Ports(Only important ports)

AR Channel

R Channel

2. Description

cpu_axiLite_bridge

1. I/O port

System Ports

CPU Interface

AXI-Lite Ports(Only important ports)

AW Channel

W Channel

B Channel

AR Channel

R Channel

2. Description

Memory & Peripheral Related Diagram