Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

minor-cpu: unable to find destination due to unconditional branch followed by memory access #1130

Open
robhau opened this issue May 14, 2024 · 8 comments
Labels

Comments

@robhau
Copy link
Contributor

robhau commented May 14, 2024

Bug description

L1:
    ....
    j L3
L2:
    lw a1, 0(a0)
    ....
L3:
    ....

The program above can lead to an error when using MinorCPU. The reason is, that the memory access is started after the unconditional jump. If a0 does not point to a valid memory region, an error is thrown in BaseXBar::findPort.

gem5/src/mem/xbar.cc

Lines 333 to 370 in b279e40

PortID
BaseXBar::findPort(AddrRange addr_range, PacketPtr pkt)
{
// we should never see any address lookups before we've got the
// ranges of all connected CPU-side-port modules
assert(gotAllAddrRanges);
// Check the address map interval tree
auto i = portMap.contains(addr_range);
if (i != portMap.end()) {
return i->second;
}
// Check if this matches the default range
if (useDefaultRange) {
if (addr_range.isSubset(defaultRange)) {
DPRINTF(AddrRanges, " found addr %s on default\n",
addr_range.to_string());
return defaultPortID;
}
} else if (defaultPortID != InvalidPortID) {
DPRINTF(AddrRanges, "Unable to find destination for %s, "
"will use default port\n", addr_range.to_string());
return defaultPortID;
}
// We should use the range for the default port and it did not match,
// or the default port is not set. Dump out the port trace if possible.
std::string port_trace = "";
if (pkt) {
std::shared_ptr<TracingExtension> ext =
pkt->getExtension<TracingExtension>();
port_trace = ext ? ext->getTraceInString() :
"Use --debug-flags=PortTrace to see the port trace of the packet.";
}
fatal("Unable to find destination for %s on %s\n%s\n",
addr_range.to_string(), name(), port_trace);
}

src/mem/xbar.cc:368: fatal: Unable to find destination for [0:0x4] on system.membus
Use --debug-flags=PortTrace to see the port trace of the packet.

Affects version
Gem5 development branch, Commit SHA: b279e40

gem5 Modifications
No modifications.

To Reproduce
I pushed a minimal working example in https://github.com/robhau/gem5/tree/minor_cpu_bug_minimal_working_example

  1. Set path to multilib RISC-V compiler in firmware/Makefile
  2. bash execute.sh

Firmware code:

.global start
.type start,@function
start:
    .option push
    .option norelax
    la gp, __global_pointer
    .option pop
    la sp, __stack_top
    li a0, 0
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    add a0, a1, a2
    li a1, 1
    j L2
L1:
    lw a2, 0(a0)
    lw a3, 4(a0)
    lw a4, 8(a0)
    j exit
L2:
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    li a0, 0
    lui a0, 0x80000
    j L1

exit:
    wfi
    j exit

Makefile:

CC=/opt/riscv-multilib2/bin/riscv64-unknown-elf-
CC_FLAGS = -march=rv32imafc -mabi=ilp32f -mfdiv -mdiv
CC_FLAGS += -O3 -ffreestanding -Wall -ffreestanding -nostartfiles
CC_FLAGS += -Wl,-T,linker.ld
firmware.elf: start.s
	$(CC)gcc $(CC_FLAGS) start.s -o firmware.elf
	$(CC)objdump -D firmware.elf > firmware.txt

clean:
	rm firmware.elf

linker file:

OUTPUT_ARCH(riscv)
MEMORY
{
    rom(rx) : ORIGIN =  0x80000000, LENGTH = 0x8000000
    ram(!rx) : ORIGIN = ORIGIN(rom) + LENGTH(rom), LENGTH = 0x8000000
}

ENTRY(start)

SECTIONS
{
    . = ORIGIN(rom);
    .text :
    {
        *(.text)
        . = ALIGN(4);
    } > rom

    PROVIDE(__stack_top = ORIGIN(ram) + LENGTH(ram));
    PROVIDE(__global_pointer = ORIGIN(ram) + LENGTH(ram) / 2);
}

disassembly of compiled firmware:

firmware.elf:     file format elf32-littleriscv


Disassembly of section .text:

80000000 <start>:
80000000:	0c000197          	auipc	gp,0xc000
80000004:	00018193          	mv	gp,gp
80000008:	10000117          	auipc	sp,0x10000
8000000c:	ff810113          	addi	sp,sp,-8 # 90000000 <__stack_top>
80000010:	4501                	li	a0,0
80000012:	00c58533          	add	a0,a1,a2
80000016:	00c58533          	add	a0,a1,a2
8000001a:	00c58533          	add	a0,a1,a2
8000001e:	00c58533          	add	a0,a1,a2
80000022:	00c58533          	add	a0,a1,a2
80000026:	00c58533          	add	a0,a1,a2
8000002a:	00c58533          	add	a0,a1,a2
8000002e:	00c58533          	add	a0,a1,a2
80000032:	00c58533          	add	a0,a1,a2
80000036:	00c58533          	add	a0,a1,a2
8000003a:	00c58533          	add	a0,a1,a2
8000003e:	4585                	li	a1,1
80000040:	a029                	j	8000004a <L2>

80000042 <L1>:
80000042:	4110                	lw	a2,0(a0)
80000044:	4154                	lw	a3,4(a0)
80000046:	4518                	lw	a4,8(a0)
80000048:	a825                	j	80000080 <exit>

8000004a <L2>:
8000004a:	4501                	li	a0,0
8000004c:	4501                	li	a0,0
8000004e:	4501                	li	a0,0
80000050:	4501                	li	a0,0
80000052:	4501                	li	a0,0
80000054:	4501                	li	a0,0
80000056:	4501                	li	a0,0
80000058:	4501                	li	a0,0
8000005a:	4501                	li	a0,0
8000005c:	4501                	li	a0,0
8000005e:	4501                	li	a0,0
80000060:	4501                	li	a0,0
80000062:	4501                	li	a0,0
80000064:	4501                	li	a0,0
80000066:	4501                	li	a0,0
80000068:	4501                	li	a0,0
8000006a:	4501                	li	a0,0
8000006c:	4501                	li	a0,0
8000006e:	4501                	li	a0,0
80000070:	4501                	li	a0,0
80000072:	4501                	li	a0,0
80000074:	4501                	li	a0,0
80000076:	4501                	li	a0,0
80000078:	4501                	li	a0,0
8000007a:	80000537          	lui	a0,0x80000
8000007e:	b7d1                	j	80000042 <L1>

80000080 <exit>:
80000080:	10500073          	wfi
80000084:	bff5                	j	80000080 <exit>
	...

Disassembly of section .riscv.attributes:

00000000 <.riscv.attributes>:
   0:	4341                	li	t1,16
   2:	0000                	unimp
   4:	7200                	flw	fs0,32(a2)
   6:	7369                	lui	t1,0xffffa
   8:	01007663          	bgeu	zero,a6,14 <start-0x7fffffec>
   c:	0039                	c.nop	14
   e:	0000                	unimp
  10:	7205                	lui	tp,0xfffe1
  12:	3376                	.insn	2, 0x3376
  14:	6932                	flw	fs2,12(sp)
  16:	7032                	flw	ft0,44(sp)
  18:	5f31                	li	t5,-20
  1a:	326d                	jal	fffff9c4 <__stack_top+0x6ffff9c4>
  1c:	3070                	.insn	2, 0x3070
  1e:	615f 7032 5f31      	.insn	6, 0x5f317032615f
  24:	3266                	.insn	2, 0x3266
  26:	3270                	.insn	2, 0x3270
  28:	635f 7032 5f30      	.insn	6, 0x5f307032635f
  2e:	697a                	flw	fs2,156(sp)
  30:	32727363          	bgeu	tp,t2,356 <start-0x7ffffcaa>
  34:	3070                	.insn	2, 0x3070
  36:	7a5f 6d6d 6c75      	.insn	6, 0x6c756d6d7a5f
  3c:	7031                	c.lui	zero,0xfffec
  3e:	0030                	addi	a2,sp,8
  40:	0108                	addi	a0,sp,128
  42:	0b0a                	slli	s6,s6,0x2

Terminal Output
Only with debug flag Exec

gem5 Simulator System.  https://www.gem5.org
gem5 is copyrighted software; use the --copyright option for details.

gem5 version DEVELOP-FOR-24.0
gem5 compiled May 14 2024 12:52:02
gem5 started May 14 2024 13:21:00
gem5 executing on rh, pid 470668
command line: build/ALL/gem5.opt --debug-flags=Exec -r --outdir=cpu_minor configs/control_hazard.py

False
Global frequency set at 1000000000000 ticks per second
src/arch/riscv/isa.cc:276: info: RVV enabled, VLEN = 256 bits, ELEN = 64 bits
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
src/base/statistics.hh:279: warn: One of the stats is a legacy stat. Legacy stat is a stat that does not belong to any statistics::Group. Legacy stat is deprecated.
src/cpu/minor/execute.cc:166: warn: No functional unit for OpClass SimdUnitStrideSegmentedLoad
src/cpu/minor/execute.cc:166: warn: No functional unit for OpClass SimdUnitStrideSegmentedStore
system.remote_gdb: Listening for connections on port 7000
src/sim/simulate.cc:199: info: Entering event queue @ 0.  Starting simulation...
 296875: system.cpu: T0 : 0x80000000 @start    : auipc gp, 49152            : IntAlu :  D=0xffffffff8c000000
 312500: system.cpu: T0 : 0x80000004 @start+4    : addi gp, gp, 0             : IntAlu :  D=0xffffffff8c000000
 328125: system.cpu: T0 : 0x80000008 @start+8    : auipc sp, 65536            : IntAlu :  D=0xffffffff90000008
 343750: system.cpu: T0 : 0x8000000c @start+12    : addi sp, sp, -8            : IntAlu :  D=0xffffffff90000000
 359375: system.cpu: T0 : 0x80000010 @start+16    : c_li a0, 0                 : IntAlu :  D=0x0000000000000000
 375000: system.cpu: T0 : 0x80000012 @start+18    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 390625: system.cpu: T0 : 0x80000016 @start+22    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 406250: system.cpu: T0 : 0x8000001a @start+26    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 421875: system.cpu: T0 : 0x8000001e @start+30    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 437500: system.cpu: T0 : 0x80000022 @start+34    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 453125: system.cpu: T0 : 0x80000026 @start+38    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 468750: system.cpu: T0 : 0x8000002a @start+42    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 484375: system.cpu: T0 : 0x8000002e @start+46    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 500000: system.cpu: T0 : 0x80000032 @start+50    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 515625: system.cpu: T0 : 0x80000036 @start+54    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 531250: system.cpu: T0 : 0x8000003a @start+58    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 546875: system.cpu: T0 : 0x8000003e @start+62    : c_li a1, 1                 : IntAlu :  D=0x0000000000000001
src/mem/xbar.cc:368: fatal: Unable to find destination for [0:0x4] on system.membus
Use --debug-flags=PortTrace to see the port trace of the packet.
Memory Usage: 448876 KBytes

Debug flags Exec,Minor

546875: system.cpu: T0 : 0x8000003e @start+62    : c_li a1, 1                 : IntAlu :  D=0x0000000000000001
 656250: system.cpu.execute.lsq.storeBuffer: StoreBuffer step numUnissuedAccesses: 0
 656250: system.cpu.execute: [tid:0] thread_interrupted?=0 isInbetweenInsts?=1
 671875: system.cpu.execute.lsq.storeBuffer: StoreBuffer step numUnissuedAccesses: 0
 671875: system.cpu.execute: [tid:0] thread_interrupted?=0 isInbetweenInsts?=1
 687500: system.cpu.execute.lsq.storeBuffer: StoreBuffer step numUnissuedAccesses: 0
 687500: system.cpu.execute: [tid:0] thread_interrupted?=0 isInbetweenInsts?=1
 703125: system.cpu.execute.lsq.storeBuffer: StoreBuffer step numUnissuedAccesses: 0
 703125: system.cpu.execute: [tid:0] thread_interrupted?=0 isInbetweenInsts?=1
 703125: system.cpu.execute: Attempting to issue [tid:0]
 703125: system.cpu.execute: Trying to issue inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j) to FU: 0
 703125: system.cpu.execute: Issuing inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j) into FU 0
 703125: system.cpu.execute: Reached inst issue limit
 703125: system.cpu.execute: Stepping to next inst inputIndex: 1
 703125: system.cpu.execute.lsq: No matching memory response for inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j)
 718750: system.cpu.execute.lsq.storeBuffer: StoreBuffer step numUnissuedAccesses: 0
 718750: system.cpu.execute: [tid:0] thread_interrupted?=0 isInbetweenInsts?=1
 718750: system.cpu.execute: Attempting to issue [tid:0]
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 0
 718750: system.cpu.execute: Can't issue as FU: 0 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 1
 718750: system.cpu.execute: Can't issue as FU: 1 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 2
 718750: system.cpu.execute: Can't issue as FU: 2 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 3
 718750: system.cpu.execute: Can't issue as FU: 3 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 4
 718750: system.cpu.execute: Can't issue as FU: 4 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 5
 718750: system.cpu.execute: Can't issue as FU: 5 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 6
 718750: system.cpu.execute: Issuing inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) into FU 6
 718750: system.cpu.execute.scoreboard0: Inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) depends on execSeqNum: 0
 718750: system.cpu.execute: Memory ref inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) must wait for inst 0(exec) before issuing
 718750: system.cpu.execute: Pushing mem inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw)
 718750: system.cpu.execute.scoreboard0: Marking up inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) regIndex: 12 final numResults: 1 returnCycle: 49
 718750: system.cpu.execute: Reached inst issue limit
 718750: system.cpu.execute: Stepping to next inst inputIndex: 1
 718750: system.cpu.execute.lsq: No matching memory response for inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j)
 734375: system.cpu.execute.lsq.storeBuffer: StoreBuffer step numUnissuedAccesses: 0
 734375: system.cpu.execute: [tid:0] thread_interrupted?=0 isInbetweenInsts?=1
 734375: system.cpu.execute: Attempting to commit [tid:0]
 734375: system.cpu.execute: Committing micro-ops for interrupt[tid:0]
 734375: system.cpu.execute: Trying to commit canCommitInsts: 1
 734375: system.cpu.execute: Trying to commit from mem FUs
 734375: system.cpu.execute: Issuing mem ref early inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) instToWaitFor: 0
 734375: global: ExecContext setting PC: (0x80000042=>0x80000044).(0=>1)
 734375: system.cpu.execute: Initiating memRef inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw)
 734375: system.cpu.execute.lsq: Pushing request (load) addr: 0x0 size: 4 flags: 0x2 lineWidth : 0x40
 734375: system.cpu.execute.lsq: Setting state from NotIssued to InTranslation for request: 0/1.1/2/19.19 pc: 0x80000042 (c_lw)
 734375: system.cpu.execute.lsq: Submitting DTLB request
 734375: system.cpu.execute.lsq: Received translation response for request: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) delayed:0 
 734375: system.cpu.execute.lsq: Setting state from InTranslation to Translated for request: 0/1.1/2/19.19 pc: 0x80000042 (c_lw)
 734375: system.cpu.execute.lsq: No forwardable data from store buffer
 734375: system.cpu.execute.lsq: Trying to send request: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) addr: 0x0
src/mem/xbar.cc:368: fatal: Unable to find destination for [0:0x4] on system.membus
Use --debug-flags=PortTrace to see the port trace of the packet.
Memory Usage: 448876 KBytes

Expected behavior
The memory access should be canceled after the unconditional jump.

If i remove li a1, 1 from start:, it works.

Host Operating System
Ubuntu 22.04.4 LTS

Host ISA
X86

Compiler used
for gem5:
g++ version 11.4.0
linker: mold version 1.0.3
for firmware:
RISC-V GNU Compiler Toolchain (riscv64-unknown-elf-gcc version 13.2.0, GNU objdump version 2.42)

@robhau robhau added the bug label May 14, 2024
@robhau
Copy link
Contributor Author

robhau commented May 14, 2024

If i remove li a1, 1 from start:, it works.

Comment because it get lost in the text wall.

@robhau robhau changed the title cpu: control hazard in MinorCPU minor-cpu: unable to find destination due to unconditional branch followed by memory access May 28, 2024
@giactra
Copy link
Contributor

giactra commented May 30, 2024

This is a weird issue to happen on a in-order CPU. Is it possible this is a just RISCV issue (eg the instruction wrongly mislabelled) ?

I am looking at the line:

703125: system.cpu.execute.lsq: No matching memory response for inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j)

And I don't understand why a jump instruction should be sent to the LSQ

@robhau
Copy link
Contributor Author

robhau commented May 30, 2024

Hi, thank you for your comment 😄.

Is it possible this is a just RISCV issue (eg the instruction wrongly mislabelled) ?

I do not think that the instruction is mislabelled, because at 0x80000040 there is also a jump instruction in the disassembly.

80000040: a029 j 8000004a <L2>

I am looking at the line:

703125: system.cpu.execute.lsq: No matching memory response for inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j)

And I don't understand why a jump instruction should be sent to the LSQ

In getCommitingThread(), for head_inflight_inst it is checked if the instruction is not in the LSQ or if there is a response for the instruction. This results in the debug message (line 1712)

gem5/src/cpu/minor/execute.cc

Lines 1686 to 1751 in 65b86cf

inline ThreadID
Execute::getCommittingThread()
{
std::vector<ThreadID> priority_list;
switch (cpu.threadPolicy) {
case enums::SingleThreaded:
return 0;
case enums::RoundRobin:
priority_list = cpu.roundRobinPriority(commitPriority);
break;
case enums::Random:
priority_list = cpu.randomPriority();
break;
default:
panic("Invalid thread policy");
}
for (auto tid : priority_list) {
ExecuteThreadInfo &ex_info = executeInfo[tid];
bool can_commit_insts = !ex_info.inFlightInsts->empty();
if (can_commit_insts) {
QueuedInst *head_inflight_inst = &(ex_info.inFlightInsts->front());
MinorDynInstPtr inst = head_inflight_inst->inst;
can_commit_insts = can_commit_insts &&
(!inst->inLSQ || (lsq.findResponse(inst) != NULL));
if (!inst->inLSQ) {
bool can_transfer_mem_inst = false;
if (!ex_info.inFUMemInsts->empty() && lsq.canRequest()) {
const MinorDynInstPtr head_mem_ref_inst =
ex_info.inFUMemInsts->front().inst;
FUPipeline *fu = funcUnits[head_mem_ref_inst->fuIndex];
const MinorDynInstPtr &fu_inst = fu->front().inst;
can_transfer_mem_inst =
!fu_inst->isBubble() &&
fu_inst->id.threadId == tid &&
!fu_inst->inLSQ &&
fu_inst->canEarlyIssue &&
inst->id.execSeqNum > fu_inst->instToWaitFor;
}
bool can_execute_fu_inst = inst->fuIndex == noCostFUIndex;
if (can_commit_insts && !can_transfer_mem_inst &&
inst->fuIndex != noCostFUIndex)
{
QueuedInst& fu_inst = funcUnits[inst->fuIndex]->front();
can_execute_fu_inst = !fu_inst.inst->isBubble() &&
fu_inst.inst->id == inst->id;
}
can_commit_insts = can_commit_insts &&
(can_transfer_mem_inst || can_execute_fu_inst);
}
}
if (can_commit_insts) {
commitPriority = tid;
return tid;
}
}
return InvalidThreadID;
}

@giactra
Copy link
Contributor

giactra commented May 31, 2024

Hi @robhau

Hi, thank you for your comment 😄.

Is it possible this is a just RISCV issue (eg the instruction wrongly mislabelled) ?

I do not think that the instruction is mislabelled, because at 0x80000040 there is also a jump instruction in the disassembly.

80000040: a029 j 8000004a <L2>

By mislabelled I don't mean mis-decoded. Basically in gem5 an instruction is defined (labelled) with some StaticInstFlag which are used by the CPU model to understand which kind of instruction it is dealing with (see https://github.com/gem5/gem5/blob/stable/src/cpu/StaticInstFlags.py). Is the instruction defined with IsUncondControl?

I am looking at the line:
703125: system.cpu.execute.lsq: No matching memory response for inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j)
And I don't understand why a jump instruction should be sent to the LSQ

In getCommitingThread(), for head_inflight_inst it is checked if the instruction is not in the LSQ or if there is a response for the instruction. This results in the debug message (line 1712)

gem5/src/cpu/minor/execute.cc

Lines 1686 to 1751 in 65b86cf

inline ThreadID
Execute::getCommittingThread()
{
std::vector<ThreadID> priority_list;
switch (cpu.threadPolicy) {
case enums::SingleThreaded:
return 0;
case enums::RoundRobin:
priority_list = cpu.roundRobinPriority(commitPriority);
break;
case enums::Random:
priority_list = cpu.randomPriority();
break;
default:
panic("Invalid thread policy");
}
for (auto tid : priority_list) {
ExecuteThreadInfo &ex_info = executeInfo[tid];
bool can_commit_insts = !ex_info.inFlightInsts->empty();
if (can_commit_insts) {
QueuedInst *head_inflight_inst = &(ex_info.inFlightInsts->front());
MinorDynInstPtr inst = head_inflight_inst->inst;
can_commit_insts = can_commit_insts &&
(!inst->inLSQ || (lsq.findResponse(inst) != NULL));
if (!inst->inLSQ) {
bool can_transfer_mem_inst = false;
if (!ex_info.inFUMemInsts->empty() && lsq.canRequest()) {
const MinorDynInstPtr head_mem_ref_inst =
ex_info.inFUMemInsts->front().inst;
FUPipeline *fu = funcUnits[head_mem_ref_inst->fuIndex];
const MinorDynInstPtr &fu_inst = fu->front().inst;
can_transfer_mem_inst =
!fu_inst->isBubble() &&
fu_inst->id.threadId == tid &&
!fu_inst->inLSQ &&
fu_inst->canEarlyIssue &&
inst->id.execSeqNum > fu_inst->instToWaitFor;
}
bool can_execute_fu_inst = inst->fuIndex == noCostFUIndex;
if (can_commit_insts && !can_transfer_mem_inst &&
inst->fuIndex != noCostFUIndex)
{
QueuedInst& fu_inst = funcUnits[inst->fuIndex]->front();
can_execute_fu_inst = !fu_inst.inst->isBubble() &&
fu_inst.inst->id == inst->id;
}
can_commit_insts = can_commit_insts &&
(can_transfer_mem_inst || can_execute_fu_inst);
}
}
if (can_commit_insts) {
commitPriority = tid;
return tid;
}
}
return InvalidThreadID;
}

You are getting the condition wrong:
(!inst->inLSQ || (lsq.findResponse(inst) != NULL));

Checks if there is a response in the LSQ only if the instruction is in the LSQ. So for whatever reason the Jump instruction gets added to the LSQ. This looks wrong to me. I would suggest you to debug the matter with GDB

@robhau
Copy link
Contributor Author

robhau commented Jun 3, 2024

Hi @giactra

Hi @robhau

Hi, thank you for your comment 😄.

Is it possible this is a just RISCV issue (eg the instruction wrongly mislabelled) ?

I do not think that the instruction is mislabelled, because at 0x80000040 there is also a jump instruction in the disassembly.
80000040: a029 j 8000004a <L2>

By mislabelled I don't mean mis-decoded. Basically in gem5 an instruction is defined (labelled) with some StaticInstFlag which are used by the CPU model to understand which kind of instruction it is dealing with (see https://github.com/gem5/gem5/blob/stable/src/cpu/StaticInstFlags.py). Is the instruction defined with IsUncondControl?

Thank you. The instructions are not mislabed.

Jumps (32 Bit format):

0x1b: JOp::jal({{
Rd = rvSext(NPC);
NPC = rvZext(PC + imm);
}}, IsDirectControl, IsUncondControl);

0x0: Jump::jalr({{
Rd = rvSext(NPC);
NPC = rvZext((imm + Rs1) & (~0x1));
}}, IsIndirectControl, IsUncondControl);
}

Compressed Jumps (16 bit format, used here):

0x5: CJOp::c_j({{
NPC = rvZext(PC + imm);
}}, IsDirectControl, IsUncondControl);

0x0: CJOp::c_jal({{
ra_sw = NPC_uw;
NPC_uw = PC_uw + imm;
}}, IsDirectControl, IsUncondControl, IsCall);

0x0: CJump::c_jr({{
if (RC1 == 0) {
return std::make_shared<IllegalInstFault>(
"source reg x0", machInst);
}
NPC = rvZext(Rc1 & (~0x1));
}}, IsIndirectControl, IsUncondControl);

default: CJump::c_jalr({{
ra = rvSext(NPC);
NPC = rvZext(Rc1 & (~0x1));
}}, IsIndirectControl, IsUncondControl, IsCall);

I am looking at the line:
703125: system.cpu.execute.lsq: No matching memory response for inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j)
And I don't understand why a jump instruction should be sent to the LSQ

In getCommitingThread(), for head_inflight_inst it is checked if the instruction is not in the LSQ or if there is a response for the instruction. This results in the debug message (line 1712)

gem5/src/cpu/minor/execute.cc

Lines 1686 to 1751 in 65b86cf

inline ThreadID
Execute::getCommittingThread()
{
std::vector<ThreadID> priority_list;
switch (cpu.threadPolicy) {
case enums::SingleThreaded:
return 0;
case enums::RoundRobin:
priority_list = cpu.roundRobinPriority(commitPriority);
break;
case enums::Random:
priority_list = cpu.randomPriority();
break;
default:
panic("Invalid thread policy");
}
for (auto tid : priority_list) {
ExecuteThreadInfo &ex_info = executeInfo[tid];
bool can_commit_insts = !ex_info.inFlightInsts->empty();
if (can_commit_insts) {
QueuedInst *head_inflight_inst = &(ex_info.inFlightInsts->front());
MinorDynInstPtr inst = head_inflight_inst->inst;
can_commit_insts = can_commit_insts &&
(!inst->inLSQ || (lsq.findResponse(inst) != NULL));
if (!inst->inLSQ) {
bool can_transfer_mem_inst = false;
if (!ex_info.inFUMemInsts->empty() && lsq.canRequest()) {
const MinorDynInstPtr head_mem_ref_inst =
ex_info.inFUMemInsts->front().inst;
FUPipeline *fu = funcUnits[head_mem_ref_inst->fuIndex];
const MinorDynInstPtr &fu_inst = fu->front().inst;
can_transfer_mem_inst =
!fu_inst->isBubble() &&
fu_inst->id.threadId == tid &&
!fu_inst->inLSQ &&
fu_inst->canEarlyIssue &&
inst->id.execSeqNum > fu_inst->instToWaitFor;
}
bool can_execute_fu_inst = inst->fuIndex == noCostFUIndex;
if (can_commit_insts && !can_transfer_mem_inst &&
inst->fuIndex != noCostFUIndex)
{
QueuedInst& fu_inst = funcUnits[inst->fuIndex]->front();
can_execute_fu_inst = !fu_inst.inst->isBubble() &&
fu_inst.inst->id == inst->id;
}
can_commit_insts = can_commit_insts &&
(can_transfer_mem_inst || can_execute_fu_inst);
}
}
if (can_commit_insts) {
commitPriority = tid;
return tid;
}
}
return InvalidThreadID;
}

You are getting the condition wrong: (!inst->inLSQ || (lsq.findResponse(inst) != NULL));

Checks if there is a response in the LSQ only if the instruction is in the LSQ. So for whatever reason the Jump instruction gets added to the LSQ. This looks wrong to me. I would suggest you to debug the matter with GDB

Thank you for the clarification ☺️.

Sorry, I quoted the wrong code section. findResponse is called in evaluate():

gem5/src/cpu/minor/execute.cc

Lines 1554 to 1571 in 3b73071

for (auto const &info : executeInfo) {
if (!info.inFlightInsts->empty()) {
const QueuedInst &head_inst = info.inFlightInsts->front();
if (head_inst.inst->isNoCostInst()) {
head_inst_might_commit = true;
} else {
FUPipeline *fu = funcUnits[head_inst.inst->fuIndex];
if ((fu->stalled &&
fu->front().inst->id == head_inst.inst->id) ||
lsq.findResponse(head_inst.inst))
{
head_inst_might_commit = true;
break;
}
}
}
}

@robhau
Copy link
Contributor Author

robhau commented Jun 3, 2024

Could the branch target predictor be the problem?

The branch predictor predicts taken, because it is an unconditional branch.

bool
BPredUnit::predict(const StaticInstPtr &inst, const InstSeqNum &seqNum,
PCStateBase &pc, ThreadID tid, PredictorHistory* &hist)
{
assert(hist == nullptr);
// See if branch predictor predicts taken.
// If so, get its target addr either from the BTB or the RAS.
// Save off branch stuff into `hist` so we can correct the predictor
// if prediction was wrong.
BranchType brType = getBranchType(inst);
hist = new PredictorHistory(tid, seqNum, pc.instAddr(), inst);
stats.lookups[tid][brType]++;
ppBranches->notify(1);
/* -----------------------------------------------
* Get branch direction
* -----------------------------------------------
* Lookup the direction predictor for every
* conditional branch. For unconditional branches
* the direction is always taken
*/
if (inst->isUncondCtrl()) {
// Unconditional branches -----
hist->condPred = true;
} else {
// Conditional branches -------
++stats.condPredicted;
hist->condPred = lookup(tid, pc.instAddr(), hist->bpHistory);
if (hist->condPred) {
++stats.condPredictedTaken;
}
}
hist->predTaken = hist->condPred;
DPRINTF(Branch,
"[tid:%i, sn:%llu] Branch predictor predicted %i for PC:%#x %s\n",
tid, seqNum, hist->condPred, hist->pc, toString(brType));

After that, the branch target buffer is looked up. However, there is a miss, because we this is the first time this jump is executed
and the branch type is not considered.

// The direction is done now get the target address
// from BTB, RAS or indirect predictor.
hist->targetProvider = TargetProvider::NoTarget;
/* -----------------------------------------------
* Branch Target Buffer (BTB)
* -----------------------------------------------
* First check for a BTB hit. This will be done
* regardless of whether the RAS or the indirect
* predictor provide the final target. That is
* necessary as modern front-end does not have a
* chance to detect a branch without a BTB hit.
*/
stats.BTBLookups++;
const PCStateBase * btb_target = btb->lookup(tid, pc.instAddr(), brType);
if (btb_target) {
stats.BTBHits++;
hist->btbHit = true;
if (hist->predTaken) {
hist->targetProvider = TargetProvider::BTB;
set(hist->target, btb_target);
}
}

// @todo Create some sort of return struct that has both whether or not the
// address is valid, and also the address. For now will just use addr = 0 to
// represent invalid entry.
const PCStateBase *
SimpleBTB::lookup(ThreadID tid, Addr instPC, BranchType type)
{
stats.lookups[type]++;
BTBEntry *entry = findEntry(instPC, tid);
if (entry) {
return entry->target.get();
}
stats.misses[type]++;
return nullptr;
}

Therefore, hist->targetProvider stays TargetProvider::NoTarget. This results in an untaken branch.

if (hist->targetProvider == TargetProvider::NoTarget) {
set(hist->target, pc);
inst->advancePC(*hist->target);
hist->predTaken = false;
}

Debug trace:

531250: system.cpu.execute: Committing micro-ops for interrupt[tid:0]
 531250: system.cpu.execute: Trying to commit canCommitInsts: 1
 531250: system.cpu.execute: Trying to commit from FUs
 531250: global: ExecContext setting PC: (0x8000003a=>0x8000003e).(0=>1)
 531250: system.cpu.execute: Committing inst: 0/1.1/1/16.16 pc: 0x8000003a (add)
 531250: system.cpu.execute: tryToBranch before: (0x8000003a=>0x8000003e).(0=>1) after: (0x8000003a=>0x8000003e).(0=>1)
 531250: system.cpu.execute: Advancing current PC from: (0x8000003a=>0x8000003e).(0=>1) to: (0x8000003e=>0x80000042).(0=>1)
 531250: system.cpu.execute: Unstalling 0 for inst 0/1.1/1/16.16
 531250: system.cpu.execute: Completed inst: 0/1.1/1/16.16 pc: 0x8000003a (add)
 531250: system.cpu.execute: Reached inst commit limit
 531250: system.cpu: T0 : 0x8000003a @start+58    : add a0, a1, a2             : IntAlu :  D=0x0000000000000000
 546875: system.cpu.execute: Attempting to commit [tid:0]
 546875: system.cpu.execute: Committing micro-ops for interrupt[tid:0]
 546875: system.cpu.execute: Trying to commit canCommitInsts: 1
 546875: system.cpu.execute: Trying to commit from FUs
 546875: global: ExecContext setting PC: (0x8000003e=>0x80000040).(0=>1)
 546875: system.cpu.execute: Committing inst: 0/1.1/1/17.17 pc: 0x8000003e (c_li)
 546875: system.cpu.execute: tryToBranch before: (0x8000003e=>0x80000040).(0=>1) after: (0x8000003e=>0x80000040).(0=>1)
 546875: system.cpu.execute: Advancing current PC from: (0x8000003e=>0x80000040).(0=>1) to: (0x80000040=>0x80000044).(0=>1)
 546875: system.cpu.execute: Unstalling 0 for inst 0/1.1/1/17.17
 546875: system.cpu.execute: Completed inst: 0/1.1/1/17.17 pc: 0x8000003e (c_li)
 546875: system.cpu.execute: Reached inst commit limit
 546875: system.cpu: T0 : 0x8000003e @start+62    : c_li a1, 1                 : IntAlu :  D=0x0000000000000001
//
//
//
 671875: system.cpu.fetch2: Trying to predict for inst: 0/1.1/2/18 pc: 0x80000040 (c_j)
 671875: system.cpu.branchPred: [tid:0, sn:18] Branch predictor predicted 1 for PC:0x80000040 DirectUncond
 671875: system.cpu.branchPred: [tid:0, sn:18] PC:0x80000040 BTB:miss
 671875: system.cpu.branchPred: predict(tid:0, sn:18, PC:0x80000040, DirectUncond) -> taken:0, target:(0x80000042=>0x80000046).(0=>1) provider:NoTarget
 671875: system.cpu.branchPred: [tid:0] [sn:18] History entry added. predHist.size(): 1
//
//
//
 687500: system.cpu.fetch2: Not attempting prediction for inst: 0/1.1/2/19 pc: 0x80000042 (c_lw)
 703125: system.cpu.execute: Attempting to issue [tid:0]
 703125: system.cpu.execute: Trying to issue inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j) to FU: 0
 703125: system.cpu.execute: Issuing inst: 0/1.1/2/18.18 pc: 0x80000040 (c_j) into FU 0
 703125: system.cpu.execute: Reached inst issue limit
 703125: system.cpu.execute: Stepping to next inst inputIndex: 1
 703125: system.cpu.fetch2: Not attempting prediction for inst: 0/1.1/2/20 pc: 0x80000044 (c_lw)
 718750: system.cpu.execute: Attempting to issue [tid:0]
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 0
 718750: system.cpu.execute: Can't issue as FU: 0 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 1
 718750: system.cpu.execute: Can't issue as FU: 1 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 2
 718750: system.cpu.execute: Can't issue as FU: 2 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 3
 718750: system.cpu.execute: Can't issue as FU: 3 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 4
 718750: system.cpu.execute: Can't issue as FU: 4 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 5
 718750: system.cpu.execute: Can't issue as FU: 5 isn't capable
 718750: system.cpu.execute: Trying to issue inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) to FU: 6
 718750: system.cpu.execute: Issuing inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) into FU 6
 718750: system.cpu.execute: Memory ref inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) must wait for inst 0(exec) before issuing
 718750: system.cpu.execute: Pushing mem inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw)
 718750: system.cpu.execute: Reached inst issue limit
 718750: system.cpu.execute: Stepping to next inst inputIndex: 1
 718750: system.cpu.fetch2: Not attempting prediction for inst: 0/1.1/2/21 pc: 0x80000046 (c_lw)
 734375: system.cpu.execute: Attempting to commit [tid:0]
 734375: system.cpu.execute: Committing micro-ops for interrupt[tid:0]
 734375: system.cpu.execute: Trying to commit canCommitInsts: 1
 734375: system.cpu.execute: Trying to commit from mem FUs
 734375: system.cpu.execute: Issuing mem ref early inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw) instToWaitFor: 0
 734375: global: ExecContext setting PC: (0x80000042=>0x80000044).(0=>1)
 734375: system.cpu.execute: Initiating memRef inst: 0/1.1/2/19.19 pc: 0x80000042 (c_lw)
src/mem/xbar.cc:368: fatal: Unable to find destination for [0:0x4] on system.membus
Use --debug-flags=PortTrace to see the port trace of the packet.
Memory Usage: 386628 KBytes

@giactra
Copy link
Contributor

giactra commented Jun 3, 2024

Could the branch target predictor be the problem?

The branch predictor predicts taken, because it is an unconditional branch.

bool
BPredUnit::predict(const StaticInstPtr &inst, const InstSeqNum &seqNum,
PCStateBase &pc, ThreadID tid, PredictorHistory* &hist)
{
assert(hist == nullptr);
// See if branch predictor predicts taken.
// If so, get its target addr either from the BTB or the RAS.
// Save off branch stuff into `hist` so we can correct the predictor
// if prediction was wrong.
BranchType brType = getBranchType(inst);
hist = new PredictorHistory(tid, seqNum, pc.instAddr(), inst);
stats.lookups[tid][brType]++;
ppBranches->notify(1);
/* -----------------------------------------------
* Get branch direction
* -----------------------------------------------
* Lookup the direction predictor for every
* conditional branch. For unconditional branches
* the direction is always taken
*/
if (inst->isUncondCtrl()) {
// Unconditional branches -----
hist->condPred = true;
} else {
// Conditional branches -------
++stats.condPredicted;
hist->condPred = lookup(tid, pc.instAddr(), hist->bpHistory);
if (hist->condPred) {
++stats.condPredictedTaken;
}
}
hist->predTaken = hist->condPred;
DPRINTF(Branch,
"[tid:%i, sn:%llu] Branch predictor predicted %i for PC:%#x %s\n",
tid, seqNum, hist->condPred, hist->pc, toString(brType));

After that, the branch target buffer is looked up. However, there is a miss, because we this is the first time this jump is executed and the branch type is not considered.

I thought about this at the beginning (wondering why an unconditional branch was predicted as not-taken). However the following reasoning convinced me there's something more to it:

  1. If we are missing in the BTB, there are only two things we can do. One is stalling the front-end, the other is to predict a NT branch. I think always going for the second (even in the counter intuitive case of an unconditional branch) is a reasonable approach.

  2. A CPU should recover anyway from a misprediction as early as possible. In the O3CPU this happens after decode for a direct unconditional branch: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/decode.cc#L712. This doesn't seem to be handled in MinorCPU. Instead this seems to be solved altogether when committing the branch: https://github.com/gem5/gem5/blob/stable/src/cpu/minor/execute.cc#L284C16-L284C27

My suspicion is that the load is issued before the branch commit, therefore before it can resteer the pipeline.

While we should solve it (maybe be adding a similar check as the one we have in the O3CPU?) it seems to me a "niche" corner case. It only happens when virtual memory is disabled (at boot) and therefore we are allowed to send an invalid request down the memory subsystem. Normally (with virtual memory on) the MMU would return an exception to the translation request, and no memory reference would go out of the LSQ.

Let me know if it makes sense to you

@robhau
Copy link
Contributor Author

robhau commented Jun 4, 2024

Thank you @giactra, makes sense to me.

A CPU should recover anyway from a misprediction as early as possible. In the O3CPU this happens after decode for a direct unconditional branch: https://github.com/gem5/gem5/blob/stable/src/cpu/o3/decode.cc#L712. This doesn't seem to be handled in MinorCPU. Instead this seems to be solved altogether when committing the branch: https://github.com/gem5/gem5/blob/stable/src/cpu/minor/execute.cc#L284C16-L284C27

Adapting the handling from O3CPU sounds good. Can we add it right after inserting a decoded instruction into the output buffer of the decode stage?

https://github.com/gem5/gem5/blob/dad5c7b6f7434ec7668192edffad83bc31a1d5f7/src/cpu/minor/decode.cc#L247C1-L248C32

While we should solve it (maybe be adding a similar check as the one we have in the O3CPU?) it seems to me a "niche" corner case. It only happens when virtual memory is disabled (at boot) and therefore we are allowed to send an invalid request down the memory subsystem. Normally (with virtual memory on) the MMU would return an exception to the translation request, and no memory reference would go out of the LSQ.

Since we only carry out bare metal simulations, we run into this problem quite often 😅.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants