Skip to content

Commit

Permalink
fix(parser): fix potential overflows when parsing hexadecimal values (#…
Browse files Browse the repository at this point in the history
…608)

* test: add unit tests for immediate parsing

* doc: improve immediate param types doc

Also remove unnecessary computation of a well-known value.

* doc(parser): document immediate bounds

* test(parser): test immediates starting with #

* doc: document use of the # character.

* refactor: remove unnecessary increments

This makes it easier to move the logic to a separate method.

* refactor: unify unsigned immediate parsing

* add a method to parse immediate values;
* add unit tests;
* use the method in the parsing of unsigned immediates;
* unify the code to parse %U and %C.

* refactor(parser): remove broken feature

Removes the ability to do arithmetic when loading memory labels as
immediate values (e.g., daddi r1, r0, label+20).

This feature is undocumented and has been broken since the very first
release (there is a bug where the entire token "label+operand" is parsed
as an integer).

As part of the removal, add back an overflow check for the label
address, which makes a test correctly fail at parsing time rather than at
runtime, therefore change the expected exception in tests.

* refactor: clean up %I parsing

* use parseImmediate();
* remove unused (and problematic) Converter functions;
* update docs to mention the ability to use memory labels as immediates.

Fixes #450.

* refactor: move parseImmediate to Converter
  • Loading branch information
lupino3 committed Aug 4, 2021
1 parent 8b70d46 commit 4e61be8
Show file tree
Hide file tree
Showing 10 changed files with 189 additions and 344 deletions.
7 changes: 6 additions & 1 deletion docs/user/en/src/source-files-format.rst
Original file line number Diff line number Diff line change
Expand Up @@ -147,14 +147,19 @@ Instructions can take three types of parameters:
* *Immediate values* an immediate value can be a number or a
label; the number can be specified in base 10 or in base 16: base 10 numbers
are simply inserted by writing the number, while base 16 number are inserted
by putting before the number the prefix "0x"
by putting before the number the prefix "0x". Immediate values can be preceded
by the # character.
* *Address* an address is composed by an immediate value followed
by a register name enclosed in brackets. The value of the register will be
used as base, the value of the immediate will be the offset.

The size of immediate values is limited by the number of bits that are
available in the bit encoding of the instruction.

When 16-bit immediates can be used, for example in ALU I-Type instructions,
it's also possible to use as an immediate value a memory label. The assembler
will put as immediate value the memory address the label points to.

You can use standard MIPS assembly aliases to address the first 32 registers,
appending the alias to one of the standard register prefixes like "r", "\$"
and "R". See the next table.
Expand Down
8 changes: 7 additions & 1 deletion docs/user/it/src/source-files-format.rst
Original file line number Diff line number Diff line change
Expand Up @@ -156,14 +156,20 @@ Le istruzioni possono accettare tre tipi di parametri:
un'etichetta; il numero può essere specificato in base 10 o in base 16. I
numeri in base 10 sono inseriti semplicemente scrivendo il numero
utilizzando l'usuale notazione decimale; i numeri in base 16 si inseriscono
aggiungendo all'inizio del numero il prefisso "0x";
aggiungendo all'inizio del numero il prefisso "0x". I valori immediati
possono essere preceduti dal carattere #;
* *Indirizzi* un indirizzo è composto da un valore immediato
seguito dal nome di un registro tra parentesi. Il valore del registro sarà
usato come base, quello dell'immediato come offset.

La dimensione dei valori immediati è limitata al numero di bit disponibili
nella codifica associata all'istruzione.

Nel caso di immediati a 16 bit, come ad esempio i valori immediati delle
istruzioni ALU I-Type, è possibile utilizzare come valore immediato un'etichetta
di memoria. L'assembler usera come valore immediato l'indirizzo della locazione
di memoria a cui l'etichetta punta.

è possibile utilizzare gli alias standard MIPS per i primi 32 registri,
mettendo in coda ai prefissi standard per i registri ("r", "$", "R") uno
degli alias indicati nella seguente tabella.
Expand Down
43 changes: 25 additions & 18 deletions src/main/java/org/edumips64/core/Converter.java
Original file line number Diff line number Diff line change
Expand Up @@ -672,30 +672,37 @@ public static boolean isHexNumber(String num) {
return true;
}

/** Check if is a valid string for an immediate value (rough check on number of
* digits, the caller is responsible for checking the actual value).
*
* TODO: this is not very clean, this function should be removed and the caller
* should just convert to a number and check the value.
* @param imm the string to validate
* @return false if imm isn't a valid immediate, else true
/**
* Parses an immediate value without any overflow/underflow check.
*
* The immediate value may be preceded by the # character (which is ignored).
* The immediate value may be encoded in base 10 or in base 16. In the latter
* case, it must be preceded by the '0x' or '0X' prefix.
*
* If the # character is used in a base-16 immediate, it must precede the 0x prefix.
*
* @param immediate a string representing an immediate value.
* @throws NumberFormatException if the number is not well-formatted.
* @return the parsed integer value.
*/
public static boolean isImmediate(String imm) {
if (imm.length() == 0) {
return false;
public static long parseImmediate(String immediate) {
if (immediate.length() == 0) {
throw new NumberFormatException("Invalid immediate: empty string.");
}

if (imm.charAt(0) == '#') {
imm = imm.substring(1);

// Skip the initial #, if present.
if (immediate.charAt(0) == '#') {
immediate = immediate.substring(1);
}

if (isInteger(imm)) {
return true;
} else if (isHexNumber(imm) && imm.length() <= 6) {
return true;
// Check if it's a hexadecimal.
int base = 10;
if (immediate.length() >= 3 && immediate.substring(0, 2).compareToIgnoreCase("0x") == 0) {
immediate = immediate.substring(2);
base = 16;
}

return false;
return Long.parseLong(immediate, base);
}
}

2 changes: 1 addition & 1 deletion src/main/java/org/edumips64/core/is/ALU_IType.java
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ public abstract class ALU_IType extends ComputationalInstructions {
protected final static int RT_FIELD_LENGTH = 5;
protected final static int RS_FIELD_LENGTH = 5;
protected final static int IMM_FIELD_LENGTH = 16;
protected final static int IMM_FIELD_MAX = (int) Math.pow(2, IMM_FIELD_LENGTH - 1) - 1;
protected final static int IMM_FIELD_MAX = 32767; // 2^15-1
protected String OPCODE_VALUE = "";

// Needs to be mutable because LUI's syntax is %R,%I, and IMM_FIELD will be 1 in that case.
Expand Down
6 changes: 3 additions & 3 deletions src/main/java/org/edumips64/core/is/Instruction.java
Original file line number Diff line number Diff line change
Expand Up @@ -117,9 +117,9 @@ public BitSet32 getRepr() {
* Valid type placeholders:
* %R General Purpose Register
* %F Floating Point Register
* %I Immediate value (6 bits?)
* %U Unsigned Immediate (5 bits?)
* %C Unsigned Immediate (3 bits)
* %I Signed Immediate Value (16 bits) [also allows a memory label as a value]
* %U Unsigned Immediate Value (5 bits)
* %C Unsigned Immediate Value (3 bits)
* %L Memory Label
* %E Program Label used for Jump Instructions
* %B Program Label used for Branch Instructions
Expand Down

0 comments on commit 4e61be8

Please sign in to comment.