Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]:Ternary Operator in return statement causes (subsequent) stack corruption #25032

Open
damianmoz opened this issue May 14, 2024 · 14 comments

Comments

@damianmoz
Copy link

The following routine, without the ternary operator used in the return statement, works.

inline proc positive(z : string)
{   
    param EMPTY = '';
    
/*
 *  THIS WORKS 
 */ 
    if z == '~' then
        return  ' ';
    else if z == '+' then
        return z;
    else
        return EMPTY;
/*      
 *  THIS IS BROKEN
    return if z == '~' then ' ' else (if z == '+' then z else EMPTY);
 */     
}           

If the BROKEN return statement is enabled (and the earlier code disabled), it must do naughty things to the stack because subsequent assignment to a freshly allocated array of strings gets corrupted. Or am I using strings the wrong way?

Sample file (suitably trimmed of excess functionality) appears here:

inline proc positive(z : string)
{   
    param EMPTY = '';
    
/*
 *  THIS WORKS 
 */ 
    if z == '~' then
        return  ' ';
    else if z == '+' then
        return z;
    else
        return EMPTY;
/*      
 *  THIS IS BROKEN
    return if z == '~' then ' ' else (if z == '+' then z else EMPTY);
 */     
}           
        
proc itoa(a : int, p = '', s = '', g = 0, b = 10)
{       
    // capture the sign of the argument and if negative, use a minus sign
    // if positive, decide on whether to use plus sign, blank, or nothing
    const sign = if a < 0 then '-' else positive(p);
    // compute |a| safely if a == 2^-31
    var n = if a == min(int) then 1:uint + max(int):uint else abs(a):uint;
    // cast base 'b' to an unsigned nteger
    const _b = b:uint;
    // decode |a| into digits
    var m = 0;
    var t : [0..#20] uint(8);

    do
    {
        const j = n / _b;

        t[m] = (n - _b * j):uint(8); n = j; m += 1;
    }
    while n > 0;

    // writeln("digits ", t[0..#m]);

    // reverse the order
    {
        param NUL = '';
        const digit = "0123456789abcdef";
        var rt : [0..m] string;

        rt[0] = sign;
        // BETWEEN HERE and the JOIN below, parts of rt[0] get clobbered
        // writeln("rt[0] prior to assembly ", rt[0..#1]);
        // sometime during this loop, rt[0] is sometimes overwritten
        for i in 1..m
        {
            rt[i] = digit[t[m - i]];
        }
        // writeln("rt[0] after an assembly ", rt[0..#1]);
        // NORMALLY THIS is ''.join(rt)
        return '|'.join(rt);
    }
}

const _2p32 = (1 << 32):real(64);

// writeln(_2p32 * _2p32);
writeln(itoa(1234567, "+"));
writeln(itoa(-1234567, "+"));
writeln(itoa(max(int), "+"), " <--- MAXINT");
writeln(itoa(min(int), "+"));
writeln(itoa(max(int), "+", "_", 6), " <--- +MAXINT - not quite!!");
writeln(itoa(-max(int), "+", "_", 6), " <--- -MAXINT - not quite!!");
// writeln(itoa(0));
// writeln(itoa(0, "+"));
// writeln(itoa(1234567, "+", ",", 3));
// writeln(itoa(0x1234567f, s = "_", g = 2, b = 16));
// writeln(itoa(min(int), s = "_", g = 2, b = 16));
exit(0);

Tagging @mppf and @vasslitvinov. Thanks.

The above code is supposed to encode an integer in base 10 or base 16. The functionality removed for purposes here is related to handling grouping separators, e.g. a comma every three digits or an underscore every two digits in a hexadecimal number.

@damianmoz
Copy link
Author

This routine is normally protected by a wrapper which ensures things like the base b is either 10 or 16. As a I noted, ths is minimalist to facilitate bug detection. Thanks

@damianmoz
Copy link
Author

OUTPUT FOLLOWS:

Correct (no ternary operator)

+|1|2|3|4|5|6|7
-|1|2|3|4|5|6|7
+|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- MAXINT
-|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|8
+|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- +MAXINT
-|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- -MAXINT

Rubbish (using ternary operator)

+|1|2|3|4|5|6|7
-|1|2|3|4|5|6|7
9|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- MAXINT
-|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|0|0|8
2|2|2|2|3|3|7|2|0|3|6|8|5|4|7|7|5|8|0|7 <--- +MAXINT - not quite!!
-|9|2|2|3|3|7|2|0|3|6|8|5|4|7|7|0|0|0|7 <--- -MAXINT - not quite!!

Without '|' ->

Correct:

+1234567
-1234567
+9223372036854775807 <--- MAXINT
-9223372036854775808
+9223372036854775807 <--- +MAXINT
-9223372036854775807 <--- -MAXINT

Rubbish:

+1234567
-1234567
99223372036854775807 <--- MAXINT
-9223372036854775008
22223372036854775807 <--- +MAXINT - not quite!!

@vasslitvinov
Copy link
Member

@damianmoz a standard question: how critical is this bug in your work / is the workaround you proposed sufficient for you?

@damianmoz
Copy link
Author

It is not stopping me here because I debugged it. It was not fun and consumed way much time. And got lucky. What f I had not got lucky.

I use the ternary if operator all the time. And I mean all the time.

I am not looking forward to dong such debugging. t means you need to have a huge warning in the documentation that says SUSPECT. That it not good for the language. Scary too.

@jabraham17
Copy link
Member

jabraham17 commented May 14, 2024

Heres a simpler reproducer

proc positive( z : string)
{
    param EMPTY = '';
    return if z == '~' then ' ' else (if z == '+' then z else EMPTY);
}
const s = positive('+');
writeln(s);

The problem is that z is a reference, and the expression is creating an owning copy for the return value, which is then de-allocated when the function exits.

I can rewrite the return in subtle ways to get this to work again.

// 1
return if z == '~' then ' ' else (if z == '+' then z:string else EMPTY);
// 2
return if z == '~' then ' ' else (if z == '+' then '+' else EMPTY);
// 3
const temp = if z == '+' then z else EMPTY;
return if z == '~' then ' ' else temp;

@damianmoz
Copy link
Author

Thanks Jade. That is good to know. Still scary

To date, i.e. for the last decade, I have only used the ternary if with numbers. Which is why I have not seen this before. I have hardly ever used strings in the past.

I still think it needs to get fixed but I do not control your priorities.

So strings are broken when used with a multi-level ternary if

Strings seem a bit fragile.

@damianmoz
Copy link
Author

Jade, thanks for looking into this. Both //1 and //2 are not intuitive.

I did not try your //3 yesterday. But I did just now and it works flawlessly. Basically, these results say that the compiler can only handle a trivial ternary if, i.e. one without a nesting of if/then/else. This should be noted in the Chapel documentation - in very big bold letters. Or for people (like me), have a warning issued by the compiler (which cannot be suppressed).

So, to answer your question @vasslitvinov, Jade's solution is a bit clunky but that will do me for now. So, fix at your leisure. But I would suggest that the warning message is super urgent. That's dangerous.

I absolutely love ternary if statements. I fell in love with them in Algol68. I am from the "every statement has a value" camp. They are much more succinct than the equivalent with a normal if/then/else. In C or C+,+ that succinctness also results in far more optimal code! I looked at numerous versions of my smaller C (and Chapel) functions which approximate the function by different rules depending on the domain of the argument (mix). I note that >80% of them have ternary if statements which are 3 deep, and >30% are 4 deep. I even have some which are 6 deep.

This is the first time I have tried a ternary if with a string which is anything except 2 deep.

Thanks Jade, my stress levels have dropped significantly today since yesterday.

@damianmoz
Copy link
Author

It is not just strings with a ternary if that will have a problem. I guess that it will be anything which is creating a copy and owning it and all that stuff. Out of scope for my brain.

Should I close this now?

@mppf
Copy link
Member

mppf commented May 15, 2024

We should keep the issue open until the bug is fixed, or at the very least, the compiler is adjusted to emit a warning or error for this pattern.

@damianmoz
Copy link
Author

Should I be using c_array(T, n) where T is some type to mimic a char? But that what type is T and how do I then write a single char. Then at the end, I would want to return a genuine Chapel string? How do I create a string but I cannot find that mentioned in the doco, although I could be fishing with the wrong phrases so not looking in the right places? Not urgent.

@damianmoz
Copy link
Author

I have had (not string) scenarios where I had tried to pass a sliced array expression as a parameter to a routine whose matching argument was a const. I got a nice compiler message saying to the effect of "cannot handle this". So I had to stick the expression in question into a const temporary. Only added one extra line. Still very readable. That worked well. But the fatal error message from the compiler told me that there was something it could not handle. So, that message should be all that is needed.

In some ways forcing the user to create their own temporary is good because they then know what space needs they are demanding in a given scenario. That said, in my case it is a 32 byte buffer so it is peanuts.

@lydia-duncan
Copy link
Member

Should I be using c_array(T, n) where T is some type to mimic a char? But that what type is T and how do I then write a single char.

You can certainly try using a c_array where T is c_char or c_uchar or something along those lines. But in an ideal world, you shouldn't have to, this should be something that "just works".

Then at the end, I would want to return a genuine Chapel string? How do I create a string but I cannot find that mentioned in the doco, although I could be fishing with the wrong phrases so not looking in the right places? Not urgent.

I think what you're looking for is string.createAdoptingBuffer and string.createCopyingBuffer from https://chapel-lang.org/docs/latest/language/spec/strings.html#String.string.createAdoptingBuffer. The documentation for c_array says that it can coerce to a c_ptr with the same element type, though I'm not finding that true in practice - it looks like an explicit cast is necessary at the moment to send a c_array to those functions (I'm planning on reporting that in a separate issue)

@damianmoz
Copy link
Author

OK. c_array() sounds a little bit hairy.

I did write a new routine with strings yesterday and was ultra careful with my ternary operators and worked with copies and avoided referencing the original string as a whole. And no stack corruption. Woo hoo.

@vasslitvinov
Copy link
Member

Moving the discussion of "how to convert an int to a string" back to the discourse thread, I posted a version that uses c_array: https://chapel.discourse.group/t/encoding-an-integer-into-a-string-in-pure-chapel/33326/10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants