-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[go] performance downgrade a lot after upgrade antlr from 4.9 to 4.13.1 #4613
Comments
I would need to see your grammar. There are a couple of small performance improvement patches that I intend to release very shortly. But I had debated just not allowing multiple go routines , but people said they used them. I did some work on a build configuration for this, but did not finish it yet. However, benchmarks are fickle. So: Did you first run one parse without measuring to load the ATNs etc? You can also cause it to do that before parsing, but that is the easiest way. It looks like you have have done that given the elapsed times? The version you were running was actually very broken. You are lucky it ran at all I think. I do not remember having to put that mutex in, but if I did, it is because it was needed in comparison to the Java target. This version definitely performs much better, but if your grammar is very ambiguous, then all bets are off. The previous runtime probably did not do half the work it should have done. If you supply your grammar, then I can take a look. But are you trying SLL parsing first? And using the DiagnosticErrorListener to work out where your grammar might be flakey? This runtime is definitely optimized for efficient grammars. Lexing is usually more CPU intensive than parsing. Hence I think that the issue might be that your grammar is being exposed with the new release. I will also check to see what state my single threaded grammar was in. FInally, you are using the latest Go compiler? |
Thanks for the very quick reply, @jimidle . My grammar can be found at https://github.com/milvus-io/milvus/blob/master/internal/parser/planparserv2/Plan.g4. This version has not been launched in our production environment yet, we just compared it to see if any performance improved.
|
Please note new release: https://github.com/antlr4-go/antlr/releases/tag/v4.13.1 This allows you to turn off mutexes with the build tag I will check your grammar and listener in a few days. But in the meantime, upgrade your go to the latest and check your grammar for efficiency and ambiguities. |
I took a Quick Look at the grammar that’s your issue. It is very ambiguous
and days not use left revision well for precedence. Binary ops are usually
higher precedence than everything else other than patens.
So the angle is likely calling your listener a ton of times my to try and
tell it about about, which it ignores. The listener then looks at tokens
and uses the locks. The new runtime will help with locks but don’t just
stop there. Let’s redo your grammar so that you don’t have the issue in the
first place.
Jim
…On Tue, May 14, 2024 at 20:33 Jiquan Long ***@***.***> wrote:
Thanks for the very quick reply, @jimidle <https://github.com/jimidle> .
My grammar can be found at
https://github.com/milvus-io/milvus/blob/master/internal/parser/planparserv2/Plan.g4.
This version has not been launched in our production environment yet, we
just compared it to see if any performance improved.
For other questions you mentioned:
1. We don't parse things in error, the syntax is correct.
2. Yes, we used an error listener, and it can be found at
https://github.com/milvus-io/milvus/blob/master/internal/parser/planparserv2/error_listener.go
.
3. Go version: go version go1.20.7 linux/amd64.
—
Reply to this email directly, view it on GitHub
<#4613 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7TMFRJEO6INDOYC26NS3ZCLCRPAVCNFSM6AAAAABHWCHYRKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJRGQ3DSMZSGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sorry for the late reply, I'm not feeling well these days and had some rest. |
There are a number of improvements you can make here:
Your performance issue is caused by: | expr op1 = (LT | LE) (Identifier | JSONIdentifier) op2 = (LT | LE) expr # Range
| expr op1 = (GT | GE) (Identifier | JSONIdentifier) op2 = (GT | GE) expr # ReverseRange
| expr op = (LT | LE | GT | GE) expr # Relational
| expr op = (EQ | NE) expr Don't separate the case for Identifier and JSONIdentifier etc. Don't duplicate teh constructs. The parser is likely trying that path, then backtracking, which is why you cannot use SLL parsing. I have not tested this, but I think that is the issue. Everything should be just expression grammar Plan;
expr:
| LPAREN expr RPAREN # Parens
| LBRACK expr (COMMA expr)* COMMA? RBRACK # Array
| expr op = NOT? IN expr # Term
| expr BAND expr # BitAnd
| expr BXOR expr # BitXor
| expr BOR expr # BitOr
| expr POW expr # Power
| expr AND expr # LogicalAnd
| expr OR expr # LogicalOr
| op = (ADD | SUB | BNOT | NOT) expr # Unary
| expr LIKE StringLiteral # Like
// | '(' typeName ')' expr # Cast
| expr op = (MUL | DIV | MOD) expr # MulDivMod
| expr op = (ADD | SUB) expr # AddSub
| expr op = (SHL | SHR) expr # Shift
| expr op = (IN | NIN) # NinIn
| expr op1 = (LT | LE) expr op2 = (LT | LE) expr # Range
| expr op1 = (GT | GE) expr op2 = (GT | GE) expr # ReverseRange
| expr op = (LT | LE | GT | GE) expr # Relational
| expr op = (EQ | NE) expr # Equality
| (JSONContains | ArrayContains) LPAREN expr COMMA expr RPAREN # JSONContains
| (JSONContainsAll | ArrayContainsAll) LPAREN expr COMMA expr RPAREN # JSONContainsAll
| (JSONContainsAny | ArrayContainsAny) LPAREN expr COMMA expr RPAREN # JSONContainsAny
| ArrayLength'('(Identifier | JSONIdentifier)')' # ArrayLength
| EXISTS expr # Exists
| IntegerConstant # Integer
| FloatingConstant # Floating
| BooleanConstant # Boolean
| StringLiteral # String
| Identifier # Identifier
| JSONIdentifier # JSONIdentifier
| EmptyTerm # EmptyTerm
;
// typeName: ty = (BOOL | INT8 | INT16 | INT32 | INT64 | FLOAT | DOUBLE);
// BOOL: 'bool';
// INT8: 'int8';
// INT16: 'int16';
// INT32: 'int32';
// INT64: 'int64';
// FLOAT: 'float';
// DOUBLE: 'double';
LPAREN : '(' ;
RPAREN : ')' ;
COMMA : ',';
LBRACK : '[';
RBRACK : ']';
LT: '<';
LE: '<=';
GT: '>';
GE: '>=';
EQ: '==';
NE: '!=';
LIKE: 'like' | 'LIKE';
EXISTS: 'exists' | 'EXISTS';
ADD: '+';
SUB: '-';
MUL: '*';
DIV: '/';
MOD: '%';
POW: '**';
SHL: '<<';
SHR: '>>';
BAND: '&';
BOR: '|';
BXOR: '^';
AND: '&&' | 'and';
OR: '||' | 'or';
BNOT: '~';
NOT: '!' | 'not';
IN: 'in';
EmptyTerm: '[' (Whitespace | Newline)* ']';
// TODO: Don't do this, use case-insensitive lexer or [Jj][Ss][Oo][Nn] etc.
JSONContains: 'json_contains' | 'JSON_CONTAINS';
JSONContainsAll: 'json_contains_all' | 'JSON_CONTAINS_ALL';
JSONContainsAny: 'json_contains_any' | 'JSON_CONTAINS_ANY';
ArrayContains: 'array_contains' | 'ARRAY_CONTAINS';
ArrayContainsAll: 'array_contains_all' | 'ARRAY_CONTAINS_ALL';
ArrayContainsAny: 'array_contains_any' | 'ARRAY_CONTAINS_ANY';
ArrayLength: 'array_length' | 'ARRAY_LENGTH';
BooleanConstant: 'true' | 'True' | 'TRUE' | 'false' | 'False' | 'FALSE';
IntegerConstant:
DecimalConstant
| OctalConstant
| HexadecimalConstant
| BinaryConstant;
FloatingConstant:
DecimalFloatingConstant
| HexadecimalFloatingConstant;
Identifier: Nondigit (Nondigit | Digit)* | '$meta';
StringLiteral: EncodingPrefix? ('"' DoubleSCharSequence? '"' | '\'' SingleSCharSequence? '\'');
JSONIdentifier: Identifier('[' (StringLiteral | DecimalConstant) ']')+;
fragment EncodingPrefix: 'u8' | 'u' | 'U' | 'L';
fragment DoubleSCharSequence: DoubleSChar+;
fragment SingleSCharSequence: SingleSChar+;
fragment DoubleSChar: ~["\\\r\n] | EscapeSequence | '\\\n' | '\\\r\n';
fragment SingleSChar: ~['\\\r\n] | EscapeSequence | '\\\n' | '\\\r\n';
fragment Nondigit: [a-zA-Z_];
fragment Digit: [0-9];
fragment BinaryConstant: '0' [bB] [0-1]+;
fragment DecimalConstant: NonzeroDigit Digit* | '0';
fragment OctalConstant: '0' OctalDigit*;
fragment HexadecimalConstant: '0' [xX] HexadecimalDigitSequence;
fragment NonzeroDigit: [1-9];
fragment OctalDigit: [0-7];
fragment HexadecimalDigit: [0-9a-fA-F];
fragment HexQuad:
HexadecimalDigit HexadecimalDigit HexadecimalDigit HexadecimalDigit;
fragment UniversalCharacterName:
'\\u' HexQuad
| '\\U' HexQuad HexQuad;
fragment DecimalFloatingConstant:
FractionalConstant ExponentPart?
| DigitSequence ExponentPart;
fragment HexadecimalFloatingConstant:
'0' [xX] (
HexadecimalFractionalConstant
| HexadecimalDigitSequence
) BinaryExponentPart;
fragment FractionalConstant:
DigitSequence? '.' DigitSequence
| DigitSequence '.';
fragment ExponentPart: [eE] [+-]? DigitSequence;
fragment DigitSequence: Digit+;
fragment HexadecimalFractionalConstant:
HexadecimalDigitSequence? '.' HexadecimalDigitSequence
| HexadecimalDigitSequence '.';
fragment HexadecimalDigitSequence: HexadecimalDigit+;
fragment BinaryExponentPart: [pP] [+-]? DigitSequence;
fragment EscapeSequence:
'\\' ['"?abfnrtv\\]
| '\\' OctalDigit OctalDigit? OctalDigit?
| '\\x' HexadecimalDigitSequence
| UniversalCharacterName;
Whitespace: [ \t]+ -> skip;
Newline: ( '\r' '\n'? | '\n') -> skip;
|
@parrt This can be closed. The grammar was the problem, not ANTLR or the Go runtime. |
Thanks a lot for the very detailed suggestions, @jimidle . I'll have a try to fix it. |
Hello, guys, thanks for the awesome project, it saved a lot of time for us to build a parser. Recently, I tried to upgrade the antlr version from 4.9 to 4.13.1, and I followed the https://github.com/antlr/antlr4/blob/master/doc/go-target.md to do this. However, after both the antlr runtime and generated code was upgraded to 4.13, we didn't gain performance improvement. The flamegraph of pprof indicated that the newer version need more time to do the parsing, most of cpu time was consumed by the
sync.Mutex
innerATN.NextTokensNoContext
. I'm very confused about this because in fact a parser won't be used in multiple goroutines in our scenario. Can anyone give me some suggestions?Version 4.9, the parsing cost 25.42% of cpu:
Version 4.13, the parsing cost 35.99% of cpu:
The text was updated successfully, but these errors were encountered: