-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A script that parses XML takes only 0.00718 seconds to execute through Python, but it takes 17 seconds to execute through Python4J. Why is there such a difference in performance? #10044
Comments
@aizhimin thanks for posting! Can you post the script so I can look? Python version would help as well. |
@agibsonccc Python version is 3.10.2 root = etree.fromstring(xml_data_str) this script takes only 0.00718 seconds to execute through Python. But it takes 17 seconds to execute through Python4J. Is it slower to pass the binary data of a file as input in Java? Do I need to pass the file path in and let Python read the file? |
@aizhimin can you give me something I can run standalone? If I'm going to benchmark something you and I need a common baseline to work with. |
@agibsonccc Sorry, my file is confidential. My question is that the code for xml parsing executed directly in the python environment runs very fast. However, the xml parsing executed through python4j calls is very slow. Is this due to the need to load the python parser? Or is it because the input parameters cannot pass file data streams? |
@aizhimin I don't care about your secrets. A vague description I can't directly run isn't something I'm inclined to spend time on. I believe you but you putting up barriers to me reproducing the issue isn't going to help get this fixed. Meet me half way and setup a trivial example you can show me and I'll be more likely to take a look at this when I get time. The goal is to have a common "language" we can speak here (in this case code) that allows us both to run the same environment and baseline so we can both agree the issue is resolved. |
The python script like this:
The java code like this:
Java takes 2.193s If only python ,it takes only 0.0119s |
Issue Description
A script that parses XML takes only 0.00718 seconds to execute through Python, but it takes 17 seconds to execute through Python4J. Why is there such a difference in performance?
Version Information
Please indicate relevant versions, including, if relevant:
the java code:
`public static void parserOverlay(){
try(PythonGIL pythonGIL = PythonGIL.lock()) {
try(PythonGC gc = PythonGC.watch()) {
//inputs
byte[] xml_data_bytes = FileUtils.readFileToByteArray(new File("D:\software\BaiduNetdisk\download\231108\231108\MGT\TE01214\KTOVLRAW_TE01214_OL.xml"));
List inputs = new ArrayList<>();
inputs.add(new PythonVariable<>("xml_data_bytes", PythonTypes.BYTES, xml_data_bytes));
what's wrong?
The text was updated successfully, but these errors were encountered: