Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] TypeError: 'JavaPackage' object is not callable after calling feathr_init_script.py #1217

Open
1 of 4 tasks
alexander-pv opened this issue Aug 31, 2023 · 0 comments
Open
1 of 4 tasks
Labels
bug Something isn't working

Comments

@alexander-pv
Copy link

Willingness to contribute

Yes. I can contribute a fix for this bug independently.

Feathr version

1.0.0

System information

  • OS Platform and Distribution: Linux Ubuntu 22.04 (Jammy), container, tag: jupyter/pyspark-notebook:python-3.9.13
  • Python version: 3.9.13
  • Spark version, if reporting runtime issue: 3.3.3

Describe the problem

Hi, thanks for your work! I really enjoyed studying your project. Now I deployed it as a group of services and ran into an error with the feathr_init_script.py script:

Error details:
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
23/08/31 16:03:04 INFO TransportClientFactory: Successfully created connection to spark-master/172.20.0.5:7077 after 16 ms (0 ms spent in bootstraps)
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20230831160304-0001
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230831160304-0001/0 on worker-20230831151258-172.20.0.7-39611 (172.20.0.7:39611) with 12 core(s)
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: Granted executor ID app-20230831160304-0001/0 on hostPort 172.20.0.7:39611 with 12 core(s), 1024.0 MiB RAM
23/08/31 16:03:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37479.
23/08/31 16:03:04 INFO NettyBlockTransferService: Server created on localhost:37479
23/08/31 16:03:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/08/31 16:03:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37479 with 434.4 MiB RAM, BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230831160304-0001/0 is now RUNNING
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
pyspark_client.py: Preprocessing via UDFs and submit Spark job.
FeatureJoinConfig is provided. Executing FeatureJoinJob.
submit_spark_job: feature_names_funcs: 
{'f_location_avg_fare,f_location_max_fare': <function preprocessing at 0x7f191a3ee9d0>}
set(feature_names_funcs.keys()): 
{'f_location_avg_fare,f_location_max_fare'}
submit_spark_job: Load DataFrame from Scala engine.
Traceback (most recent call last):
  File "/tmp/tmpv8dcsccs/feathr_pyspark_driver.py", line 107, in <module>
    submit_spark_job(feature_names_funcs)
  File "/tmp/tmpv8dcsccs/feathr_pyspark_driver.py", line 64, in submit_spark_job
    dataframeFromSpark = py4j_feature_job.loadSourceDataframe(
TypeError: 'JavaPackage' object is not callable
23/08/31 16:03:05 INFO SparkContext: Invoking stop() from shutdown hook
23/08/31 16:03:05 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
23/08/31 16:03:05 INFO StandaloneSchedulerBackend: Shutting down all executors
23/08/31 16:03:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
23/08/31 16:03:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/08/31 16:03:05 INFO MemoryStore: MemoryStore cleared
23/08/31 16:03:05 INFO BlockManager: BlockManager stopped
23/08/31 16:03:05 INFO BlockManagerMaster: BlockManagerMaster stopped
23/08/31 16:03:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/08/31 16:03:05 INFO SparkContext: Successfully stopped SparkContext
23/08/31 16:03:05 INFO ShutdownHookManager: Shutdown hook called
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-2a93ad8c-6d18-4542-b2ab-6c735cb953ba
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-479df04e-793a-4c82-a245-660169eab9fa/pyspark-d8b933f6-acf7-4445-b0fe-89ad2537e846
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-479df04e-793a-4c82-a245-660169eab9fa

My docker-compose file: Link. Short manual for my example: Link.
Since I'm not very good at understanding Java logs, I am asking for help in debugging th error. When it is solved, I will be glad to contribute.

Tracking information

Error details:
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://spark-master:7077...
23/08/31 16:03:04 INFO TransportClientFactory: Successfully created connection to spark-master/172.20.0.5:7077 after 16 ms (0 ms spent in bootstraps)
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app ID app-20230831160304-0001
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20230831160304-0001/0 on worker-20230831151258-172.20.0.7-39611 (172.20.0.7:39611) with 12 core(s)
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: Granted executor ID app-20230831160304-0001/0 on hostPort 172.20.0.7:39611 with 12 core(s), 1024.0 MiB RAM
23/08/31 16:03:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 37479.
23/08/31 16:03:04 INFO NettyBlockTransferService: Server created on localhost:37479
23/08/31 16:03:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/08/31 16:03:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManagerMasterEndpoint: Registering block manager localhost:37479 with 434.4 MiB RAM, BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, localhost, 37479, None)
23/08/31 16:03:04 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20230831160304-0001/0 is now RUNNING
23/08/31 16:03:04 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
pyspark_client.py: Preprocessing via UDFs and submit Spark job.
FeatureJoinConfig is provided. Executing FeatureJoinJob.
submit_spark_job: feature_names_funcs: 
{'f_location_avg_fare,f_location_max_fare': <function preprocessing at 0x7f191a3ee9d0>}
set(feature_names_funcs.keys()): 
{'f_location_avg_fare,f_location_max_fare'}
submit_spark_job: Load DataFrame from Scala engine.
Traceback (most recent call last):
  File "/tmp/tmpv8dcsccs/feathr_pyspark_driver.py", line 107, in <module>
    submit_spark_job(feature_names_funcs)
  File "/tmp/tmpv8dcsccs/feathr_pyspark_driver.py", line 64, in submit_spark_job
    dataframeFromSpark = py4j_feature_job.loadSourceDataframe(
TypeError: 'JavaPackage' object is not callable
23/08/31 16:03:05 INFO SparkContext: Invoking stop() from shutdown hook
23/08/31 16:03:05 INFO SparkUI: Stopped Spark web UI at http://localhost:4040
23/08/31 16:03:05 INFO StandaloneSchedulerBackend: Shutting down all executors
23/08/31 16:03:05 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
23/08/31 16:03:05 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/08/31 16:03:05 INFO MemoryStore: MemoryStore cleared
23/08/31 16:03:05 INFO BlockManager: BlockManager stopped
23/08/31 16:03:05 INFO BlockManagerMaster: BlockManagerMaster stopped
23/08/31 16:03:05 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/08/31 16:03:05 INFO SparkContext: Successfully stopped SparkContext
23/08/31 16:03:05 INFO ShutdownHookManager: Shutdown hook called
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-2a93ad8c-6d18-4542-b2ab-6c735cb953ba
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-479df04e-793a-4c82-a245-660169eab9fa/pyspark-d8b933f6-acf7-4445-b0fe-89ad2537e846
23/08/31 16:03:05 INFO ShutdownHookManager: Deleting directory /tmp/spark-479df04e-793a-4c82-a245-660169eab9fa

Code to reproduce bug

No response

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant