Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

[bug report] When a task was cloned, the TensorBoard port was not regenerated, so the TensorBoard could not be started. #5721

Open
siaimes opened this issue Feb 25, 2022 · 1 comment
Labels

Comments

@siaimes
Copy link
Contributor

siaimes commented Feb 25, 2022

Organization Name:

Short summary about the issue/question:
When a task was cloned, the TensorBoard port was not regenerated, so the TensorBoard could not be started.
Brief what process you are following:
Clone a job.

How to reproduce it:
Submit a job that TensorBoard is enabled and clone it, you will find that the TensorBoard part in the YAML configuration file of the two jobs is exactly the same, which will cause the cloned job to fail to start TensorBoard.

OpenPAI Environment:

  • OpenPAI version: v1.8.0
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:

@siaimes
Copy link
Contributor Author

siaimes commented Mar 2, 2022

Instead of specifying the port number in the YAML file, we should use the same method as the ssh function to pick an available port and feed it back to the frontend.

@Binyang2014 Binyang2014 added the bug label Mar 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants