Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JMX port for JVM monitoring will stay up if JMX port goes down on an VM #1183

Open
kaio-ru opened this issue Apr 3, 2023 · 0 comments
Open

Comments

@kaio-ru
Copy link

kaio-ru commented Apr 3, 2023

I have few VM's in Compute Engine that are running Java applications on RHEL 8. I am able to monitor them over JMX and I would like to set up an alarm that when one of those processes dies, Google Monitoring would send out an alarm, that JMX port is down.

Describe the bug
I have set up a policy for workload.googleapis.com/jvm.threads.count for it. When I kill one of the processes, the opentelemetry-collector will hang like the JMX port is still up. There is a process java -Dorg.slf4j.simpleLogger.defaultLogLevel=info io.opentelemetry.contrib.jmxmetrics.JmxMetrics -config /tmp/jmx-config-3146301998.properties up and this seems to be feeding GCP this data. When I restart the google-cloud-ops-agent-opentelemetry-collector service on the server, the jvm.threads.count will drop and an alarm is being sent out

/tmp/jmx-config-3146301998.properties file:

otel.exporter.otlp.endpoint = http://0.0.0.0:42861
otel.exporter.otlp.timeout = 5000
otel.jmx.interval.milliseconds = 60000
otel.jmx.service.url = service:jmx:rmi:///jndi/rmi://localhost:8565/jmxrmi
otel.jmx.target.system = jvm
otel.metrics.exporter = otlp

To Reproduce
Steps to reproduce the behavior:

  1. Set up JMX monitoring and an alert with metric jvm.threads.count, if metric is above 1, all is good, if metric is not over 1, send out an alarm
  2. Start the java application, metric is over 1
  3. Stop the java application, JMX port on the Virtual Machine drops but metric is still above 1 in Google Monitoring
  4. Restart the google-cloud-ops-agent-opentelemetry-collector.service service
  5. jvm.threads.count drops in the Google Monitoring also and an alarm is triggered
  6. Start the java application, metric will be over 1 again and the alarm clears

Test_dashboard_Monitoring_Demograft_Google_Cloud_console_Mozilla_Firefox

Expected behavior
Alarm should be triggered without the google-cloud-ops-agent-opentelemetry-collector.service restart

Environment (please complete the following information):

  • Project ID demograft
  • VM ID 197175560824347001
  • VM distro / OS: RHEL 8
  • Ops Agent version 2.29.0 1.el8
  • Ops Agent configuration
 metrics:
  receivers:
    jvm:
      type: jvm
      endpoint: localhost:8565
      collection_interval: 60s
  service:
    pipelines:
      jvm:
        receivers:
          - jvm
  • Ops Agent log
    health-checks.log
2023/03/31 11:23:38 api_check.go:114: logging client was created successfully
2023/03/31 11:23:38 api_check.go:146: monitoring client was created successfully
2023/03/31 11:23:38 healthchecks.go:78: API Check - Result: PASS
2023/04/03 08:26:44 ports_check.go:60: listening to 0.0.0.0:20202:
2023/04/03 08:26:44 ports_check.go:70: listening to 0.0.0.0:20201:
2023/04/03 08:26:44 ports_check.go:79: listening to [::]:20201:
2023/04/03 08:26:44 healthchecks.go:78: Ports Check - Result: PASS
2023/04/03 08:26:44 healthchecks.go:78: Network Check - Result: ERROR, Detail: Get "https://logging.googleapis.com/$discovery/rest": dial tcp: lookup logging.googleapis.com on 169.254.169.254:53: dial udp 169.254.169.254:53: connect: network is unreachable
2023/04/03 08:26:44 healthchecks.go:78: API Check - Result: ERROR, Detail: can't get GCE metadata: can't get resource metadata: Get "http://169.254.169.254/computeMetadata/v1/instance/zone": dial tcp 169.254.169.254:80: connect: network is unreachable

logging-module.log

[2023/04/03 08:26:47] [ info] [fluent bit] version=2.0.10, commit=, pid=1243
[2023/04/03 08:26:47] [ info] [storage] ver=1.4.0, type=memory+filesystem, sync=normal, checksum=off, max_chunks_up=128
[2023/04/03 08:26:47] [ info] [storage] backlog input plugin: storage_backlog.4
[2023/04/03 08:26:47] [ info] [cmetrics] version=0.5.8
[2023/04/03 08:26:47] [ info] [ctraces ] version=0.2.7
[2023/04/03 08:26:47] [ info] [input:fluentbit_metrics:fluentbit_metrics.0] initializing
[2023/04/03 08:26:47] [ info] [input:fluentbit_metrics:fluentbit_metrics.0] storage_strategy='memory' (memory only)
[2023/04/03 08:26:47] [ info] [input:tail:tail.1] initializing
[2023/04/03 08:26:47] [ info] [input:tail:tail.1] storage_strategy='filesystem' (memory + filesystem)
[2023/04/03 08:26:47] [ info] [input:tail:tail.2] initializing
[2023/04/03 08:26:47] [ info] [input:tail:tail.2] storage_strategy='filesystem' (memory + filesystem)
[2023/04/03 08:26:47] [ info] [input:tail:tail.2] multiline core started
[2023/04/03 08:26:47] [ info] [input:tail:tail.3] initializing
[2023/04/03 08:26:47] [ info] [input:tail:tail.3] storage_strategy='memory' (memory only)
[2023/04/03 08:26:47] [ info] [input:storage_backlog:storage_backlog.4] initializing
[2023/04/03 08:26:47] [ info] [input:storage_backlog:storage_backlog.4] storage_strategy='memory' (memory only)
[2023/04/03 08:26:47] [ info] [input:storage_backlog:storage_backlog.4] queue memory limit: 47.7M
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] metadata_server set to http://metadata.google.internal
[2023/04/03 08:26:47] [ warn] [output:stackdriver:stackdriver.0] client_email is not defined, using a default one
[2023/04/03 08:26:47] [ warn] [output:stackdriver:stackdriver.0] private_key is not defined, fetching it from metadata server
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] worker #0 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] worker #2 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] worker #3 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] worker #1 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] worker #4 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] worker #5 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] worker #6 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] metadata_server set to http://metadata.google.internal
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.0] worker #7 started
[2023/04/03 08:26:47] [ warn] [output:stackdriver:stackdriver.1] client_email is not defined, using a default one
[2023/04/03 08:26:47] [ warn] [output:stackdriver:stackdriver.1] private_key is not defined, fetching it from metadata server
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] worker #0 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] worker #2 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] worker #1 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] worker #5 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] worker #6 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] worker #7 started
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] worker #3 started
[2023/04/03 08:26:47] [ info] [output:prometheus_exporter:prometheus_exporter.2] listening iface=0.0.0.0 tcp_port=20202
[2023/04/03 08:26:47] [ info] [output:stackdriver:stackdriver.1] worker #4 started
[2023/04/03 08:26:47] [ info] [input:tail:tail.1] inotify_fs_add(): inode=50547303 watch_fd=1 name=/var/log/messages
[2023/04/03 08:26:47] [ info] [input:tail:tail.2] inotify_fs_add(): inode=426960 watch_fd=1 name=/opt/**-**-******-front/logs/**-*******.log
[2023/04/03 08:26:48] [ info] [input:tail:tail.3] inotify_fs_add(): inode=689166 watch_fd=1 name=/var/log/google-cloud-ops-agent/subagents/logging-module.log
[2023/04/03 09:19:00] [ info] [input:tail:tail.2] inode=426960 handle rotation(): /opt/**-**-******-front/logs/**-*******.log => /opt/**-**-*****-front/logs/**-*******.log.2023-03-31.1
[2023/04/03 09:19:00] [ info] [input:tail:tail.2] inotify_fs_remove(): inode=426960 watch_fd=1
[2023/04/03 09:19:00] [error] [/work/submodules/fluent-bit/plugins/in_tail/tail_fs_inotify.c:147 errno=2] No such file or directory
[2023/04/03 09:19:00] [ info] [input:tail:tail.2] inotify_fs_add(): inode=980964 watch_fd=2 name=/opt/**-**-******-front/logs/tv-optimizer.log

Additional context
Add any other context about the problem here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant