Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access Log Parsing - Latency not recorded. #1339

Open
codeconsole opened this issue Jul 18, 2023 · 3 comments
Open

Access Log Parsing - Latency not recorded. #1339

codeconsole opened this issue Jul 18, 2023 · 3 comments

Comments

@codeconsole
Copy link

I was patiently hoping that Ops Agent would catch up to the capabilities of the legacy agent, but it has been a couple years now and it is still lagging in basic capabilities. One of which I find very important: the ability to pass latency into the logs.

One would expect this feature to exist and already be available based on existing documentation, but it does not. Nor is the documentation clear on why it does not work.

For example:

The current Tomcat documentation states "What is logged" in which it says "The tomcat_access logs contain the following fields in the LogEntry:" and specifies that httpRequest is one of those fields, but looking at the documentation for httpRequest indicates that latency is part of what is recorded and that is simply not the case.

Digging into the source, I see that tomcat_access uses genericAccessLogParser, but the common_logging_processors.go genericAccessLogParser method has no reference to latency:

Regex: `^(?<http_request_remoteIp>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<http_request_requestMethod>\S+)(?: +(?<http_request_requestUrl>[^\"]*?)(?: +(?<http_request_protocol>\S+))?)?" (?<http_request_status>[^ ]*) (?<http_request_responseSize>[^ ]*)(?: "(?<http_request_referer>[^\"]*)" "(?<http_request_userAgent>[^\"]*)")?(?: "(?<gzip_ratio>[^\"]*)"

Previously in the legacy agent, you were able to override the regex. This is what I had done to get the legacy parser to work:

<source>
  @type tail
  format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>(?:[^\"]|\\.)*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>(?:[^\"]|\\.)*)" "(?<agent>(?:[^\"]|\\.)*)" (?<latency>[^ ]*))?$/
  time_format %d/%b/%Y:%H:%M:%S %z
  types code:integer,size:integer
  null_value_pattern '^-$'
  path /var/log/myapp/access_log.%Y-%m-%d.log
  pos_file /var/lib/google-fluentd/pos/myapp.pos
  read_from_head true
  tag apache-access-myapp
</source>

A simple solution would be to allow overriding the regex in the yaml configuration or just optionally append it in a non-breaking way to the url so that the format can be re-written to include latency. Then tomcat can be configured directly or via Spring Boot to add %Ts latency to the request.

@codeconsole
Copy link
Author

Even the NGINX documentation referenced in the source code documents how latency should be appended to the logs.

Another example of the log format enables tracking different time values between NGINX and an upstream server that may help to diagnose a problem if your website experience slowdowns. You can use the following variables to log the indicated time values:

[$upstream_connect_time](https://nginx.org/en/docs/http/ngx_http_upstream_module.html#var_upstream_connect_time) – The time spent on establishing a connection with an upstream server
[$upstream_header_time](https://nginx.org/en/docs/http/ngx_http_upstream_module.html#var_upstream_header_time) – The time between establishing a connection and receiving the first byte of the response header from the upstream server
[$upstream_response_time](https://nginx.org/en/docs/http/ngx_http_upstream_module.html#var_upstream_response_time) – The time between establishing a connection and receiving the last byte of the response body from the upstream server
[$request_time](https://nginx.org/en/docs/http/ngx_http_log_module.html#var_request_time) – The total time spent processing a request
All time values are measured in seconds with millisecond resolution.

However the nginx.go ops agent implementation does not support this either.

@jefferbrecht
Copy link
Member

Hi @codeconsole,

Thank you for your feedback. To clarify, this is working as intended, and the documentation doesn't require any particular sub-fields of HttpRequest to be populated. Please let us know what we can do to make the documentation more clear.

In general, third-party integrations in the Ops Agent, such as the Tomcat receivers, are designed to work out-of-the-box with default application configurations. Since latency is not logged by default, it is not included in the regex.

You can still define custom behavior, including custom regexes, by defining a files receiver for the log file followed by parse_regex and modify_fields processors to shape the logs as needed. The code inside genericAccessLogParser can be used as a starting point for this.

@codeconsole
Copy link
Author

Hi @jefferbrecht,

Thanks, I wasn't aware of the parse_regex + modify_fields existence. I am not sure if that existed when I originally switched from the legacy agent a couple years ago. I was able to get it to work, however, having to use modify_fields seems a bit overkill. Previously the legacy agent used convention over configuration. Why not provide something similar?

This is a bit ugly:

    set_http_request:
      type: modify_fields
      fields:
        httpRequest.remoteIp:
          move_from: jsonPayload.http_request_remoteIp
        httpRequest.requestMethod:
          move_from: jsonPayload.http_request_requestMethod
        httpRequest.requestUrl:
          move_from: jsonPayload.http_request_requestUrl
        httpRequest.protocol:
          move_from: jsonPayload.http_request_protocol
        httpRequest.status:
          move_from: jsonPayload.http_request_status
          type: integer
        httpRequest.responseSize:
          move_from: jsonPayload.http_request_responseSize
        httpRequest.referer:
          move_from: jsonPayload.http_request_referer
          omit_if: jsonPayload.http_request_referer = "-"
        httpRequest.userAgent:
          move_from: jsonPayload.http_request_userAgent
        httpRequest.latency:
          move_from: jsonPayload.latency

I suggest at least providing an option that does automatic jsonPayload mapping if any of these httpRequest fields are present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants