Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IIS Access Log Paraing Can Not Work #1382

Open
easonwang0827 opened this issue Aug 18, 2023 · 4 comments
Open

IIS Access Log Paraing Can Not Work #1382

easonwang0827 opened this issue Aug 18, 2023 · 4 comments

Comments

@easonwang0827
Copy link

easonwang0827 commented Aug 18, 2023

My environment is as follows:

OS: Windows Server 2019
GCP OPS Agent: 2.36.0

I have the following configuration in C:\Program Files\Google\Cloud Operations\Ops Agent\config\config.yaml:
logging:
receivers:
iis_access:
type: files
include_paths:
- 'D:\www\xxx.xxx.com\logs\iis\W3SVC9\u_ex*.log'
accesslog_regex:
type: parse_regex
field: message
regex: "(?\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(?<http_request_serverIp>[^\s]+)\s(?<http_request_requestMethod>[^\s]+)\s(?<cs_uri_stem>/[^\s])\s(?<cs_uri_query>[^\s])\s(?<s_port>\d*)\s(?[^\s]+)\s(?<http_request_remoteIp>[^\s]+)\s(?\w{4}/\d.?\d?)\s(?<http_request_userAgent>[^\s]+)\s(?<http_request_referer>[^\s]+)\s(?[^\s]+)\s(?<http_request_status>\d{3})\s(?<sc_substatus>\d+)\s(?<sc_win32_status>\d+)\s(?<sc_bytes>\d+)\s(?<cs_bytes>\d+)\s(?<time_taken>\d+)\s(?<http_x_forward_ip1>[^\s]+)$"
time_key: timestamp
time_format: "%Y-%m-%d %H:%M:%S"
service:
pipelines:
iis:
receivers:
- iis_access
processors:
- accesslog_regex
-
After completing the configuration, I've already restarted the GCP Ops Agent. However, the Cloud Logging shows the format as follows:
{
insertId: "xb87xaf6b9nr9"
jsonPayload: {
message: "2023-08-18 02:21:40 192.168.xxx.xxx GET /xxxx/testurl - 443 - 192.168.50.50 HTTP/2 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/115.0.0.0+Safari/537.36 - xxx.xxx.com 200 0 0 334 2592 63 -"
}
labels: {
compute.googleapis.com/resource_name: "gcpxxx-xxx"
}
logName: "projects/xxx-xxx/logs/iis_access"
receiveTimestamp: "2023-08-18T02:21:47.678317103Z"
resource: {
labels: {
instance_id: "3454610759666887808" (instance_name: gcpxxx-xxxx)
project_id: "xxxx-xxxx-xxx"
zone: "asia-east1-c"
}
type: "gce_instance"
}
timestamp: "2023-08-18T02:21:47.231603900Z"
}

It doesn't match the regex and isn't in JSON format, even if I change the "type" under "receivers" to "iis_access". The HTTP request fields are all empty.

What should I do to format it like what GCP Load Balancer outputs, so I can search it in Cloud Logging?

@braydonk
Copy link
Contributor

Hi @easonwang0827,

When you changed the receiver type to iis_access, did you still have the processor on? I think if you have just the iis_access and no processor, it should work (provided your IIS Access Log format is not different from the default).

@easonwang0827
Copy link
Author

Hi @braydonk ,

 When I changed type to iis_access and IIS Access Log format to default ,  it is working now . Thanks @braydonk 
 But I need to record additional IIS log fields (e.g., cs_host, cs_bytes, x-forwarded-for, etc.). Is there a way to achieve this ?

@jefferbrecht
Copy link
Member

At this time you would need to make your own files receiver and custom processors in order to change the built-in regex for different log formats, as you've attempted to do already.

Note that your regex YAML in the config should be enclosed by single-quotes, not double-quotes, since YAML parses backslashes as escape sequences when enclosed in a double-quoted string.

Aside from that, there are a few errors in the regex:

  • regex uses Ruby regexes, and Ruby regexes need forward-slashes to be escaped as \/.
  • (?abc) isn't a valid group; use either (?:abc) for a non-capturing group, (abc) for a capturing group, or (?<field>abc) for a named capturing group.
  • timestamp is specified as the time_key but isn't defined in the regex.
  • (?<cs_uri_stem>/[^\s]) should have a + before the end.

Putting all of that together I get regex: '(?<timestamp>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(?<http_request_serverIp>[^\s]+)\s(?<http_request_requestMethod>[^\s]+)\s(?<cs_uri_stem>\/[^\s]+)\s(?<cs_uri_query>[^\s])\s(?<s_port>\d*)\s(?:[^\s]+)\s(?<http_request_remoteIp>[^\s]+)\s(?:\w{4}\/\d.?\d?)\s(?<http_request_userAgent>[^\s]+)\s(?<http_request_referer>[^\s]+)\s(?:[^\s]+)\s(?<http_request_status>\d{3})\s(?<sc_substatus>\d+)\s(?<sc_win32_status>\d+)\s(?<sc_bytes>\d+)\s(?<cs_bytes>\d+)\s(?<time_taken>\d+)\s(?<http_x_forward_ip1>[^\s]+)$'.

Working for me in Rubular but untested in the Ops Agent: https://rubular.com/r/ogfgdVBz5Q5rpX

Beyond the regex, note that all fields parsed by a parse_regex parser will be placed under jsonPayload in the resulting log entry by default. So you will also want to move those jsonPayload.http_request_ fields to httpRequest using a modify_fields processor, e.g. with the following config fragment:

processors:
  move_http_request:
    type: modify_fields
    fields:
      httpRequest.serverIp:
        move_from: jsonPayload.http_request_serverIp
      httpRequest.requestMethod:
        move_from: jsonPayload.http_request_requestMethod
      ... (and so on for the remaining jsonPayload.http_request fields)

Hope this helps!

@easonwang0827
Copy link
Author

Hi @jefferbrecht :

I have configured according to your regex settings and restarted the service, but the results still only appear in the 'message' field and do not appear as desired. For example:

jsonPayload.http_request_serverIp: 192.168.1.1
{
insertId: "wxxxxxv"
jsonPayload: {
message: "2023-08-19 03:36:29 192.168.1.1 GET /xxxxx/testurl - 443 - 8.8.8.8 HTTP/2 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/115.0.0.0+Safari/537.36 - xxx.xxxxx.com 200 0 0 334 2592 62 -"
}
labels: {1}
logName: "projects/worldgymtaiwan-com/logs/iis_access"
receiveTimestamp: "2023-08-19T03:36:46.690142685Z"
resource: {2}
timestamp: "2023-08-19T03:36:46.629798800Z"
}

My IIS Access Format column format is as follows:

Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken X-Forwarded-For

For example:
2023-08-19 03:22:19 192.168.1.1 GET /favicon.ico - 443 8.8.8.8 HTTP/2 Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/115.0.0.0+Safari/537.36 https://xxx.xxxx.com/xxxx/testurl xxapp.xxx.com 404 0 0 187 2546 11 -

My current running configuration file is as follows:

logging:
receivers:
iis_access:
type: files
include_paths:
- 'D:\www\xxxx.xxx.xx.com\logs\xxx\xxxx\u_ex*.log'
service:
pipelines:
iis:
receivers:
- iis_access
processors:
- accesslog_regex:
field: message
regex: '^(?\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s(?<http_request_serverIp>[^\s]+)\s(?<http_request_requestMethod>[^\s]+)\s(?<cs_uri_stem>/[^\s]+)\s(?<cs_uri_query>[^\s])\s(?<s_port>\d*)\s(?:[^\s]+)\s(?<http_request_remoteIp>[^\s]+)\s(?:\w{4}/\d.?\d?)\s(?<http_request_version>\w{4}/\d.?\d?\s+)\s(?<http_request_userAgent>[^\s]+)\s(?<http_request_referer>[^\s]+)\s(?:[^\s]+)\s(?<http_request_status>\d{3})\s(?<sc_substatus>\d+)\s(?<sc_win32_status>\d+)\s(?<sc_bytes>\d+)\s(?<cs_bytes>\d+)\s(?<time_taken>\d+)\s(?<http_x_forward_ip1>[^\s]+)$'
time_key: timestamp
time_format: "%Y-%m-%d %H:%M:%S"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants