URL:
https://github.com/SSSD/sssd/pull/5712
Title: #5712: Health and Support Analyzer - Add request log parsing utility
justin-stephenson commented:
"""
Hi,
I played with a tool a bit.
Thanks for taking the time to do so.
1. When responder is restarted, `CID` is reset and log can contain multiple unrelated
requests with the same CID. Tool greps everything and this might be misleading. There is
enough info in the logs to catch this, but not very simple.
I am not sure how best to handle this, but open to ideas. Default behavior could be to
only grep the more recent CID logs, if a duplicate is detected ( and print a warning about
this duplicate)
2. Probably `--merge` option to merge responder and backend logs into single log
(sorted by timestamp) might make some sense (not sure).
I added a `--merge` option but it simply sorts the combined output by timestamp, it does
not interweave the backend logs within each cache request lookup. This interweaved logs
would require more complicated logic of linking requests. If the cache request number was
sent across sbus to the backend then it could be done easier.
3. What are we going to do with long running clients that issue thousands of requests
per single connection? Currently tool will grep everything. It definitely makes sense for
case like `id user`, but not for a long running process like db. Probably `--cr=N` option
would make sense?
This --cr=X option could be added but I wonder if this is a use case we need to handle. If
you have 1000+ CR numbers logged in the responder for a single CID, it will be difficult
to know which CR # you actually care about investigating.
4. (Most important)
Currently `--list` output looks like:
```
(2021-07-22 21:33:21:956195): [nss] [accept_fd_handler] (0x0400): Client [CID #3][cmd
id][0xd18ac0][27] connected!
```
This looks very similar to `grep` output.
At the very least I would shorten this to:
```
2021-07-22 21:33:21:956195: CID #3: cmd='id'
```
-- all those `[nss] [accept_fd_handler] (0x0400)` doesn't add value here but makes
reading more difficult.
Ideally output could look like following:
```
2021-07-22 21:33:21:956195: CID #3: cmd='id', cache requests:
- 'User by name' : 'rpc'
- 'Group by ID' : '32'
- 'Initgroups by name' : 'rpc'
...
```
But this is, of course, not very trivial and again, we need to handle case of long
running process that does a lot of requests under the same CID.
I improved the `--list` output, please see below:
~~~
# sssctl analyze request --list
******** Listing nss client requests ********
(2021-07-26 16:12:02: CID #1: id
- User by name
- administrator(a)ad.vm
- Group by ID
- GID:1690200513@ad.vm
- Initgroups by name
- administrator
- Group by ID
- GID:1690200520@ad.vm
- Group by ID
- GID:1690200519@ad.vm
- Group by ID
- GID:1690200512@ad.vm
- Group by ID
- GID:1690200518@ad.vm
- Group by ID
- GID:1690200572@ad.vm
(2021-07-26 16:13:35: CID #2: vim
- User by ID
- UID:0@ad.vm
(2021-07-26 16:17:04: CID #3: vim
- User by ID
- UID:0@ad.vm
(2021-07-26 16:38:20: CID #4: /usr/bin/bash
- User by ID
- UID:0@ad.vm
# sssctl analyze request --list --pam
******** Listing pam client requests ********
(2021-07-26 19:38:00: CID #1: sshd: administrator(a)ad.vm
- Initgroups by name
- administrator(a)ad.vm
- Initgroups by name
- administrator(a)ad.vm
(2021-07-26 19:53:50: CID #2: sshd: administrator(a)ad.vm
- Initgroups by name
- administrator(a)ad.vm
- Initgroups by name
- administrator(a)ad.vm
~~~
Other items mentioned are fixed now.
"""
See the full comment at
https://github.com/SSSD/sssd/pull/5712#issuecomment-887656608