Skip to content

benschweizer/similar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

similar - dedup similar lines in unix pipelines

CI Semgrep

similar is an unix pipeline dropin that deduplicates similar lines. It is inspired by Grafana's log deduplication feature and brings this to the command line. It's intended use is along with other text-utils like grep, sort and uniq.

Example usage:

$ cat /var/log/messages | grep cron | similar
$ similar -signature /var/log/messages /var/log/messages.1

Setup

$ make build
$ make install

Usage

similar [-none|-exact|-numbers|-signature] <files>

none		:= no dedup
exact		:= stripping all iso datetimes with millis
numbers		:= stripping all numbers, default
signature	:= stripping all numbers, letters and underscores
files		:= list of files to open, defaults to stdin

Left open and ideas for improvements

  • the filters use regex which is pretty slow, this could be rewritten using byte operations instead
  • probably more filters could be added
  • build pipeline and versioning