Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: bytes, strings: add CutByte #67101

Open
aimuz opened this issue Apr 29, 2024 · 14 comments · May be fixed by #67102
Open

proposal: bytes, strings: add CutByte #67101

aimuz opened this issue Apr 29, 2024 · 14 comments · May be fixed by #67102
Labels
Milestone

Comments

@aimuz
Copy link
Contributor

aimuz commented Apr 29, 2024

Proposal Details

Abstract

This proposal suggests the addition of a new function, CutByte, to the strings and bytes packages in the Go standard library. The function aims to simplify the handling of strings and byte slices by cutting them around the first instance of a specified byte separator. This function is designed to offer a more efficient alternative to the existing Cut function when the separator is a single byte, providing up to a 25% performance improvement in typical use cases.

Background

The existing Cut function in the Go standard library has proven to be extremely useful for handling strings and byte slices by simplifying code and replacing multiple standard library functions. However, in scenarios where the separator is known to be a single byte, the Cut function can be optimized further. The proposed CutByte function addresses this by focusing on byte-level operations, which are common in many real-world applications, such as parsing binary protocols or handling ASCII-based text formats.

Details

./cmd/vet/vet_test.go:			if _, suffix, ok := strings.Cut(text, " "); ok {
./cmd/go/scriptreadme_test.go:	_, lang, ok := strings.Cut(doc.String(), "# Script Language\n\n")
./cmd/go/scriptreadme_test.go:	lang, _, ok = strings.Cut(lang, "\n\nvar ")
./cmd/go/internal/test/test.go:		op, name, found := strings.Cut(s, " ")
./cmd/go/internal/vcweb/script.go:			mPath, version, ok := strings.Cut(args[1], "@")
./cmd/go/internal/modfetch/fetch.go:				_, vers, _ = strings.Cut(vers, "-")
./cmd/go/internal/modfetch/fetch.go:					goos, goarch, _ := strings.Cut(vers[i+1:], "-")
./cmd/go/internal/modfetch/codehost/vcs.go:		before, after, found := strings.Cut(line, ":")
./cmd/go/internal/modfetch/toolchain.go:	_, v, ok := strings.Cut(v, "-")
./cmd/go/internal/modload/list.go:		if path, vers, found := strings.Cut(arg, "@"); found {
./cmd/go/internal/modload/list.go:		if path, vers, found := strings.Cut(arg, "@"); found {
./cmd/go/internal/modload/build.go:	if path, vers, found := strings.Cut(path, "@"); found {
./cmd/go/internal/modload/search.go:			elem, rest, found := strings.Cut(p, string(filepath.Separator))
./cmd/go/internal/modload/init.go:	line, _, _ := strings.Cut(string(buf[:n]), "\n")
./cmd/go/internal/envcmd/env.go:		key, val, found := strings.Cut(arg, "=")
./cmd/go/internal/modindex/build.go:		line, argstr, ok := strings.Cut(strings.TrimSpace(line[4:]), ":")
./cmd/go/internal/modindex/build.go:	name, _, _ = strings.Cut(name, ".")
./cmd/go/internal/modindex/build_read.go:			path, _, ok = strings.Cut(args[1:], "`")
./cmd/go/internal/script/engine.go:		prefix, suffix, ok := strings.Cut(cond.tag, ":")
./cmd/go/internal/script/engine.go:		if prefix, suffix, ok := strings.Cut(tag, ":"); ok {
./cmd/go/internal/script/state.go:		if k, v, ok := strings.Cut(kv, "="); ok {
./cmd/go/internal/modcmd/edit.go:	before, after, found := strings.Cut(arg, "@")
./cmd/go/internal/modcmd/edit.go:	before, after, found := strings.Cut(arg, "@")
./cmd/go/internal/modcmd/edit.go:	before, after, found := strings.Cut(s, ",")
./cmd/go/internal/modcmd/edit.go:	before, after, found := strings.Cut(arg, "=")
./cmd/go/internal/load/godebug.go:	k, v, ok := strings.Cut(strings.TrimSpace(text[i:]), "=")
./cmd/go/internal/vcs/vcs.go:		pattern, list, found := strings.Cut(item, ":")
./cmd/go/internal/vcs/vcs.go:	if scheme, path, ok := strings.Cut(repo, "://"); ok {
./cmd/go/internal/work/gccgo.go:		before, after, _ := strings.Cut(args, "=")
./cmd/go/internal/gover/gomod.go:	s, _, _ = strings.Cut(s, "//") // strip comments
./cmd/go/internal/gover/mod_test.go:		path, vers, _ := strings.Cut(f, " ")
./cmd/go/internal/modget/query.go:	pattern, rawVers, found := strings.Cut(raw, "@")
./cmd/go/internal/cmdflag/flag.go:	name, value, hasValue := strings.Cut(arg[1:], "=")
./cmd/go/internal/workcmd/edit.go:	before, after, found := strings.Cut(arg, "@")
./cmd/go/internal/workcmd/edit.go:	before, after, found := strings.Cut(arg, "=")
./cmd/go/internal/toolchain/select.go:		min, suffix, plus := strings.Cut(gotoolchain, "+") // go1.2.3+auto
./cmd/go/internal/toolchain/select.go:		name, val, hasEq := strings.Cut(a, "=")
./cmd/go/internal/toolchain/select.go:				name, _, _ := strings.Cut(a, "=")
./cmd/go/internal/toolchain/select.go:	path, version, _ := strings.Cut(pkgArg, "@")
./cmd/go/main.go:		_, dir, _ = strings.Cut(a, "=")
./cmd/distpack/pack.go:	version, rest, _ := strings.Cut(string(data), "\n")
./cmd/internal/testdir/testdir_test.go:		goos, goarch, ok := strings.Cut(*target, "/")
./cmd/internal/testdir/testdir_test.go:		line, actionSrc, _ = strings.Cut(actionSrc, "\n")
./cmd/internal/testdir/testdir_test.go:	header, _, ok := strings.Cut(src, "\npackage")
./cmd/internal/testdir/testdir_test.go:			if _, suffix, ok := strings.Cut(text, " "); ok {
./cmd/internal/testdir/testdir_test.go:		lines[i], _, _ = strings.Cut(lines[i], " // ERROR ")
./cmd/internal/testdir/testdir_test.go:		errFile, rest, ok := strings.Cut(errStr, ":")
./cmd/internal/testdir/testdir_test.go:		lineStr, msg, ok := strings.Cut(rest, ":")
./cmd/compile/internal/base/flag.go:		verb, args, found := strings.Cut(line, " ")
./cmd/compile/internal/base/flag.go:		before, after, hasEq := strings.Cut(args, "=")
./cmd/link/internal/ld/ld.go:		verb, args, found := strings.Cut(line, " ")
./cmd/link/internal/ld/ld.go:		before, after, exist := strings.Cut(args, "=")
./cmd/link/internal/ld/go.go:		line, data, _ = strings.Cut(data, "\n")
./cmd/link/internal/ld/go.go:			if before, after, found := strings.Cut(remote, "#"); found {
./cmd/cgo/internal/testcshared/cshared_test.go:				switch prefix, _, _ := strings.Cut(name, "-"); prefix {
./cmd/fix/typecheck.go:				if _, elem, ok := strings.Cut(t, "]"); ok {
./cmd/fix/typecheck.go:				if _, et, ok := strings.Cut(t, "]"); ok {
./cmd/fix/typecheck.go:				if kt, vt, ok := strings.Cut(t[len("map["):], "]"); ok {
./cmd/fix/typecheck.go:				_, value, _ = strings.Cut(t, "]")
./cmd/fix/typecheck.go:				if k, v, ok := strings.Cut(t[len("map["):], "]"); ok {
./cmd/api/main_test.go:			feature, approval, ok := strings.Cut(line, "#")
./cmd/doc/dirs.go:		path, dir, _ := strings.Cut(line, "\t")
./cmd/vendor/golang.org/x/tools/cmd/bisect/main.go:		if _, v, _ := strings.Cut(e, "="); strings.Contains(v, "PATTERN") {
./cmd/vendor/golang.org/x/tools/cmd/bisect/main.go:		k, v, _ := strings.Cut(x, "=")
./cmd/vendor/golang.org/x/tools/cmd/bisect/main.go:		line, all, _ = strings.Cut(all, "\n")
./cmd/vendor/golang.org/x/tools/go/analysis/passes/directive/directive.go:		line, text, _ = strings.Cut(text, "\n")
./cmd/vendor/golang.org/x/tools/go/analysis/passes/directive/directive.go:				_, line, ok = strings.Cut(line, "*/")
./cmd/vendor/golang.org/x/tools/internal/versions/versions.go:	v, _, _ = strings.Cut(v, "-") // strip -bigcorp suffix.
./cmd/vendor/golang.org/x/tools/internal/stdlib/stdlib.go:	typename, name, _ = strings.Cut(sym.Name, ".")
./cmd/vendor/golang.org/x/tools/internal/stdlib/stdlib.go:	recv, name, _ = strings.Cut(sym.Name, ".")
./cmd/vendor/golang.org/x/telemetry/internal/crashmonitor/monitor.go:		_, pcstr, ok := strings.Cut(line, " pc=") // e.g. pc=0x%x
./cmd/vendor/golang.org/x/telemetry/internal/config/config.go:			prefix, _, found := strings.Cut(c.Name, ":")
./cmd/vendor/golang.org/x/telemetry/internal/config/config.go:	prefix, rest, hasBuckets := strings.Cut(counter, "{")
./cmd/vendor/golang.org/x/telemetry/internal/counter/parse.go:		k, v, ok := strings.Cut(line, ": ")
./cmd/vendor/golang.org/x/telemetry/internal/upload/reports.go:				before, _, _ := strings.Cut(k, "\n")
./cmd/vendor/golang.org/x/build/relnote/relnote.go:	dir, rest, _ := strings.Cut(filename, "/")
./cmd/vendor/golang.org/x/build/relnote/relnote.go:	dir, rest, _ = strings.Cut(rest, "/")
./crypto/ecdsa/ecdsa_test.go:			curve, hash, _ := strings.Cut(line, ",")
./crypto/x509/pem_decrypt.go:	mode, hexIV, ok := strings.Cut(dek, ",")
./crypto/x509/verify_test.go:	before, _, ok := strings.Cut(string(out), ".")
./crypto/tls/handshake_test.go:		_, after, ok := strings.Cut(line, " ")
./crypto/tls/handshake_test.go:		before, _, ok := strings.Cut(line, "|")
./strconv/fp_test.go:	if mant, exp, ok := strings.Cut(s, "p"); ok {
./strconv/fp_test.go:	if mant, exp, ok := strings.Cut(s, "p"); ok {
./strings/example_test.go:		before, after, found := strings.Cut(s, sep)
./net/mail/message.go:		k, v, ok := strings.Cut(kv, ":")
./net/platform_test.go:	net, _, _ := strings.Cut(network, ":")
./net/platform_test.go:	switch net, _, _ := strings.Cut(network, ":"); net {
./net/platform_test.go:	switch net, _, _ := strings.Cut(network, ":"); net {
./net/url/url.go:	u, frag, _ := strings.Cut(rawURL, "#")
./net/url/url.go:		rest, url.RawQuery, _ = strings.Cut(rest, "?")
./net/url/url.go:		if segment, _, _ := strings.Cut(rest, "/"); strings.Contains(segment, ":") {
./net/url/url.go:		username, password, _ := strings.Cut(userinfo, ":")
./net/url/url.go:			if segment, _, _ := strings.Cut(path, "/"); strings.Contains(segment, ":") {
./net/url/url.go:		key, query, _ = strings.Cut(query, "&")
./net/url/url.go:		key, value, _ := strings.Cut(key, "=")
./net/url/url.go:		elem, remaining, found = strings.Cut(remaining, "/")
./net/main_posix_test.go:	net, _, _ := strings.Cut(network, ":")
./net/http/transport.go:			_, text, ok := strings.Cut(resp.Status, " ")
./net/http/response.go:	proto, status, ok := strings.Cut(line, " ")
./net/http/response.go:	statusCode, _, _ := strings.Cut(resp.Status, " ")
./net/http/request.go:	username, password, ok = strings.Cut(cs, ":")
./net/http/request.go:	method, rest, ok1 := strings.Cut(line, " ")
./net/http/request.go:	requestURI, proto, ok2 := strings.Cut(rest, " ")
./net/http/cgi/host_test.go:		k, v, ok := strings.Cut(trimmedLine, "=")
./net/http/cgi/host.go:		header, val, ok := strings.Cut(string(line), ":")
./net/http/cgi/child.go:		if k, v, ok := strings.Cut(kv, "="); ok {
./net/http/fs.go:		start, end, ok := strings.Cut(ra, "-")
./net/http/cookie.go:		name, value, found := strings.Cut(s, "=")
./net/http/cookie.go:	name, value, ok := strings.Cut(parts[0], "=")
./net/http/cookie.go:		attr, val, _ := strings.Cut(parts[i], "=")
./net/http/cookie.go:			part, line, _ = strings.Cut(line, ";")
./net/http/cookie.go:			name, val, _ := strings.Cut(part, "=")
./net/http/client_test.go:				first, rest, _ := strings.Cut(final, ",")
./net/http/main_test.go:		_, stack, _ := strings.Cut(g, "\n")
./net/smtp/smtp.go:			k, v, _ := strings.Cut(line, " ")
./net/main_test.go:		_, stack, _ := strings.Cut(s, "\n")
./go/types/eval_test.go:	before, after, _ := strings.Cut(s, sep)
./go/importer/importer_test.go:	compiler, target, _ := strings.Cut(export, ":")
./go/printer/comment.go:		line, text, _ = strings.Cut(text, "\n")
./go/printer/printer.go:	if p, _, ok := strings.Cut(prefix, "*"); ok {
./go/printer/printer.go:	before, _, _ := strings.Cut(last, closing) // closing always present
./go/constant/value_test.go:			if ns, ds, ok := strings.Cut(a[1], "/"); ok && kind == token.FLOAT {
./go/constant/value_test.go:	if as, bs, ok := strings.Cut(lit, "/"); ok {
./go/version/version.go:	v, _, _ = strings.Cut(v, "-") // strip -bigcorp suffix.
./go/doc/example_test.go:				name, kind, found := strings.Cut(sectionName, ".")
./go/doc/comment/text.go:			line, text, _ = strings.Cut(text, "\n")
./go/doc/comment/markdown.go:			line, md, _ = strings.Cut(md, "\n")
./go/doc/comment/print.go:			line, md, _ = strings.Cut(md, "\n")
./go/doc/comment/print.go:		line, rest, ok := strings.Cut(s, "\n")
./go/doc/comment/parse.go:		if _, b, ok = strings.Cut(b, "'"); !ok {
./go/doc/comment/parse.go:		if _, b, ok = strings.Cut(b, "."); !ok {
./go/doc/headscan.go:		inner, s, _ = strings.Cut(s[loc[1]:], html_endh)
./go/build/read_test.go:		beforeP, afterP, _ := strings.Cut(tt.in, "ℙ")
./go/build/read_test.go:		if beforeD, afterD, ok := strings.Cut(beforeP, "𝔻"); ok {
./go/build/build.go:		line, argstr, ok := strings.Cut(strings.TrimSpace(line[4:]), ":")
./go/build/build.go:	name, _, _ = strings.Cut(name, ".")
./go/build/vendor_test.go:			_, pkg, found = strings.Cut(fullPkg, "/vendor/")
./go/build/build_test.go:	errStr, _, _ = strings.Cut(errStr, ";")
./go/build/read.go:			path, _, ok = strings.Cut(args[1:], "`")
./go/build/constraint/vers.go:		_, v, _ := strings.Cut(z.Tag, "go1.")
./regexp/exec_test.go:				loStr, hiStr, _ := strings.Cut(pair, "-")
./regexp/exec_test.go:			if _, flag, ok = strings.Cut(flag[1:], ":"); !ok {
./regexp/regexp.go:		before, after, ok := strings.Cut(template, "$")
./regexp/syntax/parse.go:					lit, t, _ = strings.Cut(t[2:], `\E`)
./archive/tar/writer_test.go:		prefix, _, _ = strings.Cut(prefix, "\x00") // Truncate at the NUL terminator
./archive/tar/strconv.go:	ss, sn, _ := strings.Cut(s, ".")
./archive/tar/strconv.go:	nStr, rest, ok := strings.Cut(s, " ")
./archive/tar/strconv.go:	k, v, ok = strings.Cut(rec, "=")
./path/filepath/path.go:		part, p, _ = strings.Cut(p, "/")
./runtime/testdata/testprog/traceback_ancestors.go:				g, all, _ = strings.Cut(all, "\n\n")
./runtime/testdata/testprog/traceback_ancestors.go:					id, _, _ := strings.Cut(strings.TrimPrefix(g, "goroutine "), " ")
./runtime/pprof/proto_test.go:			in, out, ok := strings.Cut(tt, "->\n")
./runtime/pprof/pprof_test.go:		if _, s, ok = strings.Cut(s, t); !ok {
./runtime/pprof/pprof_test.go:	base, kv, ok := strings.Cut(spec, ";")
./runtime/pprof/pprof_test.go:	k, v, ok := strings.Cut(kv, "=")
./runtime/pprof/proto.go:		loStr, hiStr, ok := strings.Cut(string(addr), "-")
./runtime/trace_cgo_test.go:		prefix, tracePath, found := strings.Cut(got, ":")
./runtime/traceback_test.go:		_, tb, _ = strings.Cut(tb, "\n")
./runtime/traceback_test.go:		line, tb, _ = strings.Cut(tb, "\n")
./runtime/traceback_test.go:			funcName, args, found := strings.Cut(line, "(")
./runtime/traceback_system_test.go:		_, pcstr, ok := strings.Cut(line, " pc=") // e.g. pc=0x%x
./runtime/debug/mod.go:		line, data, ok = strings.Cut(data, newline)
./runtime/debug/mod.go:				key, rawValue, ok = strings.Cut(kv, "=")
./internal/testenv/testenv.go:				line, goMod, _ = strings.Cut(goMod, "\n")
./internal/testenv/testenv.go:			importPath, export, ok := strings.Cut(line, "=")
./internal/zstd/zstd_test.go:			want, _, _ := strings.Cut(name, ".")
./encoding/asn1/common.go:		part, str, _ = strings.Cut(str, ",")
./encoding/xml/typeinfo.go:	if ns, t, ok := strings.Cut(tag, " "); ok {
./encoding/xml/xml.go:	} else if space, local, ok := strings.Cut(s, ":"); !ok || space == "" || local == "" {
./encoding/json/tags.go:	tag, opt, _ := strings.Cut(tag, ",")
./encoding/json/tags.go:		name, s, _ = strings.Cut(s, ",")
./html/template/js.go:	mimeType, _, _ = strings.Cut(mimeType, ";")
./html/template/url.go:	if protocol, _, ok := strings.Cut(s, ":"); ok && !strings.Contains(protocol, "/") {
./html/template/attr.go:	} else if prefix, short, ok := strings.Cut(name, ":"); ok {
./testing/testing_test.go:				if name, _, ok := strings.Cut(trimmed, " "); ok {
./log/slog/slogtest_test.go:		kv, rest, _ := strings.Cut(s, " ") // assumes exactly one space between attrs
./log/slog/slogtest_test.go:		k, value, found := strings.Cut(kv, "=")
./mime/mediatype.go:	if major, sub, ok := strings.Cut(t, "/"); !ok {
./mime/mediatype.go:	base, _, _ := strings.Cut(v, ";")
./mime/mediatype.go:		if baseName, _, ok := strings.Cut(key, "*"); ok {
./mime/encodedword.go:	charset, text, _ := strings.Cut(word, "?")
./mime/encodedword.go:	encoding, text, _ := strings.Cut(text, "?")
./os/user/lookup_unix.go:		u.Name, _, _ = strings.Cut(u.Name, ",")
./os/user/cgo_lookup_unix.go:	u.Name, _, _ = strings.Cut(u.Name, ",")
./os/os_test.go:		host, _, ok := strings.Cut(hostname, ".")
./os/exec/exec_windows_test.go:		k, _, ok := strings.Cut(kv, "=")
./os/exec/exec_test.go:	errLine, body, ok := strings.Cut(string(bs), "\n")
./os/exec/exec.go:		k, _, ok := strings.Cut(kv, "=")
./text/template/option.go:	if key, value, ok := strings.Cut(opt, "="); ok {

Proposal

We propose adding the following functions:

For the strings package:

// CutByte slices s around the first instance of sep,
// returning the text before and after sep.
// The found result reports whether sep appears in s.
// If sep does not appear in s, CutByte returns s, "", false.
func CutByte(s string, sep byte) (before, after string, found bool) {
    if i := IndexByte(s, sep); i >= 0 {
        return s[:i], s[i+1:], true
    }
    return s, "", false
}

For the bytes package:

// CutByte slices s around the first instance of sep,
// returning the text before and after sep.
// The found result reports whether sep appears in s.
// If sep does not appear in s, CutByte returns s, nil, false.
// CutByte returns slices of the original slice s, not copies.
func CutByte(s []byte, sep byte) (before, after []byte, found bool) {
    if i := IndexByte(s, sep); i >= 0 {
        return s[:i], s[i+1:], true
    }
    return s, nil, false
}

Rationale

The CutByte function is specifically optimized for cases where the separator is a single byte. In the analysis of Go's main repository and several large-scale Go projects, a significant number of string manipulations involve single-byte separators. The performance benefit of CutByte over Cut for these cases is approximately 25%, as measured in benchmarks comparing the two functions under similar conditions.

Compatibility

CutByte is a new addition and does not modify any existing interfaces or behavior in the Go standard library. It follows the established patterns and naming conventions of the Go ecosystem, ensuring that it integrates seamlessly with the existing library functions.

Implementation

The implementation of CutByte is straightforward and leverages the existing IndexByte function in both the strings and bytes packages. The proposed functions do not introduce any new dependencies or significant complexities.

Conclusion

Adding CutByte to the Go standard library will provide developers with a more efficient tool for handling common string and byte slice operations involving single-byte separators. This function not only enhances performance but also maintains readability and simplicity, aligning with Go's philosophy of clear and efficient coding practices.

@aimuz aimuz added the Proposal label Apr 29, 2024
@gopherbot gopherbot added this to the Proposal milestone Apr 29, 2024
aimuz added a commit to aimuz/go that referenced this issue Apr 29, 2024
CutByte optimizes slicing operations for single-byte
separators, offering a more efficient alternative when only a single byte
is involved.

There is more discussion on https://golang.org/issue/67101.

Fixes golang#67101.
@aimuz aimuz linked a pull request Apr 29, 2024 that will close this issue
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/582176 mentions this issue: bytes, strings: add CutByte

@adonovan
Copy link
Member

Can the performance of the existing function be improved without adding a new one? For example, how does this compare:

func Cut(s, sep []byte) (before, after []byte, found bool) {
	if len(sep) == 1 {
		if i := IndexByte(s, sep); i >= 0 {
			return s[:i], s[i+1:], true
		}
	}
	if i := Index(s, sep); i >= 0 {
		return s[:i], s[i+len(sep):], true
	}
	return s, nil, false
}

@mateusz834
Copy link
Member

@adonovan Index already has a optimization for one-byte strings

func Index(s, substr string) int {
n := len(substr)
switch {
case n == 0:
return 0
case n == 1:
return IndexByte(s, substr[0])

@ianlancetaylor
Copy link
Contributor

Please don't use images for plain text. Images are hard to read. Plain text is easy. Thanks.

@ianlancetaylor
Copy link
Contributor

See #22148 for a relevant discussion.

@adonovan
Copy link
Member

Index already has a optimization for one-byte strings

So it does. Are you saying that even with this fastpath, the overhead of evaluating len(substr), two switch conditions, and substr[0] makes it 25% slower? Could you furnish the benchmark you used?

@aimuz
Copy link
Contributor Author

aimuz commented Apr 30, 2024

Please don't use images for plain text. Images are hard to read. Plain text is easy. Thanks.

I'm sorry, I've changed it to text, to avoid too much content, now you need to click on Details to see the content


Can the performance of the existing function be improved without adding a new one? For example, how does this compare:

func Cut(s, sep []byte) (before, after []byte, found bool) {
	if len(sep) == 1 {
		if i := IndexByte(s, sep); i >= 0 {
			return s[:i], s[i+1:], true
		}
	}
	if i := Index(s, sep); i >= 0 {
		return s[:i], s[i+len(sep):], true
	}
	return s, nil, false
}

Such code will not improve anything and will even degrade performance


Index already has a optimization for one-byte strings

So it does. Are you saying that even with this fastpath, the overhead of evaluating len(substr), two switch conditions, and substr[0] makes it 25% slower? Could you furnish the benchmark you used?

This is my benchmark code.

func BenchmarkCut(b *testing.B) {
	b.ReportAllocs()
	skip := 32
	s := Repeat(Repeat(" ", skip)+"a"+Repeat(" ", skip), 1<<16/skip)
	b.Run(fmt.Sprintf("Func=Cut-%d", skip), func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			_, _, _ = Cut(s, "a")
		}
	})
	b.Run(fmt.Sprintf("Func=CutByte-%d", skip), func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			_, _, _ = CutByte(s, 'a')
		}
	})
}
benchstat -col /Func  new.txt  
goos: darwin
goarch: arm64
pkg: strings
cpu: Apple M2 Max
       │   Cut-32    │             CutByte-32              │
       │   sec/op    │   sec/op     vs base                │
Cut-12   4.644n ± 3%   3.614n ± 1%  -22.18% (p=0.000 n=10)

       │   Cut-32   │           CutByte-32           │
       │    B/op    │    B/op     vs base            │
Cut-12   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

       │   Cut-32   │           CutByte-32           │
       │ allocs/op  │ allocs/op   vs base            │
Cut-12   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

I would prefer to use the following benchmarking procedure so that I can do benchmark comparisons on different samples. When I tried to run benchstat -col /Func new.txt I got unexpected results, so I switched to the benchmarking above, using a fixed sample of tests

func BenchmarkCut(b *testing.B) {
	b.ReportAllocs()

	for _, skip := range [...]int{2, 4, 8, 16, 32, 64} {
		s := Repeat(Repeat(" ", skip)+"a"+Repeat(" ", skip), 1<<16/skip)
		b.Run(fmt.Sprintf("Func=Cut-%d", skip), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				_, _, _ = Cut(s, "a")
			}
		})
		b.Run(fmt.Sprintf("Func=CutByte-%d", skip), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				_, _, _ = CutByte(s, 'a')
			}
		})
	}
}

See #22148 for a relevant discussion.

IndexByte gets a significant performance boost after CL 148578. So I don't think "Using strings.IndexByte is premature optimisation." is entirely applicable now.

@aimuz
Copy link
Contributor Author

aimuz commented Apr 30, 2024

If I've missed any information, please clue me in, thanks!

@egonelbre
Copy link
Contributor

egonelbre commented Apr 30, 2024

I benchmarked this using https://github.com/egonelbre/exp/blob/main/bench/cutbyte/bench_test.go -- and then combining the results, I got the result (using Go master):

cpu: AMD Ryzen Threadripper 2950X 16-Core Processor
               │     Cut     │              CutByte               │
               │   sec/op    │   sec/op     vs base               │
Cut/skip=2-32    75.39n ± 2%   75.79n ± 2%       ~ (p=0.152 n=24)
Cut/skip=4-32    75.85n ± 1%   75.65n ± 1%       ~ (p=0.477 n=24)
Cut/skip=8-32    75.15n ± 2%   76.11n ± 1%       ~ (p=0.270 n=24)
Cut/skip=16-32   75.85n ± 1%   75.80n ± 2%       ~ (p=0.927 n=24)
Cut/skip=32-32   76.68n ± 1%   77.08n ± 3%       ~ (p=0.717 n=24)
Cut/skip=64-32   76.34n ± 1%   76.95n ± 2%       ~ (p=0.724 n=24)
geomean          75.88n        76.23n       +0.47%

Individual results:

cpu: AMD Ryzen Threadripper 2950X 16-Core Processor
                │     Cut     │              CutByte              │
                │   sec/op    │   sec/op     vs base              │
Cut0/skip=2-32    76.13n ± 1%   73.40n ± 3%  -3.59% (p=0.002 n=6)
Cut0/skip=4-32    75.76n ± 4%   72.25n ± 3%  -4.63% (p=0.002 n=6)
Cut0/skip=8-32    75.47n ± 3%   71.11n ± 3%  -5.78% (p=0.002 n=6)
Cut0/skip=16-32   76.63n ± 3%   70.83n ± 3%  -7.56% (p=0.002 n=6)
Cut0/skip=32-32   76.25n ± 4%   71.83n ± 4%  -5.81% (p=0.009 n=6)
Cut0/skip=64-32   75.91n ± 4%   72.01n ± 3%  -5.14% (p=0.002 n=6)
Cut1/skip=2-32    73.66n ± 2%   75.65n ± 3%  +2.69% (p=0.015 n=6)
Cut1/skip=4-32    75.80n ± 4%   75.72n ± 1%       ~ (p=0.848 n=6)
Cut1/skip=8-32    74.35n ± 2%   76.17n ± 2%  +2.44% (p=0.015 n=6)
Cut1/skip=16-32   74.25n ± 2%   75.65n ± 1%       ~ (p=0.180 n=6)
Cut1/skip=32-32   75.48n ± 2%   76.36n ± 4%       ~ (p=0.485 n=6)
Cut1/skip=64-32   75.60n ± 2%   77.56n ± 3%       ~ (p=0.180 n=6)
Cut2/skip=2-32    73.91n ± 2%   77.81n ± 3%  +5.29% (p=0.002 n=6)
Cut2/skip=4-32    74.81n ± 4%   77.52n ± 4%       ~ (p=0.093 n=6)
Cut2/skip=8-32    74.51n ± 4%   77.62n ± 2%  +4.17% (p=0.004 n=6)
Cut2/skip=16-32   76.07n ± 2%   78.75n ± 2%  +3.52% (p=0.002 n=6)
Cut2/skip=32-32   76.93n ± 2%   78.31n ± 2%  +1.79% (p=0.041 n=6)
Cut2/skip=64-32   76.45n ± 2%   78.09n ± 1%  +2.14% (p=0.009 n=6)
Cut3/skip=2-32    76.83n ± 2%   77.17n ± 3%       ~ (p=0.699 n=6)
Cut3/skip=4-32    75.93n ± 2%   75.77n ± 5%       ~ (p=0.909 n=6)
Cut3/skip=8-32    76.61n ± 2%   76.11n ± 1%       ~ (p=0.937 n=6)
Cut3/skip=16-32   75.64n ± 2%   75.89n ± 2%       ~ (p=0.818 n=6)
Cut3/skip=32-32   76.84n ± 9%   78.55n ± 2%       ~ (p=0.589 n=6)
Cut3/skip=64-32   77.31n ± 3%   76.65n ± 2%       ~ (p=0.310 n=6)
geomean           75.71n        75.66n       -0.07%

I used multiple implementations of the same code, because microbenchmarks are prone to statistical noise due to code layout differences.

@aimuz
Copy link
Contributor Author

aimuz commented Apr 30, 2024

Too slow in your tests, maybe a faulty benchmark?

https://github.com/egonelbre/exp/blob/main/bench/cutbyte/bench_test.go#L65

Can you accept it without a global variable?

diff --git a/bench/cutbyte/bench_test.go b/bench/cutbyte/bench_test.go
index f56b9ac..0489ac5 100644
--- a/bench/cutbyte/bench_test.go
+++ b/bench/cutbyte/bench_test.go
@@ -62,8 +62,6 @@ func CutByte3(s string, sep byte) (before, after string, found bool) {
 	return s, "", false
 }
 
-var x, y, z any
-
 func BenchmarkCut0(b *testing.B) {
 	b.ReportAllocs()
 
@@ -71,13 +69,15 @@ func BenchmarkCut0(b *testing.B) {
 		s := strings.Repeat(strings.Repeat(" ", skip)+"a"+strings.Repeat(" ", skip), 1<<16/skip)
 		b.Run(fmt.Sprintf("func=Cut/skip=%d", skip), func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				x, y, z = Cut0(s, "a")
+				x, y, z := Cut0(s, "a")
+				_, _, _ = x, y, z
 			}
 		})
 
 		b.Run(fmt.Sprintf("func=CutByte/skip=%d", skip), func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				x, y, z = CutByte0(s, 'a')
+				x, y, z := CutByte0(s, 'a')
+				_, _, _ = x, y, z
 			}
 		})
 	}
@@ -89,13 +89,15 @@ func BenchmarkCut1(b *testing.B) {
 		s := strings.Repeat(strings.Repeat(" ", skip)+"a"+strings.Repeat(" ", skip), 1<<16/skip)
 		b.Run(fmt.Sprintf("func=Cut/skip=%d", skip), func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				x, y, z = Cut1(s, "a")
+				x, y, z := Cut1(s, "a")
+				_, _, _ = x, y, z
 			}
 		})
 
 		b.Run(fmt.Sprintf("func=CutByte/skip=%d", skip), func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				x, y, z = CutByte1(s, 'a')
+				x, y, z := CutByte1(s, 'a')
+				_, _, _ = x, y, z
 			}
 		})
 	}
@@ -108,13 +110,15 @@ func BenchmarkCut2(b *testing.B) {
 		s := strings.Repeat(strings.Repeat(" ", skip)+"a"+strings.Repeat(" ", skip), 1<<16/skip)
 		b.Run(fmt.Sprintf("func=Cut/skip=%d", skip), func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				x, y, z = Cut2(s, "a")
+				x, y, z := Cut2(s, "a")
+				_, _, _ = x, y, z
 			}
 		})
 
 		b.Run(fmt.Sprintf("func=CutByte/skip=%d", skip), func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				x, y, z = CutByte2(s, 'a')
+				x, y, z := CutByte2(s, 'a')
+				_, _, _ = x, y, z
 			}
 		})
 	}
@@ -127,14 +131,16 @@ func BenchmarkCut3(b *testing.B) {
 		s := strings.Repeat(strings.Repeat(" ", skip)+"a"+strings.Repeat(" ", skip), 1<<16/skip)
 		b.Run(fmt.Sprintf("func=Cut/skip=%d", skip), func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				x, y, z = Cut3(s, "a")
+				x, y, z := Cut3(s, "a")
+				_, _, _ = x, y, z
 			}
 		})
 
 		b.Run(fmt.Sprintf("func=CutByte/skip=%d", skip), func(b *testing.B) {
 			for i := 0; i < b.N; i++ {
-				x, y, z = CutByte3(s, 'a')
+				x, y, z := CutByte3(s, 'a')
+				_, _, _ = x, y, z
 			}
 		})
 	}
-}
\ No newline at end of file
+}

I modified your test case a bit and got the following results

goos: darwin
goarch: arm64
pkg: github.com/egonelbre/exp/bench/cutbyte
                │     Cut     │               CutByte               │
                │   sec/op    │   sec/op     vs base                │
Cut0/skip=2-12    4.172n ± 3%   3.139n ± 1%  -24.75% (p=0.000 n=10)
Cut0/skip=4-12    4.143n ± 2%   3.132n ± 2%  -24.40% (p=0.000 n=10)
Cut0/skip=8-12    4.177n ± 3%   3.174n ± 2%  -24.01% (p=0.000 n=10)
Cut0/skip=16-12   4.151n ± 2%   3.147n ± 1%  -24.17% (p=0.000 n=10)
Cut0/skip=32-12   4.633n ± 3%   3.582n ± 1%  -22.70% (p=0.000 n=10)
Cut0/skip=64-12   4.854n ± 5%   3.936n ± 2%  -18.91% (p=0.000 n=10)
Cut1/skip=2-12    4.172n ± 2%   3.152n ± 2%  -24.45% (p=0.000 n=10)
Cut1/skip=4-12    4.175n ± 3%   3.171n ± 1%  -24.04% (p=0.000 n=10)
Cut1/skip=8-12    4.209n ± 2%   3.161n ± 1%  -24.91% (p=0.000 n=10)
Cut1/skip=16-12   4.232n ± 3%   3.159n ± 1%  -25.33% (p=0.000 n=10)
Cut1/skip=32-12   4.601n ± 5%   3.593n ± 1%  -21.92% (p=0.000 n=10)
Cut1/skip=64-12   5.016n ± 3%   3.857n ± 0%  -23.11% (p=0.000 n=10)
Cut2/skip=2-12    4.153n ± 2%   3.134n ± 1%  -24.52% (p=0.000 n=10)
Cut2/skip=4-12    4.152n ± 1%   3.128n ± 1%  -24.66% (p=0.000 n=10)
Cut2/skip=8-12    4.123n ± 2%   3.135n ± 1%  -23.95% (p=0.000 n=10)
Cut2/skip=16-12   4.154n ± 2%   3.111n ± 2%  -25.13% (p=0.000 n=10)
Cut2/skip=32-12   4.548n ± 3%   3.559n ± 1%  -21.75% (p=0.000 n=10)
Cut2/skip=64-12   4.965n ± 4%   3.853n ± 1%  -22.41% (p=0.000 n=10)
Cut3/skip=2-12    4.098n ± 3%   3.128n ± 1%  -23.67% (p=0.000 n=10)
Cut3/skip=4-12    4.154n ± 3%   3.123n ± 2%  -24.81% (p=0.000 n=10)
Cut3/skip=8-12    4.308n ± 3%   3.114n ± 1%  -27.71% (p=0.000 n=10)
Cut3/skip=16-12   4.181n ± 3%   3.125n ± 1%  -25.27% (p=0.000 n=10)
Cut3/skip=32-12   4.569n ± 3%   3.575n ± 1%  -21.76% (p=0.000 n=10)
Cut3/skip=64-12   4.942n ± 4%   3.883n ± 1%  -21.42% (p=0.000 n=10)
geomean           4.360n        3.324n       -23.76%

                │     Cut      │               CutByte               │
                │     B/op     │    B/op     vs base                 │
Cut0/skip=2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                      ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                │     Cut      │               CutByte               │
                │  allocs/op   │ allocs/op   vs base                 │
Cut0/skip=2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut0/skip=64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut1/skip=64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut2/skip=64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut3/skip=64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                      ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean


Thanks to @egonelbre's tip, I changed the benchmarking to the following way

func BenchmarkCut(b *testing.B) {
	b.ReportAllocs()

	for _, skip := range [...]int{2, 4, 8, 16, 32, 64} {
		s := Repeat(Repeat(" ", skip)+"a"+Repeat(" ", skip), 1<<16/skip)
		b.Run(fmt.Sprintf("Func=Cut/%d", skip), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				_, _, _ = Cut(s, "a")
			}
		})
		b.Run(fmt.Sprintf("Func=CutByte/%d", skip), func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				_, _, _ = CutByte(s, 'a')
			}
		})
	}
}
benchstat -col /Func  new.txt 
goos: darwin
goarch: arm64
pkg: strings
cpu: Apple M2 Max
          │     Cut     │               CutByte               │
          │   sec/op    │   sec/op     vs base                │
Cut/2-12    4.134n ± 2%   3.171n ± 2%  -23.30% (p=0.000 n=10)
Cut/4-12    4.190n ± 2%   3.208n ± 2%  -23.44% (p=0.000 n=10)
Cut/8-12    4.137n ± 2%   3.210n ± 1%  -22.42% (p=0.000 n=10)
Cut/16-12   4.128n ± 1%   3.223n ± 1%  -21.92% (p=0.000 n=10)
Cut/32-12   4.617n ± 1%   3.630n ± 1%  -21.39% (p=0.000 n=10)
Cut/64-12   4.957n ± 1%   3.927n ± 0%  -20.76% (p=0.000 n=10)
geomean     4.349n        3.383n       -22.21%

          │     Cut      │               CutByte               │
          │     B/op     │    B/op     vs base                 │
Cut/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

          │     Cut      │               CutByte               │
          │  allocs/op   │ allocs/op   vs base                 │
Cut/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

@egonelbre
Copy link
Contributor

egonelbre commented Apr 30, 2024

@aimuz, I'm storing in global vars to ensure that code doesn't get optimized away. It could avoid any though to be faster.

I updated the code, and I'm getting:

cpu: AMD Ryzen Threadripper 2950X 16-Core Processor
               │     Cut      │               CutByte               │
               │    sec/op    │   sec/op     vs base                │
Cut/skip=2-32     8.306n ± 1%   7.235n ± 2%  -12.90% (p=0.000 n=24)
Cut/skip=4-32     8.362n ± 1%   7.188n ± 1%  -14.04% (p=0.000 n=24)
Cut/skip=8-32     8.433n ± 7%   7.396n ± 2%  -12.30% (p=0.000 n=24)
Cut/skip=16-32    8.347n ± 1%   7.240n ± 1%  -13.25% (p=0.000 n=24)
Cut/skip=32-32    9.220n ± 1%   7.882n ± 1%  -14.51% (p=0.000 n=24)
Cut/skip=64-32   10.050n ± 1%   8.607n ± 1%  -14.35% (p=0.000 n=24)
geomean           8.764n        7.575n       -13.56%

aimuz added a commit to aimuz/go that referenced this issue May 1, 2024
Optimize the Cut function in both the bytes and strings packages
to immediately return slices when the separator is a single byte (or
character), avoiding more complex index searching logic. This change
can significantly reduce the execution time for these specific cases,
as benchmark tests added to each package demonstrate improvements.

The optimization checks if the length of the separator is one before
proceeding with the existing search strategy. If so, it uses IndexByte
for a faster lookup of the separator's position.

Additionally, benchmark tests have been added for both packages to
demonstrate the performance benefits of this optimization across
various scenarios.

goos: darwin
goarch: arm64
pkg: strings
cpu: Apple M2 Max
                  │ old-cut.txt │             new-cut.txt             │
                  │   sec/op    │   sec/op     vs base                │
Cut/Cut-One/2-12    4.026n ± 2%   3.274n ± 2%  -18.68% (p=0.000 n=10)
Cut/Cut-Two/2-12    8.093n ± 0%   8.357n ± 0%   +3.27% (p=0.000 n=10)
Cut/Cut-One/4-12    4.048n ± 1%   3.324n ± 2%  -17.91% (p=0.000 n=10)
Cut/Cut-Two/4-12    8.105n ± 0%   8.377n ± 1%   +3.35% (p=0.000 n=10)
Cut/Cut-One/8-12    4.089n ± 1%   3.290n ± 1%  -19.53% (p=0.000 n=10)
Cut/Cut-Two/8-12    8.107n ± 1%   8.359n ± 1%   +3.10% (p=0.000 n=10)
Cut/Cut-One/16-12   4.127n ± 1%   3.328n ± 1%  -19.35% (p=0.000 n=10)
Cut/Cut-Two/16-12   8.119n ± 1%   8.374n ± 1%   +3.15% (p=0.000 n=10)
Cut/Cut-One/32-12   4.545n ± 2%   3.675n ± 1%  -19.14% (p=0.000 n=10)
Cut/Cut-Two/32-12   8.708n ± 1%   8.963n ± 1%   +2.92% (p=0.000 n=10)
Cut/Cut-One/64-12   4.825n ± 2%   4.146n ± 1%  -14.08% (p=0.000 n=10)
Cut/Cut-Two/64-12   9.286n ± 0%   9.315n ± 1%        ~ (p=0.105 n=10)
geomean             5.983n        5.486n        -8.32%

                  │ old-cut.txt  │             new-cut.txt             │
                  │     B/op     │    B/op     vs base                 │
Cut/Cut-One/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                  │ old-cut.txt  │             new-cut.txt             │
                  │  allocs/op   │ allocs/op   vs base                 │
Cut/Cut-One/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

For golang#67101
aimuz added a commit to aimuz/go that referenced this issue May 1, 2024
Optimize the Cut function in both the bytes and strings packages
to immediately return slices when the separator is a single byte (or
character), avoiding more complex index searching logic. This change
can significantly reduce the execution time for these specific cases,
as benchmark tests added to each package demonstrate improvements.

The optimization checks if the length of the separator is one before
proceeding with the existing search strategy. If so, it uses IndexByte
for a faster lookup of the separator's position.

Additionally, benchmark tests have been added for both packages to
demonstrate the performance benefits of this optimization across
various scenarios.

goos: darwin
goarch: arm64
pkg: strings
cpu: Apple M2 Max
                  │ old-cut.txt │             new-cut.txt             │
                  │   sec/op    │   sec/op     vs base                │
Cut/Cut-One/2-12    4.026n ± 2%   3.274n ± 2%  -18.68% (p=0.000 n=10)
Cut/Cut-Two/2-12    8.093n ± 0%   8.357n ± 0%   +3.27% (p=0.000 n=10)
Cut/Cut-One/4-12    4.048n ± 1%   3.324n ± 2%  -17.91% (p=0.000 n=10)
Cut/Cut-Two/4-12    8.105n ± 0%   8.377n ± 1%   +3.35% (p=0.000 n=10)
Cut/Cut-One/8-12    4.089n ± 1%   3.290n ± 1%  -19.53% (p=0.000 n=10)
Cut/Cut-Two/8-12    8.107n ± 1%   8.359n ± 1%   +3.10% (p=0.000 n=10)
Cut/Cut-One/16-12   4.127n ± 1%   3.328n ± 1%  -19.35% (p=0.000 n=10)
Cut/Cut-Two/16-12   8.119n ± 1%   8.374n ± 1%   +3.15% (p=0.000 n=10)
Cut/Cut-One/32-12   4.545n ± 2%   3.675n ± 1%  -19.14% (p=0.000 n=10)
Cut/Cut-Two/32-12   8.708n ± 1%   8.963n ± 1%   +2.92% (p=0.000 n=10)
Cut/Cut-One/64-12   4.825n ± 2%   4.146n ± 1%  -14.08% (p=0.000 n=10)
Cut/Cut-Two/64-12   9.286n ± 0%   9.315n ± 1%        ~ (p=0.105 n=10)
geomean             5.983n        5.486n        -8.32%

                  │ old-cut.txt  │             new-cut.txt             │
                  │     B/op     │    B/op     vs base                 │
Cut/Cut-One/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                  │ old-cut.txt  │             new-cut.txt             │
                  │  allocs/op   │ allocs/op   vs base                 │
Cut/Cut-One/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

For golang#67101
@gopherbot
Copy link
Contributor

Change https://go.dev/cl/582655 mentions this issue: bytes, strings: optimize Cut for single-byte separators

@aimuz
Copy link
Contributor Author

aimuz commented May 1, 2024

If the length is 1, it does improve, but if the length is greater than 1, it will increase accordingly. Maybe this is acceptable?

The code and improved implementation of the benchmark test will be available at https://go.dev/cl/582655

goos: darwin
goarch: arm64
pkg: strings
cpu: Apple M2 Max
                  │ old-cut.txt │             new-cut.txt             │
                  │   sec/op    │   sec/op     vs base                │
Cut/Cut-One/2-12    4.026n ± 2%   3.274n ± 2%  -18.68% (p=0.000 n=10)
Cut/Cut-Two/2-12    8.093n ± 0%   8.357n ± 0%   +3.27% (p=0.000 n=10)
Cut/Cut-One/4-12    4.048n ± 1%   3.324n ± 2%  -17.91% (p=0.000 n=10)
Cut/Cut-Two/4-12    8.105n ± 0%   8.377n ± 1%   +3.35% (p=0.000 n=10)
Cut/Cut-One/8-12    4.089n ± 1%   3.290n ± 1%  -19.53% (p=0.000 n=10)
Cut/Cut-Two/8-12    8.107n ± 1%   8.359n ± 1%   +3.10% (p=0.000 n=10)
Cut/Cut-One/16-12   4.127n ± 1%   3.328n ± 1%  -19.35% (p=0.000 n=10)
Cut/Cut-Two/16-12   8.119n ± 1%   8.374n ± 1%   +3.15% (p=0.000 n=10)
Cut/Cut-One/32-12   4.545n ± 2%   3.675n ± 1%  -19.14% (p=0.000 n=10)
Cut/Cut-Two/32-12   8.708n ± 1%   8.963n ± 1%   +2.92% (p=0.000 n=10)
Cut/Cut-One/64-12   4.825n ± 2%   4.146n ± 1%  -14.08% (p=0.000 n=10)
Cut/Cut-Two/64-12   9.286n ± 0%   9.315n ± 1%        ~ (p=0.105 n=10)
geomean             5.983n        5.486n        -8.32%

                  │ old-cut.txt  │             new-cut.txt             │
                  │     B/op     │    B/op     vs base                 │
Cut/Cut-One/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                  │ old-cut.txt  │             new-cut.txt             │
                  │  allocs/op   │ allocs/op   vs base                 │
Cut/Cut-One/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

@aimuz
Copy link
Contributor Author

aimuz commented May 1, 2024

Maybe CL 582655 is the best way to go for the current situation? He can get this one optimised without modifying the code in other locations. But for applications with sep greater than 1, he will have a performance loss greater than 2%, is this acceptable?

In go code, I've investigated, and the proportion of sep lengths of 1 is quite large. Maybe we can merge CL 582655?

aimuz added a commit to aimuz/go that referenced this issue May 1, 2024
Optimize the Cut function in both the bytes and strings packages
to immediately return slices when the separator is a single byte (or
character), avoiding more complex index searching logic. This change
can significantly reduce the execution time for these specific cases,
as benchmark tests added to each package demonstrate improvements.

The optimization checks if the length of the separator is one before
proceeding with the existing search strategy. If so, it uses IndexByte
for a faster lookup of the separator's position.

Additionally, benchmark tests have been added for both packages to
demonstrate the performance benefits of this optimization across
various scenarios.

goos: darwin
goarch: arm64
pkg: strings
cpu: Apple M2 Max
                  │ old-cut.txt │             new-cut.txt             │
                  │   sec/op    │   sec/op     vs base                │
Cut/Cut-One/2-12    4.026n ± 2%   3.274n ± 2%  -18.68% (p=0.000 n=10)
Cut/Cut-Two/2-12    8.093n ± 0%   8.357n ± 0%   +3.27% (p=0.000 n=10)
Cut/Cut-One/4-12    4.048n ± 1%   3.324n ± 2%  -17.91% (p=0.000 n=10)
Cut/Cut-Two/4-12    8.105n ± 0%   8.377n ± 1%   +3.35% (p=0.000 n=10)
Cut/Cut-One/8-12    4.089n ± 1%   3.290n ± 1%  -19.53% (p=0.000 n=10)
Cut/Cut-Two/8-12    8.107n ± 1%   8.359n ± 1%   +3.10% (p=0.000 n=10)
Cut/Cut-One/16-12   4.127n ± 1%   3.328n ± 1%  -19.35% (p=0.000 n=10)
Cut/Cut-Two/16-12   8.119n ± 1%   8.374n ± 1%   +3.15% (p=0.000 n=10)
Cut/Cut-One/32-12   4.545n ± 2%   3.675n ± 1%  -19.14% (p=0.000 n=10)
Cut/Cut-Two/32-12   8.708n ± 1%   8.963n ± 1%   +2.92% (p=0.000 n=10)
Cut/Cut-One/64-12   4.825n ± 2%   4.146n ± 1%  -14.08% (p=0.000 n=10)
Cut/Cut-Two/64-12   9.286n ± 0%   9.315n ± 1%        ~ (p=0.105 n=10)
geomean             5.983n        5.486n        -8.32%

                  │ old-cut.txt  │             new-cut.txt             │
                  │     B/op     │    B/op     vs base                 │
Cut/Cut-One/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

                  │ old-cut.txt  │             new-cut.txt             │
                  │  allocs/op   │ allocs/op   vs base                 │
Cut/Cut-One/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/2-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/4-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/8-12    0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/16-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/32-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-One/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
Cut/Cut-Two/64-12   0.000 ± 0%     0.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                        ²               +0.00%                ²
¹ all samples are equal
² summaries must be >0 to compute geomean

For golang#67101
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Incoming
Development

Successfully merging a pull request may close this issue.

6 participants