Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Micro-optimization for better memory utilization #2632

Closed
behnambm opened this issue May 2, 2024 · 4 comments · Fixed by #2636
Closed

proposal: Micro-optimization for better memory utilization #2632

behnambm opened this issue May 2, 2024 · 4 comments · Fixed by #2636

Comments

@behnambm
Copy link
Contributor

behnambm commented May 2, 2024

Hello,

I've noticed that there's a potential for optimizing certain structs by rearranging their fields. This optimization could notably reduce the padding added by the compiler, ensuring better memory utilization and potentially enhancing performance.

It's important to consider that alignment requirements vary between 32-bit and 64-bit architectures. With a focus on optimizing for 64-bit architecture, I believe there's significant room for improvement in this regard.

It's worth emphasizing that this optimization won't alter the program's logic; rather, it involves solely changing the location of fields within the structs.

If okay, I would like to work on this.

@aldas
Copy link
Contributor

aldas commented May 10, 2024

Sorry for the late reply. If you are have interest and time for it please go ahead.

Could you provide small summary of which structs you aim, maybe small example with benchmark. Maybe related links to (blog)posts about this kind of optimizations - so people that are not that knowledgeable could get some introductory into the matter.

If there are no performance gains it would be useful to show that and reason why we see numbers we see. This is useful less experienced coders.

I am a quite ignorant at these things, I sometimes look at micro-optimizations that other frameworks do and think how much this magic actually give them benefits. It does not mean we should not do them but to have honest conversation what the benefits are in numbers.

I would really like if these kind of things come in a form that everyone interested (even not Echo users) could learn something. Having more optimized code is good, but even better is having that and more learned people for Go ecosystem.

@behnambm
Copy link
Contributor Author

Absolutely, sharing knowledge is key here, and I'm fully on board with that.


So let me start with a simple exlanation of what this micro optimization is.

Here we have two important concepts:

  1. Word Size: This refers to the number of bytes the CPU can process at one time. On a 64-bit system, the word size is 8 bytes, meaning the CPU can handle 8 bytes of data in a single operation. Ref
  2. Alignment: For the CPU to process data most efficiently, the data needs to be aligned in memory according to the word size. This means that data types should ideally begin at memory addresses that are multiples of their size, up to the word size. Ref

When struct fields are not naturally aligned with these boundaries, the Go compiler will automatically insert padding (extra space) between fields. This padding ensures that each field starts at an address that aligns with its size, optimizing memory access during runtime. And as a result when a data type is naturally aligned, the CPU fetches it in minimum read cycles.

Look at this struct:

type myStruct struct {
 myBool  bool    // 1 byte
 myFloat float64 // 8 bytes
 myInt   int32   // 4 bytes
}

And this is how the memory allocation for this struct will look like:
image

The red space is the padding added by the compiler to complete a word.

If we just reorder the fields of that struct like this:

type myStructOptimized struct {
 myFloat float64 // 8 bytes
 myBool  bool    // 1 byte
 myInt   int32   // 4 bytes
}

We will get an allocation like this:
image

And finally with just a reorder we can save 8 bytes.

Here's the code if you want to run it locally:

package main

import (
	"fmt"
	"unsafe"
)

type myStruct struct {
	myBool  bool    // 1 byte
	myFloat float64 // 8 bytes
	myInt   int32   // 4 bytes
}

type myStructOptimized struct {
	myFloat float64 // 8 bytes
	myBool  bool    // 1 byte
	myInt   int32   // 4 bytes
}

func main() {
	a := myStruct{}
	b := myStructOptimized{}

	fmt.Println("Size of myStruct:", unsafe.Sizeof(a))          //24
	fmt.Println("Size of myStructOptimized:", unsafe.Sizeof(b)) //16

}

References:
Golang struct size and memory optimisation
Alignment
Structure Matters: Padding, Alignment, and Memory Optimization in Go
Optimize Your Struct with Alignment Trick + Great Visualization
How to Speed Up Your Struct in Golang

@behnambm
Copy link
Contributor Author

behnambm commented May 14, 2024

Benchmark

I've been attempting to fine-tune the Echo framework to incorporate new benchmarks over the past few days. However, I encountered challenges due to the complexity of duplicating code required to establish a suitable benchmark environment. As a result, I decided to forgo benchmarking directly within the Echo framework itself.

I will drop this benchmark code here so you can run it locally too.

main.go

package main

type OptimizedUserStruct struct {
	Email    string // 16 bytes
	Balance  int64  // 8 bytes
	Rank     int32  // 4 bytes
	Age      uint16 // 2 bytes
	IsAdmin  bool   // 1 byte
	IsActive bool   // 1 byte
}

type UserStruct struct {
	IsAdmin  bool   // 1 byte + 7 padding
	Email    string // 16 bytes
	IsActive bool   // 1 byte + 7 padding
	Balance  int64  // 8 bytes
	Age      uint16 // 2 bytes + 2 padding
	Rank     int32  // 4 bytes
}

main_test.go

package main

import (
	"testing"
)

var OptimizedStructList []OptimizedUserStruct
var StructList []UserStruct

func BenchmarkUserStruct(b *testing.B) {
	b.ReportAllocs()
	b.ResetTimer()

	for i := 0; i < b.N; i++ {
		tmp := UserStruct{
			IsAdmin:  true,
			Email:    "email@example.com",
			IsActive: true,
			Balance:  42,
			Age:      42,
			Rank:     42,
		}
		StructList = append(StructList, tmp)
	}
}

func BenchmarkOptimizedUserStruct(b *testing.B) {
	b.ReportAllocs()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		tmp := OptimizedUserStruct{
			Email:    "email@example.com",
			Balance:  42,
			Rank:     42,
			Age:      42,
			IsAdmin:  true,
			IsActive: true,
		}
		OptimizedStructList = append(OptimizedStructList, tmp)
	}
}

Result

go test -bench=. ./...

goos: linux
goarch: amd64
pkg: lab/padding
cpu: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
BenchmarkUserStruct-8                   10656973               109.0 ns/op           270 B/op          0 allocs/op
BenchmarkOptimizedUserStruct-8          46414730                53.81 ns/op          162 B/op          0 allocs/op
PASS
ok      lab/padding     4.164s


It's crucial to emphasize the title of the issue, which proposes a "micro" optimization. Therefore, we shouldn't anticipate a significant impact on performance. The effects of these types of optimizations become apparent at scale.

behnambm added a commit to behnambm/echo that referenced this issue May 29, 2024
aldas pushed a commit that referenced this issue May 30, 2024
@aldas
Copy link
Contributor

aldas commented May 30, 2024

Thank you @behnambm !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants