Error in Load Testing with MySQL in Firecracker Environment - Buffer Size Issue #1999

Roopsai507 · 2024-02-09T11:35:58Z

I'm facing an issue while conducting load test with HammerDB on MySQL within a Firecracker environment.
I'm encountering error messages related to buffer sizes and frame handling. Below is a snippet of the error messages:

2024-02-09T11:12:26.690064527 [9090:main] Receiving buffer is too small to hold frame of current size
2024-02-09T11:12:26.690120360 [9090:main] Receiving buffer is too small to hold frame of current size
2024-02-09T11:12:26.690126586 [9090:main] Receiving buffer is too small to hold frame of current size

frame trace:
ffffc00001e07f50:   ffffffff8005b624
ffffc00001e07fc0:   ffffffff800472ba

loaded klibs:
assertion len <= x->p.pbuf.len failed at /home/circleci/project/src/virtio/virtio_net.c:196 (IP 0xffffffff80106807)  in vnet_input(); halt

frame trace:
ffffc00001d1b060:   ffffffff800467d2
ffffc00001d1b0c0:   ffffffff80047236
ffffc00001d1b3c0:   ffffffff80044168
ffffc00001d1b3d0:   ffffffff800467d2
ffffc00001d1b430:   ffffffff80047236
ffffc00001d1b730:   ffffffff80044168
ffffc00001d1b740:   ffffffff800467d2
ffffc00001d1b7a0:   ffffffff80047236
ffffc00001d1baa0:   ffffffff80044168
ffffc00001d1bab0:   ffffffff800467d2
ffffc00001d1bb10:   ffffffff80047236
ffffc00001d1be10:   ffffffff80044168
ffffc00001d1be20:   ffffffff800467d2
ffffc00001d1be80:   ffffffff80047236
ffffc00001d1c180:   ffffffff80044168
ffffc00001d1c190:   ffffffff800467d2
ffffc00001d1c1f0:   ffffffff80047236
ffffc00001d1c4f0:   ffffffff80044168
ffffc00001d1c500:   ffffffff800467d2
ffffc00001d1c560:   ffffffff80047236
ffffc00001d1c860:   ffffffff80044168
ffffc00001d1c870:   ffffffff800467d2
ffffc00001d1c8d0:   ffffffff80047236
ffffc00001d1cbd0:   ffffffff80044168
ffffc00001d1cbe0:   ffffffff800467d2
ffffc00001d1cc40:   ffffffff80047236
ffffc00001d1cf40:   ffffffff80044168
ffffc00001d1cf50:   ffffffff800467d2
ffffc00001d1cfb0:   ffffffff80047236
ffffc00001d1d2b0:   ffffffff80044168
ffffc00001d1d2c0:   ffffffff800467d2
ffffc00001d1d320:   ffffffff80047236

loaded klibs:
assertion buffer_extend(v, sizeof(void *)) failed at /home/circleci/project/src/runtime/vector.h:73 (IP 0xffffffff80044130)  in vector_push(); halt
validate_page error: objcache 0xffff80000005ad38, footer 0xffffc00001dfffe0, bad magic! (10)
objcache_allocate error: alloc failed

Test Configurations:

Nanos Kernel - ~/.ops/d875bfe/kernel.img
Ops version: 0.1.40
Nanos version: 0.1.49
MySQL: 8.0.35
Firecracker: v1.6.0
HammerDB: 4.7

HammerDB Testing Setup:

./hammerdbcli
hammerdb>cat buildSchema.tcl
puts "SETTING CONFIGURATION"
dbset db mysql
diset connection mysql_host 11.24.45.251
diset connection mysql_port 3306
diset tpcc mysql_pass root
diset tpcc mysql_count_ware 800
diset tpcc mysql_partition true
diset tpcc mysql_num_vu 64
diset tpcc mysql_storage_engine innodb
print dict
buildschema
hammerdb>source buildSchema.tcl

The text was updated successfully, but these errors were encountered:

francescolavra · 2024-02-17T09:48:42Z

The issue causing the above error messages has been fixed in #2002. To test this fix, you can create a new image with Ops by specifying --nanos-version 659f598 in your ops image create command, and then use ~/.ops/659f598/kernel.img as kernel image in your Firecracker configuration.

Roopsai507 · 2024-03-19T13:38:50Z

The same I have tried with postgres(Image - francescolavra/postgres:16.0) with the updated kernel image ~/.ops/659f598/kernel.img

Got the below memory error and kernel got crashed.

2024-03-19 13:17:57.245 UTC [4361762560] WARNING:  plancache reference leak: plan 0x10c2e8658 not closed
2024-03-19 13:17:57.255 UTC [4351211264] ERROR:  lock reference 0x1042e89b8 is not owned by resource owner (null)
2024-03-19 13:17:57.257 UTC [4351211264] WARNING:  AbortTransaction while in COMMIT state
2024-03-19 13:17:57.258 UTC [4351211264] PANIC:  cannot abort transaction 4883, it was already committed

*** signal 6 received by tid 15, errno 0, code -6

*** Thread context:
lastvector: 00000000000000ea
     frame: ffffc0000261b800
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000

francescolavra · 2024-03-23T17:05:21Z

The above log does not show a kernel crash, it's the postgres program that triggered an unhandled SEGV signal and therefore has been terminated by the kernel. Anyway, I tried load-testing postgres with HammerDB but I couldn't reproduce this issue.
I'm using Firecracker v1.6.0, and tried with both kernel build 659f598 and the latest nightly build.
@Roopsai507 could you share the tcl file you are using to build the postgres schema in HammerDB? Mine is as follows:

puts "SETTING CONFIGURATION"
dbset db pg
diset connection pg_host 11.0.2.15
print dict
buildschema

Roopsai507 · 2024-03-23T17:33:51Z

First run the below build schema tcl file

> cat buildSchema.tcl
puts "SETTING CONFIGURATION"
dbset db pg
diset connection pg_host 11.0.2.15
diset connection pg_port 9090
diset tpcc pg_count_ware 5
diset tpcc pg_superuser postgres
diset tpcc pg_num_vu 5
diset tpcc pg_user tpcc
diset tpcc pg_pass tpcc
diset tpcc pg_dbase tpcc
print dict
buildschema

After this run the below tcl file the issue will be reproduced

> cat pgrun.tcl          
#!/bin/tclsh
proc runtimer { seconds } {
set x 0
set timerstop 0
while {!$timerstop} {
incr x
after 1000
if { ![ expr {$x % 60} ] } {
set y [ expr $x / 60 ]
puts "Timer: $y minutes elapsed"
}
update
if { [ vucomplete ] || $x eq $seconds } { set timerstop 1 }
}
return
}

puts "SETTING CONFIGURATION"
dbset db pg
diset connection pg_host 11.0.2.15
diset connection pg_port 9090
diset tpcc pg_superuser postgres
diset tpcc pg_user tpcc
diset tpcc pg_pass tpcc
diset tpcc pg_dbase tpcc
diset tpcc pg_driver timed
diset tpcc pg_duration 2
diset tpcc pg_duration 5
diset tpcc pg_vacuum true
print dict
vuset logtotemp 1
loadscript
puts "SEQUENCE STARTED"
foreach z {1} {
puts "$z VU test"
vuset vu $z
vucreate
vurun
runtimer 480
vudestroy
after 5000
}
puts "TEST SEQUENCE COMPLETE"

francescolavra · 2024-04-17T16:34:58Z

I found some issues in both the postgres:16.0 package and the Nanos kernel. The postgres issues have been fixed in a new package I uploaded (francescolavra/postgres:16.2), while the kernel issues have been fixed in 95d807a. With these fixes, the HammerDB load test completes successfully.

Roopsai507 · 2024-04-23T08:43:58Z

I tried testing it with the nanos-kernel version 95d807a
ops image create -c config.json --imagename postgres --package francescolavra/postgres_16.2 --nanos-version 95d807a

But I couldn't get it to start. When I tried starting the Firecracker VM with the kernel image 95d807a, I encountered a segmentation error. It seems there's an issue with the kernel image. I've tested with other kernel versions, and PostgreSQL starts fine with those.

2024-04-23T08:37:39.278245165 [9090:main] Running Firecracker v1.6.0
2024-04-23T08:37:39.376741397 [9090:main] Artificially kick devices.
2024-04-23T08:37:39.376852171 [9090:main] Successfully started microvm that was configured from one single json
warning: ACPI MADT not found, default to 1 processor
en1: assigned 11.244.15.161

*** signal 11 received by tid 2, errno 0, code 2
    fault address 0xfffd7db8

*** Thread context:
lastvector: 000000000000000e (Page fault)
     frame: ffffc00002601800
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
error code: 0000000000000007
   address: 00000000fffd7db8

francescolavra · 2024-04-23T13:26:05Z

I'm unable to reproduce the issue here. Does PostrgreSQL start fine if you run it with Ops (i.e. with ops pkg load francescolavra/postgres_16.2 -c config.json --imagename postgres --nanos-version 95d807a)? If you get the same error, could you share the contents of the config.json file you are using in the Ops command line?

Roopsai507 · 2024-04-23T14:42:37Z

The command ops pkg load works fine with the given configuration.

However, when I used the same generated image (postgres) and kernel image (95d807a) with Firecracker, I encountered the error mentioned earlier.
If I replace the kernel image with any other version, such as 659f598 or d875bfe, it works without issues(PostrgreSQL starts fine).

cat config.json 
{
    "RunConfig": {
        "IPAddress": "11.244.15.161",
        "NetMask": "255.255.255.252",
        "Gateway": "11.244.15.162",
	"BridgeName": "eth0"
    },
    "BaseVolumeSz": "20000m"
}

Command for image creation :
ops image create -c config.json --imagename postgress --package francescolavra/postgres_16.2

Kernel Given to Firecracker: ~/.ops/95d807a/kernel.img

francescolavra · 2024-04-23T14:59:56Z

It still works fine here. My Firecracker config file is as follows:

$ cat ../vm_config.json 
{
  "boot-source": {
    "kernel_image_path": "/home/francesco/.ops/95d807a/kernel.img"
  },
  "drives": [
    {
      "drive_id": "rootfs",
      "path_on_host": "/home/francesco/.ops/images/postgress",
      "is_root_device": true,
      "is_read_only": false
    }
  ],
  "network-interfaces": [
    {
      "iface_id": "eth0",
      "guest_mac": "AA:FC:00:00:00:01",
      "host_dev_name": "tap1"
    }
  ],
  "machine-config": {
    "vcpu_count": 1,
    "mem_size_mib": 2048
  }
}

Then I run Firecracker as below:

$ /opt/firecracker-v1.6.0-x86_64 --no-api --config-file ../vm_config.json
2024-04-23T16:53:06.399909931 [anonymous-instance:main] Running Firecracker v1.6.0
2024-04-23T16:53:06.420688187 [anonymous-instance:main] Artificially kick devices.
2024-04-23T16:53:06.421089364 [anonymous-instance:main] Successfully started microvm that was configured from one single json
warning: ACPI MADT not found, default to 1 processor
en1: assigned 11.244.15.161
2024-04-23 14:53:06.548 UTC [4296224768] LOG:  starting PostgreSQL 16.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 8.4.0-1ubuntu1~16.04.1) 8.4.0, 64-bit
2024-04-23 14:53:06.566 UTC [4296224768] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-04-23 14:53:06.574 UTC [4296224768] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-04-23 14:53:06.590 UTC [4311586560] LOG:  database system was shut down at 2024-03-27 16:26:45 UTC
2024-04-23 14:53:06.604 UTC [4296224768] LOG:  database system is ready to accept connections
en1: assigned FE80::A8FC:FF:FE00:1

What Firecracker command line are you using? And what are the contents of your Firecracker config file?

Roopsai507 · 2024-04-23T15:35:26Z

sudo firecracker-v1.6.0-x86_64 --no-api --config-file /home/roop/fcconfig.json
2024-04-23T20:58:50.543184053 [anonymous-instance:main] Running Firecracker v1.6.0
2024-04-23T20:58:50.624887552 [anonymous-instance:main] Artificially kick devices.
2024-04-23T20:58:50.625226239 [anonymous-instance:main] Successfully started microvm that was configured from one single json
warning: ACPI MADT not found, default to 1 processor
en1: assigned 11.244.15.161

*** signal 11 received by tid 2, errno 0, code 2
    fault address 0xffde5db8

*** Thread context:
lastvector: 000000000000000e (Page fault)
     frame: ffffc00002601800
      type: thread
active_cpu: 00000000ffffffff
 stack top: 0000000000000000
error code: 0000000000000007
   address: 00000000ffde5db8

   rax: 0000000000000000
   rbx: 0000000000000000
   rcx: 0000155a75c9cc30

FC Command : sudo firecracker-v1.6.0-x86_64 --no-api --config-file /home/roop/fcconfig.json

ConfigFile :

cat fcconfig.json 
{
  "boot-source": {
    "kernel_image_path": "/home/roop/kernel_95d807a.img"
  },
  "machine-config": {
    "vcpu_count": 4,
    "mem_size_mib": 13404
  },
  "network-interfaces": [
    {
      "iface_id": "bond0",
      "host_dev_name": "fc-1000-tap0"
    }
  ],
  "drives": [
    {
      "drive_id": "rootfs",
      "path_on_host": "/home/roop/images/postgress",
      "is_root_device": true,
      "is_read_only": false
    }

 ]
}

I think I found the issue ,
have changed the machine-config

{
    "vcpu_count": 4,
    "mem_size_mib": 13404
  }

to your given conf and working fine

{
    "vcpu_count": 1,
    "mem_size_mib": 2048
  }

May I know why it is causing the issue

Depending on the address of the initial pages allocated in setup_initmap(), the map_setup_2mbpages() function may not find an existing PDPT and or PDT in the page tables, and thus may need to use a new PDPT and/or a new PDT. However, the PDPT and PDT addresses passed to this function correspond to in-use pages, and as such cannot be reused and assigned to new PTEs. Moreover, when creating a new PTE, the map_setup_2mbpages() function is simply OR-ing the page address with the page flags, failing to set the USER flag and thereby preventing any pages referenced by this directory entry from being mapped for user space access; this may cause bogus segmentation fault signals to be delivered to the user process (#1999 (comment)). This change fixes the above issues by amending map_setup_2mbpages() so that it takes a physical memory region from which to allocate any new pages it may need, and calls new_level_pte() when creating a new PTE.

francescolavra · 2024-04-24T11:12:23Z

OK, thanks, you uncovered another bug in the kernel. This is fixed in #2019, and you can retrieve the fixed kernel build with --nanos-version d9743ea.

francescolavra self-assigned this Feb 13, 2024

francescolavra mentioned this issue Feb 16, 2024

Memory leak fixes #2001

Merged

francescolavra mentioned this issue Feb 17, 2024

virtIO fixes #2002

Merged

francescolavra closed this as completed in 53944c7 Feb 24, 2024

eyberg reopened this Mar 19, 2024

francescolavra mentioned this issue Apr 24, 2024

x86: setup_initmap(): fix mapping of initial pages #2019

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in Load Testing with MySQL in Firecracker Environment - Buffer Size Issue #1999

Error in Load Testing with MySQL in Firecracker Environment - Buffer Size Issue #1999

Roopsai507 commented Feb 9, 2024 •

edited

francescolavra commented Feb 17, 2024

Roopsai507 commented Mar 19, 2024

francescolavra commented Mar 23, 2024

Roopsai507 commented Mar 23, 2024 •

edited

francescolavra commented Apr 17, 2024 •

edited

Roopsai507 commented Apr 23, 2024

francescolavra commented Apr 23, 2024

Roopsai507 commented Apr 23, 2024 •

edited

francescolavra commented Apr 23, 2024

Roopsai507 commented Apr 23, 2024 •

edited

francescolavra commented Apr 24, 2024

Error in Load Testing with MySQL in Firecracker Environment - Buffer Size Issue #1999

Error in Load Testing with MySQL in Firecracker Environment - Buffer Size Issue #1999

Comments

Roopsai507 commented Feb 9, 2024 • edited

francescolavra commented Feb 17, 2024

Roopsai507 commented Mar 19, 2024

francescolavra commented Mar 23, 2024

Roopsai507 commented Mar 23, 2024 • edited

francescolavra commented Apr 17, 2024 • edited

Roopsai507 commented Apr 23, 2024

francescolavra commented Apr 23, 2024

Roopsai507 commented Apr 23, 2024 • edited

francescolavra commented Apr 23, 2024

Roopsai507 commented Apr 23, 2024 • edited

francescolavra commented Apr 24, 2024

Roopsai507 commented Feb 9, 2024 •

edited

Roopsai507 commented Mar 23, 2024 •

edited

francescolavra commented Apr 17, 2024 •

edited

Roopsai507 commented Apr 23, 2024 •

edited

Roopsai507 commented Apr 23, 2024 •

edited