Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic memory control #211

Open
slax81 opened this issue Sep 15, 2022 · 15 comments
Open

Dynamic memory control #211

slax81 opened this issue Sep 15, 2022 · 15 comments

Comments

@slax81
Copy link

slax81 commented Sep 15, 2022

Hello, when I create a new VM with 16GB of ram, I get minimum of 2GB of ram as in the image below. I would like to be set to 16GB bith min and max ram. Is that possible to do in terraform code?

image

@slax81
Copy link
Author

slax81 commented Sep 21, 2022

Of course, VM from the above image is created using terraform and terraform-provider-xenorchestra. But in VM resource we can only set memory_max not memory_min limit. At least I haven't found a way. Please help me with this if possible.

@michal-rybinski
Copy link

michal-rybinski commented Dec 2, 2022

What's more than the above it seems either the newest XO API is broken or the provider, as setting memory_max does not necessery set this value in the VM. I've done multiple tests with setting this parameter from 0.5G to 16G of ram for a VM and the result was that VMs with less than 4GB of RAM assigned to them through this parameter always crashed during OS installation, the faster the more the difference between allocated memory and 4GB, VMs with more RAM, never crashed and always progressed with installation.

It seems to have something to do with the fact that memory_max is setting also dynamic memory parameters and setting it lower than the 4GB limit (which is somehow automatically applied on the memory max side if that's the case), seems to crash the VM as it tries to address it and most probably is killed by hypervisor due to seg fault.

Could this please be investigated?

2GB

4GB

16GB

0 5GB

8GB

1GB

@ddelnano
Copy link
Collaborator

ddelnano commented Dec 6, 2022

There isn't a memory_min parameter for the VM resource at the moment and as you both have seen there isn't control over the dynamic values. I definitely agree that the status quo has issues and there should be flexibility for more advanced use cases.

@michal-rybinski can you please show some terraform code for the example(s) you've shown above? I'm not sure I understand what inputs create the outputs you shared.

I was hoping to provide users a "memory" setting that didn't need to provide dynamic and static values for the simple case. We should definitely extend the provider to allow configuring all settings, but if it's possible I think it would be great if the simple case would work properly.

@michal-rybinski
Copy link

michal-rybinski commented Dec 6, 2022

sure, the vm code I use is as below:

resource "xenorchestra_vm" "test-vm" {
  #  memory_max        = 17179869184
  #  memory_max        = 8589934592
  memory_max = 4294967296
  #  memory_max        = 2147483648
  #  memory_max        = 1073741824
  #  memory_max        = 536870912

  cpus              = 2
  videoram          = 8
  hvm_boot_firmware = "bios"
  cloud_config      = xenorchestra_cloud_config.test.template
  name_label        = "Test Rocky 8 VM"
  name_description  = "VM managed by Terraform"
  template          = data.xenorchestra_template.template.id
  auto_poweron      = true

  cdrom {
    id = data.xenorchestra_vdi.rocky8.id
  }

  # Prefer to run the VM on the primary pool instance
  affinity_host = data.xenorchestra_pool.pool.master

  network {
    network_id = data.xenorchestra_network.net_mgmt.id
  }
  network {
    network_id = data.xenorchestra_network.net_data.id
  }

  disk {
    sr_id      = data.xenorchestra_sr.local_storage.id
    name_label = "Test Rocky 8"
    size       = 21474836480
  }

  tags = [
    "Rocky",
    "8",
    "Test",
  ]

  // Override the default create timeout from 5 mins to 10
  timeouts {
    create = "10m"
  }
}

The outputs are created by just uncommenting/commenting the memory_max lines.
The files were named accordingly but looks like giuthub renames them when it attaches so it got lost.
The first screengrab is with 2GB, second 4GB, third 16GB, fourth 0.5GB, fifth 8GB and sixth 1GB.

The problem occurs when the amount is lower than 4GB as you can see that static max is then set to 4GB regardless and it crashes the VM.

Probably easiest solution would be to set the provioded value as all 4 values (but I am not sure what are other dependencies that need to be meet here) or just make sure no value is greater then the one provided.

@slax81
Copy link
Author

slax81 commented Dec 6, 2022

I've solved this by using xo-cli after the VM is created to modify memory_min

@michal-rybinski
Copy link

Well, I can change the static max value from Xen Orchestra and that's what I did but I'd prefer to not have to do it every time I want to create a VM with less than 4GB or memory

@4censord
Copy link
Contributor

4censord commented Dec 7, 2022

seems to crash the VM as it tries to address it and most probably is killed by hypervisor due to seg fault.

but I'd prefer to not have to do it every time I want to create a VM with less than 4GB or memory

I would assume that there must be some other factor, additionally to the memory setting, at play here, as I am regularly creating VMs with less than 4G ram, and i never had that issue.

Do you maybe have some host or Xen orchestra logs that could help us understand the problem better?
How is your pool set up, and how big is it?

For reference:
This
Vm created by following terraform config:

resource "xenorchestra_vm" "vm" {
  name_label       = "${local.projekt} ${local.ip_address}"
  template         = data.xenorchestra_template.template.id

  cpus       = 1
  memory_max = 2 * local.gib

  cloud_config = xenorchestra_cloud_config.user_data.template
  cloud_network_config = xenorchestra_cloud_config.network_data.template

  auto_poweron = true

  wait_for_ip = true

  cdrom {
    id = data.xenorchestra_vdi.iso_name.id
  }

  network {
    network_id  = data.xenorchestra_network.network.id
  }

  disk {
    sr_id            = data.xenorchestra_sr.storage.id
    name_label       = "${local.projekt}: grav state"
    size             = 53691285504
  }
  disk {
    sr_id            = data.xenorchestra_sr.storage.id
    name_label       = "${local.projekt}: caddy state"
    size             = 1075838976
  }

  tags = [
    "state",
  ]
  lifecycle {
    replace_triggered_by = [
      xenorchestra_cloud_config.user_data,
      xenorchestra_cloud_config.network_data,
    ]
  }
}

with the used template being Debian Bullseye 11

@michal-rybinski
Copy link

Was able to get this log from XCP-NG server

Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7623 :::80||api_effect] VM.add_to_HVM_boot_params
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 HTTPS 192.168.1.77->:::80|VM.add_to_HVM_boot_params D:7a6bd38bb489|sql] Duplicate key in set or map: table VM; field HVM__boot_params; ref OpaqueRef:0edb8dd7-c91f-4b4c-a0a6-cbb46a5da733; key firmware
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] VM.add_to_HVM_boot_params D:7a6bd38bb489 failed with exception Db_exn.Duplicate_key("VM", "HVM__boot_params", "OpaqueRef:0edb8dd7-c91f-4b4c-a0a6-cbb46a5da733", "firmware")
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7620 :::80||api_effect] VM.remove_from_VCPUs_params
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] Raised Db_exn.Duplicate_key("VM", "HVM__boot_params", "OpaqueRef:0edb8dd7-c91f-4b4c-a0a6-cbb46a5da733", "firmware")
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7622 :::80||api_effect] VM.remove_from_platform
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7621 :::80||api_effect] VM.remove_from_VCPUs_params
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7618 HTTPS 192.168.1.77->:::80|VM.set_VCPUs_max D:de230b766ee4|audit] VM.set_VCPUs_max: self = dfd8dd53-0f6c-da54-bf5d-04fc5b8271d9 (Test Rocky 8 VM); value = 2
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] 1/8 xapi Raised at file ocaml/database/db_cache_impl.ml, line 316
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] 2/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] 3/8 xapi Called from file ocaml/xapi/rbac.ml, line 233
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] 4/8 xapi Called from file ocaml/xapi/server_helpers.ml, line 101
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] 5/8 xapi Called from file ocaml/xapi/server_helpers.ml, line 122
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7624 :::80||api_effect] VM.add_to_platform
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7624 HTTPS 192.168.1.77->:::80|VM.add_to_platform D:434ad8ece81e|sql] Duplicate key in set or map: table VM; field platform; ref OpaqueRef:0edb8dd7-c91f-4b4c-a0a6-cbb46a5da733; key device-model
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] 6/8 xapi Called from file lib/xapi-stdext-pervasives/pervasiveext.ml, line 24
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7625 HTTPS 192.168.1.77->:::80|VM.set_memory_limits R:68f521b74729|audit] VM.set_memory_limits: self = dfd8dd53-0f6c-da54-bf5d-04fc5b8271d9 (Test Rocky 8 VM); static_min = 2147483648; static_max = 4294967296; dynamic_min = 2147483648; dynamic_max = 2147483648
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7619 :::80||api_effect] VM.set_affinity
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [error||7623 :::80||backtrace] 7/8 xapi Called from file map.ml, line 135
Dec  7 21:07:32 xcp-ng-r620-2 xapi: [debug||7627 :::80||api_effect] VM.set_tags

where the following line seems to be doing the job

Dec 7 21:07:32 xcp-ng-r620-2 xapi: [debug||7625 HTTPS 192.168.1.77->:::80|VM.set_memory_limits R:68f521b74729|audit] VM.set_memory_limits: self = dfd8dd53-0f6c-da54-bf5d-04fc5b8271d9 (Test Rocky 8 VM); static_min = 2147483648; static_max = 4294967296; dynamic_min = 2147483648; dynamic_max = 2147483648

I run XCP-ng 8.2.1 on R620 with 32 cores and 96GB of ram
connecting to it with Xen Orchestra, versions as visible in screen shot attached.
Terraform v1.3.6
xenorchestra provider v0.23.3
Screenshot 2022-12-07 at 20 46 30

@michal-rybinski
Copy link

forgot to add, although not sure if that matters, riunning terraform from Intel based Mac with OSX 13.0.1

@michal-rybinski
Copy link

michal-rybinski commented Dec 7, 2022

Out of curiosity run it from Ubuntu 22.10 laptop and got exactly same result so this should not matter

@michal-rybinski
Copy link

michal-rybinski commented Dec 7, 2022

@4censord

Tried your config, got the same result regardless

@4censord
Copy link
Contributor

4censord commented Dec 8, 2022

Okay, I will try to replicate tomorrow.
For now, here are my versions/setup.

Xen orchestra (Appliance on the stable channel, not updated for a few weeks)

- node: 16.14.2
- npm: 8.12.1
- xen-orchestra-upload-ova: 0.1.4
- xo-server: 5.103.1
- xo-server-telemetry: 0.5.0
- xo-server-xoa: 0.15.0
- xo-web-free: 5.104.0
- xoa-cli: 0.31.0
- xoa-updater: 0.43.1

Multiple hosts on

Version: 8.2.1
Build number: release/yangtze/master/58

Shared storage on iscsi

Terraform:

Terraform v1.3.5
on linux_amd64

@michal-rybinski
Copy link

michal-rybinski commented Dec 8, 2022

I might have noticed something that potentially is the reson for this behaviour.

Looks like the static limits are set by Templates and I am using one that has the initial limits set to 4GB
Screenshot 2022-12-08 at 21 47 06

whereas yours has 1GB set into it
Screenshot 2022-12-08 at 21 47 41

Looks like if I provide value smaller than 4GB, it leaves static max as it was and changes static min along with both dynamic values. If I provide value bigger than 4GB it adjusts all values in accordance to its own algorithm.

In your case, you might notice behaviour which I see if you go for values below 1GB, but then again you might be hitting amounts which depending on the OS will just not be enough for the OS installer to work as it needs and crash regardless.

@4censord
Copy link
Contributor

4censord commented Dec 9, 2022

I have done some testing.
I have attached some terraform files.

I can reproduce the static_max value being set incorrectly to the value of the template when the vm has less memory than its template.

(this is the rocky-1g vm)

mem-issue-repro.zip

I have no idea where to look to fix this tho.

@yerkanian
Copy link

Do you happen to have any updates on this issue? The RHEL and derivate distributions templates are affected by this issue, and creating a VM with memory_max < 4Gb using Terraform results in the OS installer (booted from iso attached to the VM) crashing and rebooting. Fixing the static_max value from XO UI fixes the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants