Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arc GPU Not rendering to Display #130

Open
zrosol232 opened this issue Nov 20, 2023 · 8 comments
Open

Arc GPU Not rendering to Display #130

zrosol232 opened this issue Nov 20, 2023 · 8 comments
Labels
compiler/driver/OS issue Issues with the C++ compiler, the OpenCL driver of the device vendor, or the operating system

Comments

@zrosol232
Copy link

zrosol232 commented Nov 20, 2023

Good morning,
I am trying to run FluidX3D on an Arc A770 GPU for some collegiate racing aerodynamics and heat transfer sims. However, when I attempt to run the program with a custom setup script, I do not get anything on the rendering window output. The same setup script and settings work without issue on my Nvidia laptop.

Arc GPU: A770
Driver Version: 31.0.101.4953 (WHQL Certified)
OS: Windows 11

Nvidia GPU: 3070 Ti Max-Q
Driver Version: 537.58
Setup Script:

void main_setup() { //Pack Dynamics
	//Initalize LBM method
	const uint3 lbm_N = resolution(float3(1.0f, 2.0f, 0.5f), 8000u ); // first param is ratio of box, and second is total VRAM wanted to use
	//start SI unit calling
	const float si_u = 25.0f;
	const float si_length = 2.0f;
	const float si_T = 10.0f;
	const float si_nu = 1.48E-5f;
	const float si_rho = 1.225f;
	const float lbm_length = 0.65 * (float)lbm_N.y;
	const float lbm_u = 0.1f;
	units.set_m_kg_s(lbm_length, lbm_u, 1.0f, si_length, si_u, si_rho);
	print_info("Re = " + to_string(to_uint(units.si_Re(si_length, si_u, si_nu))));
	LBM lbm(lbm_N, units.nu(si_nu));

	//load geometry in it

	float3x3 rotation = float3x3(
		float3(0, 0, 1), 
		radians(90.0f)) * float3x3(float3(1, 0, 0), 
		radians(270.0f)); // rotation around the z-axis by 180°, then around the x-axis by 90°
	const std::string stlPath = get_exe_path() + "../stl/Full_Pack_21700_Rev2_SMALL.STL";
	Mesh* mesh = read_stl(stlPath, lbm.size(), lbm.center(), rotation, lbm_length);
	mesh->translate(float3(0.0f, 1.0f - mesh->pmin.y + 0.1f * lbm_length, 2.0f - mesh->pmin.z)); // move mesh forward a bit and to simulation box bottom, keep in mind 1 cell thick box boundaries
	//mesh->translate(float3(0.0f, 1.0f - mesh->pmin.y + 0.1f * lbm_length, 1.0f - mesh->pmin.z));
	lbm.voxelize_mesh_on_device(mesh);
	//begin boundary conditions
	const uint Nx = lbm.get_Nx(), Ny = lbm.get_Ny(), Nz = lbm.get_Nz(); parallel_for(lbm.get_N(), [&](ulong n) { uint x = 0u, y = 0u, z = 0u; lbm.coordinates(n, x, y, z);
	if (z == 0u) 
		lbm.flags[n] = TYPE_S; // solid floor
	if (lbm.flags[n] != TYPE_S) 
		lbm.u.y[n] = lbm_u; // initialize y-velocity everywhere except in solid cells
	if (x == 0u || x == Nx - 1u || y == 0u || y == Ny - 1u || z == Nz - 1u) 
		lbm.flags[n] = TYPE_E; // all other simulation box boundaries are inflow/outflow
	});

	lbm.graphics.visualization_modes = VIS_FLAG_SURFACE | VIS_Q_CRITERION;
	const uint lbm_T = 10000u; // number of LBM time steps to simulate
	lbm.graphics.set_camera_centered(-40.0f, 20.0f, 78.0f, 1.25f);
	lbm.run(0u); // initialize simulation
	while (lbm.get_t() < lbm_T) { // main simulation loop
		/*
		if (lbm.graphics.next_frame(lbm_T, 25.0f)) { // render enough frames for 25 seconds of 60fps video
			lbm.graphics.set_camera_free(float3(2.5f * (float)Nx, 0.0f * (float)Ny, 0.0f * (float)Nz), 0.0f, 0.0f, 50.0f); // set camera to position 1
			lbm.graphics.write_frame(get_exe_path() + "export/camera_angle_1/"); // export image from camera position 1
			lbm.graphics.set_camera_centered(-40.0f, 20.0f, 78.0f, 1.25f); // set camera to position 2
			lbm.graphics.write_frame(get_exe_path() + "export/camera_angle_2/"); // export image from camera position 2
		}
		*/
		lbm.run(1u); // run 1 LBM time step
	}
	lbm.rho.write_device_to_vtk();
	lbm.u.write_device_to_vtk();
	lbm.flags.write_device_to_vtk();
	//lbm.phi.write_device_to_vtk(); // only for SURFACE extension
	//lbm.T.write_device_to_vtk(); // only for TEMPERATURE extension
	//lbm.write_mesh_to_vtk(mesh); // for exporting triangle meshes
#if defined(GRAPHICS) && !defined(INTERACTIVE_GRAPHICS)
	lbm.graphics.set_camera_centered(-40.0f, 20.0f, 78.0f, 1.25f);
	lbm.run(0u); // initialize simulation
	while (lbm.get_t() <= units.t(si_T)) { // main simulation loop
		if (lbm.graphics.next_frame(units.t(si_T), 10.0f)) lbm.graphics.write_frame();
		lbm.run(1u);
	}
#else // GRAPHICS && !INTERACTIVE_GRAPHICS
	lbm.run();
#endif // GRAPHICS && !INTERACTIVE_GRAPHICS
}

Defines:

#pragma once



//#define D2Q9 // choose D2Q9 velocity set for 2D; allocates 53 (FP32) or 35 (FP16) Bytes/cell
//#define D3Q15 // choose D3Q15 velocity set for 3D; allocates 77 (FP32) or 47 (FP16) Bytes/cell
#define D3Q19 // choose D3Q19 velocity set for 3D; allocates 93 (FP32) or 55 (FP16) Bytes/cell; (default)
//#define D3Q27 // choose D3Q27 velocity set for 3D; allocates 125 (FP32) or 71 (FP16) Bytes/cell

#define SRT // choose single-relaxation-time LBM collision operator; (default)
//#define TRT // choose two-relaxation-time LBM collision operator

//#define FP16S // compress LBM DDFs to range-shifted IEEE-754 FP16; number conversion is done in hardware; all arithmetic is still done in FP32
//#define FP16C // compress LBM DDFs to more accurate custom FP16C format; number conversion is emulated in software; all arithmetic is still done in FP32

//#define BENCHMARK // disable all extensions and setups and run benchmark setup instead

//#define VOLUME_FORCE // enables global force per volume in one direction (equivalent to a pressure gradient); specified in the LBM class constructor; the force can be changed on-the-fly between time steps at no performance cost
//#define FORCE_FIELD // enables computing the forces on solid boundaries with lbm.calculate_force_on_boundaries(); and enables setting the force for each lattice point independently (enable VOLUME_FORCE too); allocates an extra 12 Bytes/cell
#define EQUILIBRIUM_BOUNDARIES // enables fixing the velocity/density by marking cells with TYPE_E; can be used for inflow/outflow; does not reflect shock waves
//#define MOVING_BOUNDARIES // enables moving solids: set solid cells to TYPE_S and set their velocity u unequal to zero
//#define SURFACE // enables free surface LBM: mark fluid cells with TYPE_F; at initialization the TYPE_I interface and TYPE_G gas domains will automatically be completed; allocates an extra 12 Bytes/cell
//#define TEMPERATURE // enables temperature extension; set fixed-temperature cells with TYPE_T (similar to EQUILIBRIUM_BOUNDARIES); allocates an extra 32 (FP32) or 18 (FP16) Bytes/cell
#define SUBGRID // enables Smagorinsky-Lilly subgrid turbulence LES model to keep simulations with very large Reynolds number stable
//#define PARTICLES // enables particles with immersed-boundary method (for 2-way coupling also activate VOLUME_FORCE and FORCE_FIELD; only supported in single-GPU)

#define INTERACTIVE_GRAPHICS // enable interactive graphics; start/pause the simulation by pressing P; either Windows or Linux X11 desktop must be available; on Linux: change to "compile on Linux with X11" command in make.sh
//#define INTERACTIVE_GRAPHICS_ASCII // enable interactive graphics in ASCII mode the console; start/pause the simulation by pressing P
//#define GRAPHICS // run FluidX3D in the console, but still enable graphics functionality for writing rendered frames to the hard drive

#define GRAPHICS_FRAME_WIDTH 2560 // set frame width if only GRAPHICS is enabled
#define GRAPHICS_FRAME_HEIGHT 1600 // set frame height if only GRAPHICS is enabled
#define GRAPHICS_BACKGROUND_COLOR 0x000000 // set background color; black background (default) = 0x000000, white background = 0xFFFFFF
//#define GRAPHICS_TRANSPARENCY 0.7f // optional: comment/uncomment this line to disable/enable semi-transparent rendering (looks better but reduces framerate), number represents transparency (equal to 1-opacity) (default: 0.7f)
#define GRAPHICS_U_MAX 0.25f // maximum velocity for velocity coloring in units of LBM lattice speed of sound (c=1/sqrt(3)) (default: 0.25f)
#define GRAPHICS_Q_CRITERION 0.0001f // Q-criterion value for Q-criterion isosurface visualization (default: 0.0001f)
#define GRAPHICS_F_MAX 0.002f // maximum force in LBM units for visualization of forces on solid boundaries if VOLUME_FORCE is enabled and lbm.calculate_force_on_boundaries(); is called (default: 0.002f)
#define GRAPHICS_STREAMLINE_SPARSE 4 // set how many streamlines there are every x lattice points
#define GRAPHICS_STREAMLINE_LENGTH 128 // set maximum length of streamlines
#define GRAPHICS_RAYTRACING_TRANSMITTANCE 0.25f // transmitted light fraction in raytracing graphics ("0.25f" = 1/4 of light is transmitted and 3/4 is absorbed along longest box side length, "1.0f" = no absorption)
#define GRAPHICS_RAYTRACING_COLOR 0x005F7F // absorption color of fluid in raytracing graphics



// #############################################################################################################

#define TYPE_S 0b00000001 // (stationary or moving) solid boundary
#define TYPE_E 0b00000010 // equilibrium boundary (inflow/outflow)
#define TYPE_T 0b00000100 // temperature boundary
#define TYPE_F 0b00001000 // fluid
#define TYPE_I 0b00010000 // interface
#define TYPE_G 0b00100000 // gas
#define TYPE_X 0b01000000 // reserved type X
#define TYPE_Y 0b10000000 // reserved type Y

#define VIS_FLAG_LATTICE  0b00000001 // lbm.graphics.visualization_modes = VIS_...|VIS_...|VIS_...;
#define VIS_FLAG_SURFACE  0b00000010
#define VIS_FIELD         0b00000100
#define VIS_STREAMLINES   0b00001000
#define VIS_Q_CRITERION   0b00010000
#define VIS_PHI_RASTERIZE 0b00100000
#define VIS_PHI_RAYTRACE  0b01000000
#define VIS_PARTICLES     0b10000000

#if defined(FP16S) || defined(FP16C)
#define fpxx ushort
#else // FP32
#define fpxx float
#endif // FP32

#ifdef BENCHMARK
#undef UPDATE_FIELDS
#undef VOLUME_FORCE
#undef FORCE_FIELD
#undef MOVING_BOUNDARIES
#undef EQUILIBRIUM_BOUNDARIES
#undef SURFACE
#undef TEMPERATURE
#undef SUBGRID
#undef PARTICLES
#undef INTERACTIVE_GRAPHICS
#undef INTERACTIVE_GRAPHICS_ASCII
#undef GRAPHICS
#endif // BENCHMARK

#ifdef SURFACE // (rho, u) need to be updated exactly every LBM step
#define UPDATE_FIELDS // update (rho, u, T) in every LBM step
#endif // SURFACE

#ifdef TEMPERATURE
#define VOLUME_FORCE
#endif // TEMPERATURE

#ifdef PARTICLES // (rho, u) need to be updated exactly every LBM step
#define UPDATE_FIELDS // update (rho, u, T) in every LBM step
#endif // PARTICLES

#if defined(INTERACTIVE_GRAPHICS) || defined(INTERACTIVE_GRAPHICS_ASCII)
#define GRAPHICS
#define UPDATE_FIELDS // to prevent flickering artifacts in interactive graphics
#endif // INTERACTIVE_GRAPHICS || INTERACTIVE_GRAPHICS_ASCII

I am also using a custom STL file and can provide it if you need it.
Thank you in advance for your help.

@ProjectPhysX
Copy link
Owner

Hi @zrosol232,

what does the console window (switch to it with Alt+Tab) say in the failed case?

I assume you have the A770 8GB model? Could it be an issue with getting too close to VRAM capacity limit? A laptop discrete GPU usually has the entire VRAM free for applications since the integrated GPU handles Windows overhead, but a dedicated desktop GPU usually has a few hundred MB VRAM occupied by Windows.

Can you test with

const uint3 lbm_N = resolution(float3(1.0f, 2.0f, 0.5f), 6000u);

to see if that still works, and then increase to maybe 7500?

Kind regards,
Moritz

@zrosol232
Copy link
Author

Hi @zrosol232,

what does the console window (switch to it with Alt+Tab) say in the failed case?

I assume you have the A770 8GB model? Could it be an issue with getting too close to VRAM capacity limit? A laptop discrete GPU usually has the entire VRAM free for applications since the integrated GPU handles Windows overhead, but a dedicated desktop GPU usually has a few hundred MB VRAM occupied by Windows.

Can you test with

const uint3 lbm_N = resolution(float3(1.0f, 2.0f, 0.5f), 6000u);

to see if that still works, and then increase to maybe 7500?

Kind regards, Moritz

I have the Arc A770 16 GB mode, and have attached the console output of what it does. The code attempts to load the stl file, then the program becomes unresponsive, along with the userspace, before the screen goes black and then comes back. The program does not render, but it will still accept commands from the keyboard.
I've attached a console output showing the process when the program hard locks, and it doesn't throw any error codes.
fluidx3d_error
This issue persists regardless with 6000u or 4000u with the resolution function call.

Thanks again!

@zrosol232 zrosol232 reopened this Nov 20, 2023
@ProjectPhysX
Copy link
Owner

Hi @zrosol232,

this seems a problem with the .stl voxelization kernel. Can you attach Full_Pack_21700_Rev2_SMALL.STL so I can reproduce?

Thanks and kind regards,
Moritz

@zrosol232
Copy link
Author

Full_Pack_21700_Rev2_SMALL.zip
Here is the file. I've also attached the section of code of how I load the file and orient it.
float3x3 rotation = float3x3( float3(0, 0, 1), radians(90.0f)) * float3x3(float3(1, 0, 0), radians(270.0f)); // rotation around the z-axis by 180°, then around the x-axis by 90° const std::string stlPath = get_exe_path() + "../stl/Full_Pack_21700_Rev2_SMALL.STL"; Mesh* mesh = read_stl(stlPath, lbm.size(), lbm.center(), rotation, lbm_length); mesh->translate(float3(0.0f, 1.0f - mesh->pmin.y + 0.1f * lbm_length, 2.0f - mesh->pmin.z)); // move mesh forward a bit and to simulation box bottom, keep in mind 1 cell thick box boundaries //mesh->translate(float3(0.0f, 1.0f - mesh->pmin.y + 0.1f * lbm_length, 1.0f - mesh->pmin.z)); lbm.voxelize_mesh_on_device(mesh);

@ProjectPhysX
Copy link
Owner

ProjectPhysX commented Nov 26, 2023

Hi @zrosol232, this is strange, it works for me with 7GB VRAM occupation.
EDIT: I tried again with the newer 4953 drivers, and it also works.
image

@zrosol232
Copy link
Author

I wonder if the issue is a hardware problem with my Arc GPU. I do get Windows Kernel Error codes when I attempt to run this simulation (specifically a Windows Live Kernel Event with the error codes 141 and 117), so based on your ability to run the simulation, I'm afraid it's something on my end.

@ProjectPhysX
Copy link
Owner

@zrosol232 driver issues are much more likely. Arc is very picky when any components from old drivers are still present. Try removing all graphics drivers with Display Driver Uninstaller, Intel drivers and also Nvidia/AMD drivers if you had used such a card before, and do a fresh reinstall of Arc drivers.

@ProjectPhysX ProjectPhysX added the compiler/driver/OS issue Issues with the C++ compiler, the OpenCL driver of the device vendor, or the operating system label Nov 27, 2023
@ProjectPhysX
Copy link
Owner

ProjectPhysX commented Apr 19, 2024

Hi @zrosol232,

I've fixed a bug in the voxelization kernel recently (PR #101), which was an out-of-bounds register access that could potentially cause crashes or freezing. Although Nvidia and Arc GPU drivers in my testing had no issues with the bug in place, it made the the OpenCL CPU runtime fail and could possibly have been the root cause of your issue too.

Could you please check if your issue is still present with current master branch code?

Thanks and kind regards,
Moritz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/driver/OS issue Issues with the C++ compiler, the OpenCL driver of the device vendor, or the operating system
Projects
None yet
Development

No branches or pull requests

2 participants