Deploy Linux workstations for 3D visualization in Azure

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

If you need a remote workstation with graphical acceleration for 3D visualization, Azure has several options available. From the original NV series, with the NVIDIA Tesla M60, to the fifth generation of the family with the NVads A10 v5 series based on the NVIDIA A10 cards. This series is the first one to introduce support for the use of partitioned NVIDIA GPUs with a minimum of 1/6 of the GPU resources in the smaller version with the Standard_NV6ads_A10_v5, up to a maximum of 2 full GPUs per virtual machine in the Standard_NV72ads_A10_v5.

In addition, this new generation is based on the latest AMD EPYC 74F3V (Milan) processors with a base frequency of 3.2 GHz and a peak of 4.0 GHz. This hardware configuration provides one of the best options to cover both the most basic needs of visualization and the most demanding ones.

If you need to set up a Linux environment with Nvads A10 v5 series, this article guides you step by step. The configuration is based on CentOS 7.9 as an operating system, it uses driver version 510.73 due to the requirements imposed by GRID version 14.1, and provides remote access via TurboVNC along with VirtualGL for 3D acceleration.

The URN of the exact image used is "OpenLogic:CentOS:7_9-gen2:latest". It is important to keep this in mind since there are multiple variants available in the Marketplace at this time. This guide is based on the configuration scripts used by Azure HPC On-Demand Platform but with updated drivers and software versions.

Preparing the operating system

First step will focus on updating the base image available in Azure and installing the basic dependencies: Linux kernel headers and Dynamic Kernel Module (DKMS) support. These packages are used by NVIDIA drivers to generate the necessary module and load it without having to modify the kernel. The kernel version used is 3.10.0-1160.76.1

sudo yum update -y
sudo yum install -y kernel-devel
#DKMS is only available on Fedora EPEL repos.
sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
sudo yum install -y dkms
sudo reboot

This restart is important to allow the operating system to apply the changes after the update and avoid later error. For example, the NVIDIA installer will not be able to find the kernel headers correctly automatically and you would need to manually specify them.

Installing NVIDIA GRID drivers

Since we are going to use NVIDIA's proprietary drivers, we need to prevent the kernel from loading the open source Nouveau drivers. It is possible to run the following as root or edit the file directly with your preferred text editor (i.e. nano, vim, etc.).

sudo su -
cat <<EOF >/etc/modprobe.d/nouveau.conf
Blacklist Nouveau
LBM-Nouveau Blacklist
EOF
exit

After that, we would continue installing the NVIDIA GRID drivers. It is very important to make use of the installer provided directly by Microsoft instead of those available on the NVIDIA website. Microsoft’s version already includes the required GRID licensing to be used in Azure configured.

If you use NVIDIA's own drivers you will have to configure a licensing server and acquire the corresponding licenses, something that does not make sense since they are already included in the price of the virtual machine.

wget -O NVIDIA-Linux-x86_64-grid.run https://download.microsoft.com/download/6/2/5/625e22a0-34ea-4d03-8738-a639acebc15e/NVIDIA-Linux-x86_64-510.73.08-grid-azure.run
chmod +x NVIDIA-Linux-x86_64-grid.run
sudo ./NVIDIA-Linux-x86_64-grid.run -s

Once successfully installed, NVIDIA GRID settings need to be modified. To do this we will use the sample file provided by NVIDIA.

sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf

The following changes will need to be made:

Comment on the FeatureType section as it is not required in this customized version of the drivers in Azure
Disable the licensing interface in nvidia-settings with EnableUI=FALSE as it is automatically managed in Azure.
Add IgnoreSP=FALSE as reflected in the official Azure documentation

sudo su -
cat <<EOF >>/etc/nvidia/gridd.conf
IgnoreSP=FALSE
EnableUI=FALSE
EOF
sed -i '/FeatureType=0/d' /etc/nvidia/gridd.conf
reboot

After rebooting, the kernel would use the newly installed drivers and we could check that the card is correctly configured.

nvidia-smi

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.73.08 driver version: 510.73.08 CUDA version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0 NVIDIA A10-4Q On | 0000E7AB:00:00.0 Off |                    0 |
| N/A N/A P8 N/A / N/A |      0MiB / 4096MiB |      0% Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
|  GPU GI CI PID Type Process name GPU Memory ||        ID ID Usage |
|=============================================================================|
|  No running processes found |
+-----------------------------------------------------------------------------+

Installing VNC Remote Access with TurboVNC and VirtualGL

Linux images in the Azure marketplace do not come by default with a graphical environment. It would be required to install both the X.org window manager and a desktop environment. In that case, we will use Xfce due to its low resource consumption, ideal for a remote work environment in the cloud.

sudo yum groupinstall -y "X Window system"
sudo yum groupinstall -y xfce

Once the graphical environment is installed, the next step will be to configure VNC access. We will use TurboVNC, an optimized VNC server and client for video and 3D environments. Its integration with VirtualGL allows us to have a robust and high-performance solution for this type of applications on any type of network.

sudo yum install -y https://jztkft.dl.sourceforge.net/project/turbovnc/3.0.1/turbovnc-3.0.1.x86_64.rpm
sudo wget --no-check-certificate "https://virtualgl.com/pmwiki/uploads/Downloads/VirtualGL.repo" -O /etc/yum.repos.d/VirtualGL.repo
sudo yum install -y VirtualGL turbojpeg xorg-x11-apps

To make sure that permission are correctly applied when configuring VirtualGL, it is necessary to stop the window manager and offload the kernel modules. If not, the setup wizard will notify you that changes won't take effect until you do it.

sudo service gdm stop
sudo rmmod nvidia_drm nvidia_modeset NVIDIA
sudo /usr/bin/vglserver_config -config +s +f -t
sudo service gdm start

After that, we would configure systemd to boot in graphical mode by default and, to avoid a restart, we start it directly in the current session.

sudo systemctl set-default graphical.target
sudo systemctl isolate graphical.target

The last step is to indicate what software we want to run when we establish a new connection through TurboVNC. In our case, we want a new Xfce desktop session to start working on our workstation.

cd $HOME
echo "xfce4-session" > ~/. Xclients
chmod a+x ~/. Xclients

All server side configuration has been completed. Next step is installing TurboVNC client on your local machine and connect to the IP or DNS associated with your virtual machine deployed in Azure. Make sure that your Network Security Groups applied to the subnet or VM network interface card are properly configured to gran you access.

You should see something similar to the following snapshot. Congratulations, your Linux workstation for 3D visualization is already configured. Next step will be to install the necessary applications for your scenario and make sure to execute them with VirtualGL.

Recommended extra configuration

PCI Bus Update

If the virtual machine is restarted or redeployed to another host, PCI Hus identifier may vary. This will cause our graphics environment to not work properly because it can’t find the card. To avoid this situation, it is recommended to configure the following script that adjusts the BusPCI settings each time the virtual machine starts.

sudo su -
cat <<EOF >/etc/rc.d/rc3.d/busidupdate.sh
#!/bin/bash
BUSID=\$(nvidia-xconfig --query-gpu-info | awk '/PCI BusID/{print \$4}')
nvidia-xconfig --enable-all-gpus --allow-empty-initial-configuration -c /etc/X11/xorg.conf --virtual=1920x1200 --busid \$BUSID -s
sed -i '/BusID/a\ Option "HardDPMS" "false"' /etc/X11/xorg.conf
EOF
chmod +x /etc/rc.d/rc3.d/busidupdate.sh
/etc/rc.d/rc3.d/busidupdate.sh
exit

Create a vglrun alias

3D acceleration can be configured at the graphical environment level or at the application level. Using Xfce as a desktop environment does not require the first option, and we can dedicate all the resources of the GPU for our applications.

To ensure that applications make use of acceleration, it is necessary to execute them through the vglrun command. To make the process easier and make sure we use all the GPUs available on the node, this script generates an alias with the necessary configuration. To start an application, append vglrun at the beginning of the command and that’s all.

sudo su -
cat <<EOF >/etc/profile.d/vglrun.sh
#!/bin/bash
ngpu=\$(/usr/sbin/lspci | grep NVIDIA | wc -l)
alias vglrun='/usr/bin/vglrun -d :0.\$(( \${port:-0} % \${ngpu:-1}))'
EOF
exit

Increase the size of network buffers

The default Linux network device configuration may not provide optimal throughput (bandwidth) and latency for parallel work scenarios. That is why it is advisable to increase the size of the write and read buffers at the operating system level.

sudo su -
cat << EOF >>/etc/sysctl.conf
net.core.rmem_max=2097152
net.core.wmem_max=2097152
EOF
exit

Leave a Reply Cancel reply