Optimizing RAM on a Jetson Orin Nano Super for Local LLM Inference

Optimizing RAM on a Jetson Orin Nano Super for Local LLM Inference
Photo by Tyler Daviaux / Unsplash

The Jetson Orin Nano Super ships with 8GB of unified LPDDR5 memory shared between CPU and GPU. That sounds generous until you factor in the operating system, background services, and the model weights you actually want to run. A stock JetPack installation quietly consumes 1.5 to 2GB before your workload even starts. Here is how to reclaim most of it with three targeted changes.


Why Unified Memory Changes the Equation

On a conventional PC, system RAM and GPU VRAM are separate pools. On the Jetson, every process (the OS, your application, and the GPU inference engine) draws from the same 8GB. This means a desktop environment sitting idle is not just a cosmetic waste: it is directly competing with the GPU for the memory bandwidth your model needs.

The three optimizations below are independent of whatever workload you run. They apply to any Jetson Orin Nano Super used as a dedicated inference node.


1. Disable the Desktop GUI (recover ~800MB)

A fresh JetPack installation boots into a full Ubuntu desktop environment (GNOME with all its services). If your Jetson is headless or serves a browser-based interface, that desktop costs roughly 800MB for nothing.

sudo systemctl set-default multi-user.target
sudo systemctl isolate multi-user.target

The first command makes headless mode permanent across reboots. The second applies it immediately without a restart. To revert at any point, replace multi-user.target with graphical.target.


2. Disable nvargus-daemon

JetPack ships with nvargus-daemon, the ISP and camera stack for Jetson's sensor interface. If your project does not use a camera, this service runs in the background consuming resources for no reason.

sudo systemctl disable nvargus-daemon --now

The --now flag stops it immediately in addition to disabling it on future boots. The savings are modest (a few dozen MB) but it is a zero-risk change.


3. Add a Disk-Backed Swapfile (the ZRAM trap)

A stock JetPack already has swap enabled. Running swapon --show reveals something like this:

NAME       TYPE      SIZE  USED PRIO
/dev/zram0 partition 635M    0B  100
/dev/zram1 partition 635M    0B  100
...

Six ZRAM devices for roughly 3.8GB of swap total. ZRAM compresses memory pages in-place rather than writing them to disk. It is fast and useful for general OS pressure, but it has a fundamental limitation for LLM workloads: it is still backed by RAM. When Ollama loads a model that barely fits in 8GB and needs to spill memory somewhere, ZRAM offers no real escape hatch.

A disk-backed swapfile on the NVMe is what provides genuine overflow capacity. First, check your current storage layout:

lsblk -o NAME,SIZE,MOUNTPOINT,FSTYPE

The Jetson Orin Nano Super boots from a microSD card by default. The NVMe, if present, will appear as nvme0n1 with no mountpoint unless you explicitly mounted it or migrated your root filesystem to it (as covered in the NVMe migration guide). Depending on your setup, you have two cases.

If the NVMe is your root disk (/ mounted on nvme0n1p1), place the swapfile under /var/swap directly:

SWAP_PATH=/var/swap

sudo fallocate -l 8G $SWAP_PATH
sudo chmod 600 $SWAP_PATH
sudo mkswap $SWAP_PATH
sudo swapon $SWAP_PATH

echo "$SWAP_PATH none swap sw 0 0" | sudo tee -a /etc/fstab

If the NVMe is a separate, unmounted drive, mount it first:

sudo mkdir -p /mnt/nvme
sudo mount /dev/nvme0n1p1 /mnt/nvme
SWAP_PATH=/mnt/nvme/swapfile

Then run the same fallocate block above. Add the mount itself to /etc/fstab as well so it persists across reboots.

After this, ZRAM and the NVMe swapfile coexist. Linux assigns higher priority to ZRAM (it is faster) and falls back to the NVMe file under real memory pressure. On a 232GB NVMe, 8GB of swapfile costs essentially nothing.

Source