Optimizing RAM on a Jetson Orin Nano Super for Local LLM Inference
The Jetson Orin Nano Super ships with 8GB of unified LPDDR5 memory shared between CPU and GPU. That sounds generous until you factor in the operating system, background services, and the model weights you actually want to run. A stock JetPack installation quietly consumes 1.5 to 2GB before your workload even starts. Here is how to reclaim most of it with three targeted changes.
Why Unified Memory Changes the Equation
On a conventional PC, system RAM and GPU VRAM are separate pools. On the Jetson, every process (the OS, your application, and the GPU inference engine) draws from the same 8GB. This means a desktop environment sitting idle is not just a cosmetic waste: it is directly competing with the GPU for the memory bandwidth your model needs.
The three optimizations below are independent of whatever workload you run. They apply to any Jetson Orin Nano Super used as a dedicated inference node.
1. Disable the Desktop GUI (recover ~800MB)
A fresh JetPack installation boots into a full Ubuntu desktop environment (GNOME with all its services). If your Jetson is headless or serves a browser-based interface, that desktop costs roughly 800MB for nothing.
sudo systemctl set-default multi-user.target
sudo systemctl isolate multi-user.target
The first command makes headless mode permanent across reboots. The second applies it immediately without a restart. To revert at any point, replace multi-user.target with graphical.target.
2. Disable nvargus-daemon
JetPack ships with nvargus-daemon, the ISP and camera stack for Jetson's sensor interface. If your project does not use a camera, this service runs in the background consuming resources for no reason.
sudo systemctl disable nvargus-daemon --now
The --now flag stops it immediately in addition to disabling it on future boots. The savings are modest (a few dozen MB) but it is a zero-risk change.
3. Add a Disk-Backed Swapfile (the ZRAM trap)
A stock JetPack already has swap enabled. Running swapon --show reveals something like this:
NAME TYPE SIZE USED PRIO
/dev/zram0 partition 635M 0B 100
/dev/zram1 partition 635M 0B 100
...
Six ZRAM devices for roughly 3.8GB of swap total. ZRAM compresses memory pages in-place rather than writing them to disk. It is fast and useful for general OS pressure, but it has a fundamental limitation for LLM workloads: it is still backed by RAM. When Ollama loads a model that barely fits in 8GB and needs to spill memory somewhere, ZRAM offers no real escape hatch.
A disk-backed swapfile on the NVMe is what provides genuine overflow capacity. First, check your current storage layout:
lsblk -o NAME,SIZE,MOUNTPOINT,FSTYPE
The Jetson Orin Nano Super boots from a microSD card by default. The NVMe, if present, will appear as nvme0n1 with no mountpoint unless you explicitly mounted it or migrated your root filesystem to it (as covered in the NVMe migration guide). Depending on your setup, you have two cases.
If the NVMe is your root disk (/ mounted on nvme0n1p1), place the swapfile under /var/swap directly:
SWAP_PATH=/var/swap
sudo fallocate -l 8G $SWAP_PATH
sudo chmod 600 $SWAP_PATH
sudo mkswap $SWAP_PATH
sudo swapon $SWAP_PATH
echo "$SWAP_PATH none swap sw 0 0" | sudo tee -a /etc/fstab
If the NVMe is a separate, unmounted drive, mount it first:
sudo mkdir -p /mnt/nvme
sudo mount /dev/nvme0n1p1 /mnt/nvme
SWAP_PATH=/mnt/nvme/swapfile
Then run the same fallocate block above. Add the mount itself to /etc/fstab as well so it persists across reboots.
After this, ZRAM and the NVMe swapfile coexist. Linux assigns higher priority to ZRAM (it is faster) and falls back to the NVMe file under real memory pressure. On a 232GB NVMe, 8GB of swapfile costs essentially nothing.