When you upload a photo to most online background removal services, your image travels to a remote server, gets processed by a GPU cluster somewhere in the cloud, and a result gets sent back. In those few seconds, your photo has left your device, touched at least one server, and possibly been stored temporarily (or permanently).
This tool works differently. When you drop an image in, the AI runs directly inside your browser tab. The image never leaves your device. Here's how that's actually possible — explained without the jargon.
The AI model: U²-Net
The brain of this tool is a neural network called U²-Net (specifically the lighter u2netp variant). It was designed by the University of Waterloo specifically for salient object detection — in plain English, "find the main thing in this image and separate it from the background."
U²-Net outputs a saliency map: a grayscale image the same size as the input where bright pixels = foreground (keep) and dark pixels = background (remove). This map is the "mask" that gets applied to your photo.
The model is about 4.7 MB in its compressed ONNX format. That's small enough to download in a few seconds and run on consumer hardware.
ONNX Runtime: running the model in a browser
Neural networks are normally run with frameworks like PyTorch or TensorFlow, which require Python and a full server environment. To run them in a browser, we use a format called ONNX (Open Neural Network Exchange) and a runtime library called ONNX Runtime Web.
ONNX is a standard file format that represents neural networks in a way that any compatible runtime can execute. ONNX Runtime Web is a JavaScript library that loads ONNX models and runs them. It's maintained by Microsoft and widely used in production.
ONNX Runtime Web has two execution backends:
- WebAssembly (WASM) — works in every modern browser. WebAssembly is a low-level binary instruction format that runs in a sandboxed environment inside your browser — think of it as a safe, fast virtual machine. WASM enables near-native performance for compute-heavy tasks.
- WebGPU — uses your GPU directly via the browser's WebGPU API (similar to Vulkan/Metal/D3D12 but web-safe). This is 5–10x faster than WASM for neural network inference on compatible hardware. Chrome enables it automatically on supported devices.
When you first process an image, the tool detects whether WebGPU is available. If it is, inference runs on your GPU. If not, it falls back to multi-threaded WASM on your CPU. You'll see which backend is being used in the result view.
What happens step by step
Here's the full pipeline for a single image:
1. Image loading
Your image file is read using the createImageBitmap() browser API. No network request is made. The pixel data stays in memory on your device.
2. Downscaling for inference
U²-Net was trained on 320×320 images. The tool allows inference up to 1024px on the longest edge — beyond that, larger images are temporarily downscaled. This avoids exhausting device memory while preserving enough detail for accurate segmentation.
3. Normalization
The pixel values are normalized from [0–255] to the float range the model expects, matching the statistics of the training data. This runs in a dedicated Web Worker (background thread) so the UI stays responsive.
4. Neural network inference
The normalized image tensor is passed through U²-Net's 44 layers. This is where WebGPU shines — the convolutions and matrix multiplications that make up the network are executed in parallel on your GPU's shader cores. On WASM, they run across your CPU cores using shared memory threads.
5. Mask output and upscaling
The model outputs a 1-channel float mask at inference resolution. If the input was downscaled, the mask is bilinearly upscaled back to the original image resolution using a high-quality interpolation pass on OffscreenCanvas.
6. Compositing
The mask is applied to the original (full-resolution) image pixel by pixel. Optional adjustments — edge feathering (box blur approximation of Gaussian), threshold clamping — are also applied here. The result is rendered to a second OffscreenCanvas, which you can download or copy to clipboard.
Why this is private — technically
Every step above happens in the browser's JavaScript sandbox or in the GPU's local memory. The browser's security model prevents JavaScript from making unauthorized network requests — any outgoing request requires the JavaScript code itself to call fetch() or XMLHttpRequest(). There are no such calls in the image processing pipeline.
The only network requests this site makes are:
- Loading the page HTML, CSS, and JavaScript (static files on Cloudflare CDN)
- Downloading the ONNX model file (~5 MB) once, on first use
- Downloading the ONNX Runtime WASM files once
Your image data is never in any of those requests. Once the model is cached (it is, after first use), the tool works entirely offline.
Skeptical? Open your browser's Network tab (F12 → Network) before uploading an image. After you click upload and processing completes, you'll see no outgoing requests for your image. Zero bytes sent.
The worker architecture
The ONNX model runs in a Web Worker — a background thread that runs JavaScript in isolation from the main browser tab. This has two benefits:
- Responsiveness — the UI doesn't freeze while inference runs. You can still interact with the page.
- Memory isolation — the worker has its own memory space. When inference completes, the image tensor data can be garbage collected without affecting the main tab.
Image pixel data is transferred to the worker via a transferable ArrayBuffer — a zero-copy transfer that avoids duplicating the memory. The worker posts the mask data back to the main thread the same way.
Performance expectations
Inference time depends on your device:
- Chrome with WebGPU (modern laptop/desktop GPU): 0.5–2 seconds per image
- Chrome/Firefox with WASM (CPU fallback): 3–8 seconds per image
- Mobile browsers: 5–15 seconds (WebGPU support is still rolling out on mobile)
After the first run, the model is cached in the browser's Cache Storage API (same as service worker cache). Subsequent uses are fast — even offline.
Limitations
Running on-device has some trade-offs:
- Model size vs. quality — we use u2netp, the lighter variant. The full u2net model is 176 MB and impractical to ship over the web. u2netp is 4.7 MB and handles most cases very well.
- No cloud-scale compute — cloud services can use massive GPU clusters for high-quality inpainting and refinement. On-device is more resource-constrained.
- Complex edges — hair, fur, and transparent materials are harder than they'd be for a server-side SAM2 or similar model. Edge Feather helps, but there are limits.
For everyday use cases — product photos, portrait backgrounds, profile pictures — the quality is genuinely excellent. And the privacy and speed benefits of fully client-side processing are hard to beat.
Try it yourself
Now that you know how it works, try the AI Background Remover. For marketplace product photos with white backgrounds, see the Product Photo Background tool. For processing multiple images at once, try the Batch Background Remover.