How to Run Llama 3.1 Locally and Enable Remote Access -

Introduction:

Llama 3.1 is the latest, state-of-the-art open-source Large Language Model (LLM) family released by Meta. The lineup includes 8B (8 billion parameters), 70B (70 billion parameters), and the massive 405B (405 billion parameters) model—the largest one Meta has ever released.

Hardware Requirements: Please verify your hardware before starting to avoid wasting time.

Windows: NVIDIA RTX 3060 or better (8GB+ VRAM) + 16GB RAM, with at least 20GB of free disk space.

Mac: M1 or M2 chip with 16GB RAM and 20GB+ of disk space.

GPU requirements per model:

llama3.1-8b: At least 8GB of VRAM.
llama3.1-70b: Approximately 70-75 GB of VRAM.
llama3.1-405b: Requires significant VRAM and resources, at least 400-450 GB of VRAM. Proceed with caution.

If your rig meets these requirements, let’s get started!

1. Downloading Ollama

Ollama is an open-source tool designed to manage LLMs locally, handling everything from deployment to monitoring. It simplifies the local management of models and integrates well with frameworks like TensorFlow and PyTorch. [Download from the official website] Choose the version that matches your OS.

2. Installing and Running Ollama

Run the installer (default installation is on your C: drive). Once finished, open Windows PowerShell or CMD and type ollama to see the help menu, confirming a successful installation.

3. Downloading the Llama 3.1 Model

In your terminal, run the following command:

ollama run llama3.1:8b

If you have high-end hardware, you can also pull the 70B or 405B models:

ollama run llama3.1:70b 
ollama run llama3.1:405b

Wait for the download to complete, and you’ll be dropped into a chat session to test the model.

4. Configuring Remote Access

By default, Ollama listens on http://127.0.0.1:11434. To access it remotely, you need to set the OLLAMA_HOST environment variable.

Variable	Value	Description
OLLAMA_HOST	0.0.0.0:8888	Configures the listening IP and port
OLLAMA_ORIGINS	*	Enables CORS; specific domains can also be listed
OLLAMA_MODELS	C:\Users\Administrator\.ollama	Redirect model storage to a different drive

How to set environment variables on Windows:

1. Quit the Ollama process entirely.

2. Right-click This PC > Properties > Advanced system settings > Environment Variables > User variables for Administrator > New. Add the three variables mentioned above.

Windows environment variable settings dialog

3. Restart the Ollama service.

4. You can now connect using a web-based UI. We highly recommend Open WebUI or LobeChat.

Open WebUI

GitHub: https://github.com/open-webui/open-webui
Documentation: https://docs.openwebui.com/

LobeChat

GitHub: https://github.com/lobehub/lobe-chat
Documentation: https://lobehub.com/zh/docs/self-hosting/start

Conclusion:

That’s it! While it looks like a lot of steps, it’s actually quite straightforward. I hope this guide helps you get your own local AI up and running!

Share this post:

X (Twitter) LinkedIn Reddit