Frustrated with cloud limits and recurring API fees for DeepSeek-R1? You’re not alone. Many developers, AI users, and researchers want full control over their data and workflow. The solution? Running DeepSeek-R1 locally. It’s faster, keeps your data private, and lets you customize the model however you need.
This guide covers four ways to set up DeepSeek-R1 on your system, whether you’re using macOS, Windows (via WSL 2), or Linux.
Each method suits different skill levels and hardware setups. If you want a simple installation, Ollama is a great choice. For full customization, the Python method is ideal. Docker offers a clean, containerized setup, while llama.cpp is best for optimizing performance on lower-end machines.
Table Of Contents 👉
- Why Run DeepSeek R1 Locally?
- Prerequisites
- 4 Ways to Install And Run DeepSeek-R1 Locally
- How to Install DeepSeek-R1 Locally Using Ollama
- How to Install and Run DeepSeek-R1 Using Python & Hugging Face
- How To Install and Run DeepSeek-R1 Locally Using Docker
- How To Install and Run DeepSeek-R1 Locally Using LLama.cpp
Why Run DeepSeek R1 Locally?
Before exploring each installation method, let’s first understand why running DeepSeek R1 locally is becoming more popular.
- Data Privacy & Security: Running DeepSeek R1 on your own device keeps sensitive data under your control. This is especially important for industries that require strict confidentiality or for anyone who prefers to avoid third-party servers.
- Offline Access: Need to work without the internet? Running DeepSeek R1 locally lets you stay productive anywhere, whether you’re traveling or dealing with an unreliable connection.
- Customization & Flexibility: Running DeepSeek R1 locally lets you tweak, optimize, and fine-tune it for your projects. You can adjust settings and integrate it with local apps without being restricted by external platform limitations.
- Performance & Speed: While cloud services are fast, using your own hardware offers even better satisfaction. With a strong CPU or GPU, you can enjoy faster performance and lower latency for quick results.
- Cost Efficiency: Nobody likes paying high subscription fees. By running DeepSeek-R1 locally, you avoid ongoing costs, which is especially beneficial as your workload increases. It’s a one-time setup that can grow with you without the constant financial burden.
- Experimentation & Development: Local installations are perfect for developers who love experimenting. You can test different model versions, revert changes, and prototype new features freely, without the concern of external usage limits.
- Enhanced Security for Sensitive Applications: For industries like healthcare or finance with strict regulations, hosting DeepSeek-R1 locally is crucial. It keeps your data behind your firewall, giving you full control over how it’s handled and ensuring compliance with security standards.
Prerequisites
Before you start, ensure your system meets these requirements:
Operating System:
- macOS (Intel or Apple Silicon)
- Linux (x86_64 or ARM64, e.g., Ubuntu 24.04)
- Windows (via Windows Subsystem for Linux [WSL 2])
Hardware:
- At least 8GB of RAM (though 16GB or more is recommended).
- 10GB+ of free storage space.
- A compatible GPU (optional but highly recommended for faster inference).
Software:
- Terminal access (e.g., zsh or bash on macOS/Linux, Command Prompt/PowerShell on Windows via WSL).
- Basic tools like Python 3.10+, pip, and git.
Once you’ve checked all these boxes, you’re good to go!
4 Ways to Install And Run DeepSeek-R1 Locally

Now that we’ve discussed the reasons and requirements, let’s check out the installation methods. Each option has its own advantages and challenges, so choose the one that fits your skill level or project needs best.
1. Ollama — Easiest for Most Users
If you’re looking for a hassle-free setup, Ollama is a great choice. It takes care of most of the setup for you, making it easy to use. However, this convenience comes with limited customization, so if you want more control, keep reading for the other methods.
2. Python (Hugging Face Integration) — Maximum Flexibility
If you’re someone who loves controlling every aspect of a machine learning pipeline, the Python route is your playground.
By installing DeepSeek-R1 through Python—often with the help of Hugging Face—you can tweak parameters, fine-tune the model, or integrate it into larger applications. Just be prepared for a bit more hands-on configuration.
3. Docker — Consistent and Reliable
Docker provides a clean way to containerize your DeepSeek-R1 setup. It keeps your dependencies and runtime consistent across any OS.
If you value reproducible builds and want to avoid “it worked on my machine” issues, Docker is a great option. Plus, you can easily start or stop the container whenever you need.
4. llama.cpp — Lightweight Optimization
Last but certainly not least, llama.cpp provides an optimized way to run language models on CPU (and some GPU configurations).
It might not be as beginner-friendly, but it’s known for running efficiently, especially on systems with modest hardware.
If you’re looking to push the performance envelope or run DeepSeek-R1 on a wider range of devices, llama.cpp is worth exploring.
Posts You May Like:
How to Install DeepSeek-R1 Locally Using Ollama
Running DeepSeek-R1 on your local machine can feel a bit like setting up your own mini AI lab—exciting and empowering.
Whether you’re using macOS, Linux, or even Windows via WSL, this guide will walk you through the process using Ollama, a tool that makes managing large language models (LLMs) straightforward and efficient.
Step 1: Installing Ollama
To get started with DeepSeek-R1, ensure that Ollama is set up and operational. Here’s how to proceed:
For macOS Users
- Download the App: Visit Ollama.com and grab the macOS installer.
- Install the Application: Simply drag the Ollama icon into your Applications folder.
- Launch the App: Open Ollama to automatically start the background service.
For Linux/Ubuntu 24.04 and WSL (Windows)
- Run the Installation Script: Open your terminal and execute:
curl -fsSL https://ollama.ai/install.sh | sh
- Start the Service: After installation, kick things off by running:
ollama serve
Verifying Your Ollama Installation
To make sure everything is set up correctly, run: ollama --version
If the installation was successful, you’ll see a version number (for example, ollama version 0.1.25
), confirming that Ollama is ready to use.
Step 2: Downloading and Installing DeepSeek R1
DeepSeek R1 might not be immediately visible in Ollama’s default library. Depending on its availability, you can choose one of two methods to bring it on board.
Method 1: Pulling Directly from Ollama (If Available)
- Check the Model List: Type in:
ollama list
This command will show you all models currently available.
- Pull the Model: If you spot DeepSeek-R1 in the list, simply execute:
ollama pull deepseek-r1
Note: Since these language models can be several gigabytes in size, be patient—your download time will depend largely on your internet speed.
Method 2: Manual Setup Using a Modelfile
If DeepSeek R1 isn’t listed, you can manually install it using these steps:
- Download the Model File: Obtain the DeepSeek-R1 model in GGUF format (for example,
deepseek-r1.Q4_K_M.gguf
) from trusted sources like Hugging Face or the official DeepSeek repository. Save this file in a dedicated folder (e.g.,~/models
).
- Create a Modelfile: Inside the same folder, create a file named
Modelfile
with the following content:FROM ./deepseek-r1.Q4_K_M.gguf
(Make sure to replace the filename with your actual file name if it differs.)
- Build the Model in Ollama: Run this command to create your DeepSeek-R1 model:
ollama create deepseek-r1 -f Modelfile
Step 3: Running DeepSeek-R1
Now comes the fun part—interacting with your freshly installed model!
- Start the Model: Launch DeepSeek-R1 by running:
ollama run deepseek-r1
- Test with a Prompt: At the prompt (indicated by
>>>
), try asking something like: Write a Python function to calculate Fibonacci numbers. This will not only test the model’s response but also give you a glimpse of its conversational capabilities.
Step 4: Verifying the Installation (Optional)
To double-check that DeepSeek-R1 is active: ollama list
If everything is in order, you should now see deepseek-r1
listed. Feel free to run a few sample prompts to evaluate both the speed and the quality of responses.
Step 5: Running DeepSeek in a Web UI
While interacting through the command line works well, some users prefer a more visual, user-friendly experience. Enter the Ollama Web UI—perfect for those who enjoy the convenience of a web browser interface.
Setting Up the Web UI
1. Create a Virtual Environment: This helps isolate your Python dependencies.
- First, install the virtual environment package if needed:
sudo apt install python3-venv
- Then, create and activate a new virtual environment:
python3 -m venv ~/open-webui-venv source ~/open-webui-venv/bin/activate
2. Install Open WebUI: Use pip to install: pip install open-webui
3. Start the Server: Launch the web UI server by running: open-webui serve
4. Access the Interface: Open your favorite browser and navigate to http://localhost:8080. You should now see a clean, interactive interface for managing your Ollama models, including DeepSeek-R1.
Troubleshooting Tips
Even the best-laid plans can hit a snag now and then. Here are a few tips to troubleshoot common issues:
- Model Not Found:
- Double-check that you’ve spelled the model name correctly.
- If DeepSeek-R1 isn’t available, try the manual GGUF setup.
- Browse Ollama’s Model Registry for alternatives like
deepseek-coder
.
- Performance Issues:
- Consider allocating more system memory (RAM/VRAM) if responses seem sluggish.
- Simplify your prompts for quicker results.
- WSL Errors (for Windows Users):
- Update WSL by running:
wsl --update
- Restart the Ollama service to see if that resolves the issue.
- Update WSL by running:
For any additional questions or the latest updates, always refer back to the Ollama Documentation or the official DeepSeek repository.
Posts You May Like:
How to Install and Run DeepSeek-R1 Using Python & Hugging Face
If you’re eager to explore DeepSeek-R1 and enjoy programming in Python, the Hugging Face Transformers library is a good option.
It gives you access to advanced features and lets you integrate top-tier models into your projects. Let’s go through the setup step by step.
Step 1: Installing Dependencies
Before you start working with the model, make sure your environment is ready. First, check that Python is installed on your machine.
Once that’s done, you’ll need some key libraries. Open your terminal or command prompt and run this command: pip install torch transformers accelerate
Here’s a quick rundown of what each package offers:
- Transformers: This library provides access of pre-trained models and equips you with all the essential tools to utilize them effectively.
- Accelerate: Particularly useful when dealing with larger models and GPU operations, this package helps streamline and optimize model execution.
- Torch: Provides the backbone for many machine learning operations, ensuring your computations run smoothly.
Step 2: Downloading the DeepSeek-R1 Model
The next step is to obtain DeepSeek R1, available on the Hugging Face Model Hub. To download the model, just clone its repository. Run this command in your terminal: git clone https://huggingface.co/deepseek-ai/deepseek-r1
This command downloads all the necessary files, preparing you to start coding immediately.
Step 3: Running Inference
Now comes the main part—making the model work! Start by creating a Python script (let’s call it inference.py
). Open your favorite code editor and insert the following code:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load the model and tokenizer from Hugging Face
model = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/deepseek-r1")
# Define your prompt
prompt = "Explain quantum computing in simple terms."
# Tokenize the prompt and prepare inputs for the model
inputs = tokenizer(prompt, return_tensors="pt")
# Generate the output with a maximum token limit
outputs = model.generate(**inputs, max_length=200)
# Decode and print the response
print(tokenizer.decode(outputs[0]))
A few things to note:
- Model and Tokenizer Loading: The script loads both the model and its tokenizer from the repository, ensuring they work hand-in-hand.
- Prompt Setup: The example prompt is simple yet effective—feel free to replace it with your own creative queries.
- Output Generation: By specifying a
max_length
of 200 tokens, the script ensures your output is detailed but doesn’t run away with the conversation.
Step 4: Executing the Script
Once your script is saved, it’s time to see DeepSeek R1 in action. Run the script using Python: python inference.py
Sit back and watch as the model processes your prompt and generates a response. It’s almost like having a mini AI assistant right on your local machine!
Troubleshooting Tips
As you get everything set up, you may face a few challenges along the way. Here are some common problems and their solutions:
- Out-of-Memory Errors: If you run into memory issues, try modifying the model loading by adding
device_map="auto"
like so:pythonCopymodel = AutoModelForCausalLM.from_pretrained("deepseek-ai/deepseek-r1", device_map="auto")
This helps distribute the model across available devices more efficiently.
- Slow Performance: For a noticeable speed boost, consider using quantization. You can load the model with quantization enabled, for example, using
load_in_4bit=True
if your setup supports it. This reduces memory usage and can speed up inference.
By following these steps, you’ll be ready to use DeepSeek-R1 with Python and Hugging Face.
This approach gives you control, flexibility, and speed, making it perfect for both quick tests and bigger projects.
Posts You May Like:
How To Install and Run DeepSeek-R1 Locally Using Docker
Imagine creating your own local AI workspace without all the installation hassles. With Docker, you can run DeepSeek R1 smoothly and consistently—no more tedious manual setups!
Step 1: Get Docker Installed
Before getting started, ensure that Docker and Docker Compose are installed on your system.
These tools make everything easier by packaging your application into a clean container. Here’s what you need to do:
- For Windows and macOS: Open the Docker’s official website and download Docker Desktop. It’s a straightforward install—just follow the on-screen instructions.
- For Linux (Ubuntu/Debian): Open your terminal and run:
sudo apt-get update && sudo apt-get install docker.io
This command updates your package list and installs Docker. Easy, right?
Step 2: Pull the DeepSeek Docker Image
With Docker set up, it’s time to get the DeepSeek image. This image is a pre-configured snapshot of DeepSeek, ready to run in a container.
Use this command (make sure to check DeepSeek’s latest documentation for any updates to the image name): docker pull deepseek/deepseek-llm:latest # Replace with the correct image name if needed
Be patient—the download might take a few minutes, especially if the image is large. A stable internet connection really helps here!
Step 3: Launch the DeepSeek Container
With the image safely downloaded, you can now start your DeepSeek container.
This command will run the container in the background, allocate the necessary resources, and set up the network port mapping so you can interact with it locally: docker run -d --name deepseek-container -p 8080:8080 deepseek/deepseek:latest
Let’s break that down:
-d
runs the container in detached mode (i.e., it runs in the background).--name deepseek-container
assigns a friendly name for easy reference.-p 8080:8080
maps port 8080 in the container to port 8080 on your machine, allowing you to access the service via your browser or API calls.
Step 4: Verify the Installation
After launching, it’s a good idea to double-check that everything’s running as expected. Use this command to see your container’s status: docker ps -a | grep deepseek-container
If your container is up and running, you’ll see it listed along with its details. It’s a simple check that gives you peace of mind.
Step 5: Interact with DeepSeek
It’s now time to enjoy the fun part of your installation — testing. You can send a quick API request to see if DeepSeek is ready to chat. Open your terminal and run:
curl -X POST http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, DeepSeek!", "max_tokens": 50}'
This command sends a POST request with a friendly prompt. The model will process it and respond with a generated completion. It’s like giving DeepSeek a quick hello and watching it reply!
Important Considerations
Before you get too comfortable, here are a few extra pointers to ensure a smooth experience:
- GPU Support: If you want to leverage GPU acceleration, make sure your system has the appropriate NVIDIA drivers installed, along with the NVIDIA Container Toolkit.
- Model Weights: Some DeepSeek models might require you to download additional weights. Always check the latest DeepSeek documentation for any extra steps.
- Configuration Tweaks: Depending on your specific needs, you might have to set up extra environment variables. These can fine-tune model parameters, secure your API, or allocate resources more efficiently.
With Docker, setting up DeepSeek locally becomes a streamlined process, removing much of the guesswork and complexity.
Posts You May Like:
How To Install and Run DeepSeek-R1 Locally Using LLama.cpp
For CPU-only configurations or those aiming for lightweight GPU usage.
Getting started with DeepSeek using llama.cpp is a great way to keep things fast and efficient.
Whether you’re on an x86-64, ARM-based Apple Silicon, or using some GPU power, this steps will walk you through the entire setup and help you get up and running smoothly.
Prerequisites
Before you begin, make sure you have everything in place. Here’s a quick rundown of the essentials:
- Hardware:
- CPU: A modern x86-64 or ARM processor (Apple Silicon works great).
- GPU (Optional): If you’re planning on tapping into GPU acceleration, you’ll need either NVIDIA (with CUDA), AMD (with ROCm), or Apple Metal support.
- Software & Tools:
- C++ Compiler: Ensure you have a compatible compiler installed (for example,
g++
). - CMake: This is crucial for building llama.cpp.
- Git: You’ll need it to clone the repository.
- Python (Optional): For those who want to leverage Python bindings later on.
- C++ Compiler: Ensure you have a compatible compiler installed (for example,
Step 1: Clone the llama.cpp Repository
First things first—get the source code for llama.cpp. Open your terminal and execute:
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp && make
This simple process pulls down the repository and initiates a basic build. It’s a quick and straightforward way to lay the groundwork.
Step 2: Build llama.cpp
Building the executable depends on your operating system:
For Windows (Using CMake)
- Create a Build Directory:
mkdir build cd build
- Configure and Build: Run the following commands:
cmake .. cmake --build . --config Release
After a successful build, you should find the executable in thebuild/bin
directory.
For Linux/macOS
Depending on whether you’re optimizing for Apple GPUs or NVIDIA CUDA, choose one of the following:
- For Apple Metal (macOS):bashCopy
make clean && make LLAMA_METAL=1
- For NVIDIA CUDA (Linux/macOS):bashCopy
make clean && make LLAMA_CUBLAS=1
These commands clean any previous builds and compile llama.cpp with the appropriate flags for your hardware.
Step 3: Download the DeepSeek GGUF Model
Now, it’s time to get your hands on the DeepSeek model in the GGUF format. You have two main options:
Option 1: Pre-Converted Model
- Hugging Face Hub: Search for “deepseek-gguf” on the Hugging Face Hub. Download the model version that suits your needs—this is the quickest route if a pre-converted model is available.
Option 2: Manual Conversion (Advanced)
If you prefer to convert the raw model yourself, follow these steps:
- Convert from PyTorch/Safetensors to GGUF: Run a command similar to:
python3 convert.py --ctx-size 4096 --outtype f16 /path/to/deepseek-model-dir
Adjust the parameters as necessary for your model.
- Quantize the Model: For instance, to generate a 4-bit quantized version:
./quantize /path/to/deepseek-model.gguf /path/to/deepseek-model-Q4_K_M.gguf Q4_K_M
This manual route offers flexibility and might be needed if your model isn’t directly available in GGUF format.
Step 4: Run the DeepSeek Model
With everything in place, you’re ready to run the model. Use the main executable provided by llama.cpp:
- For CPU Usage:
./main -m /path/to/deepseek-r1.Q4_K_M.gguf -p "Hello, DeepSeek!" -n 512
- For GPU Acceleration (e.g., NVIDIA CUDA):
./main -m /path/to/deepseek-r1.Q4_K_M.gguf -p "Hello, DeepSeek!" -n 512 --ngl 50
Here, the -p
flag specifies your prompt (feel free to experiment with different queries), and -n 512
controls the number of tokens for the generated output. Adding --ngl 50
leverages GPU acceleration if available.
Step 5: Use the API Server
For those who prefer an API-driven approach, you can run DeepSeek as an OpenAI-compatible API server. This makes it easier to integrate DeepSeek into web services or other applications.
Launch the server with:
./server -m /path/to/deepseek-r1.Q4_K_M.gguf --port 8000 --host 0.0.0.0 --ctx-size 4096
Then, test the setup by sending a sample request via curl:
curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d '{
"prompt": "Explain AI alignment",
"max_tokens": 200
}'
This should kick off a conversation with DeepSeek, and you can adjust the parameters to suit your needs.
Troubleshooting Tips
Even the best-laid plans can encounter bumps along the way. Here are some common troubleshooting tips:
- Model Compatibility: Ensure that the DeepSeek model you’re using is compatible with llama.cpp. If not, you might need to convert or adjust the model with the proper tools.
- Memory Issues: If you hit memory errors, consider using quantized versions (e.g., files ending in
.ggml.q4_0
or.gguf.q5_1
). These versions are optimized for lower resource usage.
- Performance Tweaks: For enhanced performance, especially if you have GPU support, double-check your build flags. Leveraging GPU acceleration can significantly speed up inference times.
Setting up DeepSeek with llama.cpp might seem a bit technical at first glance, but once you’re through these steps, you’ll have a robust local setup that lets you experiment freely with DeepSeek’s capabilities.
Enjoy exploring and fine-tuning your own deep learning environment, and don’t hesitate to revisit these instructions if you need to tweak your setup further.
We hope all these 4 methods on how to run DeepSeek-r1 locally for free helped you. Happy coding!
Posts You May Like: