In a previous post, I’d experimented with running LLMs locally using llama.cpp on a Windows machine with a GPU. It worked well, but the process took quite a few steps to set things up. Since then, I found Ollama and tried it out on a Macbook Pro (M4 Pro).
I learned two things: Ollama makes it really easy to experiment with running an LLM locally and running local LLM inference on the M4 Pro GPU chip is actually usable!
Here’s the steps if you want to experiment for yourself on a Mac. I’m assuming that you’ve already got Homebrew installed to make installation easier.
- Start by installing Ollama with brew install ollama.
- Start the Ollama server with ollama serve. Leave it running in the background and open a new Terminal tab.
- Enter ollama run llama3.1in your new Terminal tab. This will pull down llama3.1 and then launch an interactive prompt where you can start a conversation with the model.
- When you’re done, enter /byeto terminate the chat. You can nowCtrl+Cthe running server to shut down Ollama.
It’s great how easy this makes getting started with local LLMs! There’s a bunch of other models in the Ollama library if you want to experiment with different models. For example, you can try out DeepSeek-R1 with ollama run deepseek-r1.
