How To Run LLMs Locally With Ollama & Next.js

I’ve created AI solutions at work to support various projects. Several of those leveraged APIs for large language models (LLMs), like GPT-4, to generate content or to act as virtual assistants. I figured it would be useful to learn how to run LLMs locally on my Windows laptop, so I can eliminate external dependancies and explore freely without incurring additional expenses.

Originally, I ran into this article about building a local chatbot. Since I tend to follow my own path, I decided to jump right into the Next.js starter on GitHub, rather than follow the article word-for-word.

I already had WSL (Windows Subsystem for Linux) installed, so getting this setup on that environment was a breeze! I’ll provide an additional resource for installing WSL and Ollama at the end of this article.

The Steps I Followed:

Install Ollama.
Pull the model I wanted to use: ollama pull llama2:chat (reference)
Clone the repository: git clone https://github.com/lgrammel/modelfusion-ollama-nextjs-starter.git
Install dependencies: npm install
Start the development server: npm run dev
Go to http://localhost:3000/
Check out the API code: app/api/chat/route.ts

Here’s a video showing everything set up, demonstrating Llama2’s inference performance when running on a modest NVIDIA RTX 4060 GPU!
I did NOT speed up this video in any way:

If you’d like to see a comparison of running inference on a CPU, check out the video at the end of this article (starting at 5 minute mark). Makes me glad that I invested in my new Lenovo Legion laptop! 🤣

Access on Mobile

I’m using the following command to run all this on a specific port and to listen for network traffic, so I can access the NextJS UI from my phone. However, it’s not working like I expected. I’ll provide an update once I get that working. I imagine it has something to do with using WSL.

npx next dev -H 0.0.0.0 -p 3000

Testing Multiple Models

Following this process, we’re now set up to run several different models out-of-the-box! Here’s a list of models that I’m seeing already have API routes and pages set up in Next.js:

Llama2
Mistral
Neural-Chat
OpenHermes
Vicuna

Here’s the full list of models provided through Ollama!

I haven’t done this myself, yet, but you should just need to pull those models down through Ollama (step 2 from above) and open the appropriate URL (for example: http://localhost:3000/llama2).

I’ve seen articles claiming that Mistral is better than GPT, so I’ll likely be testing that theory myself.

Installing WSL

In case it’s helpful, I thought I’d include this video I found for installing WSL and Ollama, since Ollama isn’t supported on Windows without WSL (yet). He doesn’t get into using Next.js, but does use the command line to demonstrate how the Mistral model performs when using a CPU.

Shutting Down Ollama

It’s good to ensure your system resources aren’t being drained by unwanted processes running. Turns out I kept finding that Ollama was running when I didn’t expect it to be. I even found an issue on GitHub for it.

To be sure to use the following command in WSL to shut it down completely:

sudo service ollama stop

To avoid this issue in the future, I’m moving to running Ollama in Docker, so I can simply shut down the container.

Conclusion

Now that I know how to run LLMs locally using Ollama and Next.js, I have the foundation in place for me to start exploring a variety of ideas I’ve been kicking around. If you’d like to follow my journey, be sure to subscribe to get new articles by email!

Header Image

Continuing to share how I’m making my header images using Stable Diffusion…the funniest thing about using the Disney/Pixar models with a wide aspect ratio is that the only way to get 2 llamas as the focus is to emphasize ((one)) 😅

I do love how quickly you can generate images though, which is helpful when it takes a few iterations before you get a good one.

Prompt: (one llama), glasses, colorful, playful, fancy library, colorful books, sun light, book_shelves background, open mouth smile, stacked books, high contrast, ((laptop)), windows

Negative prompt: canvas frame, ((disfigured)), ((bad art)), ((deformed)),((close up)),((b&w)), blurry, (((duplicate))), [out of frame], mutated hands, ((ugly)), blurry, (((bad proportions))), (((disfigured))), out of frame, ugly, gross proportions, ugly, tiling, poorly drawn, out of frame, disfigured, deformed, negative_easynegative

Steps: 35, Sampler: DPM2 Karras, CFG scale: 5, Seed: 382666114, Size: 1344×768, Model hash: 5990f18f0e, Model: PixarXL, Style Selector Enabled: True, Style Selector Randomize: False, Style Selector Style: base, Version: v1.6.1

Time taken: 44.7 sec.A: 4.14 GB, R: 6.31 GB, Sys: 8.0/7.99609 GB (100.0%)

How To Run LLMs Locally With Ollama & Next.js

The Steps I Followed:

Access on Mobile

Testing Multiple Models

Installing WSL

Shutting Down Ollama

Conclusion

Header Image

Leave a Reply Cancel reply

Categories

Archives

The Steps I Followed:

Access on Mobile

Testing Multiple Models

Installing WSL

Shutting Down Ollama

Conclusion

Header Image

Related Posts

Moving to the Next Level of GenAI with ComfyUI

Watch My Technical Evaluations of UFO/UAP Evidence on YouTube

Create Amazing Audio-Reactive Light Shows in UE5 with DMX and TouchDesigner

Leave a Reply Cancel reply

Categories

Archives