Playing with AIs at home – beginning the journey


After having upgraded my home server, I found myself with an abundance of both CPU power and memory, both of which are meant to be used. After having given my other down-scaled components the memory and CPU they truly need, I decided to see what my new hardware could be used for. One of the thing the Intel Core Ultra 9 has is an NPU, a neural processing unit, and a GPU that can be used to speed up AI tasks.

But AI is a large field. Where do you start? Everyone have tried basic ChatGPT usage, some of us use AI coding tools at work and for hobby. Behind all AI tools, there are, however, some basic concepts.

  • Models are AI datasets trained on an amount of data – generally the larger the better, but the larger they are, the more hardware do you need.
  • Tools in various forms – either via OpenAPI specifications or with MCP (Model Context Protocol) are interfaces allowing an AI to perform specific things.

There are models made for general purpose and highly specific models. For getting started, I simply wanted a setup to run a few basic models, preferably with some tools support.

Ollama

Ollama is a tool which can run a model locally, and an excellent intro to getting the hang of the basics of AI. It can also run tools, and with the right models and tools, analyze and generate things like images.

Ollama is basically a command-line tool, but there is a user interface that can be used, Open WebUI.

To get the max of my Core i9, according to my current understanding, GPU is the best bet. The NPU in its current state isn’t as useful for the general-purpose LLMs, so I needed a build of ollama with Intel GPU support. And it turns out I am in luck, because I found https://docs.openwebui.com/tutorials/integrations/ipex_llm/ – a post detailing how to run Intel GPU-accelerated Ollama in Open WebUI.

There’s a ton of parameters to tune on this, various docker images are hosted. It took me a bit, but I finally got a setup that worked.

Ollama Deployment in Kubernetes with Intel GPU support

After quite a bit of tinkering, I ended with the following deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
namespace: ai
labels:
app.kubernetes.io/name: ollama
app.kubernetes.io/instance: ollama
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: ollama
app.kubernetes.io/instance: ollama
template:
metadata:
labels:
app.kubernetes.io/name: ollama
app.kubernetes.io/instance: ollama
spec:
securityContext:
{}
priorityClassName: low-priority
containers:
- name: ollama
securityContext:
{}
image: "intelanalytics/ipex-llm-inference-cpp-xpu:2.3.0-SNAPSHOT"
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 11434
protocol: TCP
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
periodSeconds: 30
timeoutSeconds: 15
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
periodSeconds: 30
timeoutSeconds: 15
resources:
limits:
gpu.intel.com/i915: "1"
memory: 16Gi
nvidia.com/gpu: "0"
env:
- name: DEVICE
value: "Arc"
- name: OMP_NUM_THREADS
value: "12"
- name: OLLAMA_NUM_THREADS
value: "12"
- name: OLLAMA_ORIGINS
value: "*"
- name: OLLAMA_HOST
value: "0.0.0.0"
- name: OLLAMA_CONTEXT_LENGTH
value: "8192"
volumeMounts:
- name: llm-data
mountPath: /root/.ollama
- name: dev-dri
mountPath: /dev/dri
command: ["/bin/bash", "-lc"]
args:
- |
cd /llm/scripts/ &&
source ipex-llm-init --gpu --device ${DEVICE} &&
bash start-ollama.sh &&
tail -f /llm/ollama/ollama.log
volumes:
- name: llm-data
persistentVolumeClaim:
claimName: ollama
- name: dev-dri
hostPath:
path: /dev/dri
type: Directory

As you can see, I feed it the path to /dev/dri which is the directory where my GPU interface devices live. I also limit the amount of memory it can use and sets some thread and memory boundaries to prevent it to consume all my resources. After all, I do have workloads here that I care about, and Kubernetes itself isn’t too happy about not having any resources left for itself. It also needed a disk, which I have currently set to 100 GB. This is where models etc live, so you need a some disk space if you want to play around.

I then, of course, need a service to make sure workloads can find and talk to it:

apiVersion: v1
kind: Service
metadata:
name: ollama
labels:
app.kubernetes.io/name: ollama
app.kubernetes.io/instance: ollama
spec:
type: ClusterIP
ports:
- port: 11434
targetPort: 11434
protocol: TCP
name: http
selector:
app.kubernetes.io/name: ollama
app.kubernetes.io/instance: ollama

Running this, I have an ollama POD, but we also wanted Open WebUI to have an UI above it, for management and interactions. But there is another component: ollama-webui-pipelines, which sits between Open WebUI and Ollama, and adds things like tools support:

apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama-pipelines
labels:
app.kubernetes.io/name: ollama
app.kubernetes.io/instance: ollama
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: ollama-webui-pipelines
app.kubernetes.io/instance: ollama
template:
metadata:
labels:
app.kubernetes.io/name: ollama-webui-pipelines
app.kubernetes.io/instance: ollama
spec:
containers:
- name: webui
image: "ghcr.io/open-webui/pipelines:latest"
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 9099
protocol: TCP
env:
- name: PIPELINES_DIR
value: "/app/pipelines"
resources:
requests:
cpu: "100m"
memory: "100Mi"
limits:
cpu: "500m"
memory: "512Mi"
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
periodSeconds: 30
timeoutSeconds: 15
readinessProbe:
failureThreshold: 3
httpGet:
path: /
port: http
periodSeconds: 30
timeoutSeconds: 15
volumeMounts:
- name: pipelines-data
mountPath: /app/pipelines
volumes:
- name: pipelines-data
persistentVolumeClaim:
claimName: ollama-pipelines

And we, of course, need the PVC I have referenced, plus a service:

apiVersion: v1
kind: Service
metadata:
name: ollama-pipelines
labels:
app.kubernetes.io/name: ollama
app.kubernetes.io/instance: ollama
spec:
type: ClusterIP
ports:
- port: 9099
targetPort: http
protocol: TCP
name: http
selector:
app.kubernetes.io/name: ollama-webui-pipelines
app.kubernetes.io/instance: ollama

And now, we are ready to deploy Open WebUI:

apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama-webui
labels:
app.kubernetes.io/name: ollama
app.kubernetes.io/instance: ollama
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: ollama-webui
app.kubernetes.io/instance: ollama
template:
metadata:
labels:
app.kubernetes.io/name: ollama-webui
app.kubernetes.io/instance: ollama
spec:
containers:
- name: webui
image: "ghcr.io/open-webui/open-webui:0.7.1"
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 8080
protocol: TCP
resources:
requests:
cpu: "500m"
memory: "500Mi"
limits:
cpu: "1000m"
memory: "1Gi"
env:
- name: OLLAMA_BASE_URL
value: http://ollama:11434
- name: OLLAMA_PROXY_URL
value: http://ollama:11434
- name: DATA_DIR
value: "/app/backend/data"
- name: OPENAI_API_KEY
value: "something_secret"
- name: OPENAI_API_BASE_URL
value: http://ollama-pipelines:9099
- name: OAUTH_CLIENT_ID
value: openwebui
- name: OAUTH_CLIENT_SECRET
valueFrom:
secretKeyRef:
name: openwebui-oidc
key: client_secret
- name: OPENID_PROVIDER_URL
value: https://keycloak.engen.priv.no/realms/Klauvsteinen/.well-known/openid-configuration
- name: OPENID_REDIRECT_URI
value: https://ai.engen.priv.no/oauth/oidc/callback
- name: OAUTH_SCOPES
value: "openid email profile"
- name: ENABLE_OAUTH_ROLE_MANAGEMENT
value: "true"
- name: OAUTH_ROLES_CLAIM
value: "roles"
- name: GLOBAL_LOG_LEVEL
value: "DEBUG"
- name: ENABLE_OAUTH_SIGNUP
value: "true"
- name: OAUTH_MERGE_ACCOUNTS_BY_EMAIL
value: "true"
- name: OAUTH_ALLOWED_ROLES
value: "openwebui_user,openwebui_admin"
- name: OAUTH_ADMIN_ROLES
value: "openwebui_admin"

As you can see, it supports Keycloak, which I am using for all my home lab authentication needs wherever it’s supported. I’ve described keycloak earlier, so I am not going to explain that, but as you can see, I’ve decided to run it at ai.engen.priv.no.

So, we’ll of course need a service, an ingressroute, TLS certificates, DNS and all that, which is by now, pretty trivial – I’ve described my setup in earlier blog posts, so it’s just yet another site. To be able to log on, I of course also needed to add the Keycloak Client for Ollama and give my user admin rights in Open WebUI.

Using Ollama

Now that I have Open WebUI installed, I just head over to the interface, and am greeted by an interface centered around an AI chat interface. In Open WebUI, you can also install tools and MCPs, models, set up prompts, and every bit of fun you expect from an AI rig. There’s tons of things to play with, and I’m only going to touch on a few thinfs.

Selecting a model

Selecting a model isn’t easy, and if you are in this for the experimenting, you’ll end up with a few. There are way to many to mention them all, but there are some things to know.

Most models come in more than one size, which basically says how much data it is trained on. The larger it is, the better, of course, from a functionality standpount, but a larger model will obviously feel slower. In my setup, I can live with everything up to 8b, occationally dabbling higher for testing specific things. 8b means 8 billion tokens, tokens is a general unit of mesure for the size of an LLM.

There exists a list models to use with ollama at https://ollama.com/library – where models strength and purpose is detailed.

Some models are better suited than others for usage with tools, some can analyze images, etc. This is where fun starts, and I really don’t have any recommendations, it all depends what you want to play with. I tend to go with the llama models myself for general purpose usage, while I am for example experimenting with gemma models for image analyzing.

Once you have found a model you want, you need to download it, after which you can select it in the chat interface.

Using tools

Tools are configured in a similar way. The best ways to find tools to play with is to head over to https://openwebui.com/search and search for tools. There you’ll find. a set of community managed tools, ready for you to play with. You can also connect to OpenAPI compatible services and MCP-services of «streaming http» type. I am currently playing with one for my home assistant, but the success isn’t all that good yet. This isn’t an exact science, prompts and instructions matter, selection of models matter. There’s a ton of parameters to set to get the AI to do what you want.

Other possibilities

You can also hook it up to text-to-speech services, to image generation AIs – for example a self-hosted ComfyUI – and various other things. If you are interested in playing with AIs, then go ahead to install these tools! They are free to use, and is an excellent way of getting to understand a bit more about AIs than how to use ChatGPT.

Wrapping it up

There’s way too many possibilities of what you can do with self-hosted LLMs. I’m tinkering with creating a chat bot currently, you can use them for home automation, image generation/analysis, and about everything you can use an AI for today.

The first step is to have a system to run it, and I believe Ollama is one of the best/easiest ways to start that, being both feature-rich and relatively simple to use.

Also note: There are some controversy around copyright of things that go into models. A model is trained on a lot of data, probably some of it is mine and yours. They generally haven’t asked for permissions for that. I am a bit unsure of what to think of all of that, but in the end, you’ll probably need a generic framework for letting an LLM know whether or not you want them to use your content. Much like how you’d allow/disallow search engines on your web site. I am a geek by heart, I like to play with cool functionality, so for now I’ll stay away from those issues, and just learn to use the technology. It can be great fun!

, , ,

Legg igjen en kommentar

Din e-postadresse vil ikke bli publisert. Obligatoriske felt er merket med *

Dette nettstedet bruker Akismet for å redusere spam. Finn ut mer om hvordan kommentardataene dine behandles.