Playing with AIs at home – beginning the journey

After having upgraded my home server, I found myself with an abundance of both CPU power and memory, both of which are meant to be used. After having given my other down-scaled components the memory and CPU they truly need, I decided to see what my new hardware could be used for. One of the thing the Intel Core Ultra 9 has is an NPU, a neural processing unit, and a GPU that can be used to speed up AI tasks.

But AI is a large field. Where do you start? Everyone have tried basic ChatGPT usage, some of us use AI coding tools at work and for hobby. Behind all AI tools, there are, however, some basic concepts.

Models are AI datasets trained on an amount of data – generally the larger the better, but the larger they are, the more hardware do you need.
Tools in various forms – either via OpenAPI specifications or with MCP (Model Context Protocol) are interfaces allowing an AI to perform specific things.

There are models made for general purpose and highly specific models. For getting started, I simply wanted a setup to run a few basic models, preferably with some tools support.

Ollama

Ollama is a tool which can run a model locally, and an excellent intro to getting the hang of the basics of AI. It can also run tools, and with the right models and tools, analyze and generate things like images.

Ollama is basically a command-line tool, but there is a user interface that can be used, Open WebUI.

To get the max of my Core i9, according to my current understanding, GPU is the best bet. The NPU in its current state isn’t as useful for the general-purpose LLMs, so I needed a build of ollama with Intel GPU support. And it turns out I am in luck, because I found https://docs.openwebui.com/tutorials/integrations/ipex_llm/ – a post detailing how to run Intel GPU-accelerated Ollama in Open WebUI.

There’s a ton of parameters to tune on this, various docker images are hosted. It took me a bit, but I finally got a setup that worked.

Ollama Deployment in Kubernetes with Intel GPU support

After quite a bit of tinkering, I ended with the following deployment.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
  namespace: ai
  labels:
    app.kubernetes.io/name: ollama
    app.kubernetes.io/instance: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ollama
      app.kubernetes.io/instance: ollama
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ollama
        app.kubernetes.io/instance: ollama
           spec:
      securityContext:
        {}
      priorityClassName: low-priority
      containers:
        - name: ollama
          securityContext:
            {}
          image: "intelanalytics/ipex-llm-inference-cpp-xpu:2.3.0-SNAPSHOT"
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 11434
              protocol: TCP
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: http
            periodSeconds: 30
            timeoutSeconds: 15
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: http
            periodSeconds: 30
            timeoutSeconds: 15
          resources:
            limits:
              gpu.intel.com/i915: "1"
              memory: 16Gi
              nvidia.com/gpu: "0"
          env:
            - name: DEVICE
              value: "Arc"
            - name: OMP_NUM_THREADS
              value: "12"
            - name: OLLAMA_NUM_THREADS
              value: "12"
            - name: OLLAMA_ORIGINS
              value: "*"
            - name: OLLAMA_HOST
              value: "0.0.0.0"
            - name: OLLAMA_CONTEXT_LENGTH
              value: "8192"
          volumeMounts:
            - name: llm-data
              mountPath: /root/.ollama
            - name: dev-dri
              mountPath: /dev/dri
          command: ["/bin/bash", "-lc"]
          args:
            - |
              cd /llm/scripts/ &&
              source ipex-llm-init --gpu --device ${DEVICE} &&
              bash start-ollama.sh &&
              tail -f /llm/ollama/ollama.log
      volumes:
        - name: llm-data
          persistentVolumeClaim:
            claimName: ollama
        - name: dev-dri
          hostPath:
            path: /dev/dri
            type: Directory

As you can see, I feed it the path to /dev/dri which is the directory where my GPU interface devices live. I also limit the amount of memory it can use and sets some thread and memory boundaries to prevent it to consume all my resources. After all, I do have workloads here that I care about, and Kubernetes itself isn’t too happy about not having any resources left for itself. It also needed a disk, which I have currently set to 100 GB. This is where models etc live, so you need a some disk space if you want to play around.

I then, of course, need a service to make sure workloads can find and talk to it:

apiVersion: v1
kind: Service
metadata:
  name: ollama
  labels:
    app.kubernetes.io/name: ollama
    app.kubernetes.io/instance: ollama
spec:
  type: ClusterIP
  ports:
    - port: 11434
      targetPort: 11434
      protocol: TCP
      name: http
  selector:
    app.kubernetes.io/name: ollama
    app.kubernetes.io/instance: ollama

Running this, I have an ollama POD, but we also wanted Open WebUI to have an UI above it, for management and interactions. But there is another component: ollama-webui-pipelines, which sits between Open WebUI and Ollama, and adds things like tools support:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama-pipelines
  labels:
    app.kubernetes.io/name: ollama
    app.kubernetes.io/instance: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ollama-webui-pipelines
      app.kubernetes.io/instance: ollama
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ollama-webui-pipelines
        app.kubernetes.io/instance: ollama
    spec:
      containers:
        - name: webui
          image: "ghcr.io/open-webui/pipelines:latest"
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 9099
              protocol: TCP
          env:
            - name: PIPELINES_DIR
              value: "/app/pipelines"
          resources:
            requests:
              cpu: "100m"
              memory: "100Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"
          livenessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: http
            periodSeconds: 30
            timeoutSeconds: 15
          readinessProbe:
            failureThreshold: 3
            httpGet:
              path: /
              port: http
            periodSeconds: 30
            timeoutSeconds: 15
          volumeMounts:
            - name: pipelines-data
              mountPath: /app/pipelines
      volumes:
        - name: pipelines-data
          persistentVolumeClaim:
            claimName: ollama-pipelines

And we, of course, need the PVC I have referenced, plus a service:

apiVersion: v1
kind: Service
metadata:
  name: ollama-pipelines
  labels:
    app.kubernetes.io/name: ollama
    app.kubernetes.io/instance: ollama
spec:
  type: ClusterIP
  ports:
    - port: 9099
      targetPort: http
      protocol: TCP
      name: http
  selector:
    app.kubernetes.io/name: ollama-webui-pipelines
    app.kubernetes.io/instance: ollama

And now, we are ready to deploy Open WebUI:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama-webui
  labels:
    app.kubernetes.io/name: ollama
    app.kubernetes.io/instance: ollama
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: ollama-webui
      app.kubernetes.io/instance: ollama
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ollama-webui
        app.kubernetes.io/instance: ollama
    spec:
      containers:
        - name: webui
          image: "ghcr.io/open-webui/open-webui:0.7.1"
          imagePullPolicy: IfNotPresent
          ports:
            - name: http
              containerPort: 8080
              protocol: TCP
          resources:
            requests:
              cpu: "500m"
              memory: "500Mi"
            limits:
              cpu: "1000m"
              memory: "1Gi"
          env:
            - name: OLLAMA_BASE_URL
              value: http://ollama:11434
            - name: OLLAMA_PROXY_URL
              value: http://ollama:11434
            - name: DATA_DIR
              value: "/app/backend/data"
            - name: OPENAI_API_KEY
              value: "something_secret"
            - name: OPENAI_API_BASE_URL
              value: http://ollama-pipelines:9099
            - name: OAUTH_CLIENT_ID
              value: openwebui
            - name: OAUTH_CLIENT_SECRET
              valueFrom:
                secretKeyRef:
                  name: openwebui-oidc
                  key: client_secret
            - name: OPENID_PROVIDER_URL
              value: https://keycloak.engen.priv.no/realms/Klauvsteinen/.well-known/openid-configuration
            - name: OPENID_REDIRECT_URI
              value: https://ai.engen.priv.no/oauth/oidc/callback
            - name: OAUTH_SCOPES
              value: "openid email profile"
            - name: ENABLE_OAUTH_ROLE_MANAGEMENT
              value: "true"
            - name: OAUTH_ROLES_CLAIM
              value: "roles"
            - name: GLOBAL_LOG_LEVEL
              value: "DEBUG"
            - name: ENABLE_OAUTH_SIGNUP
              value: "true"
            - name: OAUTH_MERGE_ACCOUNTS_BY_EMAIL
              value: "true"
            - name: OAUTH_ALLOWED_ROLES
              value: "openwebui_user,openwebui_admin"
            - name: OAUTH_ADMIN_ROLES
              value: "openwebui_admin"

As you can see, it supports Keycloak, which I am using for all my home lab authentication needs wherever it’s supported. I’ve described keycloak earlier, so I am not going to explain that, but as you can see, I’ve decided to run it at ai.engen.priv.no.

So, we’ll of course need a service, an ingressroute, TLS certificates, DNS and all that, which is by now, pretty trivial – I’ve described my setup in earlier blog posts, so it’s just yet another site. To be able to log on, I of course also needed to add the Keycloak Client for Ollama and give my user admin rights in Open WebUI.

Using Ollama

Now that I have Open WebUI installed, I just head over to the interface, and am greeted by an interface centered around an AI chat interface. In Open WebUI, you can also install tools and MCPs, models, set up prompts, and every bit of fun you expect from an AI rig. There’s tons of things to play with, and I’m only going to touch on a few thinfs.

Selecting a model

Selecting a model isn’t easy, and if you are in this for the experimenting, you’ll end up with a few. There are way to many to mention them all, but there are some things to know.

Most models come in more than one size, which basically says how much data it is trained on. The larger it is, the better, of course, from a functionality standpount, but a larger model will obviously feel slower. In my setup, I can live with everything up to 8b, occationally dabbling higher for testing specific things. 8b means 8 billion tokens, tokens is a general unit of mesure for the size of an LLM.

There exists a list models to use with ollama at https://ollama.com/library – where models strength and purpose is detailed.

Some models are better suited than others for usage with tools, some can analyze images, etc. This is where fun starts, and I really don’t have any recommendations, it all depends what you want to play with. I tend to go with the llama models myself for general purpose usage, while I am for example experimenting with gemma models for image analyzing.

Once you have found a model you want, you need to download it, after which you can select it in the chat interface.

Using tools

Tools are configured in a similar way. The best ways to find tools to play with is to head over to https://openwebui.com/search and search for tools. There you’ll find. a set of community managed tools, ready for you to play with. You can also connect to OpenAPI compatible services and MCP-services of «streaming http» type. I am currently playing with one for my home assistant, but the success isn’t all that good yet. This isn’t an exact science, prompts and instructions matter, selection of models matter. There’s a ton of parameters to set to get the AI to do what you want.

Other possibilities

You can also hook it up to text-to-speech services, to image generation AIs – for example a self-hosted ComfyUI – and various other things. If you are interested in playing with AIs, then go ahead to install these tools! They are free to use, and is an excellent way of getting to understand a bit more about AIs than how to use ChatGPT.

Wrapping it up

There’s way too many possibilities of what you can do with self-hosted LLMs. I’m tinkering with creating a chat bot currently, you can use them for home automation, image generation/analysis, and about everything you can use an AI for today.

The first step is to have a system to run it, and I believe Ollama is one of the best/easiest ways to start that, being both feature-rich and relatively simple to use.

Also note: There are some controversy around copyright of things that go into models. A model is trained on a lot of data, probably some of it is mine and yours. They generally haven’t asked for permissions for that. I am a bit unsure of what to think of all of that, but in the end, you’ll probably need a generic framework for letting an LLM know whether or not you want them to use your content. Much like how you’d allow/disallow search engines on your web site. I am a geek by heart, I like to play with cool functionality, so for now I’ll stay away from those issues, and just learn to use the technology. It can be great fun!

Vegards Blog

Playing with AIs at home – beginning the journey

Ollama

Ollama Deployment in Kubernetes with Intel GPU support

Using Ollama

Selecting a model

Using tools

Other possibilities

Wrapping it up

Legg igjen en kommentar Avbryt svar

Playing with AIs at home – beginning the journey

Ollama

Ollama Deployment in Kubernetes with Intel GPU support

Using Ollama

Selecting a model

Using tools

Other possibilities

Wrapping it up

Del dette:

Legg igjen en kommentar Avbryt svar