My self-hosted AI journey Part 2: Using Ollama as your coding assistant

One of the big use-cases for AI for developers today is coding-assistants. It basically serves as your assistant for useful suggestions, a sparring-partner, and occationally you can hand off larger tasks to it while you are taking a lunch break.

Hiring human assistants means you can have them sign work contracts and confidentiality agreements – which they might or might not uphold, I’ll give you that. Using an AI is in some ways very different, but in some ways not. You can of course use a service which promises not to leak your data to anyone, and you can use paid services where this is part of the contract. Nevertheless, you are still trusting a big data company somewhere with your code, either in full or in bits and pieces, and for some people, this isn’t acceptable. And some people like me simply like to run services themselves. But can you do this with AI services? Of course you can!

In my previous article I described how I was running Ollama for generic AI chat a la ChatGPT. In this article, I’n exploring using Ollama as a Coding Assistant.

I’m not a big IDE-user, but one IDE I actually have gotten used to, mostly because we also use it at work, is Microsoft VSCode. We also use AI-agents etc at work integrated it, so I know where to start and what to compare with. There’s various options for integrating VSCode with Ollama. It seems the most versatile and flexible one is continue, so I decided to try that one.

Plugins in VSCode is easy to install, but once it’s installed, I needed it to talk to my Ollama. And a big fat warning here: There’s no authentication layer in Ollama, so I had to open my Ollama POD up for traffic from my internal network. I’m not extremely worried, but I’m likely gonna try approve it.

Prerequisites: Exposing Ollama

This is pretty easy: I just need a loadbalancer in front of Ollama and letting network from internal network through, both in calico and in my unifi gateway.

The load balancer:

apiVersion: v1
kind: Service
metadata:
    name: ollama-api
    namespace: ai
    annotations:
      projectcalico.org/ipv6pools: '["loadbalancer-ipv6-pool-internal"]'
      projectcalico.org/ipv4pools: '["loadbalancer-ipv4-pool-internal"]'
      external-dns.alpha.kubernetes.io/hostname: ollama.engen.priv.no
      external-dns/internal: "true"
      external-dns.alpha.kubernetes.io/ttl: "300"
      ipchanger.alpha.kubernetes.io/patch: "true"
      unifi.engen.priv.no/firewall-group: internalweb

spec:
  externalTrafficPolicy: Local
  type: LoadBalancer
  ipFamilyPolicy: PreferDualStack
  ipFamilies:
    - IPv6
    - IPv4
  ports:
    - name: api
      port: 11434
  selector:
    app.kubernetes.io/name: ollama

This also adds it to the internalweb group, so that it gets the correct opening, as well as adding it to my internal (Unifi) DNS under the name ollama.engen.priv.no. Once you have this infrastructure components/operators in place in Kubernetes, adding new services is a piece of cake.

For calico policy, I need:

apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
  name: allow-traffic-to-ollama-api
  namespace: ai  # Change this to your namespace
spec:
  types:
  - Ingress
  ingress:
  - action: Allow
    protocol: TCP
    source:
      nets:
        - 192.168.0.0/16
    destination:
      ports:
      - 11434
  - action: Allow
    protocol: TCP
    source:
      nets:
        - <my ipv6 range>::/56
    destination:
      ports:
      - 11434
  selector: app.kubernetes.io/name == "ollama"

With that done, I’ll be able to access Ollama on http://ollama.engen.priv.no:11434/ from the internal network.

Installing and configuring Continue

Installing Continue, you do as any plugin in VSCode, through the UI, so I’ll not cover that one.

For configuring, you’ll need to create/modify the file ~/.continue/config.yaml. This sets up the models and other configuration, like prompts and context providers. I’ll go through mine step by step. Note: This doesn’t work perfectly yet, in fact some of this still acts up weird, but at least it tries to do semi-intelligent things to my code. The autocomplete is also decent.

name: Local Config
version: 1.0.0
schema: v1
prompts:
  - name: repo-safe-agent
    description: "Safe, repo-grounded agent behavior"
    prompt: |
      You are an AI coding assistant working ONLY with the files and context provided from the current VS Code workspace.

      Rules:
      - When referring to an existing file, first read and, if asked, show its REAL current contents from the workspace.
      - If you do not see a file or its content in the provided context, say you cannot access it and ask the user to attach or open it.
      - Prefer using context providers such as code, folder, and codebase to automatically select relevant files.
      - When unsure, ask clarifying questions instead of guessing.
      - Clearly distinguish between EXISTING files and NEW files you propose to add.

      Always follow these rules.

This is an attempt at getting it do to consistent things. Some of it is an attempt to fix me using unsuitable models in agent mode, but it stays for now.

Next blocks is the models and what they should be used for. This is also very much a work in progress, but the most consistent results I have gotten so far is from this set. Note that llama3.1:8b came recommended highly for tool use, even though it’s a relatively small model. This is the model I use in agent mode now.

I’ve left in a few models for each use case. These models will also have to be installed in ollama before they can be used, of course.

models:
  - name: deepseek-coder:33b
    provider: ollama
    model: deepseek-coder:33b
    apiBase: http://ollama.engen.priv.no:11434
    roles:
      - chat
      - edit

  - name: Qwen2.5Coder2.5b
    provider: ollama
    model: qwen2.5-coder:2.5b
    apiBase: http://ollama.engen.priv.no:11434
    roles:
      - autocomplete

  - name: qwen2.5-coder:32b
    provider: ollama
    model: qwen2.5-coder:32b
    apiBase: http://ollama.engen.priv.no:11434
    roles:
      - chat
      - edit

  - name: llama31-8b
    provider: ollama
    model: llama3.1:8b
    apiBase: http://ollama.engen.priv.no:11434
    roles:
      - chat
      - apply
    capabilities:
      - tool_use


  - name: qwen2.5-coder:7b
    provider: ollama
    model: qwen2.5-coder:7b
    apiBase: http://ollama.engen.priv.no:11434
    roles:
      - autocomplete

Last, the context-providers. FOr now, I have these:

context:
  - provider: code
  - provider: diff
  - provider: terminal
  - provider: folder
  - provider: repo-map
  - provider: open
  - provider: tree

The description of these are in Continue Reference guide

Getting it all to work – or at least trying.

So, what’s some good tips to get good results? This is still a learning project for me, but I do have some good advice.

Write and reference documentation!

If you have some documentation you can reference, outlining conventions you use, examples, etc, your AI will thank you for it. Or maybe it won’t, but at least there’s a larger chance it will produce better code.

Create some good rules

If you add files in .continue/rules/ with instructions for continue, you increase the chance of better results. I have tried a few exampes. The results are somewhat mixed, there’s no magic wand that makes the AI do exactly what you want unless you give some thoughts into what your AI should know and what it should do.

Here’s a few examples I am playing with now

rules % cat kubernetes.md 
name: Kubernetes YAML edits
globs: ["**/*.yaml", "**/*.yml"]
alwaysApply: true
---

When editing Kubernetes manifests:

- Output **only** the final YAML, with no natural-language explanation.
- Do not wrap YAML in Markdown fences.
- Do not repeat the file twice.
- Do not include greetings, notes, or signatures.

This was written to counter specific errors I had, although it’s likely that switching model has had greater effect on the result.

You can even tell it which documents it should reference, in a rule:

---
name: Repo docs
alwaysApply: true
---

When answering questions about this repo, prefer using these docs:

- docs/argocd_repo_layout.md
- docs/kubernetes_architecture.md

Reference them explicitly in your reasoning and keep answers consistent with them.

Use correct models

The different models have different strength, and larger doesn’t always mean better. My current agent model is llaam3.1:8b, while my chat/edit models are qwen2.5-coder:32b and deepseek-coder:33b. The two latter didn’t give nearly as good agent results as the first one, even though the first is much smaller.

Test examples

I did a few tests, of course, but nothing I have used for something definite yet. One-off-edits of a single file with the «edit» mode is more predictable, but I’ll focus on agent here, because that’s more fun and possibly useful/timesaving if you get it to work properly

Creating a new firewallgroup

 @firewallrules create a new firewall group called internalservices

This references my firewallrules folder and is a simple question to create a new YAML file for a new firewall group. (See my Kubernetes for geeks: Creating your own Kubernetes Operator blog post). I used the qwen2.5-coder:32b (which I think might be my agent model of preference now…)

The answer was spot on:

apiVersion: unifi.engen.priv.no/v1beta1
kind: FirewallGroup
metadata:
  labels:
    app.kubernetes.io/name: unifi-network-operator
    app.kubernetes.io/managed-by: kustomize
  name: internalservices
  namespace: firewallrules
  annotations:
    unifi.engen.priv.no/firewall-rule: internalservices-rules
spec:
  name: InternalServices
  matchServicesInAllNamespaces: true
  # TODO(user): Add fields here

The follow-up-description:

You can save this content in a new file, for example, apps/firewallrules/internal_services.yaml.

To apply the changes, ensure that the file is included in your Kubernetes manifest or ArgoCD application configuration. If you are using a specific tool to manage these files (like Kustomize), make sure to include the new file in the appropriate kustomization file.

If you need further assistance, refer to docs/argocd_repo_layout.md for details on how to structure your Argocd repository and docs/kubernetes_architecture.md for more information on Kubernetes architecture and resource management.

After doing a lot of trials which mainly failed, I decided to consult github issues, and found a couple that suggests ollama and agent mode is slightly broken. It’s still valuable to run continue in edit mode and autocomplete mode, that seems to work decently. Here’s the most promising issue I watch:

https://github.com/continuedev/continue/issues/9529

Conclusion

It’s definitely possible to have a self-hosted coding assistant, but it’s not a fire-and-forget-thing, it needs some configuration which probably needs to be updated over time.

Documentation in the repo helps a lot, so running a code agent might also help on doing QA on the documentation… Think of the AI as a junior coder, prone to misinterpret almost everything you tell it. The cleared you state it, and the less room there are for misunderstandings, the greater is the chance for good results.

I might revisit and either create a new article or update this one when I feel I have some groundbreaking results, but for now it was a nice experiment that I’ll likely continue with.

Agent mode is currently slightly broken, so I’ve put that a bit on hold for now. I might try different plugins, for example this one is the next candidate.

Vegards Blog

My self-hosted AI journey Part 2: Using Ollama as your coding assistant

Prerequisites: Exposing Ollama

Installing and configuring Continue

Getting it all to work – or at least trying.

Write and reference documentation!

Create some good rules

Use correct models

Test examples

Creating a new firewallgroup

Conclusion

Legg igjen en kommentar Avbryt svar

My self-hosted AI journey Part 2: Using Ollama as your coding assistant

Prerequisites: Exposing Ollama

Installing and configuring Continue

Getting it all to work – or at least trying.

Write and reference documentation!

Create some good rules

Use correct models

Test examples

Creating a new firewallgroup

Conclusion

Del dette:

Legg igjen en kommentar Avbryt svar