My self-hosted AI journey Part 2: Using Ollama as your coding assistant


One of the big use-cases for AI for developers today is coding-assistants. It basically serves as your assistant for useful suggestions, a sparring-partner, and occationally you can hand off larger tasks to it while you are taking a lunch break.

Hiring human assistants means you can have them sign work contracts and confidentiality agreements – which they might or might not uphold, I’ll give you that. Using an AI is in some ways very different, but in some ways not. You can of course use a service which promises not to leak your data to anyone, and you can use paid services where this is part of the contract. Nevertheless, you are still trusting a big data company somewhere with your code, either in full or in bits and pieces, and for some people, this isn’t acceptable. And some people like me simply like to run services themselves. But can you do this with AI services? Of course you can!

In my previous article I described how I was running Ollama for generic AI chat a la ChatGPT. In this article, I’n exploring using Ollama as a Coding Assistant.

I’m not a big IDE-user, but one IDE I actually have gotten used to, mostly because we also use it at work, is Microsoft VSCode. We also use AI-agents etc at work integrated it, so I know where to start and what to compare with. There’s various options for integrating VSCode with Ollama. It seems the most versatile and flexible one is continue, so I decided to try that one.

Plugins in VSCode is easy to install, but once it’s installed, I needed it to talk to my Ollama. And a big fat warning here: There’s no authentication layer in Ollama, so I had to open my Ollama POD up for traffic from my internal network. I’m not extremely worried, but I’m likely gonna try approve it.

Prerequisites: Exposing Ollama

This is pretty easy: I just need a loadbalancer in front of Ollama and letting network from internal network through, both in calico and in my unifi gateway.

The load balancer:

apiVersion: v1
kind: Service
metadata:
name: ollama-api
namespace: ai
annotations:
projectcalico.org/ipv6pools: '["loadbalancer-ipv6-pool-internal"]'
projectcalico.org/ipv4pools: '["loadbalancer-ipv4-pool-internal"]'
external-dns.alpha.kubernetes.io/hostname: ollama.engen.priv.no
external-dns/internal: "true"
external-dns.alpha.kubernetes.io/ttl: "300"
ipchanger.alpha.kubernetes.io/patch: "true"
unifi.engen.priv.no/firewall-group: internalweb

spec:
externalTrafficPolicy: Local
type: LoadBalancer
ipFamilyPolicy: PreferDualStack
ipFamilies:
- IPv6
- IPv4
ports:
- name: api
port: 11434
selector:
app.kubernetes.io/name: ollama

This also adds it to the internalweb group, so that it gets the correct opening, as well as adding it to my internal (Unifi) DNS under the name ollama.engen.priv.no. Once you have this infrastructure components/operators in place in Kubernetes, adding new services is a piece of cake.

For calico policy, I need:

apiVersion: projectcalico.org/v3
kind: NetworkPolicy
metadata:
name: allow-traffic-to-ollama-api
namespace: ai # Change this to your namespace
spec:
types:
- Ingress
ingress:
- action: Allow
protocol: TCP
source:
nets:
- 192.168.0.0/16
destination:
ports:
- 11434
- action: Allow
protocol: TCP
source:
nets:
- <my ipv6 range>::/56
destination:
ports:
- 11434
selector: app.kubernetes.io/name == "ollama"

With that done, I’ll be able to access Ollama on http://ollama.engen.priv.no:11434/ from the internal network.

Installing and configuring Continue

Installing Continue, you do as any plugin in VSCode, through the UI, so I’ll not cover that one.

For configuring, you’ll need to create/modify the file ~/.continue/config.yaml. This sets up the models and other configuration, like prompts and context providers. I’ll go through mine step by step. Note: This doesn’t work perfectly yet, in fact some of this still acts up weird, but at least it tries to do semi-intelligent things to my code. The autocomplete is also decent.

name: Local Config
version: 1.0.0
schema: v1
prompts:
- name: repo-safe-agent
description: "Safe, repo-grounded agent behavior"
prompt: |
You are an AI coding assistant working ONLY with the files and context provided from the current VS Code workspace.

Rules:
- When referring to an existing file, first read and, if asked, show its REAL current contents from the workspace.
- If you do not see a file or its content in the provided context, say you cannot access it and ask the user to attach or open it.
- Prefer using context providers such as code, folder, and codebase to automatically select relevant files.
- When unsure, ask clarifying questions instead of guessing.
- Clearly distinguish between EXISTING files and NEW files you propose to add.

Always follow these rules.

This is an attempt at getting it do to consistent things. Some of it is an attempt to fix me using unsuitable models in agent mode, but it stays for now.

Next blocks is the models and what they should be used for. This is also very much a work in progress, but the most consistent results I have gotten so far is from this set. Note that llama3.1:8b came recommended highly for tool use, even though it’s a relatively small model. This is the model I use in agent mode now.

I’ve left in a few models for each use case. These models will also have to be installed in ollama before they can be used, of course.

models:
- name: deepseek-coder:33b
provider: ollama
model: deepseek-coder:33b
apiBase: http://ollama.engen.priv.no:11434
roles:
- chat
- edit

- name: Qwen2.5Coder2.5b
provider: ollama
model: qwen2.5-coder:2.5b
apiBase: http://ollama.engen.priv.no:11434
roles:
- autocomplete

- name: qwen2.5-coder:32b
provider: ollama
model: qwen2.5-coder:32b
apiBase: http://ollama.engen.priv.no:11434
roles:
- chat
- edit

- name: llama31-8b
provider: ollama
model: llama3.1:8b
apiBase: http://ollama.engen.priv.no:11434
roles:
- chat
- apply
capabilities:
- tool_use


- name: qwen2.5-coder:7b
provider: ollama
model: qwen2.5-coder:7b
apiBase: http://ollama.engen.priv.no:11434
roles:
- autocomplete

Last, the context-providers. FOr now, I have these:

context:
- provider: code
- provider: diff
- provider: terminal
- provider: folder
- provider: repo-map
- provider: open
- provider: tree

The description of these are in Continue Reference guide

Getting it all to work – or at least trying.

So, what’s some good tips to get good results? This is still a learning project for me, but I do have some good advice.

Write and reference documentation!

If you have some documentation you can reference, outlining conventions you use, examples, etc, your AI will thank you for it. Or maybe it won’t, but at least there’s a larger chance it will produce better code.

Create some good rules

If you add files in .continue/rules/ with instructions for continue, you increase the chance of better results. I have tried a few exampes. The results are somewhat mixed, there’s no magic wand that makes the AI do exactly what you want unless you give some thoughts into what your AI should know and what it should do.

Here’s a few examples I am playing with now

rules % cat kubernetes.md 
name: Kubernetes YAML edits
globs: ["**/*.yaml", "**/*.yml"]
alwaysApply: true
---

When editing Kubernetes manifests:

- Output **only** the final YAML, with no natural-language explanation.
- Do not wrap YAML in Markdown fences.
- Do not repeat the file twice.
- Do not include greetings, notes, or signatures.

This was written to counter specific errors I had, although it’s likely that switching model has had greater effect on the result.

You can even tell it which documents it should reference, in a rule:

---
name: Repo docs
alwaysApply: true
---

When answering questions about this repo, prefer using these docs:

- docs/argocd_repo_layout.md
- docs/kubernetes_architecture.md

Reference them explicitly in your reasoning and keep answers consistent with them.

Use correct models

The different models have different strength, and larger doesn’t always mean better. My current agent model is llaam3.1:8b, while my chat/edit models are qwen2.5-coder:32b and deepseek-coder:33b. The two latter didn’t give nearly as good agent results as the first one, even though the first is much smaller.

Conclusion

It’s definitely possible to have a self-hosted coding assistant, but it’s not a fire-and-forget-thing, it needs some configuration which probably needs to be updated over time.

Documentation in the repo helps a lot, so running a code agent might also help on doing QA on the documentation… Think of the AI as a junior coder, prone to misinterpret almost everything you tell it. The cleared you state it, and the less room there are for misunderstandings, the greater is the chance for good results.

I might revisit and either create a new article or update this one when I feel I have some groundbreaking results, but for now it was a nice experiment that I’ll likely continue with.

, ,

Legg igjen en kommentar

Din e-postadresse vil ikke bli publisert. Obligatoriske felt er merket med *

Dette nettstedet bruker Akismet for å redusere spam. Finn ut mer om hvordan kommentardataene dine behandles.