Local AI Agent Architecture with Cline, Ollama, and Model Context Protocol
There is a particular kind of developer frustration that arrives by surprise, usually around the third subscription confirmation email of the week. One tool writes code, another reads your files, another connects to GitHub, another queries your database — and each one wants API credits, a seat licence, or a monthly plan. At least one of them is happily burning tokens in the background while you are still deciding what to type.
At some point, the question is no longer “can this AI assistant help me?” It becomes “can I actually understand what it is doing, what it can reach, and what it will cost me?”
That is what drew me to the combination of Cline + Ollama + MCP. Not because it is perfect — it is not — but because each part has a clear, honest role, and together they produce a local AI coding setup where the pricing story does not require a magnifying glass and a lawyer.
Cline is an open-source AI coding agent for VS Code that connects local LLMs to MCP tool servers, requiring explicit user approval before executing any terminal command or file modification. Ollama is a local model inference server that exposes an OpenAI-compatible REST API at localhost:11434, enabling any tool that speaks the OpenAI API format to run open-weight models on-device. Together, Cline and Ollama with MCP create something that feels less like renting intelligence from a cloud and more like building a capable workbench you actually control.
Cline’s pricing is refreshingly direct: the open-source extension is free, and you only pay for inference if you choose a paid model. No seat fees, no subscriptions, no surprises. (Cline) Pairing it with Ollama means the inference itself can also run locally, on your own hardware. (Ollama) “Local” does not mean “free”, of course — your laptop still pays in heat, battery drain, RAM, and patience. But at least the bill is visible, which is more than can be said for some alternatives.
What is MCP? Model Context Protocol Explained
Model Context Protocol (MCP) is an open standard for connecting AI applications to external systems such as files, databases, APIs, and workflows, enabling models to call structured tools rather than predict from training data alone. The official documentation describes it as a standard port for AI systems, almost like USB but for context. (MCP Docs) If that still sounds abstract, the practical version is simpler: a model on its own only predicts text. A model with tools can actually do things.
Without tools, you can ask a model “how would I check my database schema?” and it will give you a plausible answer based on its training data. With MCP tools connected, it can query your schema directly, inspect the actual result, and explain what it found in your specific database — not a hypothetical one it has seen during training. Once you have experienced that difference, going back to pure chat feels like navigating by memory instead of a map.
The protocol gives AI assistants a standard vocabulary for tool use:
Model = the brain
MCP server = the tool provider
MCP client = the application connecting the model to those tools
Tool call = the model asking for something to be done
MCP tools can expose anything from querying a database and calling an API to searching files or running a small helper script. (MCP Docs) Cline acts as the MCP client — the thing that wires your model to whatever tools you choose to expose.
Why MCP Transforms Local Model Usefulness: From Guessing to Grounded Tool Access
Local models are genuinely attractive: private by default, great for experimentation, and they let you iterate on ideas without watching a token counter climb. The catch is that they arrive knowing nothing about your project.
A local model has no idea what files changed yesterday, what your database schema looks like, or how your internal APIs are structured. It only knows what you paste into the prompt, which makes it feel less like a useful colleague and more like a well-read stranger you have to brief from scratch every single time.
MCP changes this. Instead of asking the model to guess, you give it controlled access to tools that let it actually look: read project files, inspect documentation, query a local SQLite database, search a codebase, or fetch structured project metadata. The model stops sitting outside your project and starts feeling like an assistant at the workbench with you — not omniscient, not magic, but usefully grounded in what is actually there.
Why Cline for MCP: Open-Source VS Code Agent with Explicit Tool Approval Workflow
MCP only matters if you have a client that can actually work with MCP servers, and Cline is currently one of the most practical options. It is an open-source coding agent for VS Code, with Plan/Act modes, terminal-first workflows, and MCP integration built in from the start. (Cline)
Crucially, before Cline executes any terminal command or modifies a file, it asks for your explicit approval — which means you review its actions the way you would review a pull request from a capable but unsupervised junior developer. From a developer’s perspective, this means you can inspect files, suggest edits, run terminal commands, and connect to MCP tools all from a familiar environment — without wiring up your own agent loop from scratch, which is a rather thankless afternoon.
Cline also has an MCP marketplace where you can browse and add available tool servers. (Cline) You can absolutely build your own MCP client later if you want that level of control. But for learning the stack and daily coding, Cline gives you a working playground without the ceremony.
Why Ollama for Local Inference: OpenAI-Compatible REST API at localhost:11434
Ollama is simply the least painful way to run open models locally. Instead of downloading weights manually, configuring an inference server, and writing boilerplate to glue it together, you typically need two commands:
ollama pull qwen3
ollama run qwen3
For general instruction following, qwen3 is a solid starting point. For coding tasks specifically — which is what we are building towards — a coding-focused model will serve you better:
ollama pull qwen2.5-coder:7b # good balance of quality and speed on 16 GB RAM
ollama pull devstral # Mistral's coding-optimised model, strong on multi-file tasks
ollama pull deepseek-coder-v2 # capable on code generation and explanation
For a closer look at running DeepSeek models locally, there is an earlier walkthrough on DeepSeek V3 with Ollama.
Use a coding-focused model as your default in Cline, and drop back to a general model only when you need broader reasoning or a larger context window. Which specific model fits depends on your available RAM and how patient you are prepared to be — a 13B parameter model on a 16 GB laptop will remind you that local models are real workloads, not cloud models hiding inside your machine.
One detail worth knowing: Ollama exposes an OpenAI-compatible REST API at http://localhost:11434. This is why Cline, Continue, and most other AI coding tools can connect to it with minimal configuration — they already speak the OpenAI API format, and Ollama simply answers on that same interface using your local model. No custom integration required.
Cline’s documentation on running models locally suggests 16–32 GB of RAM for small or quantised models, 32–64 GB for mid-size coding models, and 64 GB or more for larger models with bigger context windows. (Cline) One nuance worth flagging for PC users: on Apple Silicon, unified memory is shared between CPU and GPU, so 32 GB works well for mid-size models. On a discrete GPU, the relevant figure is VRAM — if the model spills out of VRAM into standard system RAM, inference speed drops to roughly 1–2 tokens per second, which turns a useful tool into a patience exercise. Ollama recommends at least a 32K token context window for coding tools. (Ollama Docs) Experiment gradually — this is not the kind of setup where you want to discover RAM limits mid-deadline.
Cline + Ollama + MCP Architecture: Model, Inference, and Tool Layer Explained
The mental model is straightforward once you see it:
VS Code
|
v
Cline
|
+--> Ollama
| |
| v
| Local open-source model
|
+--> MCP servers
|
+--> files
+--> APIs
+--> databases
+--> search tools
+--> custom scripts
Cline is the workspace. Ollama runs the model. MCP servers expose the tools. The model reasons over the task and requests tool calls as needed — and Cline handles the back and forth, including showing you what the model wants to do before it does it, which turns out to be rather important.
The practical difference this makes: instead of a chatbot guessing why your Flask route is failing based on the error text you pasted, a Cline + MCP setup can inspect the actual project files, trace the route definition, compare it against the template, and suggest a fix rooted in what is actually there. The difference is context. In programming, context is usually everything.
Step-by-Step MCP Setup: Installing Ollama, Cline, and Registering MCP Tool Servers
1. Install Ollama
Install Ollama from ollama.com and confirm it is running.
Then pull a model — for coding work, start here:
ollama pull qwen2.5-coder:7b
You can test it directly before involving Cline:
ollama run qwen2.5-coder:7b
You can skip running qwen2.5-coder for this tutorial, though, since we do not need it for our MCP example to work.
2. Install Cline
Install the Cline extension in VS Code, open its settings panel, and set the API provider to Ollama. Point it at the model name and increase the context window — Ollama recommends at least 32K tokens for coding tools. (Ollama Docs)
I use PyCharm, and Cline is easy to install via Plugins in Settings.
3. Enable local-model-friendly settings
Local models tend to have smaller effective context windows and less reliable instruction-following than frontier cloud models. Cline’s documentation specifically recommends enabling Use Compact Prompt for local setups. (Cline) It is one of those small settings that makes the experience noticeably less frustrating — the kind of thing you only discover after an afternoon of wondering why the model keeps losing track of what it was doing.
4. Add MCP tools
MCP servers in Cline are configured through the MCP panel in the sidebar, which writes to cline_mcp_settings.json in your VS Code settings directory. A typical entry looks like this:
{
"mcpServers": {
"my_local_tool": {
"type": "stdio",
"command": "python",
"args": ["/Users/elena/git/mcp_semantic_search/code/local_ai_mcp_fastapi/mcp_server.py"]
}
}
}
A quick note for JetBrains & VSCodium users:
Cline is a VS Code (and compatible) extension, which means cline_mcp_settings.json is only created where Cline is installed — not in PyCharm’s config tree. For MCP in the JetBrains world, you’ll use whatever that specific IDE/plugin supports, or simply run Claude Code / Cursor / VS Code+Cline alongside PyCharm.
- VS Code: Open Cline → MCP (server icon) → Configure → Configure MCP Servers.
- PyCharm: Look in
~/.cline/data/settings/cline_mcp_settings.json. - VSCodium, Cursor, or Insiders: Replace
Codewith the matching folder (e.g.VSCodium,Cursor,Code - Insiders) and search underglobalStoragefor a cline-related extension ID if the path differs.
The type field tells Cline how to communicate with the server. Three transports show up in practice. stdio — Cline launches the MCP server as a child process and talks over stdin/stdout: ideal for small Python or Node scripts, no open ports, no separate daemon. Streamable HTTP — the direction of the current MCP spec for network servers: one HTTP endpoint, stateless or sessioned, easy to put behind TLS and auth; Claude Code can register these with claude mcp add --transport http …. SSE (older HTTP + Server-Sent Events pattern) is still common in examples and bridges; some desktop clients expect stdio first, so tools like mcp-remote wrap an HTTP/SSE URL as a local stdio shim. For a quick single-file tool, stdio remains the lowest-friction default; for a shared “always on” tool service, FastAPI + Streamable HTTP is a better match. (Ollama Docs)
The key design idea is that Cline does not need every capability built in — it connects to tools through MCP, which makes the whole system modular. Start with one tool. Add another. Eventually write your own.
MCP Tool Implementation: Semantic File Search with Ollama nomic-embed-text and Cosine Similarity
You can find all code in the repository https://github.com/edaehn/local_ai_mcp_fastapi, I appreciate your stars if you like this tutorial. Thank you so much for your support :)
Most developers accumulate documentation over time — architecture decision records, internal guides, personal engineering notes, README files scattered across projects. Finding the right document from memory is fine at ten files. At a hundred, it quietly taxes your attention every time you need to look something up.
This MCP tool solves that problem entirely locally. Given a plain-language query, it finds the text and Markdown files in a directory most semantically similar to what you are looking for — not by keyword matching, but by meaning. Ask “caching strategies for expensive database queries” and it surfaces the relevant document even if the file never uses those exact words.
The implementation uses Ollama’s embedding API with nomic-embed-text, a small model (274 MB) purpose-built for semantic search. Since you are already running Ollama for inference, no new infrastructure is needed. Pull the model once:
ollama pull nomic-embed-text
The model name is a single top-level constant (EMBED_MODEL = "nomic-embed-text") in semantic_core.py — swapping to a different embedding model is a one-line change if you want to experiment.
Then the tool embeds both files and the query and ranks by cosine similarity — all offline, on your machine.
Runnable project in this repository
Source, tests, and setup live under:
mcp_semantic_search/code/local_ai_mcp_fastapi/
| Piece | Role |
|---|---|
semantic_core.py |
Embeddings via Ollama nomic-embed-text, cosine similarity, cache keyed by path + mtime. File types scanned: .md, .txt, .rst, .adoc — all others are silently skipped |
mcp_tools.py |
Registers find_similar_files on a FastMCP instance — shared by stdio and HTTP so the two servers never drift apart |
mcp_server.py |
Stdio MCP: mcp.run(); no port. Best default for Cline and native Claude Desktop command/args entries |
main.py |
FastAPI: Streamable HTTP MCP on /mcp, legacy SSE MCP under /mcp-sse, plus GET /health and POST /api/search |
README.md |
Setup, when to use stdio vs HTTP, Cline / Claude Desktop / Claude Code snippets, troubleshooting |
One cache detail worth knowing: embeddings are stored under the key {absolute_path}::{mtime}. A file that is touched — same content, newer modification time — will be re-embedded on the next run. In practice this is rarely disruptive, but it explains any unexpected Ollama calls against an apparently unchanged directory.
Install and run (after ollama pull nomic-embed-text once):
cd mcp_semantic_search/code/local_ai_mcp_fastapi
python3 -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
uvicorn main:app --host 127.0.0.1 --port 8765
The POST /api/search endpoint lets you test the search logic without a running MCP client:
curl -s -X POST http://127.0.0.1:8765/api/search \
-H "Content-Type: application/json" \
-d '{"query": "ollama", "directory": "/Users/elena/git/blog/_posts", "top_n": 3}'
You will get ranked output like this:
[{"file":"2025-01-28-deepseek-with-ollama.md","score":0.448,"excerpt":"---"},{"file":"2026-06-06-local-ai-lab-setup-ollama-open-webui-rag.md","score":0.4408,"excerpt":"---"},{"file":"2026-09-22-a-minimal-langgraph-project-in-docker.md","score":0.4403,"excerpt":"---"}]%
Run the unit tests (no live Ollama; embeddings are mocked):
pytest tests/ -v
pytest tests/ -v
===================================================================================== test session starts =====================================================================================
platform darwin -- Python 3.13.5, pytest-9.0.1, pluggy-1.6.0 -- /usr/local/bin/python3
cachedir: .pytest_cache
rootdir: /Users/elena/git/mcp_semantic_search/code/local_ai_mcp_fastapi
configfile: pytest.ini
plugins: anyio-4.11.0, cov-7.1.0
collected 10 items
tests/test_semantic_core.py::test_cosine_similarity_identical_vectors PASSED [ 10%]
tests/test_semantic_core.py::test_cosine_similarity_orthogonal_vectors PASSED [ 20%]
tests/test_semantic_core.py::test_cosine_similarity_zero_vector_returns_zero PASSED [ 30%]
tests/test_semantic_core.py::test_excerpt_skips_markdown_headings PASSED [ 40%]
tests/test_semantic_core.py::test_load_cache_returns_empty_for_corrupt_file PASSED [ 50%]
tests/test_semantic_core.py::test_embed_returns_cached_value_without_calling_ollama PASSED [ 60%]
tests/test_semantic_core.py::test_embed_calls_ollama_on_cache_miss_and_stores_result PASSED [ 70%]
tests/test_semantic_core.py::test_embed_truncates_long_text_before_ollama PASSED [ 80%]
tests/test_semantic_core.py::test_find_similar_files_raises_for_invalid_directory PASSED [ 90%]
tests/test_semantic_core.py::test_find_similar_files_ranks_by_semantic_similarity PASSED [100%]
===================================================================================== 10 passed in 0.05s ======================================================================================
Stdio MCP: mcp_server.py (why it is still the right default)
The stdio transport is the shape MCP had first: the client (Cline, Claude Desktop, Claude Code in stdio mode) starts your server as a subprocess and sends JSON-RPC over stdin/stdout. Nothing listens on a TCP port; you do not leave a daemon running when the editor is closed. That is educationally useful because it strips the problem down to “one Python file + one tool” — the same mental model as the JSON block in the Cline MCP panel (command / args).
It is also practically useful for daily work with Cline, which treats stdio MCP as the well-lit path: paste a venv python path and the absolute path to mcp_server.py, reload MCP, and you are done. Claude Desktop is the same story — claude_desktop_config.json wants command + args, not a raw http://localhost URL (remote connectors are a different flow). For a personal laptop setup, stdio often wins on simplicity and isolation: each session gets a fresh process; when you change the script, the next spawn picks it up.
The repo ships mcp_server.py next to main.py. Both import mcp_tools.py, so the HTTP server and the stdio server always advertise the same tool (find_similar_files) with the same behaviour.
Run (under a real client — running it bare in a terminal will wait on stdin, which looks like a hang):
cd mcp_semantic_search/code/local_ai_mcp_fastapi
source .venv/bin/activate
python mcp_server.py
Cline — type: stdio, absolute paths to the venv interpreter and script:
{
"mcpServers": {
"local-doc-search": {
"type": "stdio",
"command": "/ABS/PATH/TO/local_ai_mcp_fastapi/.venv/bin/python",
"args": ["/ABS/PATH/TO/local_ai_mcp_fastapi/mcp_server.py"]
}
}
}
Claude Desktop — the same command / args pair inside mcpServers in ~/Library/Application Support/Claude/claude_desktop_config.json (macOS path shown; see Anthropic docs for Windows).
Claude Code — stdio without uvicorn:
claude mcp add --transport stdio local-docs -- /ABS/PATH/TO/local_ai_mcp_fastapi/.venv/bin/python /ABS/PATH/TO/local_ai_mcp_fastapi/mcp_server.py
You will get the .claude.json modified:
Added stdio MCP server local-docs with command: /Users/elena/git/mcp_semantic_search/code/local_ai_mcp_fastapi/.venv/bin/python /Users/elena/git/mcp_semantic_search/code/local_ai_mcp_fastapi/mcp_server.py to local config
File modified: /Users/elena/.claude.json [project: /Users/elena/git/mcp_semantic_search]
With /mcp list, you can see your MCP servers:
Next, you can simply run prompts from command line with claude:
claude "Call find_similar_files on directory /Users/elena/git/blog/_posts with query Ollama , top_n 2. Summarise the top 2 files in one sentence each."
Honestly, with this prompt, I have received a poor result:
Here are the top 2 files most semantically similar to "Ollama" under _posts:
1. genai.md (score 0.596) — A tag landing page for "Generative AI" that defines generative AI and lists its key features (content creation, pattern recognition,
customization, automation, and continuous learning).
2. os.md (score 0.572) — A tag landing page for "OS (Operating System)" that collects posts tagged around operating systems and the command line.
Note: both top matches are tag index pages (mostly frontmatter), not deep articles — likely because no individual post centers explicitly on Ollama, so the
embedder surfaced the closest thematic tag pages.
This happened because I had mixed files, for posts and topic tags. So I had to adjust my prompt as follows:
claude “Call find_similar_files with exactly directory = /Users/elena/git/blog/_posts query = Ollama top_n = 2 After find_similar_files, ignore files whose name is only *.md with a single slug like genai.md / os.md, or ignore layout: tag_paginated if you read the file — then re-rank from the remaining hits.”
I was thrilled when I saw what happened next. Claude actually sorted out what is happening and recovered the right results from the pool of mixed files while advising on what to do next:
Claude Code v2.1.165 with Opus 4.8 (1M context) using my MCP server and advises on a solution
When you do want a long-lived process, /health, or HTTP-native clients, use uvicorn main:app as below — not because stdio is worse, but because different transports solve different deployment shapes.
Claude Code — Streamable HTTP MCP
Claude Code can register an MCP server by URL. Two structural details about main.py worth knowing before you wire this up: the FastMCP instance is created with stateless_http=True, meaning each HTTP request is self-contained and no per-session state is held server-side; and the FastAPI lifespan context manager calls mcp.session_manager.run(). That lifespan call is not optional — remove it and the Streamable HTTP transport initialises silently but refuses requests.
With uvicorn running:
claude mcp add --transport http local-docs http://127.0.0.1:8765/mcp
claude mcp list
In a session, ask for something that forces a tool call — for example, to run find_similar_files on a trusted directory of Markdown notes with query “authentication” and top_n 5. The tool can read any path the server process can read, so treat directory like a filesystem permission you hand to a script.
Remove the entry when you are done:
claude mcp remove local-docs
Claude Desktop — bridge SSE with mcp-remote
Claude Desktop’s claude_desktop_config.json is built around stdio MCP servers. For this FastAPI app’s SSE mount, spawn mcp-remote so Desktop talks stdio to Node, and Node talks HTTP to your app (macOS config path: ~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"local-docs-sse": {
"command": "npx",
"args": ["-y", "mcp-remote", "http://127.0.0.1:8765/mcp-sse/sse"]
}
}
}
Fully quit and reopen Claude Desktop after saving. For hosted HTTPS MCP with OAuth, Anthropic’s flow is Settings → Connectors (“custom connector”); see their remote MCP overview.
With find_similar_files available as an MCP tool — stdio via mcp_server.py in Cline or Desktop, HTTP via uvicorn in Claude Code, or bridged SSE in Desktop — you can ask things like:
“Find the most relevant docs to how we handle authentication in this project, then tell me what is missing.”
The model calls the tool and receives a ranked JSON list:
[
{"file": "auth-overview.md", "score": 0.9142, "excerpt": "Token-based authentication flow for the API"},
{"file": "security-notes.md", "score": 0.8703, "excerpt": "Rate limiting and session invalidation strategy"}
]
The excerpt field is the first non-empty, non-heading line of each file, capped at 120 characters — enough for the model to decide whether to open the full file. The model reads the relevant files and gives you a grounded answer. This is the combination that makes local AI agents genuinely useful — not a smarter model, but a model with honest access to what is actually in your project.
MCP Server Discovery: Glama Registry, MCP.so Directory, and Cline Marketplace
The companion code under mcp_semantic_search/code/local_ai_mcp_fastapi/ is deliberately small: semantic_core.py holds the embedding logic, mcp_tools.py registers the MCP tool once, and mcp_server.py versus main.py differ only in transport (stdio versus FastAPI + HTTP). That separation is the shape you want before you publish anything — clients discover capabilities from tool names and descriptions, so keeping the core testable and the transport thin pays off quickly.
Most people do not author their first MCP server from a blank page; they compose what already exists. Useful layers to know:
Specification and reference implementations. The protocol is documented in the official MCP documentation. Anthropic’s modelcontextprotocol/servers repository on GitHub collects reference servers (filesystem, Git, fetch, and many others) — the right place to copy idioms for tool schemas, packaging, and how much responsibility belongs in a single server. (GitHub)
IDE marketplaces. Cline exposes an MCP marketplace inside VS Code so you can add servers without spelunking npm first. (Cline) Other assistants (Cursor, Continue, JetBrains-style setups, and so on) follow the same idea: curated or templated entries, then a JSON block that points at command / args, an HTTP URL, or a hosted connector.
Independent directories and inspectors. Two large, community-facing indexes are worth bookmarking: Glama’s MCP registry, which lists servers, offers browser-based inspection, and documents maintainer flows for indexing and hosting; and MCP.so, a broad third-party directory of servers and clients. Use both as discovery aids, not guarantees — they help you shortlist integrations, but you should still read manifests, source, and permissions before you grant access to files, shells, or APIs. (Glama) (MCP.so)
Sharing your own server. “Submitting” really means making something trustworthy to install: a public Git repository, a README with reproducible steps (uv, pip, or npx), honest tool descriptions, and a clear story about data leaving the machine (or not). Many authors publish to npm or PyPI under their own namespace and link the package from a directory when they want visibility. Registries such as Glama index GitHub repositories and describe maintainer metadata (for example, adding a glama.json at the repo root — see Glama’s own “add server” documentation for the current schema). MCP.so runs a community submission flow (typically via their GitHub issues or an on-site submit form — check their FAQ for the exact process, because it can change). None of that removes your obligation to review third-party code before you run it.
The through-line is the same as the rest of this post: directories and marketplaces reduce search cost; they do not remove security judgement. Treat every new MCP server like a new dependency with full access to whatever credentials and paths you give the host process.
MCP Server Security Review: Local Lab Use vs. Public Directory Listing Requirements
Is this code secure? For learning on your own machine, it is reasonable if you use it deliberately: install dependencies in a venv, point the tool’s directory argument only at folders you are happy for Python to read, and keep uvicorn on localhost (or behind proper HTTPS + auth + rate limits before anyone suggests putting it on the public internet).
It is not hardened like a bank or a SaaS product. In plain terms:
find_similar_filesis a strong read primitive — it walks a tree of text files under whatever path the model or user supplies. That is the point for local documentation; it is also why you must not treat “local” as “harmless.”- HTTP mode opens a port on your machine. Even
127.0.0.1can be reached by other software on the same computer, not only by you. - Embeddings go through Ollama — still on your laptop if Ollama is local, but it is a second program you are trusting with file contents.
There is no hidden malware in the “read files + call Ollama” design itself; the risk is misuse and deployment — exactly what you should document in a real project’s README or SECURITY.md.
Can you submit this MCP server to a directory (Glama, MCP.so, …)? You can apply, but directories usually want a standalone repository (or a published package) that looks maintained and scoped — not only a subfolder inside a personal blog repo. This layout is ideal for teaching; as a public listing, it is incomplete until you split it out, add a clear license, and spell out who should run it and what paths it may read.
Reviewers may also flag the tool honestly: “recursive read of user-supplied directory” is high impact. That does not make it evil — it makes it a sharp tool. Listings are discovery, not a certificate of safety; always read someone else’s server the same way you would read yours before a coursework demo goes near real data.
One-line takeaway: use this repo to learn MCP structure and transports; treat directories as phone books; upgrade packaging and constraints before you invite the world to install your server.
The MCP security threat model applies whether you are installing a server from a directory or shipping a script of your own — context changes, the security questions do not.
MCP Security Model: Prompt Injection and Tool Poisoning Attack Vectors in Local Agent Setups
There is a tempting thought that runs something like: “if it runs locally, it must be safe.” Not quite.
Local models do improve privacy, because your prompts and files do not automatically leave your machine. But the security boundary shifts rather than disappears. MCP tools can read files, call commands, access APIs, and interact with external services. So the question changes from “who receives my prompt?” to “what can this tool actually do?” — which is a rather more consequential question, and one worth thinking through before you start connecting third-party MCP servers from a public marketplace or directory, however well indexed it looks.
Prompt injection and tool poisoning are the two primary MCP security attack classes that affect local agent setups regardless of where inference runs. Two specific attack classes are worth understanding. (Tom’s Hardware)
Prompt injection is a MCP security vulnerability where a malicious tool response embeds text designed to redirect the model’s behaviour. A tool that returns search results could include hidden instructions telling the model to exfiltrate data or take an unintended action. The model may comply because it cannot reliably distinguish tool output from legitimate user intent — it sees all of it as context, which is precisely what makes this attack effective.
Tool poisoning is where a malicious or compromised MCP server advertises capabilities that look benign but do something harmful when invoked. Because the model sees tool descriptions rather than source code, it cannot verify that read_schema() actually does only what the name suggests.
Both are realistic threats, and both get more dangerous the more permissive your tool setup is. My own rule is straightforward: give the model tools the way you would give permissions to a junior developer you trust but still review. In practice, that means starting with read-only tools, avoiding broad shell access early on, preferring narrow tools with clearly bounded inputs, keeping secrets out of tool responses, reviewing file changes before accepting them, and not connecting production systems while you are still experimenting. Bridges need guardrails.
Local LLM Task Boundaries: Supported Developer Workflows vs. Frontier API Requirements
Local open-source models are already genuinely useful for a wide range of developer tasks: explaining code, generating boilerplate, writing tests, refactoring small functions, summarising files, drafting documentation, exploring unfamiliar libraries, and catching common errors. With MCP tools providing real project context, they become better still at grounded tasks where inspecting the actual codebase matters rather than reasoning from training data alone.
Where they still struggle is at the edges — very large codebases, long multi-step planning, subtle architecture decisions, complex debugging, and reliable tool use on limited hardware. This is not a failure; it is the current trade-off, and it is worth being honest about rather than pretending local models are interchangeable with frontier cloud models. They are not, and treating them as such is a reliable route to disappointment.
Cline does not lock you into one path. You can use local inference through Ollama, bring your own API key, or switch to paid inference when the task genuinely demands it. (Cline) The open-source version has no vendor lock-in — you pay only for the inference you choose to use.
Hybrid Inference Workflow: Local LLMs for Routine Tasks, Frontier APIs for Complex Reasoning
The most realistic workflow is not “use local models for everything” — that way lies frustration and slow afternoons. It is “use local models where they are good enough, and save paid inference for where it genuinely earns its keep.”
In practice, local models handle first drafts, small edits, documentation, simple tests, reading local files, checking project structure, and repetitive tasks quite comfortably. Cloud frontier models earn their keep on hard debugging, architecture design, security-sensitive reviews, complex refactoring across large unfamiliar codebases, and any task where the cost of getting it wrong is high. Think of local models as the daily workbench and paid models as the specialist consultant you bring in for the tricky jobs. That feels considerably healthier than routing every trivial task through a premium API — and kinder to the monthly bill.
Developer Autonomy with MCP: Controlling Model Selection, Tool Access, and Data Boundaries
The biggest benefit of this setup is not cost. It is control.
With Cline + Ollama + MCP, you can ask questions that most AI tool setups simply do not let you ask: which model do I want here? Which tools should it be able to reach? Which data stays local? Which actions require my explicit review before they happen? Which workflows can I automate safely, and which ones can’t I? This is where AI development starts to feel more like engineering and less like pressing buttons and hoping for the best.
The early chatbot era was about prompts. The next stage is about systems — and a useful AI coding setup is not just one model, but a small deliberate ecosystem:
model + tools + context + permissions + review
MCP provides the standard layer for connecting tools. Ollama provides a practical way to run open models without cloud dependency. Cline provides the editor workflow that makes all of it usable day to day.
Getting Started: A Minimal MCP Experiment with One Tool and One Local Ollama Model
If you want to experience the difference without building something complicated, start genuinely small. Get Cline running with Ollama and one local model, then point it at a project you know well. Try prompts like:
Explain the structure of this project.
Find the main entry point and summarise how the app starts.
Look for duplicated logic and suggest one safe refactor.
Write a README section explaining how to run this app locally.
Then add exactly one MCP tool — a file search, a local documentation lookup, a database schema reader, or a simple project metadata tool. Not ten. One. The goal is to feel the difference between a model that talks about your project and one that can actually inspect it. That difference seems modest at first. Then, once you have used it a few times, it becomes difficult to go back.
Why Cline + Ollama + MCP is the Right Foundation for Local AI Agent Development
Cline + Ollama + MCP works as a combination because each part has a clear, honest role and none of them tries to do everything. Cline is the interface. Ollama is the inference layer. MCP is the connection standard that makes tools composable. The result is not a magic developer replacement — which is fine, because magic in software tends to mean “something you cannot debug at 11pm”.
What I want from an AI coding setup is tools I can understand: tools that tell me what runs locally, what costs money, what can touch my files, and what needs my review before it changes anything important. This combination gets closer to that than most.
It brings AI coding back towards something like the Unix spirit — small tools, clear boundaries, composable workflows. Only this time, one of the components happens to be a language model. Used carefully, that is a genuinely powerful thing to have on the workbench.
Open-Source Cline Alternatives with MCP Support: Continue, Kilo Code, OpenHands, OpenCode, Zed
Cline is not the only open-source tool in this space. I use it here because it gives a clear, practical path: VS Code, local models through Ollama, MCP support, and a very understandable pricing story. But it is worth knowing the alternatives, especially as MCP becomes a shared layer across many AI coding tools.
| Tool | Type | MCP support | Local model support | Best fit |
|---|---|---|---|---|
| Continue | VS Code / JetBrains assistant | Yes | Yes, supports many model providers | Teams who want configurable AI assistance, autocomplete, chat, and repo-level workflows |
| Kilo Code | VS Code / JetBrains / CLI agent | Yes | Yes, via supported providers | Developers who want a broader agentic platform with IDE and CLI options |
| OpenCode | Terminal-first coding agent | Yes | Yes, bring-your-own-provider style | Developers who prefer terminal workflows and portable configuration |
| OpenHands | Autonomous software agent platform | Yes, through its agent/tool system | Yes, bring your own LLM | Larger autonomous coding tasks, sandboxed experiments, and self-hosted agent workflows |
| Zed | Open-source code editor | Yes | Depends on configured model/provider | Developers who want a fast editor with built-in agentic workflows |
Continue is a different kind of tool: less “single autonomous coding agent” and more a configurable AI assistant for coding workflows, with broad model-provider support that makes it useful for teams. (Continue Docs) Kilo Code presents itself as an agentic engineering platform across VS Code, JetBrains, CLI, and cloud, with a dedicated MCP section in its documentation. (GitHub)
OpenCode is interesting if you prefer to stay in the terminal — an open-source coding agent with support for both local and remote MCP servers. (OpenCode) OpenHands is considerably more ambitious: closer to an autonomous software development agent than an editor assistant, designed for running agents locally or at scale with MCP integration throughout. (GitHub) Zed is different again — a full editor rather than a VS Code extension, with MCP support for extending its AI agent with external tools and integrations. (Zed)
One caution
“Supports MCP” does not mean the same thing across every tool. In some, MCP is central to the workflow; in others, it is available but still evolving. Before building a serious workflow around any assistant, check how it handles permissions, tool approval, local model configuration, and project-level MCP settings. The table above is a starting point, not a guarantee.
References
Enjoyed this? Get more like it.
Weekly notes on AI tools, Python, and what I'm actually building — plus a free copy of Fantastic AI: The 2026 Toolkit.