Local AI vs cloud AI for regulated professionals: a UK decision framework

At a glance
- The question is not which is better. Local AI and cloud AI optimise for different things. The right question is which is the right answer for a specific buyer's data sensitivity, workload, and operational tolerance.
- Five decision axes: data sovereignty, capability, latency, 36-month cost, operational burden. Each is a real trade-off, not a marketing point.
- Hybrid is a real third option. A local model handles the majority of workloads with a defined and pre-agreed escalation path to a cloud LLM for tasks beyond local capability. The boundary is the contract, not a configuration setting.
- Local AI suits UK regulated professionals where the default answer to "has client data left the building" must be no. Cloud AI suits the broader UK SME and enterprise market where data sensitivity allows a standard DPA-based deployment.
- Most buyers do not need to choose forever. The decision is reversible at quarterly review. The right answer in 2026 may not be the right answer in 2028.
What this comparison is, and what it is not
This article is a decision framework for UK regulated professionals evaluating whether to deploy AI locally on their own hardware or through a hosted cloud LLM. It is not a benchmark write-up, a model-versus-model comparison, or a list of features. It is the structured form of the conversation we run during scoping calls, written down so buyers can work through it before a paid engagement begins.
The audience is principals and partners in solicitor practices, IFAs, accountants, family offices, and private clinical practice. The framing assumes that the buyer is commercially literate, time-constrained, and outcome-focused. The buyer has already decided that an AI assistant of some kind will be useful; the question is whether the deployment topology should be on-premises or in the cloud.
Where useful, we distinguish between the two routes The AI Consultancy delivers: Claude Implementation for cloud deployments under standard Data Processing Agreements, and Private AI Concierge for on-premises deployments with optional cloud fallback in hybrid mode.
Definitions
Cloud AI, in this article, means a hosted large language model accessed through a third-party API or web application. Examples relevant to UK practice include Claude.ai (Anthropic), ChatGPT (OpenAI), Microsoft 365 Copilot (which uses OpenAI models behind a Microsoft tenancy), and the Claude API delivered through AWS Bedrock or Google Cloud Vertex AI. In each case the buyer's prompt and any attached data is transmitted to the provider's infrastructure for inference.
Local AI means an open-weight model running on inference hardware the buyer owns and operates, typically on the buyer's premises. The model itself is downloaded once and runs offline. Channels (email, messaging) and integrations (case management, accounting software) connect to the local agent rather than to a hosted service.
Hybrid is a configuration of local AI in which the local model handles routine workloads and a pre-agreed set of harder tasks is routed to a cloud LLM. The routing rules and the data classes that may be sent to cloud are documented in writing as part of the engagement.
The five decision axes
Axis 1: Data sovereignty
The starting question. Where does the data go, who is the processor, and what does the regulator or the client need you to be able to say about it.
For most UK businesses, a cloud LLM under a standard DPA is a workable answer. The major hosted providers operate UK or EU data residency configurations, hold relevant certifications (SOC 2 Type II, ISO 27001:2022, ISO 42001:2023 in Anthropic's case), and offer enterprise tiers with no-training-on-customer-data terms. The data still leaves the buyer's network, but the legal apparatus around the transfer is mature.
For UK regulated professionals, the analysis is more demanding. Three points recurrently come up:
- Privileged matters. Solicitors handling privileged client data face client-confidence and SRA Code obligations that do not disappear just because the cloud DPA is technically permissible. The conservative reading of the SRA Code in 2026 is that any third-party processor handling privileged data should be considered carefully and disclosed to the client where reasonable.
- Pre-filing IP work. A patent attorney drafting before filing has a hard requirement that no third party can read the draft. Cloud is structurally a poor fit for this even where the DPA is impeccable.
- Regulator-facing audit trails. An IFA running a suitability assessment, a clinician dictating a consultation note, an accountant preparing a director's loan account: in each case the regulator may at some point want a clean answer about who saw the data and where it lived. "On the buyer's hardware, in the buyer's office, on the buyer's network" is a simpler answer than the cloud equivalent, even where both are defensible.
Local AI sits on the right side of this axis by design. There is no third-party processor in the chain; the buyer is the data controller and the only processor. Hybrid mode reintroduces the third-party processor question for a defined and bounded subset of workloads.
Axis 2: Capability
The question buyers ask second, and often the question they should ask first if data sensitivity allows it.
The honest position in mid-2026 is that the gap between hosted frontier models and the best open-weight local models has narrowed considerably but not closed. The places where hosted frontier models still hold a clear advantage:
- Long-context reasoning. Claude Sonnet 4.6 and Opus 4.7 ship a 1 million token context window. Local open-weight models in mid-2026 are typically 32K to 200K tokens depending on the variant. For tasks involving full case bundles, complete contracts, or long transcripts, the cloud advantage is material.
- Complex code generation. Cloud frontier models still produce more reliable code on harder problems. For a buyer whose AI use is principally code-related, this matters.
- The hardest reasoning. Multi-step planning, novel scientific reasoning, and the higher tiers of mathematical and logical work still favour cloud frontier models.
For the workloads that dominate UK professional services practice, the gap is smaller and often not commercially material:
- Drafting client correspondence in the buyer's house style: workable on local models.
- Summarising meetings, attendance notes, and matter updates: workable.
- Structured extraction from documents: workable.
- Classification, triage, and routing: workable.
- Question-and-answer over the buyer's own document set: workable, with the right retrieval setup.
The right way to evaluate this in your specific case is to test the local stack against a held-out set of your own real tasks before committing. We do this in the workflow design sprint of every Private AI Concierge engagement.
Axis 3: Latency
Latency is usually the axis buyers ignore on the way in and notice afterwards. The headline points:
- Local AI is faster on first token. A local model on a Mac mini M4 Pro typically returns the first token in under 200ms. A hosted cloud LLM, particularly a frontier-tier one, often runs 500ms to 2s on first token, sometimes longer at peak load.
- Cloud AI is faster on total throughput for long generations. Hosted frontier models run on infrastructure that is materially more powerful than a Mac mini and finish long outputs faster.
- Network reliability matters more for cloud. A Private AI Concierge deployment continues working through a broadband outage. A cloud-only deployment does not.
The user-experience implication: for short, interactive, conversational use (the dominant pattern in professional services practice), local AI feels noticeably snappier. For long single-pass document generation, cloud is faster. For workflows that mix the two, hybrid mode can route appropriately.
Axis 4: Total cost over 36 months
The most-misunderstood axis. The instinct is to compare a per-seat cloud subscription to a free open-source local stack and conclude that local is dramatically cheaper. The full comparison, properly modelled, is more nuanced.
For a 5-person UK professional-services practice, an illustrative 36-month total cost of ownership comparison might look as follows. The numbers below are illustrative ranges, not quotes.
| Cost element | Cloud LLM (Claude.ai Team or Enterprise) | Local AI (Private AI Concierge, Practice tier) |
|---|---|---|
| One-off setup | GBP 1,500 to GBP 4,000 (configuration, training) | GBP 4,500 to GBP 6,500 |
| Hardware | None (uses existing devices) | GBP 1,799 to GBP 2,499 (Mac mini M4 Pro) |
| Software licence | GBP 25 to GBP 50 per user per month (typical Team and Enterprise pricing bands) | None (open-weight model) |
| Cloud API consumption (hybrid mode only) | Included in subscription | GBP 30 to GBP 150 per principal per month if hybrid is enabled |
| Retainer or managed service | None typically (provider handles updates) | GBP 500 to GBP 900 per month |
| 36-month total, 5 users, illustrative | GBP 6,000 to GBP 13,000 | GBP 24,000 to GBP 41,000 |
On a pure cost-to-cost basis, cloud wins for most small UK practices. The headline reason is that the retainer load on a local deployment is fixed regardless of user count, while the cloud cost scales with users. Cloud becomes more expensive than local at higher user counts and longer time horizons, particularly where the buyer would otherwise need an enterprise tier.
The honest framing: do not buy local AI to save money on small-team deployments. Buy local AI because the data sovereignty argument changes the commercial answer. The cost gap is the price of keeping the data on the buyer's network.
Axis 5: Operational burden
The axis buyers underestimate most. Cloud LLMs are operated by their providers; software updates, security patches, model upgrades, and capacity management are handled upstream. The buyer's operational burden is essentially zero.
Local AI is the opposite. The agent framework, the inference engine, the model itself, and the host operating system all release on independent schedules. CVE monitoring, regression testing, and skill curation take 4 to 8 hours of skilled work per deployment per month, on average.
This burden has two implications for the buyer:
- Self-managed deployments tend to drift. A buyer who installs a local stack themselves over a weekend will, in most cases, find the system out of date inside the first quarter and quietly stop relying on it. The work of keeping it current is not glamorous and is easily deferred.
- The retainer is the structural answer. Local AI as a managed service, with a monthly retainer that takes responsibility for the operational cycle, is the deployment model that survives contact with the buyer's actual schedule. Local AI without a retainer is a project rather than a system.
For more on the operational pattern, see our companion article on the Hermes Agent and Ollama local AI stack.
Hybrid as the third option, not a compromise
Hybrid mode is often introduced as a halfway house. It is more useful to think of it as a third option with its own commercial logic.
The hybrid configuration runs the local model for routine workloads and routes a defined set of harder tasks to a cloud LLM. The routing rules and the data classes permitted to leave the network are documented in a written hybrid policy that forms part of the engagement record.
Hybrid is appropriate where:
- The buyer's workloads include a small share of tasks that need cloud-frontier capability (long-context analysis, complex code, hardest reasoning) but the majority of work is routine.
- The buyer's data sensitivity is high but not absolute. Data classes that must never leave the network are defined and excluded from the routing policy.
- The marginal capability gain from cloud fallback materially changes the commercial value of the assistant.
Hybrid is not appropriate where:
- The buyer's data sensitivity is absolute (privileged solicitor matters, pre-filing IP, identifiable patient data). For this cohort, local-only is the right answer.
- The buyer is uncomfortable with the operational discipline of maintaining a written hybrid policy and reviewing it regularly. Hybrid mode requires governance commitment.
From a UK GDPR perspective, hybrid mode reintroduces the third-country transfer question for the workloads it routes. It requires a DPIA and the standard analysis around the cloud provider as a US processor. We default the cloud-routing endpoint to AWS Bedrock at the UK South or EU Ireland region for residency reasons and document the routing in writing.
The decision framework
Working through the five axes, the practical decision usually falls out as follows. The matrix below maps from buyer type to recommended deployment.
| Buyer profile | Recommended default | Rationale |
|---|---|---|
| UK SME, ordinary business data, 5 to 100 users | Cloud (Claude or ChatGPT under DPA) | Cost and capability advantages dominate. Data sensitivity allows it. |
| UK enterprise, mixed sensitivity, broad rollout | Cloud, with selective on-prem for specific functions | Cloud handles the bulk; specific high-sensitivity teams run local. |
| Solo solicitor, privileged matter focus | Local-only | Privileged data should not leave the building. Hybrid is not appropriate. |
| Patent or trade-mark attorney, pre-filing work | Local-only | Pre-filing confidentiality is absolute. |
| IFA practice, suitability and review work | Local-only or Hybrid | Client suitability data is fiduciary. Hybrid for anonymised modelling, local for client-specific work. |
| Family office | Local-only | Beneficial ownership, structuring, intergenerational data is radioactive. |
| Accountancy practice, HNW clients | Hybrid | Mixed sensitivity. Hybrid policy keeps client identifiers local while allowing cloud for anonymised analysis. |
| Private medical or dental practice | Local-only | Identifiable patient data should remain on-site under Caldicott principles. |
| Boutique advisory or corporate finance under heavy NDA | Local-only | Counterparty confidentiality dominates. Cloud is structurally a poor fit. |
This is a starting matrix, not a substitute for the workflow design sprint. The right answer for a specific practice depends on the actual workloads and the actual data classes involved.
Reversibility
One point worth making explicitly: this decision is reversible at quarterly review. The right answer in 2026 is not necessarily the right answer in 2028.
- If open-weight models close the remaining capability gap to cloud frontier, the case for local strengthens further.
- If UK regulators tighten the third-country transfer position, hybrid becomes harder and local-only becomes the default for a wider cohort.
- If a hosted provider releases a UK-sovereign tenancy with provable data residency and processing locality, the cloud route weakens for some on-prem holdouts.
- If hardware costs continue to fall, the breakeven point at which local TCO undercuts cloud TCO moves toward smaller team sizes.
The deployment topology is not a permanent commitment. The retainer model used in Private AI Concierge engagements explicitly includes a quarterly architecture review to test whether the original decision still holds.
Where to start
If you have read this far and are unsure whether local, cloud, or hybrid is the right starting point for your practice, the next step is a free 30-minute scoping call. We work through the five axes against your specific workloads and recommend the route that fits.
If your data sensitivity already tells you local is the right answer, the relevant page is Private AI Concierge.
If your data sensitivity allows a cloud route, the relevant page is Claude Implementation.
If your practice is in a sector with established sector-specific compliance overlay, the supporting articles for solicitors, IFAs and family offices, and private clinical practice walk through the decision in sector terms.
Frequently asked questions
- Is local AI cheaper than cloud AI for a small UK practice?
- Usually not, on a pure cost-to-cost basis, for practices under 10 users. The retainer load on a local deployment is broadly fixed regardless of user count, while cloud cost scales per user. Local becomes cheaper than cloud at higher user counts and longer time horizons. The honest framing is to buy local AI for data sovereignty rather than for cost; the cost gap is the price of keeping data on your network.
- Will an open-weight local model perform as well as Claude or ChatGPT?
- On routine professional-services workloads (drafting, summarisation, classification, structured extraction, question-and-answer over your own documents) the gap is small and often not commercially material. On long-context analysis above 100K tokens, complex code generation, and the hardest reasoning tasks, hosted frontier models still hold a clear advantage. Hybrid mode is the route for buyers who need most workloads handled locally but occasional access to frontier capability.
- What is hybrid mode and how is the boundary controlled?
- Hybrid mode runs the local model for routine workloads and routes a pre-agreed set of harder tasks to a cloud LLM, by default the Claude API through AWS Bedrock at the UK South or EU Ireland region. The routing rules and the data classes permitted to leave the network are documented in a written hybrid policy that forms part of the engagement record. The boundary is the contract, not a configuration setting.
- Is local AI compliant with UK GDPR?
- Local-only mode is structurally easier to defend under UK GDPR because no personal data is transferred to a third-country processor. The buyer is the data controller and the only processor in the chain. Hybrid mode reintroduces the third-country transfer question for the workloads it routes and requires a DPIA covering those data classes. We provide the technical input into the DPIA; the legal sign-off remains with the DPO or solicitor.
- Can the deployment topology be changed later?
- Yes. The decision is reversible at quarterly review. Local-only deployments can move to hybrid if the buyer's workload mix changes; hybrid deployments can revert to local-only if the buyer's data sensitivity tightens. The retainer model in our Private AI Concierge engagements includes a quarterly architecture review specifically to test whether the original deployment decision still holds.