A working desk in a small London advisory office, with a Mac mini and Mac Studio sitting beside a notebook, fountain pen, and brass anglepoise lamp under soft natural daylight.

Private AI Concierge for confidentiality-sensitive UK professionals

Private AI Concierge is an on-premises AI assistant for UK professionals whose client data cannot be routed through public cloud large language models. A Mac mini or Mac Studio with Apple Silicon sits on your premises, runs an open-source agent stack with local model inference, and operates as a persistent personal AI across email, messaging, and your existing tools. The system is designed to give regulated professionals, including solicitors, IP and patent attorneys, IFAs, accountants, family-office principals, and private medical and dental practices, the productivity benefits of an always-on AI assistant without sending sensitive client information off your network. Engagements start at GBP 2,500 plus hardware. The cloud-fallback hybrid option, which uses the Claude API for tasks beyond local-model capability, is enabled only at your written instruction.

Why a local-first AI assistant exists as a distinct service

Most AI consulting work in the UK assumes that client data can be sent to a cloud LLM under a Data Processing Agreement. For most businesses, that assumption is correct. For a specific cohort of regulated and confidentiality-sensitive professionals, it is not.

A solicitor advising on a contested probate matter, an IFA reviewing a client's full balance sheet, an accountant preparing a director's loan account, a patent attorney drafting before filing, a private clinician dictating a consultation note, a family office tracking beneficial ownership: in each case the data is either privileged, fiduciary, regulated under specific professional codes, or commercially radioactive. Even where a cloud DPA is technically permissible, the professional-conduct overhead, the client-consent paperwork, and the procurement risk often make the cloud route uncommercial.

Private AI Concierge solves the same productivity problem with a different deployment topology. The model runs on hardware you own, on your own network, in your own building. The system can still operate as a multi-channel agent across email, Telegram, Slack, Signal, and your existing case or matter management tools. It learns over time and accumulates persistent memory of your preferences, templates, and workflows. The behavioural experience for the user is broadly comparable to a cloud assistant. The compliance posture is materially different.

What is included in a Private AI Concierge engagement

Every engagement is structured as a one-off implementation followed by a monthly retainer. The retainer is where the long-term commercial value sits and is the core argument for ongoing custodianship over a self-installed product. The release velocity of the underlying agent and model components is high, currently weekly, and CVE response, model upgrades, and skill curation cannot reasonably be left to the client.

One-off implementation includes

  • Discovery and workflow design covering your document types, client communication patterns, and existing tool stack
  • Hardware procurement and delivery, passed through at supplier cost and itemised separately on the proposal
  • Hermes Agent (Nous Research, MIT licence) installation with Apple Silicon optimisation
  • Ollama installation and local model selection from the Qwen 3.5 family for tool-calling reliability
  • Channel setup across the platforms you use, typically email, Telegram or Signal, Slack, and SMS gateway where relevant
  • MCP server wiring for your existing case management, practice management, or accounting software where supported
  • Security hardening including FileVault, automatic unlock for headless operation, network isolation, and SSH access controls
  • Client-specific skill creation: the named workflows your assistant runs repeatedly, configured and tested
  • Written documentation covering operation, recovery, and the security posture
  • Two handover and training sessions for the principal and one named technical contact

Monthly retainer includes

  • Hermes Agent patching against the upstream release cadence, currently approximately weekly
  • CVE monitoring and response across the agent, model, and OS layers
  • Model upgrades as new releases land in the Qwen, Llama, and Hermes families, with regression testing before swap-in
  • Skill curation and refinement based on your usage patterns
  • Monthly usage review with one written summary
  • Incident response cover within UK business hours
  • Quarterly architecture review covering hardware utilisation, channel load, and any drift between local capability and workload demand

How a Private AI Concierge engagement runs

Engagements run in four stages. We begin with a free 30-minute scoping conversation. The first paid commitment comes only at Stage 2.

Stage 1: Discovery (no charge, 30 minutes)

A scoping call covering use cases, channel preferences, regulatory constraints, and tier fit. Outputs are a written summary and a tier recommendation. We move to a paid engagement only when both sides see a credible fit.

Stage 2: Workflow design (1 to 2 weeks)

A paid discovery sprint that produces a written workflow design document. We define the named skills the assistant will run, the channels it will operate across, the data sources it will read, and the security boundary. This document is the contract for Stage 3.

Stage 3: Implementation (2 to 4 weeks)

Hardware procurement, install, configuration, integration, and on-site handover. Includes user-acceptance testing against your real workflows, not synthetic test cases.

Stage 4: Retainer (ongoing)

Monthly patching, CVE response, model upgrades, skill refinement, and incident cover. The retainer is the relationship.

Pricing

Pricing is published at tier-band level. Final scope and quote follow the workflow design sprint in Stage 2. Hardware is passed through at supplier cost; the configuration and installation fee is itemised separately on the proposal.

TierOne-off (ex hardware)Monthly retainerTarget buyer
SoloFrom GBP 2,500From GBP 250Sole practitioners, senior solo consultants, family-office principals
PracticeFrom GBP 4,500From GBP 5002 to 10-person professional services firms, private clinics
ChambersFrom GBP 8,000From GBP 900Larger regulated firms, multi-partner practices, multi-site operations

Hardware indicative cost:GBP 1,799 to GBP 2,499 for Solo and Practice tiers (Mac mini with M4 Pro silicon, 48 to 64GB unified memory). GBP 2,099 to GBP 4,000+ for the Chambers tier (Mac Studio with M4 Max silicon, 64 to 128GB unified memory), depending on memory and storage configuration. Hardware is purchased on your behalf and invoiced at supplier cost. We do not earn margin on the device itself; a separate "Hardware procurement and configuration" line at GBP 250 to GBP 400 is itemised on the proposal.

Local-only versus Claude-fallback hybrid

The system operates in one of two modes. The mode is set during workflow design and can be changed later only at your written instruction. The two modes are different products commercially and regulatorily.

Local-only mode

All inference happens on your on-premises hardware. No client data leaves your network at any point during normal operation. The model is constrained to the capability of locally-runnable open-weight models, principally the Qwen 3.5 family.

Recommended for: solicitors handling privileged matters, patent attorneys before filing, clinicians handling identifiable patient data, family offices handling beneficial ownership, and any buyer whose default answer to "has client data left the building" must be no.

Trade-off: harder reasoning tasks, particularly long-context document analysis above 100K tokens and complex code generation, may run against the limit of what current local models can do.

Claude-fallback hybrid mode

The local model handles the majority of tasks. For workloads beyond local-model capability, the system routes the request to the Claude API, by default through AWS Bedrock at the UK South or EU Ireland region for data residency.

How the boundary works: we document exactly which prompt classes route to cloud and which do not, in a written hybrid policy that forms part of the engagement record. Data classes you mark as restricted never route to cloud regardless of model preference.

Recommended for: accountants and IFAs working with anonymised or already-disclosed data, boutique advisory firms where the marginal capability gain matters commercially, and any buyer whose data sensitivity is high but not absolute. Hybrid mode requires explicit written client consent and is not enabled by default.

On-premises versus managed off-site deployment

Default deployment is on-premises at your office. The privacy story is at its strongest when the device is physically on your network and under your direct control. The trade-offs to plan for:

  • Physical access. The device requires an occasional physical reboot or hardware replacement. We advise on placement, power, and ventilation during workflow design.
  • Network configuration. We coordinate with your existing IT provider where one exists. For solo practitioners on consumer broadband, we configure the network ourselves.
  • Device ownership. The device is yours. Hardware is invoiced to you directly at supplier cost. The retainer covers operational custodianship; it does not transfer ownership.

Managed off-site deployment, in which the device sits in a UK colocation facility under our operational control, is available on request for clients who prefer that topology. It changes the regulatory analysis materially and is priced separately. We discuss this only if requested.

Security posture

The security review is part of the standard engagement. The headline elements:

  • Open-source agent stack. Hermes Agent is MIT licensed, container-hardened, and run against the published release cadence with active CVE monitoring.
  • Software selection discipline. We do not install community agent frameworks with active high-severity CVEs in the trailing 12 months. Software choices are reviewed at every monthly retainer cycle and replaced if the security posture degrades.
  • Network isolation. The device runs on a dedicated network segment with locked-down inbound rules. No remote shell access except through a documented bastion path.
  • Disk encryption. FileVault on, with a documented recovery process for the principal.
  • Hybrid-mode payload control. Where Hybrid mode is enabled, the data classes routed to the Claude API are documented and bounded. The boundary is part of the contract, not a configuration setting.

Who Private AI Concierge is for

The service is built for principals and partners in:

  • Solicitor practices, particularly probate, family, immigration, IP, and contentious work
  • Patent and trade-mark attorneys, especially pre-filing
  • Independent financial advisers, wealth managers, and family offices
  • Accountancy practices handling director-level and HNW clients
  • Private medical and dental practices, particularly cosmetic, mental health, and reproductive
  • Boutique advisory and corporate finance firms working under NDA-heavy engagements
  • Single-practitioner specialists across regulated professions

The service is not the right fit for businesses where cloud LLMs are already an acceptable answer, where data volume is low enough that an off-the-shelf consumer subscription suffices, or where the buyer is unwilling to accept ongoing retainer custodianship.

Why The AI Consultancy

Private AI Concierge sits within a wider Anthropic-aligned delivery practice. Hybrid mode uses the Claude API as the cloud fallback option. Local-only mode is the appropriate answer when the cloud answer is wrong. We deploy whichever fits the regulatory reality.

Delivery credentials relevant to this service line:

  • AWS, Google Cloud, and Nvidia certified consultancy
  • Active Claude implementation track record across UK SMEs and enterprise
  • UK delivery, London-based, 70 Horseferry Road
  • Engagement structure compatible with Innovate UK BridgeAI and qualifying R&D tax credit claims where eligible

Related services

  • Claude Implementation: the full Claude rollout service for UK buyers whose data sensitivity allows a cloud-LLM deployment under a standard DPA. Private AI Concierge is the alternative path when a cloud rollout is the wrong answer.
  • AI Readiness Assessment: structured pre-implementation screen covering data, processes, skills, and compliance. For some regulated buyers the recommendation will be Private AI Concierge rather than a cloud-LLM rollout.
  • AI Strategy Consulting: board-level AI strategy where Private AI Concierge is one option in a wider portfolio across multiple AI vendors and use cases.
  • Grant-Funded AI Implementation: eligibility screening and application support where part of the Private AI Concierge cost can be met by Innovate UK BridgeAI, Smart Grants, or qualifying R&D tax credit work.

Frequently asked questions

Why on-premises rather than a private cloud tenant?+
Private cloud tenants reduce the risk of data co-mingling but they do not eliminate the regulatory question of whether client data has left your network. For solicitors handling privileged matters, patent attorneys before filing, and clinicians handling identifiable patient data, the only defensible answer to a regulator or client is that the data did not leave the building. On-premises deployment makes that answer factual rather than contractual.
Will a local model perform as well as Claude or ChatGPT?+
On most professional-services workloads, the gap is smaller than expected. The Qwen 3.5 family handles drafting, summarisation, classification, and structured extraction at a level that is workable for daily use. The gap shows up on hard reasoning, long-context analysis above 100K tokens, and complex code generation. Hybrid mode covers these specific cases by routing them to the Claude API, with the routing policy agreed in writing.
What happens if the Mac mini fails?+
Hardware failure is rare on Apple Silicon but not zero. We hold a configuration backup off-device, in encrypted form, that allows a replacement Mac mini or Mac Studio to be provisioned to the same state within one working day in the UK. The retainer includes incident cover within UK business hours.
Can I run this myself without the retainer?+
Technically yes. Commercially we do not recommend it. The agent and underlying open-weight models release approximately weekly. CVE monitoring, regression testing, and skill curation are skilled work that takes hours per month and degrades quickly when neglected. Self-managed deployments tend to drift out of date within the first quarter and become a security liability.
Is this GDPR compliant?+
Local-only mode is structurally easier to defend under UK GDPR because no personal data is transferred to a third-country processor. You remain the sole data controller and the only processor in the chain. Hybrid mode requires the standard analysis around the Claude API as a US processor, with the data classes routed to cloud documented in your written hybrid policy. We provide the technical input into your DPIA; the legal sign-off remains with your DPO or solicitor.
How does this affect my Anthropic spend?+
Local-only mode makes no calls to the Claude API. Hybrid mode increases Claude API consumption for the workloads it routes upstream. Most Hybrid clients see a small but non-trivial Claude bill, typically GBP 30 to GBP 150 per principal per month, layered on top of the retainer. We are transparent about this in proposals.
Can the service be grant-funded?+
Some elements may be eligible for Innovate UK BridgeAI Innovation Exchange funding where the buyer is an SME and the engagement counts as a structured AI adoption project. R&D tax credits can apply to qualifying technical work that goes beyond standard configuration. Eligibility depends on sector, project type, and applicant size. We run the eligibility screen alongside Stage 2 workflow design where it is plausibly in scope.
What is the difference between local-only and hybrid mode?+
Local-only mode runs all inference on your on-premises hardware. No client data leaves your network during normal operation. Hybrid mode routes a defined and pre-agreed set of harder workloads to the Claude API; the routing policy is written down at the workflow design stage and forms part of the engagement contract. Hybrid mode is opt-in only and requires explicit written client consent before activation.

Book a free 30-minute Private AI scoping call

If you are evaluating an on-premises AI assistant for a confidentiality-sensitive UK practice, we will run a free scoping call covering tier fit, channel setup, hybrid policy, and an outline implementation plan. Pricing is published at tier-band level above; final scope and quote follow the workflow design sprint.