
Private AI Concierge for confidentiality-sensitive UK professionals
Private AI Concierge is an on-premises AI assistant for UK professionals whose client data cannot be routed through public cloud large language models. A Mac mini or Mac Studio with Apple Silicon sits on your premises, runs an open-source agent stack with local model inference, and operates as a persistent personal AI across email, messaging, and your existing tools. The system is designed to give regulated professionals, including solicitors, IP and patent attorneys, IFAs, accountants, family-office principals, and private medical and dental practices, the productivity benefits of an always-on AI assistant without sending sensitive client information off your network. Engagements start at GBP 2,500 plus hardware. The cloud-fallback hybrid option, which uses the Claude API for tasks beyond local-model capability, is enabled only at your written instruction.
Why a local-first AI assistant exists as a distinct service
Most AI consulting work in the UK assumes that client data can be sent to a cloud LLM under a Data Processing Agreement. For most businesses, that assumption is correct. For a specific cohort of regulated and confidentiality-sensitive professionals, it is not.
A solicitor advising on a contested probate matter, an IFA reviewing a client's full balance sheet, an accountant preparing a director's loan account, a patent attorney drafting before filing, a private clinician dictating a consultation note, a family office tracking beneficial ownership: in each case the data is either privileged, fiduciary, regulated under specific professional codes, or commercially radioactive. Even where a cloud DPA is technically permissible, the professional-conduct overhead, the client-consent paperwork, and the procurement risk often make the cloud route uncommercial.
Private AI Concierge solves the same productivity problem with a different deployment topology. The model runs on hardware you own, on your own network, in your own building. The system can still operate as a multi-channel agent across email, Telegram, Slack, Signal, and your existing case or matter management tools. It learns over time and accumulates persistent memory of your preferences, templates, and workflows. The behavioural experience for the user is broadly comparable to a cloud assistant. The compliance posture is materially different.
What is included in a Private AI Concierge engagement
Every engagement is structured as a one-off implementation followed by a monthly retainer. The retainer is where the long-term commercial value sits and is the core argument for ongoing custodianship over a self-installed product. The release velocity of the underlying agent and model components is high, currently weekly, and CVE response, model upgrades, and skill curation cannot reasonably be left to the client.
One-off implementation includes
- Discovery and workflow design covering your document types, client communication patterns, and existing tool stack
- Hardware procurement and delivery, passed through at supplier cost and itemised separately on the proposal
- Hermes Agent (Nous Research, MIT licence) installation with Apple Silicon optimisation
- Ollama installation and local model selection from the Qwen 3.5 family for tool-calling reliability
- Channel setup across the platforms you use, typically email, Telegram or Signal, Slack, and SMS gateway where relevant
- MCP server wiring for your existing case management, practice management, or accounting software where supported
- Security hardening including FileVault, automatic unlock for headless operation, network isolation, and SSH access controls
- Client-specific skill creation: the named workflows your assistant runs repeatedly, configured and tested
- Written documentation covering operation, recovery, and the security posture
- Two handover and training sessions for the principal and one named technical contact
Monthly retainer includes
- Hermes Agent patching against the upstream release cadence, currently approximately weekly
- CVE monitoring and response across the agent, model, and OS layers
- Model upgrades as new releases land in the Qwen, Llama, and Hermes families, with regression testing before swap-in
- Skill curation and refinement based on your usage patterns
- Monthly usage review with one written summary
- Incident response cover within UK business hours
- Quarterly architecture review covering hardware utilisation, channel load, and any drift between local capability and workload demand
How a Private AI Concierge engagement runs
Engagements run in four stages. We begin with a free 30-minute scoping conversation. The first paid commitment comes only at Stage 2.
Stage 1: Discovery (no charge, 30 minutes)
A scoping call covering use cases, channel preferences, regulatory constraints, and tier fit. Outputs are a written summary and a tier recommendation. We move to a paid engagement only when both sides see a credible fit.
Stage 2: Workflow design (1 to 2 weeks)
A paid discovery sprint that produces a written workflow design document. We define the named skills the assistant will run, the channels it will operate across, the data sources it will read, and the security boundary. This document is the contract for Stage 3.
Stage 3: Implementation (2 to 4 weeks)
Hardware procurement, install, configuration, integration, and on-site handover. Includes user-acceptance testing against your real workflows, not synthetic test cases.
Stage 4: Retainer (ongoing)
Monthly patching, CVE response, model upgrades, skill refinement, and incident cover. The retainer is the relationship.
Pricing
Pricing is published at tier-band level. Final scope and quote follow the workflow design sprint in Stage 2. Hardware is passed through at supplier cost; the configuration and installation fee is itemised separately on the proposal.
| Tier | One-off (ex hardware) | Monthly retainer | Target buyer |
|---|---|---|---|
| Solo | From GBP 2,500 | From GBP 250 | Sole practitioners, senior solo consultants, family-office principals |
| Practice | From GBP 4,500 | From GBP 500 | 2 to 10-person professional services firms, private clinics |
| Chambers | From GBP 8,000 | From GBP 900 | Larger regulated firms, multi-partner practices, multi-site operations |
Hardware indicative cost:GBP 1,799 to GBP 2,499 for Solo and Practice tiers (Mac mini with M4 Pro silicon, 48 to 64GB unified memory). GBP 2,099 to GBP 4,000+ for the Chambers tier (Mac Studio with M4 Max silicon, 64 to 128GB unified memory), depending on memory and storage configuration. Hardware is purchased on your behalf and invoiced at supplier cost. We do not earn margin on the device itself; a separate "Hardware procurement and configuration" line at GBP 250 to GBP 400 is itemised on the proposal.
Local-only versus Claude-fallback hybrid
The system operates in one of two modes. The mode is set during workflow design and can be changed later only at your written instruction. The two modes are different products commercially and regulatorily.
Local-only mode
All inference happens on your on-premises hardware. No client data leaves your network at any point during normal operation. The model is constrained to the capability of locally-runnable open-weight models, principally the Qwen 3.5 family.
Recommended for: solicitors handling privileged matters, patent attorneys before filing, clinicians handling identifiable patient data, family offices handling beneficial ownership, and any buyer whose default answer to "has client data left the building" must be no.
Trade-off: harder reasoning tasks, particularly long-context document analysis above 100K tokens and complex code generation, may run against the limit of what current local models can do.
Claude-fallback hybrid mode
The local model handles the majority of tasks. For workloads beyond local-model capability, the system routes the request to the Claude API, by default through AWS Bedrock at the UK South or EU Ireland region for data residency.
How the boundary works: we document exactly which prompt classes route to cloud and which do not, in a written hybrid policy that forms part of the engagement record. Data classes you mark as restricted never route to cloud regardless of model preference.
Recommended for: accountants and IFAs working with anonymised or already-disclosed data, boutique advisory firms where the marginal capability gain matters commercially, and any buyer whose data sensitivity is high but not absolute. Hybrid mode requires explicit written client consent and is not enabled by default.
On-premises versus managed off-site deployment
Default deployment is on-premises at your office. The privacy story is at its strongest when the device is physically on your network and under your direct control. The trade-offs to plan for:
- Physical access. The device requires an occasional physical reboot or hardware replacement. We advise on placement, power, and ventilation during workflow design.
- Network configuration. We coordinate with your existing IT provider where one exists. For solo practitioners on consumer broadband, we configure the network ourselves.
- Device ownership. The device is yours. Hardware is invoiced to you directly at supplier cost. The retainer covers operational custodianship; it does not transfer ownership.
Managed off-site deployment, in which the device sits in a UK colocation facility under our operational control, is available on request for clients who prefer that topology. It changes the regulatory analysis materially and is priced separately. We discuss this only if requested.
Security posture
The security review is part of the standard engagement. The headline elements:
- Open-source agent stack. Hermes Agent is MIT licensed, container-hardened, and run against the published release cadence with active CVE monitoring.
- Software selection discipline. We do not install community agent frameworks with active high-severity CVEs in the trailing 12 months. Software choices are reviewed at every monthly retainer cycle and replaced if the security posture degrades.
- Network isolation. The device runs on a dedicated network segment with locked-down inbound rules. No remote shell access except through a documented bastion path.
- Disk encryption. FileVault on, with a documented recovery process for the principal.
- Hybrid-mode payload control. Where Hybrid mode is enabled, the data classes routed to the Claude API are documented and bounded. The boundary is part of the contract, not a configuration setting.
Who Private AI Concierge is for
The service is built for principals and partners in:
- Solicitor practices, particularly probate, family, immigration, IP, and contentious work
- Patent and trade-mark attorneys, especially pre-filing
- Independent financial advisers, wealth managers, and family offices
- Accountancy practices handling director-level and HNW clients
- Private medical and dental practices, particularly cosmetic, mental health, and reproductive
- Boutique advisory and corporate finance firms working under NDA-heavy engagements
- Single-practitioner specialists across regulated professions
The service is not the right fit for businesses where cloud LLMs are already an acceptable answer, where data volume is low enough that an off-the-shelf consumer subscription suffices, or where the buyer is unwilling to accept ongoing retainer custodianship.
Why The AI Consultancy
Private AI Concierge sits within a wider Anthropic-aligned delivery practice. Hybrid mode uses the Claude API as the cloud fallback option. Local-only mode is the appropriate answer when the cloud answer is wrong. We deploy whichever fits the regulatory reality.
Delivery credentials relevant to this service line:
- AWS, Google Cloud, and Nvidia certified consultancy
- Active Claude implementation track record across UK SMEs and enterprise
- UK delivery, London-based, 70 Horseferry Road
- Engagement structure compatible with Innovate UK BridgeAI and qualifying R&D tax credit claims where eligible
Where to start
Most UK buyers come to Private AI Concierge with a sector-specific question. Each links to a focused supporting article.
Local AI versus cloud AI for regulated professionals
The five-axis comparison, the decision framework, and where each is the right answer for a UK regulated practice.
Private AI for solicitors UK
SRA Code, privileged matters, the five workflows where local AI is workable today, and the two where Hybrid is sometimes appropriate.
GDPR and AI assistants for UK private practitioners
UK GDPR obligations, the third-country transfer question post-Schrems II, and the on-premises route as the lowest-friction compliance answer.
AI for IFAs and family offices: the data sovereignty question
FCA Consumer Duty, suitability reviews, beneficial ownership, and the 36-month TCO of on-premises versus Claude.ai Enterprise.
AI for private dental and medical practices
Caldicott principles, the five clinical workflows where on-premises AI delivers value, and the practical hardware footprint for a 4-chair practice.
Hermes Agent and Ollama: a UK consultant's view of the local AI agent stack
The technical anchor: what is mature, what is not, hardware fit, MCP wiring, and operational burden in 2026.
Related services
- Claude Implementation: the full Claude rollout service for UK buyers whose data sensitivity allows a cloud-LLM deployment under a standard DPA. Private AI Concierge is the alternative path when a cloud rollout is the wrong answer.
- AI Readiness Assessment: structured pre-implementation screen covering data, processes, skills, and compliance. For some regulated buyers the recommendation will be Private AI Concierge rather than a cloud-LLM rollout.
- AI Strategy Consulting: board-level AI strategy where Private AI Concierge is one option in a wider portfolio across multiple AI vendors and use cases.
- Grant-Funded AI Implementation: eligibility screening and application support where part of the Private AI Concierge cost can be met by Innovate UK BridgeAI, Smart Grants, or qualifying R&D tax credit work.
Frequently asked questions
Why on-premises rather than a private cloud tenant?+
Will a local model perform as well as Claude or ChatGPT?+
What happens if the Mac mini fails?+
Can I run this myself without the retainer?+
Is this GDPR compliant?+
How does this affect my Anthropic spend?+
Can the service be grant-funded?+
What is the difference between local-only and hybrid mode?+
Book a free 30-minute Private AI scoping call
If you are evaluating an on-premises AI assistant for a confidentiality-sensitive UK practice, we will run a free scoping call covering tier fit, channel setup, hybrid policy, and an outline implementation plan. Pricing is published at tier-band level above; final scope and quote follow the workflow design sprint.