How to measure ROI on an AI deployment in 90 days: a framework for UK SMEs

Why most AI ROI calculations do not survive the first finance review
PwC's 29th Global CEO Survey, published in January 2026 and covering 4,454 business leaders across 95 countries, reported that 56% of CEOs had seen neither higher revenues nor lower costs from AI in the previous 12 months. Only 12% reported both. The sample is weighted toward large enterprises, so the headline number is not a direct read on UK SMEs. Focused SME deployments of customer-service automation, document processing, or invoice routing typically perform better than broad enterprise transformation programmes. But the underlying problem PwC identified applies to UK SMEs too: the four things that separate the 12% from the rest are structured technical foundations, a clear roadmap, formal risk processes, and organisational alignment. None of those is specific to enterprise scale.
The methodological failure is usually at one of three points. Baselines are not captured before deployment, so there is nothing credible to compare against. Loaded hourly cost is replaced by raw salary, understating the value of time saved by a third or more. Quality cost, the time staff spend reviewing and correcting AI outputs, is not tracked, so gross savings look rosier than net. Fix those three and a 90-day window is long enough to produce numbers a finance director will accept.
This article assumes the business has already made the decision to deploy an AI tool and is now in the measurement phase. For the upstream questions of whether the business is ready and which use cases to target first, a separate readiness assessment is the right starting point.
Week 0: establish baselines before you switch anything on
Baseline data is the single most important input to an AI ROI calculation. Without it, any claim about improvement is an estimate. The baseline needs to cover the specific process being automated, measured over enough time to average out a normal operating week.
For customer service automation, capture average handle time per ticket, tickets per agent per day, first-contact resolution rate, and customer satisfaction score. For document processing, capture time per document, error rate, and rework rate. For sales proposal generation, capture time per proposal, conversion rate, and average deal size. Two weeks of baseline data is a practical minimum; four weeks is safer if the business has weekly or monthly cycles.
Run the baseline capture with the staff who will continue doing the work after deployment. Resist the temptation to wait until the tool has been announced, because awareness of an incoming change shifts performance up or down depending on how the team feels about it. Record the measurement method alongside the numbers, so the post-deployment measurement can be run identically.
If baseline capture feels disproportionate for a small pilot, reduce the pilot scope rather than skip the step. A two-person baseline over two weeks is better than no baseline at all.
Baseline capture also forces an honest description of the current process. Teams often discover during baselining that the problem they want AI to solve is not quite what they thought it was. The real bottleneck might be elsewhere, or the metric they assumed mattered turns out not to correlate with the outcome they actually want. That discovery is cheaper before deployment than after.
Days 1 to 30: instrument the deployment and resist early conclusions
The first 30 days are for learning and instrumentation, not ROI claims. Staff are adapting to the tool, integrations are settling in, and early productivity often dips before it recovers. A calibration period of roughly four weeks is normal.
Set up tracking to capture the same metrics as the baseline. Where possible, pull the numbers automatically from the underlying system rather than asking staff to self-report. Self-reported time savings skew toward whatever answer the respondent thinks management wants. For customer service tools, the ticketing system usually has the data. For document processing, the document management system or a simple spreadsheet logged by the team will do.
Expect the Week 1 data to look worse than baseline. This is not a reason to abandon the pilot. It is a reason to check the measurement pipeline is working correctly, the team knows what they are being asked to do, and the tool has been configured for the actual workload rather than the demo workload.
Days 31 to 60: compare against baseline and measure quality
By Day 31 there should be enough post-deployment data to start drawing conclusions. Compare the metrics against baseline. Identify where the tool is delivering the expected improvement and where it is not.
This is the point at which quality measurement has to start in earnest. Gross time saved means nothing if a high percentage of AI outputs require human review or rework. Track two numbers: the percentage of outputs that require any correction, and the average time spent on that correction. A 40% review rate at five minutes per output is a meaningful quality cost. So is a 5% review rate at 30 minutes per output.
Adjust usage patterns based on findings. If the tool produces good output for simple tickets and poor output for complex ones, route complex tickets back to humans. If a particular prompt or template is producing better results than others, standardise on it. The pilot is still calibrating at this stage, so treat the numbers as direction-setting rather than final.
Days 61 to 90: calculate the ROI and report to leadership
With 60 days of post-deployment data, the ROI calculation is defensible. The calculation has five components.
Monthly time benefit: hours saved per week, multiplied by 4.33 weeks, multiplied by the loaded hourly cost of the affected roles, minus the rework time multiplied by the same loaded hourly rate.
Monthly cost: the subscription fee, plus a pro-rated share of implementation cost, plus management time spent overseeing the tool.
Monthly net benefit: monthly benefit minus monthly cost.
Payback period: total implementation cost divided by monthly net benefit.
Three-year ROI: three years of net benefit minus three years of net cost, divided by three years of net cost, expressed as a percentage.
A worked example makes the maths concrete. A customer-service AI tool costs £800 per month with a £5,000 one-off implementation charge. The three-year total cost is £33,800. The tool saves 15 hours per week across the team, at a loaded hourly cost of £25 for the roles involved. Gross monthly benefit is 15 hours x 4.33 weeks x £25 = £1,624. Over three years that is £58,464. Three-year ROI is (£58,464 - £33,800) / £33,800 = 73%. Payback is around seven months.
That 73% looks healthy. It stops looking healthy if the team actually saves eight hours per week, not 15, because quality review absorbs the rest. Net hours of 8 x 4.33 x £25 is £866 per month, £31,176 over three years. Three-year ROI becomes roughly negative eight per cent. This is why quality cost is not optional in the arithmetic.
Loaded hourly cost, not salary
UK SMEs consistently understate loaded hourly cost, and it is the single most common reason AI ROI calculations look worse than they should. A £35,000 salary is not £17 per hour. Once employer National Insurance at 13.8%, pension contributions at a minimum of 3%, holiday and sick cover, and a fair share of management and overhead are included, the loaded cost is typically between £25 and £30 per hour for professional services roles.
The other adjustment is working hours. A full-time UK employee works around 1,760 hours per year after the statutory 28 days of holiday including bank holidays. A £35,000 salary across 1,760 hours is £19.89 per hour. Loaded at a standard 25% to 30% overhead for pension, NI, and allocated management time, the defensible cost is £25 to £27 per hour. Use the higher end for professional services and the lower end for operational roles.
Showing this calculation on the first page of the ROI report is worth the space it takes. Finance directors who have not done the arithmetic before will often challenge it, and having the working shown is the difference between a credible number and a number that gets sent back for rework.
Quality cost is the other blind spot
Vendor ROI calculators and self-reported productivity surveys both tend to overstate net savings. The 7.75 hours per week figure that Microsoft's June 2025 Work Trend Index reported for UK AI users is a self-reported number. So is the 11 hours per week for trained employees in the LSE Inclusion Initiative study published in October 2025. Both numbers are credible for gross time saved, but neither tracks rework time systematically.
A conservative planning assumption is 15% to 25% productivity improvement in the processes the tool directly supports, rather than the 40% to 60% figures that appear in vendor marketing. The Resultsense framework published in October 2025 recommends this calibration, and independent UK case studies support the lower range for most operational use cases. Vendor calculators assume maximum adoption, zero rework, and full realisation of savings. Treat them as best-case scenarios, not planning basis.
A simple way to track quality cost in practice is a weekly sample. Pull ten random AI outputs from the week. Check each one. Record which required any correction and how long the correction took. Ten samples a week across a three-person team gives enough data to produce a defensible quality-cost percentage by Day 60, without creating a measurement burden that distracts from the actual work.
When 90 days is not enough
The 90-day window works for operational automation. Customer service chatbots, document processing, invoice routing, and similar focused deployments produce measurable numbers inside a quarter.
Strategic AI takes longer. Demand forecasting, decision support, and product development tools typically need six to twelve months before the financial signal is separable from normal business variance. Pilots in regulated sectors also run longer because approval cycles and compliance reviews add calendar time. The judgment call is this: if the first 90 days produce no measurable signal at all on an operational deployment, the project should be redesigned or stopped. If the signal is weak but present on a strategic deployment, it needs more time before a verdict is drawn.
Questions to answer before the next pilot
Any UK SME about to start an AI pilot should have clear answers to five questions before Day 1. What is the specific process being automated? What baseline metrics will we capture and over what period? What is the loaded hourly cost of the roles involved? How will we measure rework and review time? Who signs off the ROI report at Day 90?
If any of those answers is vague, the measurement will be too. The framework above does not need sophisticated software. A spreadsheet and a disciplined two weeks of baseline capture is enough to produce a number that holds up under scrutiny. For the arithmetic, see The AI Consultancy's ROI calculator. For a fuller readiness review before a pilot, see our AI readiness assessment.
Frequently asked questions
- How long does it take to see ROI from AI?
- For operational automation, a credible ROI number is achievable within 90 days if baselines are captured before deployment and quality cost is tracked alongside time savings. Strategic AI deployments for demand forecasting or decision support typically take six to twelve months before the financial signal separates from normal business variance.
- What is a loaded hourly cost and why does it matter for AI ROI?
- Loaded hourly cost is the fully-weighted hourly rate for an employee, including employer National Insurance, pension contributions, holiday and sick cover, and a fair share of management and overhead. For a UK professional services role on a £35,000 salary, the loaded cost is typically £25 to £30 per hour, not the £17 implied by raw salary arithmetic. Using loaded cost rather than salary usually increases the calculated ROI of time saved by 30% or more.
- How do I set baselines before deploying AI?
- Capture the specific metrics of the process being automated, over at least two weeks of normal operation, using the same measurement method you will apply after deployment. For customer service, that means average handle time, tickets per agent per day, first-contact resolution rate, and customer satisfaction. Record baselines before the team knows the tool is coming, to avoid shifts in performance driven by anticipation.
- How do I account for the time staff spend reviewing and correcting AI outputs?
- Track two numbers through the pilot: the percentage of AI outputs that require any human correction, and the average time spent on each correction. Multiply those together to get weekly rework hours. Subtract the rework time from the gross time saved before calculating the monthly financial benefit. A tool that saves 15 hours a week gross but creates five hours a week of rework has a net benefit of 10 hours, not 15.
- What is a realistic ROI expectation for AI in a UK SME?
- Plan for a 15% to 25% productivity improvement on the specific processes the tool supports, rather than the 40% to 60% figures that appear in vendor marketing. For a focused operational deployment with a subscription around £800 per month and a £5,000 implementation charge, a three-year ROI in the 50% to 100% range is a realistic planning expectation. Payback periods are typically six to twelve months. Anything above that needs careful scrutiny of the assumptions.