From clinical problem to scientific paper: how KOI was born, the AI that helps doctors classify anaesthesia risk

KOIhealthcareanaesthesiologyLLMmedical devicepeer review

A problem affecting 300 million surgeries per year

Every time a patient needs surgery, an anaesthetist assesses their health status and assigns a score: the ASA-PS classification (American Society of Anesthesiologists Physical Status). It’s one of the most widely used systems in medicine — for over 80 years.

The problem? Doctors disagree. Studies on hundreds of anaesthetists show the correct classification is assigned only 70% of the time. In a third of assessments, consensus isn’t even reached. A patient classified ASA 2 by one doctor may be classified ASA 3 by another — with real consequences for anaesthetic precautions, operating room preparation and post-operative management.

It’s not a competence issue: it’s a problem of inherent variability in a system based on subjective judgements.

The insight: AI reasons, it doesn’t guess

In 2024, HT-X started asking: can new-generation language models — those capable of structured reasoning (chain-of-thought) — do better?

Not better than the best specialists. Better than the average doctor, with a consistency no human can guarantee across thousands of assessments.

Answering this required scientific rigour, not a demo. It required validated data, a serious clinical partner, and a method publishable in a peer-reviewed journal.

The partner: Centro Ortopedico di Quadrante (Ramsay Santé)

HT-X collaborated with the Centro Ortopedico di Quadrante, part of the international Ramsay Santé group, one of Europe’s largest hospital groups. The clinical team — anaesthetists and hospital data scientists — worked with HT-X researchers to design a rigorous study.

The collaboration produced a scientific paper submitted to Informatics in Medicine Unlocked (Elsevier): “Improving ASA-PS Classification Accuracy Using Privacy-Preserving Large Language Models: A Multilingual On-Premise Evaluation”.

The study: 11 AI models, 20 clinical cases, 2 languages

The team tested 11 different AI models — from early ones (GPT-4, LLaMA, Mistral, Phi-4) to advanced reasoning models (GPT-o3, GPT-o4-mini, Claude Sonnet 3.7, Gemini 2.5, DeepSeek R1) — on 20 standardised clinical cases from the scientific literature.

Each case was evaluated in both English and Italian, to verify the AI works in the hospital’s language.

Results

Metric	Human doctors	Early-gen LLMs	Reasoning LLMs
Mean accuracy	7.7/10 (77%)	7.7/10 (77%)	9.75/10 (97.5%)
Errors per 10 cases	2.3	2.3	0.25
Error reduction	—	—	-89%

Key figures:

97.5% accuracy for advanced models (95% CI: 92.9%–99.1%)
89% error reduction versus both doctors and early-generation models
DeepSeek R1: perfect accuracy (10/10) with total reproducibility across repeated trials
No difference between English and Italian evaluations
Under 10 seconds per classification

The most relevant figure for a healthcare organisation: the difference between early and advanced models is statistically significant (p = 0.0008, Cohen’s d ≈ 1.21 — a “very large” effect).

Why on-premise and not ChatGPT

One of the paper’s central aspects — and of the KOI product that derives from it — is the choice of on-premise AI.

38% of LLM studies in healthcare don’t even address patient data privacy. HT-X made it central:

DeepSeek R1 runs on EU cloud: no patient data leaves Europe
GDPR and healthcare regulation compliant by design
AI Act: the system is currently Research Use Only and undergoing medical device certification, with complete audit trail and human oversight
Identical performance to cloud models: DeepSeek R1 (on-premise) achieves the same 10/10 as GPT-o3 and Claude Sonnet (cloud)

Using ChatGPT to classify patients would mean sending medical histories, diagnoses and clinical data to OpenAI’s servers. For a European hospital, that’s not an option.

From paper to product: how KOI was born

The scientific study wasn’t meant to stay in a journal. It’s the foundation on which HT-X built KOI, a clinical decision support system for anaesthesia classification.

The journey from problem to product:

1. Identifying the clinical need → Variability in ASA-PS classification has been documented for decades. Guidelines aren’t lacking — consistency in applying them is.

2. Rigorous scientific research → Benchmarks on standardised cases from literature, comparison with published human data, complete statistical analysis, peer review.

3. Technology choice → Open-source models (DeepSeek R1) installable on-premise, no cloud provider dependency, PRISMA infrastructure (Private Intelligence Stack for Modular AI).

4. Multilingual validation → AI must work in the hospital’s language. Italian results are identical to English ones.

5. Regulatory pathway → Medical device certification (MDR, IEC 62304). The system is a support tool: the anaesthetist decides.

6. Clinical deployment → On-premise installation in hospital infrastructure, integration with existing information systems.

What this means for healthcare organisations

This case demonstrates an approach HT-X applies systematically:

Start from a real problem — not from technology
Validate scientifically — with publishable studies, not demos
Build on-premise — because in healthcare, data cannot leave
Certify — because software touching clinical decisions is a medical device

If your facility has clinical processes where inter-operator variability is a known problem — classifications, triage, report interpretation — the approach is the same: start from data, validate rigorously, deploy with privacy.

The paper

The study “Improving ASA-PS Classification Accuracy Using Privacy-Preserving Large Language Models: A Multilingual On-Premise Evaluation” has been submitted for peer review to Informatics in Medicine Unlocked (Elsevier). Authors: Francesco Menegoni (HT-X), Claudio Trotti, Maria Beatrice Pagani, Paola Pisano.

For information about KOI or to assess AI opportunities in your healthcare facility, contact HT-X.

Frequently asked questions

Early models like GPT-4 achieve about 77% accuracy — the same level as human doctors. But the real problem isn't precision: it's that ChatGPT sends patient clinical data to OpenAI's servers in the USA, violating GDPR and European healthcare regulations. HT-X's KOI uses on-premise AI models (like DeepSeek R1) achieving 97.5% accuracy without any data leaving the hospital.

ASA-PS (American Society of Anesthesiologists Physical Status) classification is the global standard for preoperative risk assessment. It ranges from ASA 1 (healthy patient) to ASA 5 (moribund patient). It's critical because it determines anaesthetic precautions, but doctors agree on the correct class only 70% of the time — a problem AI can solve.

KOI is undergoing certification as a medical device under the European MDR regulation and IEC 62304 standard for medical software. The system is designed as a decision-support tool: the final classification remains the anaesthetist's responsibility. The scientific study has been submitted for peer review to Informatics in Medicine Unlocked (Elsevier).

Looking for a private ChatGPT for your business?

ORCA is the on-premise AI platform by HT-X (Human Technology eXcellence): your data stays yours, GDPR and AI Act compliant.

Discover ORCA