A newsletter on the latest in AI for healthcare.

Welcome back,

In this issue’s top paper, we highlight the Breast cancer Intelligent Non-invasive Diagnosis System (BINDS), a multimodal AI model for pre-biopsy breast cancer assessment published in Nature Biomedical Engineering. In top health AI news, MHRA leadership wants regulation to help proven AI tools reach the UK’s National Health Service faster. And χ-Bench, a benchmark for healthcare agents, shows today’s systems still struggle with long, policy-heavy workflows.

SUMMARY

Top Research Paper

  • BINDS (Breast cancer Intelligent Non-invasive Diagnosis System) is a multimodal breast imaging model, published in Nature Biomedical Engineering, that reached 0.973 AUC and could cut benign biopsies by up to 32.4%.

Top AI News

Top Healthcare AI Benchmark

Bedside Bets

Healthcare AI rounds, partnerships, and market moves.

Pulse Check

Quick reads across health AI.

TOP PAPER

🧬 BINDS helped radiologists cut benign breast biopsies by up to 32.4%

Source: Nature Biomedical Engineering Ā· 19 May 2026

BINDS, or Breast cancer Intelligent Non-invasive Diagnosis System, is a deep learning system for breast cancer imaging. It combines ultrasound, mammography and magnetic resonance imaging (MRI) to estimate cancer risk and classify cancer subtype.

The key idea is workflow fit. BINDS starts with the scans most clinics already use first: ultrasound and/or mammography. It then adds MRI only when the early result is uncertain. That matters because MRI is more sensitive, but also more expensive and harder to use as a default test.

Research question

  • Can a breast imaging AI model improve diagnosis and help avoid unnecessary needle biopsies, while still working when hospitals have different combinations of scans available?

Source Li et al

Approach

  • Built a two-stage model that follows routine breast imaging workflow.

  • Used ultrasound, mammography and MRI, with flexible inputs when one scan type is missing.

  • Trained and validated the system on 27,048 participants from 8 centres and 7 public datasets.

  • Used pathology images during training to help the model learn features linked to real tissue diagnosis.

  • Compared BINDS with junior and senior radiologists in a reader study of 208 BI-RADS 4 lesions.

  • Released PyTorch code, preprocessing scripts and model weights on GitHub.

Results

  • The two-stage BINDS workflow reached 0.973 AUC for cancer risk assessment on the internal test cohort.

  • It reached 0.941 AUC on an external cancer risk assessment cohort.

  • BINDS outperformed junior radiologists in trimodal diagnosis, with 0.933 accuracy versus 0.894.

  • With BINDS support, senior radiologists cut benign biopsies by 32.4%, from 37 to 25.

  • Junior radiologists cut benign biopsies by 22.5%, from 40 to 31.

  • The reduction focused on benign lesions, while biopsy rates for malignant lesions were maintained.

Caveats

  • The work was retrospective, so prospective clinical testing is still needed.

  • The in-house data came from medical centres in China, which may limit generalisability.

  • Paired radiology and pathology data came from one centre, which may affect the alignment method.

  • The system used B-mode ultrasound only, not Doppler or elastography.

Potential impact: If validated prospectively, models like BINDS could help clinicians reach a confident diagnosis through multimodal imaging before moving to invasive biopsy. That could reduce unnecessary procedures, lower costs and patient burden, and reserve biopsies for cases where tissue confirmation is still needed.

TOP NEWS

Smarter MHRA regulation could give healthcare AI companies a faster route into the NHS

Source: Politics UK

MHRA Chief Executive Lawrence Tallon argued that AI regulation should become a catalyst for safe adoption, not only a barrier to entry. His case has three parts:

  • The current pathway is too narrow. It was built mainly around image-recognition tools, especially in radiology, while the next wave includes large language models, large medical language models and adaptive systems.

  • Regulation should count the risk of inaction. Tallon argued that proven tools should not be kept from clinicians and patients when they are demonstrably better than current practice.

  • Approval should become an ongoing process. Tallon wants less ā€œhigh jumpā€ and more ā€œhurdles race,ā€ with proportionate checks, real-world evidence, post-market monitoring and repeated assessment in NHS settings.

That matters because public trust, clinical safety and commercial adoption are now tied together.

Why it matters: Good regulation gives useful, well-tested AI models the best chance of being implemented in healthcare. For companies, evidence can translate into adoption, not just another pilot. For the NHS, the best tools get a clearer route to changing care while still being monitored after deployment.

Top Healthcare AI Benchmark

χ-Bench shows healthcare AI agents still fail most long, policy-heavy workflows

χ-Bench, or Clinical Healthcare In-Situ Benchmark, is a healthcare agent benchmark. It tests whether frontier AI agents can complete realistic, end-to-end healthcare operations.

The benchmark covers provider prior authorisation, payer utilisation management and care management. These are exactly the kinds of policy-heavy workflows where hospitals, payers and vendors want automation, but where errors can create delays, denials or unsafe handoffs.

What stands out

  • Tests agents in high-fidelity healthcare software environments.

  • Covers long workflows with policy retrieval, multi-role handoffs and multi-turn interactions.

  • Uses simulated healthcare apps exposed through Model Context Protocol (MCP) tools.

  • Includes a managed-care operations handbook of 1,279 documents.

  • Evaluates 30 agent harness/model configurations.

  • Best performance reached only 28.0% task resolution when agents got one attempt at each task.

  • No agent cleared 20% when asked to complete the same task successfully three times in a row.

  • Long sequences of connected tasks dropped to 3.8%, showing agents became much less reliable across extended workflows.

Developer value: For healthcare AI builders, χ-Bench is a stress test for product readiness. It shows where to harden agents before deployment: policy lookup, tool use, handoffs, consent checks and recovery from mistakes.

Explore Education and Careers resources to build a career in healthcare AI/ML.

If this newsletter was forwarded to you, subscribe here or see previous

How was today’s issue?

NEWSLETTER BY:
Dr Ezekiel Dinama

MD and PhD Researcher at Cambridge University applying physics-informed ML/AI to neurophysiological research.

Keep Reading