How to Evaluate Mental Health AI

3 min read

This post was co-written with team members from the Division of Digital Psychiatry, Aoife Keane and Jermic Aryee.

The rise of artificial intelligence (AI) chatbots for mental health is impossible to ignore. The Harvard Business Review reported in 2025 that the leading use case of generative AI is now therapy/companionship. Some patients may even be using AI to prepare for sessions with their therapist. And clinicians are interested, too. According to Cross et al., 79 percent (68/86) of the surveyed mental health professionals expressed comfort in using chatbots for mental healthcare.

Despite this growing interest, there is no established framework to practically ensure that the chatbots’ outputs are accurate, ethical, and transparent. Awareness of the need for regulated mental health chatbots has increased significantly, with organizations such as the U.S. Food and Drug Administration, American Psychological Association, and the American Medical Association releasing policies to evaluate AI models. We can expect updates and new recommendations in the next 12 months as regulators and AI companies collaborate to enhance the safety of the technology. However, many AI models for mental health may fall outside of regulation by claiming to offer wellness services rather than care. The difference may be subtle, but it makes a big difference and likely means you need to make your own decision around using or recommending that AI tool.

Below, we highlight three promising frameworks that provide valuable guidance and resources for assessing AI chatbots in mental health. While there are, of course, many more frameworks that offer tremendous value, these three give a practical starting place to learn more from.

1. READI Framework

This comprehensive framework, specifically geared toward mental health, is composed of safety, privacy, equity, effectiveness, engagement, and implementation. This framework accurately identifies the shortcomings of previous frameworks and seeks to incorporate these missing elements into its own framework. READI provides numerous evaluation questions based on each of the presented categories, ensuring an AI model has a clear understanding of how to operate in the mental health field.

2. American Psychological Association Checklist

In 2024, the American Psychological Association (APA) released an AI evaluation checklist for practitioners who intend to integrate AI tools into their practice. This checklist pays specific attention to clinical evidence, data privacy, and chatbot utility. Questions such as whether health data is encrypted, if it abides by HIPAA regulations, and whether it is shared with third-party companies are important to consider when using chatbots for mental health care.

3. Accept-AI

The Accept-AI framework advocates for the careful use of pediatric data in AI research, emphasizing the thoughtful evaluation of six key domains: age, communication, consent and assent, equity, data protection, and technological considerations, including transparency in AI development and testing. Delineating pediatric and adult data is necessary to avoid algorithmic bias. With few studies and standards for the use of pediatric data in AI models, this framework provides a crucial starting point for future bias-free frameworks.

While these three frameworks offer helpful information, they still cannot make a final decision about whether an AI system is safe or effective. If unsure, it is best to assume “no” until proven otherwise, as there is clear evidence that some AI chatbots can cause harm, and the evidence of their benefits is mixed and still evolving.

link

You May Also Like

More From Author

+ There are no comments

Add yours