Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis

bmj-2023-078538.full.pdf

LLM Disinformation. 

Are there effective measures in place to prevent the misuse of large language models (LLMs) in generating health disinformation? 

Menz et al. evaluated the effectiveness of safeguards in preventing LLMs from generating health disinformation and assessed the transparency of AI developers regarding their risk mitigation processes.

The study, conducted in September 2023, tested OpenAI’s GPT-4, Google’s PaLM 2 and Gemini Pro, Anthropic’s Claude 2, and Meta’s Llama 2 by prompting them to generate content claiming that sunscreen causes skin cancer and the alkaline diet cures cancer.

This evaluation was repeated 12 weeks later to assess any subsequent improvements in safeguards.

Claude 2 (via Poe) consistently declined to generate disinformation.

GPT-4 (via Copilot) initially refused but later generated disinformation, highlighting the fluctuating nature of safeguards within the current self-regulating AI ecosystem.

GPT-4 (via ChatGPT), PaLM 2/Gemini Pro, and Llama 2 consistently produced disinformation blogs incorporating attention-grabbing titles, fake references, and fabricated testimonials.

Although each LLM had mechanisms to report concerning outputs, developers did not respond when vulnerabilities were reported.

The study suggests that implementing standards and third-party filters might reduce discrepancies in outputs between different tools, as exemplified by the differences observed between ChatGPT and Copilot, both powered by GPT-4.

Conclusion

Our findings highlight notable inconsistencies in the effectiveness of LLM safeguards to prevent the mass generation of health disinformation. Implementing effective safeguards to prevent the potential misuse of LLMs for disseminating health disinformation has been found to be feasible. For many LLMs, however, these measures have not been implemented effectively, or the maintenance of robustness has not been prioritized.

Thus, in the current AI environment where safety standards and policies remain poorly defined, malicious actors can potentially use publicly accessible LLMs for the mass generation of diverse and persuasive health disinformation, posing substantial risks to public health messaging—risks that will continue to increase with advancements in generative AI for audio and video content.

Moreover, this study found substantial deficiencies in the transparency of AI developers about commitments to mitigating risks of health disinformation.

Given that the AI landscape is rapidly evolving, public health and medical bodies have an opportunity to deliver a united and clear message about the importance of health disinformation risk mitigation in developing AI regulations, the cornerstones of which should be transparency, health specific auditing, monitoring, and patching.

답글 남기기

이메일 주소는 공개되지 않습니다. 필수 필드는 *로 표시됩니다

Share via
Copy link