Democratising science with AI

BCS SGAI member Dr Mercedes Arguello Casteleiro explores how AI may work with society rather than just for it by enabling citizen scientists to participate in the scientific process.

The 21st century brings major global challenges, including climate change and rapid population aging, calling for transdisciplinary involvement of academics and students, companies and societal organisations. Scientific progress requires reliable knowledge — but explainability, complex reasoning, and dealing reliably with facts are hard limits for today’s AI.

Against this backdrop, a group of AI practitioners from the BCS Specialist Group on AI (SGAI) has been exploring how AI can be made more accessible and inclusive for citizen scientists. Specifically, the group has been exploring democratising AI through low-code/no-code (LCNC) principles. If done rigorously and correctly, the group hopes such an AI-enabled approach may allow wider citizen participation in science and scientific breakthroughs.

Democratising AI

Lately, the world has witnessed significant advancements in AI through the development of large language models (LLMs) trained on text, images, audio, videos, and programming code. Generative AI, such as OpenAI’s ChatGPT, has become well known to the public, and the latest multimodal LLMs such as Google’s Gemini 1.5 Pro are even more versatile.

The UK AI market is projected to grow to over $1 trillion by 2035, driven by the adoption of cloud based technology and advancements in digitalisation, fostering unique opportunities for One Health approaches (human-animal-environment health). LLMs can also process and learn from chemical molecules, with enormous potential for biomarker and materials discovery.

The distinction between closed-source and open-source LLMs is critical in achieving AI democratisation (in other words, expansion beyond large technology firms and elite universities). Major companies like Google, Facebook, Microsoft and OpenAI release both open-source and closed-source models. AI startup Hugging Face, valued at $4.5 billion (USD), has significantly contributed to the democratisation of AI by providing access to 685,844 LLMs and 154,515 datasets, all open-source and publicly available.

Low-code/no-code

LCNC platforms such as Amazon’s Honeycode, Microsoft’s Power Apps, and Google’s AppSheet are designed for citizen developers at affordable prices. LCNC platforms are highly visual, including drag-and-drop options and ready-made templates.

LCNC principles are not new — and citizens may not need to become citizen developers to be citizen scientists. Excel is a no-code widely-used application to manipulate CSV files, the format for Hugging Face datasets. Hugging Face promotes using LLMs in Google Colab or Amazon SageMaker Studio Lab — both services are no-charge with no-setup to use.

For you

Be part of something bigger, join BCS, The Chartered Institute for IT.

Besides free access to computing resources, democratising AI with LLMs also requires lowering the technical skills overhead, for example knowledge of LLM architectures and training strategies. It is getting more common to find a base LLM alongside the instruct (or chat) version of the LLM for prompting in Hugging Face models; indeed, prompt engineering is emerging as an attractive no-code approach for citizens interested in contributing to solving scientific problems without becoming programming experts. Hugging Face pipelines use LLMs with minimal coding in Python; with some light systematisation, it is feasible to utilise Hugging Face pipelines with text, images, and sound in just three lines of python code. Updating or altering a few lines of code can be as effective as enacting changes in a drag-and-drop interface.

On 8th May 2024, the SGAI virtual seminar included a presentation with university undergraduates about democratising AI with LLMs. Among the feedback received:

"I thought the students did an excellent job of explaining a complex subject in an understandable way. Well done!" Matt Armstrong-Barnes, Chief Technology Officer, Hewlett-Packard Enterprises

"I was impressed by the students and how easy it is to use the models, and am planning to run a few tests myself" Dr Mathias Kern, Senior Research Manager, BT

The slides for the presentation can be found on the SGAI website.

Conclusion

In conclusion, democratising AI using LLMs through LCNC principles and open-source initiatives can create a more inclusive and dynamic scientific community. The approach may also empower citizen scientists to participate more fully in the scientific process, drive innovation from the ground up, and ensure that the benefits of AI are widely shared.

About the author

Dr Mercedes Arguello Casteleiro is a lecturer in Electronics and Computer Science at the University of Southampton, and has a PhD in Physics. She is interested in "One Health" and "Planetary Health", investigating more transparent and explainable AI models by integrating deep learning with symbolic reasoning (Neurosymbolic AI).