Can AI power the Global Mapping and Quantification of Animal Suffering? the Pain Atlas Project

Wladimir J Alonso, Cynthia Schuck-Paim

One of the key figures behind many technological advancements in AI is Demis Hassabis. In his TED talk “How AI Is Unlocking the Secrets of Nature”, he illustrates a groundbreaking achievement where AI saved an estimated one billion year’s worth of research hours in molecular biology. Traditionally, solving the three-dimensional structure of a protein takes a PhD student 4-5 years, but with AI this could be done in minutes. In just one year, the AI-based AlphaFold project described the structure of the 200 million known proteins, which will greatly contribute to advance our understanding and treatment of many diseases.
The capabilities of AI have the potential to revolutionize other scientific fields too, at a pace that is difficult to comprehend. We propose that one such field is the large-scale mapping and quantification of animal suffering, and that the analytical tools required for this task are already available.

Recruiting AI to help quantify animal suffering

Despite AI’s extraordinary capabilities, it still requires proper methods and instructions to solve specific problems effectively. AI cannot solve problems merely due to its cleverness; it needs structured approaches. Traditionally, protein structures were determined using expensive, labor-intensive techniques like X-ray crystallography, NMR spectroscopy, and cryo-EM. Instead, AlphaFold used AI by training a deep learning model on a vast dataset of known protein structures and their corresponding amino acid sequences, enabling it to predict new structures from new sequences.

To quantify affective experiences, we depart from a very different reality, since there is not yet any method that enables directly accessing subjective affective states. Therefore, different from proteins, there is no dataset of known correspondences that can be used by deep learning models. 

However, a method to estimate the key elements of affective experiences – duration and intensity –  has been recently made available through the development of the Welfare Footprint Framework (WFF). Over the last few years, the WFF has been used to estimate the welfare impact of animal welfare reforms and interventions in different species. Although originally developed for use by researchers, the WFF is nevertheless highly suitable for use by AI, specifically in Large Language Models (LLMs) like OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, and Meta’s LLaMA.

Using the Welfare Footprint Framework to leverage AI capabilities

The Welfare Footprint Framework (WFF) was born in 2018, with a long-term goal of achieving a comprehensive picture of the causes and distribution of animal suffering across species, systems, products, practices and geographies. Due to its systematic approach to channeling existing evidence to inform suffering estimates, the WFF is particularly suitable to make use of AI capabilities. The figure below illustrates the method, which involves breaking down animals’ lives into modules (I) to identify the resulting affective states in individuals (II), which are then studied in detail (III).

This figure outlines the assessment process proposed by the WFF, which is typically divided in three stages: I
- Framework for the Description of Living Circumstances: Defines the hierarchical layers of context affecting an individual, including Species, Systems (e.g. intensive, semi-intensive), Life-Fates (e.g., market animals, breeders), Life Phases (e.g., hatchery, grow-out), and Circumstances (e.g., stocking density, feed amount, climate). II - Identification of Biological Conditions and Affective States: inventory of the Conditions (e.g. wounds, immunity changes, diseases, aggression from a co-especific) and resulting Affective States (e.g., thirst, hunger, fear, pain from injuries, pain from diseases) derived from the Circumstances. III - Description and Quantification of the Affective States:  Affective states are described by the Welfare Footprint Method using the Pain-Track and Pleasure-Track, resulting in the metrics of  (Cumulative Pain and Cumulative Pleasure, respectively, both for individuals and populations.

One particularly critical part of this process, where the WFF made its major contribution, is in the description and quantification of affective states. This is explained in more detail elsewhere, but in short, the method for describing negative experiences is performed with the Pain-Track and follows two basic stages:

Breaking Down the Experience: because the intensity of most experiences fluctuates over time (e.g., it may increase as a disease advances, or reduce as healing progresses), each experience is divided into meaningful time segments, each representing an expected intensity pattern. For example, the pain from stubbing a toe starts with sharp, immediate pain lasting seconds, followed by acute pain with swelling, lasting minutes. This continues through secondary pain and bruising (minutes to hours), subacute pain and healing (hours to days), and mild residual pain during recovery. This approach can be applied to any unpleasant experience, such as disease, injury, fear, frustration, or hunger. 

Estimating Intensity and Duration during each Segment: for each temporal segment, the intensity of the negative experience is estimated, ranging from Annoying to Excruciating. To transform scattered knowledge from various fields into estimates within each cell in the Pain-Track, the WFF uses a structured approach that documents existing evidence and highlights how well it supports or contradicts each intensity hypothesis. For example, to estimate the intensity of pain a make piglet endures during surgical castration, a procedure whereby piglets are restrained, the scrotum incised with a scalpel, and the testes extracted by tearing the tissues or severing the spermatic cord, the experience is first divided into meaningful time segments. Next, all relevant information is gathered from the literature. This includes, among others, data on nociceptor densities in this tissue, the animal’s behavior, posture and vocalizations during and after the process, neurological and physiological responses, the characterization and typical duration of inflammatory processes, the type and dose of analgesics required to alleviate pain, and the evolutionary significance of pain for the animal and in the area affected. Each of these lines of evidence is then compared with the definitions of the four categories of pain intensity used in the WFF to estimate the intensity and duration of the experience in each of these specific moments. Once estimates of the intensity and duration of the experience are made for each time segment of a Pain-Track, calculating welfare loss, measured as Cumulative Time in Pain of different intensities (or Cumulative Pain for short), is straightforward and automatic. 

Pain-Track with hypotheses about the temporal evolution of pain intensity in piglets castrated without pain relief, and resulting cumulative time in pain, of each intensity (Cumulative Pain). Estimates of intensity and duration are based on a comprehensive review of evidence (not shown here) on indicators of pain (behavior, physiology, neurology, pharmacology, immunology, evolutionary reasoning) at each temporal stage. Cumulative pain estimates discount eight hours per day from sleeping.

Since learning about the powerful capabilities of LLMs to gather and interpret large volumes of data, we have been experimenting with harnessing their potential, chiefly for assisting in creating the Pain-Tracks. One of the tools resulting from this exercise is the ‘Pain-Track’ Custom GPT. This tool is intended to provide a starting point for the description and quantification of the impact of different welfare issues in any species. 

The tool’s output is organized in typical sections of a scientific document (introduction, methods, results, discussion) through a number of consecutive interactions, which are either confirmed or refined by the user. The video below demonstrates its operation, using the welfare impact of air asphyxia in fish (trout) as an example.

Video demonstrating how the Pain-Track Custom GPT operates at the time of this writing, using the welfare impact of air asphyxia in fish (trout) as an example.

While the results of the tool should only be interpreted as a starting point of analysis, which still requires human revision, estimates of Cumulative Pain produced so far are promising. For example, except for Annoying pain, estimates of Cumulative Pain due to surgical castration in male piglets produced by the Pain-Track GPT tool fall within the credibility interval of estimates developed by researchers, as shown in the Pain-Track before. Still, LLMs are not deterministic, so estimates vary across queries. In the example below, 3 out of 10 individual estimates of Excruciating pain, and 5 (out of 10) individual estimates of Disabling pain did not overlap with those produced by the researchers, despite being in the same ‘ballpark’. Therefore, if the outputs of this GPT tool are aimed at informing decision-making, we advise using the average of about 10 queries, in the same way that human-based estimates are typically the consensus of various researchers.

In all queries, the estimates above were produced using default values in the tool, but in real research situations, users can interact in many ways with the tool to obtain greater value from it. Some suggestions include:

Uploading Materials: Enhance analyses by uploading scientific papers, reports, and charts. When doing this, it is worth using the prompt recommended here .

Specific Scenarios: If you have a specific scenario to analyze, provide details at the start. For example, include relevant details about the target animals (which may influence the intensity or temporal evolution of the ‘pain’ experience, such as age, sex, living conditions, comorbidities, breed, among others), and the interval of analysis you are interested on (e.g., from the moment of injury until pain is gone)

Sleeping time: if you would like to discount pain during sleeping hours, instruct the GPT tool to do it in the initial instructions. For example, by stating: “Consider the day has only 16 hours”.

Interactive Engagement: At any stage, interact with the tool by asking for more details, challenging the provided information, contributing knowledge, or discussing points of special interest.

Focused Conversations: Treat each conversation as a valuable resource. Keep discussions on-topic to maintain the quality of interactions. 

Approving External Functions: At some point, you will be asked to approve the running of external functions. It is important to approve these requests as some operations are performed using external Python codes, enhancing the tool’s functionality and accuracy.

From the moment the tool has started to perform satisfactorily, our priority has been to make it available to the research and advocacy community,  even if they are not yet perfect. We found it particularly useful for didactic purposes, incorporating it into our workshops as a way to engage participants in the understanding and use of the Welfare Footprint method. Participants are motivated by being able to immediately describe and quantify sources of suffering in their target species or even for their personal pain experiences. 

Existing limitations with the tool are primarily due to the limited capabilities of current chat boxes. Instructions, in the case of Custom GPTs, are limited to 8,000 characters (although we outsource parts of the analytical process by adding functions and uploading content to the knowledge base), preventing the tool from taking full advantage of existing capabilities of LLMs. Still, in all cases, it is important to take the following points into account:

LLM Variability: Large Language Models are not deterministic, so estimates are likely to vary with each query. This unpredictability offers diverse perspectives, similar to consulting different experts. If you are going to use the estimates produced by the Pain-Track GPT tool to inform decision-making, we suggest using the average of about 10 queries  for the same source of ‘pain’, in the same way that estimates conducted by humans typically reflect the consensus of various researchers (as we did with the example above).

Human Expertise: AI-generated analyses are so far only valuable starting points, but human expertise and critical evaluation are essential to refine and build upon these initial drafts.

Policy Compliance: Avoid consulting the tool about topics related to torture, extreme suffering, etc., as these are against OpenAI’s policies. If you need to address such topics for scientific purposes, to understand and mitigate hotspots of suffering, explain this in the chatbox at the start of the interaction to reduce the chance of a negative response from the system.

Technical Issues: If you encounter issues like improperly formed tables, frozen stages (particularly when justification for estimates of intensity are being generated) or error codes, use the ‘Regenerate’ and ‘Bad response’ icons. This feedback helps improve the GPT model.  Persistent issues can be reported to us at AI@welfarefootprint.org.

If you think this tool can be useful for others, feel free to share this document, rather than sharing the GPT link. This ensures users have the necessary background and access to the newest information, as we will continue to update this document with important developments, including potential transitions to other LLMs or the availability of new tools.

The Pain ATLAS PROJECT

It is difficult to imagine anything more important than mapping the sources of suffering in humans and animals under our care. Quantifying suffering provides critical data that can inform decision-making in multiple contexts, including the development of more effective animal welfare policies, the identification of key areas for intervention, research priorities, the optimization of resource allocation, as well as a means to increase public awareness. 

We envision a megaproject, the “Pain Atlas Project” (or “AlphaPain”), in collaboration with an AI company or institution, to achieve a leap in the animal welfare sciences similar to what AlphaFold achieved for molecular biology.

This project is designed to be structured around three core components:

1. Mapping of Suffering

This component involves a comprehensive analysis of the primary source of suffering endured by different animal species throughout their lives and various contexts. This includes injuries, diseases, deprivations, and stressors, building on efforts such as the Veterinary Extension of SNOMED CT. The primary aim is to construct a detailed inventory of pain sources, laying the groundwork for subsequent quantification efforts.

2. Quantification of Suffering

This component involves use of the Cumulative Pain metric to estimate the magnitude of suffering associated with each of the sources of suffering identified. In a large-scale effort such as the Atlas, the instructions would be more detailed than those allowed with user-end resources like Custom GPTs. For instance, the results for each Pain-Track should be the consensus (or average) of the results obtained from different ‘agents’, each with a different specialty (e.g., ‘physiologist’, ‘veterinary’, ‘neuroscientist’)

3. Visualization of Suffering

This final component involves use of visualization tools to construct a detailed and global landscape of suffering across species and living conditions. The focus will be on highlighting hotspots of suffering and the effectiveness of potential interventions. This analytical phase is critical for transforming raw data into actionable insights, ultimately guiding decision-making and intervention strategies.

We invite everyone to provide feedback in the EA forum and discuss potential collaborations (feel free to also reach out to us at AI@welfarefootprint.org). 

image_pdfimage_print