1

Can AI power the Global Mapping and Quantification of Animal Suffering? The Pain Atlas Project

Can AI power the Global Mapping and Quantification of Animal Suffering? the Pain Atlas Project

Wladimir J Alonso, Cynthia Schuck-Paim

Demis Hassabis is one of the key figures behind many technological advancements in AI. In his TED talk “How AI Is Unlocking the Secrets of Nature”, he illustrates a groundbreaking achievement where AI saved an estimated one billion year’s worth of research hours in molecular biology. Traditionally, solving the three-dimensional structure of a protein takes a PhD student 4-5 years, but with AI this could be done in minutes. In just one year, the AI-based AlphaFold project described the structure of the 200 million known proteins, which will greatly contribute to advance our understanding and treatment of many diseases.
The capabilities of AI have the potential to revolutionize other scientific fields too, at a pace that is difficult to comprehend. We propose that one such field is the large-scale mapping and quantification of animal suffering, and that the analytical tools required for this task are already available.

Recruiting AI to help quantify animal suffering

Despite AI’s extraordinary capabilities, it still requires proper methods and instructions to solve specific problems effectively. AI cannot solve problems merely due to its cleverness; it needs structured approaches. Traditionally, protein structures were determined using expensive, labor-intensive techniques like X-ray crystallography, NMR spectroscopy, and cryo-EM. Instead, AlphaFold used AI by training a deep learning model on a vast dataset of known protein structures and their corresponding amino acid sequences, enabling it to predict new structures from new sequences.

To quantify affective experiences, we depart from a very different reality, since there is not yet any method that enables directly accessing subjective affective states. Therefore, different from proteins, there is no dataset of known correspondences that can be used by deep learning models. 

However, a method to estimate the key elements of affective experiences – duration and intensity –  has been recently made available through the development of the Welfare Footprint Framework (WFF). Over the last few years, the WFF has been used to estimate the welfare impact of animal welfare reforms and interventions in different species. Although originally developed for use by researchers, the WFF is nevertheless highly suitable for use by AI, specifically in Large Language Models (LLMs) like OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, and Meta’s LLaMA.

Using the Welfare Footprint Framework to leverage AI capabilities

The Welfare Footprint Framework (WFF) was born in 2018, with a long-term goal of achieving a comprehensive picture of the causes and distribution of animal suffering across species, systems, products, practices and geographies. Due to its systematic approach to channeling existing evidence to inform suffering estimates, the WFF is particularly suitable to make use of AI capabilities. The figure 1 below illustrates the method, which involves breaking down animals’ lives into modules (I) to identify the resulting affective states in individuals (II), which are then studied in detail (III).

Figure 1. Assessment process proposed by the WFF, which is typically divided in three stages: I
- Framework for the Description of Living Circumstances: Defines the hierarchical layers of context affecting an individual, including Species, Systems (e.g. intensive, semi-intensive), Life-Fates (e.g., market animals, breeders), Life Phases (e.g., hatchery, grow-out), and Circumstances (e.g., stocking density, feed amount, climate). II – Identification of Biological Conditions and Affective States: inventory of the Conditions (e.g. wounds, immunity changes, diseases, aggression from a co-especific) and resulting Affective States (e.g., thirst, hunger, fear, pain from injuries, pain from diseases) derived from the Circumstances. III – Description and Quantification of the Affective States:  Affective states are described by the Welfare Footprint Method using the Pain-Track (Fig.2) and Pleasure-Track, resulting in the metrics of  (Cumulative Pain and Cumulative Pleasure, respectively, both for individuals and populations.

One particularly critical part of this process, where the WFF made its major contribution, is in the description and quantification of affective states. This is explained in more detail elsewhere, but in short, the method for describing negative experiences is performed with the Pain-Track (Fig. 2) and follows two basic stages:

Breaking Down the Experience: because the intensity of most experiences fluctuates over time (e.g., it may increase as a disease advances, or reduce as healing progresses), each experience is divided into meaningful time segments, each representing an expected intensity pattern. For example, the pain from stubbing a toe starts with sharp, immediate pain lasting seconds, followed by acute pain with swelling, lasting minutes. This continues through secondary pain and bruising (minutes to hours), subacute pain and healing (hours to days), and mild residual pain during recovery. This approach can be applied to any unpleasant experience, such as disease, injury, fear, frustration, or hunger. 

Estimating Intensity and Duration during each Segment: for each temporal segment, the intensity of the negative experience is estimated, ranging from Annoying to Excruciating. To transform scattered knowledge from various fields into estimates within each cell in the Pain-Track, the WFF uses a structured approach that documents existing evidence and highlights how well it supports or contradicts each intensity hypothesis. For example, to estimate the intensity of pain a make piglet endures during surgical castration, a procedure whereby piglets are restrained, the scrotum incised with a scalpel, and the testes extracted by tearing the tissues or severing the spermatic cord, the experience is first divided into meaningful time segments. Next, all relevant information is gathered from the literature. This includes, among others, data on nociceptor densities in this tissue, the animal’s behavior, posture and vocalizations during and after the process, neurological and physiological responses, the characterization and typical duration of inflammatory processes, the type and dose of analgesics required to alleviate pain, and the evolutionary significance of pain for the animal and in the area affected. Each of these lines of evidence is then compared with the definitions of the four categories of pain intensity used in the WFF to estimate the intensity and duration of the experience in each of these specific moments. Once estimates of the intensity and duration of the experience are made for each time segment of a Pain-Track, calculating welfare loss, measured as Cumulative Time in Pain of different intensities (or Cumulative Pain for short), is straightforward and automatic. 

Figure 2. Pain-Track with hypotheses about the temporal evolution of pain intensity in piglets castrated without pain relief, and resulting cumulative time in pain, of each intensity (Cumulative Pain). Estimates of intensity and duration are based on a comprehensive review of evidence (not shown here) on indicators of pain (behavior, physiology, neurology, pharmacology, immunology, evolutionary reasoning) at each temporal stage. Cumulative pain estimates discount eight hours per day from sleeping.

Since discovering the powerful capabilities of LLMs to gather and interpret large volumes of data, we have been exploring their potential, particularly in creating Pain-Tracks. One outcome of this exploration is the ‘Pain-Track’ Custom GPT. This tool provides a starting point for describing and quantifying the impact of various welfare issues across species. In the video below, you can see an example of its operation, where the user simply confirmed each step by answering ‘yes’. However, interactions can be much more detailed (see the section ‘Tips for Using the Pain-Track Custom GPT’ further down in this text).

https://youtu.be/Bhf_ICByeNw

Video demonstrating how the Pain-Track Custom GPT operates at the time of this writing, using the welfare impact of air asphyxia in fish (trout) as an example.

From the moment the tool has started to perform satisfactorily, our priority has been to make it available to the research and advocacy community,  even if they are not yet perfect. We found it particularly useful for didactic purposes, incorporating it into our workshops as a way to engage participants in the understanding and use of the Welfare Footprint method. Participants are motivated by being able to immediately describe and quantify sources of suffering in their target species or even for their personal pain experiences.

While the results of the tool should only be interpreted as a starting point of analysis, which still requires human revision, estimates of Cumulative Pain produced so far are promising. For example, except for Annoying pain, estimates of Cumulative Pain due to surgical castration in male piglets produced by the the Custom Pain-Track GPT tool fall within the credibility interval of estimates developed by researchers, as shown below: 

Still, LLMs are not deterministic, so estimates vary across queries. In the example above, 3 out of 10 individual estimates of Excruciating pain, and 5 (out of 10) individual estimates of Disabling pain did not overlap with those produced by the researchers, despite being in the same ‘ballpark’. Therefore, if the outputs of this GPT tool are aimed at informing decision-making, we advise using the average of about 10 queries, in the same way that human-based estimates are typically the consensus of various researchers.

 tips for using the pain-track custom gpt 

The tool’s output is organized into typical sections of a scientific document (introduction, methods, results, discussion) through a series of consecutive interactions, which are either confirmed or refined by the user. Once you enter the pain source and the species (including humans) to be described and quantified, you can proceed by simply answering ‘yes’ or ‘proceed’. However, in real research situations, users can interact with the tool in various ways to derive greater value from it. Some suggestions include:

 

Providing detailed scenarios at the start: If you have a specific scenario to analyze, provide details at the start. For example, include relevant details about the target animals (which may influence the intensity or temporal evolution of the ‘pain’ experience, such as age, sex, living conditions, comorbidities, breed, among others), and the interval of analysis you are interested on (e.g., from the moment of injury until pain is gone)

Approving External Functions: At some point, you will be asked to approve the running of external functions. It is important to approve these requests as some operations are performed using external Python codes, enhancing the tool’s functionality and accuracy.

Uploading Materials: Enhance analyses by uploading scientific papers, reports, and charts. When doing this, it is worth using the prompt recommended here .

Taking Sleeping Time into Account: if you would like to discount pain during sleeping hours, instruct the GPT tool to do it in the initial instructions. For example, by stating: “Consider the day has only16 hours”.

Interactive Engagement: At any stage, interact with the tool by asking for more details, challenging the provided information, contributing knowledge, or discussing points of special interest.

Keep Conversations Focused: Treat each conversation as a valuable resource. Keep discussions on-topic to maintain the quality of interactions. 

Allowing for AI Variability: Large Language Models are not deterministic, so estimates may vary with each query. This unpredictability offers diverse perspectives, similar to consulting different experts. If you are using the estimates produced by the Pain-Track GPT tool to inform decision-making, we suggest using the average of about 10 queries for the same source of ‘pain’, similar to how human estimates typically reflect the consensus of various researchers.

Human Expertise is Pivotal: AI-generated analyses are so far only valuable starting points, but human expertise and critical evaluation are essential to refine and build upon these initial drafts.

Policy Compliance: Avoid consulting the tool about topics related to torture, extreme suffering, etc., as these are against OpenAI’s policies. If you need to address such topics for scientific purposes, to understand and mitigate hotspots of suffering, explain this in the chatbox at the start of the interaction to reduce the chance of a negative response from the system.

Dealing with Technical Issues: If you encounter issues like improperly formed tables, frozen stages (particularly when justification for estimates of intensity are being generated) or error codes, use the ‘Regenerate’ and ‘Bad response’ icons. This feedback helps improve the GPT model.  Persistent issues can be reported to us at AI@welfarefootprint.org.

Sharing the Tool: If you think this tool can be useful for others, feel free to share this document, rather than sharing the GPT link. This ensures users have the necessary background and access to the newest information, as we will continue to update this document with important developments, including potential transitions to other LLMs or the availability of new tools.

The Pain ATLAS PROJECT

Identifying and understanding the sources of suffering (mainly in its extreme forms) in humans and animals under our custody are arguably among the most morally important research endeavors we can embark on. Given that the technological conditions are ripe, as we have shown throughout this text, we propose the ‘Pain Atlas Project’ to make this possible. By potentially utilizing higher levels of AI resources or collaborating with an AI company or institution, we hope to achieve an advance in animal welfare sciences comparable to what AlphaFold accomplished for molecular biology.

This project is designed to be structured around three core components:

1. Mapping of Suffering

This component involves a comprehensive analysis of the primary source of suffering endured by different animal species throughout their lives and various contexts (Figure 1, sections I & II). This includes conditions such as injuries, diseases, deprivations, and stressors, building on efforts such as the Veterinary Extension of SNOMED CT.  

2. Quantification of Suffering

This component involves use of the Cumulative Pain metric to estimate the magnitude of suffering associated with each of the sources of suffering identified (Figure 1, section III). In a large-scale effort such as the Atlas, the instructions would be more detailed than those allowed with user-end resources like Custom GPTs. For instance, the results for each Pain-Track should be the consensus (or average) of the results obtained from different ‘agents’, each with a different specialty (e.g., ‘physiologist’, ‘veterinary’, ‘neuroscientist’)

3. Visualization of Suffering

This final component uses visualization tools to construct a detailed and global landscape of suffering across species and living conditions. The focus will be on highlighting hotspots of suffering and assessing the effectiveness of potential interventions. This analytical phase is crucial for transforming raw data into actionable insights, ultimately guiding decision-making and intervention strategies..

We invite everyone to provide feedback in the EA forum and discuss potential collaborations (feel free to also reach out to us at AI@welfarefootprint.org).