In This Issue
Winter Bridge on The Grainger Foundation Frontiers of Engineering
December 13, 2024 Volume 54 Issue 4
This issue features articles by The Grainger Foundation US Frontiers of Engineering 2024 symposium participants. The articles examine cutting-edge developments in microbiology and health, artificial intelligence, the gut-brain connection, and digital twins.

Artificial Social Intelligence? On the Challenges of Socially Aware and Ethically Informed Large Language Models

Thursday, December 12, 2024

Author: Maarten Sap

AI systems should be reshaped so that they are socially aware and ethically informed.

In an era where artificial intelligence systems like large language models (LLMs) have become pervasive tools across various sectors (including high-stakes ones like healthcare and law), the question arises: Do these AI systems possess the necessary social intelligence to interact seamlessly and safely with humans? While LLMs such as GPT-4 have exhibited remarkable progress in processing and generating human-like text, their capabilities in understanding, interpreting, and responding to complex social cues remain limited.

In my research group, we explore questions at the intersection of natural language processing (NLP) and society along three different directions. First, we build methods to examine and improve the social intelligence of NLP and AI systems. For example, we recently quantified LLMs’ ability to respond correctly to non-literal statements (e.g., sarcasm, metaphors, exaggerations), finding that most modern LLMs interpret statements much more literally than humans would (Yerukola et al. 2024).

Second, my research group also explores the ethical and societal implications of NLP and AI systems, building frameworks to measure who technologies work for and do not work for (Santy et al. 2023). For example, we were the first to uncover strong racial biases in hate speech detection systems (Sap et al. 2019), to document biases against Queer identities in LLM pretraining (Dodge et al. 2021), and to measure toxicity in LLMs (Gehman et al. 2020).

Finally, we focus on developing AI and NLP systems for social good. For example, I build language technologies that can help detect, explain, and combat socially biased and hateful text (Mun et al. 2023; Sap et al. 2017, 2020; X. Zhou et al. 2023), and systems to computationally analyze the social effects of stories to foster human-human connection (Antoniak et al. 2024; Shen et al. 2023).

In this article, I will dive into three projects at the outer edges of social intelligence in LLMs and discuss the ethical implications inherent in their design and deployment.

Sotopia: Testing the Social Intelligence of AI via Interactions

Humans navigate their world through intricate social interactions—coordinating, collaborating, and competing with one another to achieve a myriad of social goals. Yet, existing AI systems, such as LLMs, often falter when it comes to social contexts, lacking the nuanced understanding that these human interactions require (Sap et al. 2022; Shapira et al. 2024). In large part, LLMs have become capable enough that the traditional methods of evaluating their abilities have become obsolete, due to being overly simplistic in their formats (e.g., single turn question-answering). This lacks the realism of how AI systems are now used, which is in complex multi-turn interactions with users.

Along with my students and collaborators, I developed Sotopia, a simulation environment to measure LLMs’ ability to interact towards accomplishing social goals. Our novel computational framework allows us to quantify how well LLMs navigate social interactions, examining multiple dimensions of social interaction success such as goal completion, social norm following, relationship preservation, and more.

Sotopia presents LLMs with a diverse tapestry of scenarios where they must engage in role-play, mimicking real-world social interactions, from cooperative tasks to complex negotiations. In our 900 simulations, we found that LLMs still struggle to appropriately navigate social interactions. We found that LLMs are significantly worse than humans at interactions, even when LLMs are interacting with humans.

In subsequent work led by my student (X. Zhou et al. 2024), we examined why LLMs struggle with interactions. We found that these AI systems struggle with the ­phenomenon of information asymmetry, an aspect that underlies all human interactions and dictates that ­interlocutors do not have access to each other’s ­internal thoughts or mental states. Due to their training ­paradigm, LLMs are not equipped to handle this ­asymmetry of information, and thus often fail to interact in natural and intelligent ways. Our findings underscore the need for a robust platform like Sotopia to refine and enhance the social intelligence of future AI systems, and for better methods to develop LLMs that are truly socially intelligent.

Relying on the Unreliable: The Overconfidence of LLMs

As LLMs are used in more interactional settings with lay users, it becomes increasingly important not only to measure their abilities in-vitro (e.g., via benchmarks or simulations), but also to examine how users interpret the LLMs outputs. Particularly, users’ trust and reliance on LLMs’ outputs presents a substantial safety risk, as LLMs are known to confidently output false information (K. Zhou, Jurafsky, and Hashimoto 2023).

LLMs are significantly worse than humans at interactions.

My summer intern and I, along with collaborators, developed a method to measure users’ reliance on LLMs’ use of expressions of certainty or confidence (e.g., “I’m sure the answer is…”) and uncertainty (e.g., “I don’t know, maybe it’s…”). We first discovered that LLMs tend to output much more expressions of certainty than ­uncertainty. Alarmingly, LLMs are often wrongly overconfident, expressing certainty with incorrect outputs nearly 50% of the time. We traced this overconfident bias to later stages of the LLM training pipeline, in which they are optimized to satisfy human preferences via a process called reinforcement learning from human feedback.

Then, we developed a method to measure how LLM confidence affects user reliance. We created a self-­incentivized game where users are asked hard trivia questions. At each question, users can choose to rely on an LLM answer suggestion, only gaining points if they correctly rely on the AI. We found that LLM-generated expressions of certainty cause users to rely on the answer suggestions, even if the LLM is wrong.

These findings highlight a missing focus in AI safety (i.e., user perception and reliance on AI outputs) and point to the need to reassess the training pipeline of LLMs for increased safety.

ParticipAI: Democratizing AI Development and Risk Assessment

The findings above highlight the implications of natural language being the interface between AI systems and humans, enabled by general-purpose systems such as LLMs. Not only are AI systems’ outputs now understandable to anyone, lay users can now also instruct LLMs to accomplish any task that they might want (e.g., via the GPT-store[1]). This opens the door for a virtually limitless set of use cases that people could instruct AI systems to do, including AI dilemmas, or use cases of AI that simultaneously could cause harms and benefits (e.g., a mental health chatbot can help save a life but also backfire and endorse someone’s suicidal ideation).

LLM-generated expressions of certainty cause users to rely on the answer suggestions, even if the LLM is wrong.

In research led by my student (Mun et al. 2024), we created ParticipAI, a participatory surveying framework to anticipate future AI use cases, harms, and benefits. Contrasting with current AI development, safeguarding, and governance approaches, which are predominantly done by industry and policy makers, our framework aims to include lay users into the AI process. In our surveying framework, lay participants imagine what tasks they might use AI systems for 10 years in the future, brainstorm the possible harms and benefits of allowing or not allowing AI systems to do that task, and provide a ­judgment of the acceptability of the future AI use case.

Through a comprehensive study engaging a demographically diverse group of 295 participants, we discovered that public concerns often diverge from the commercial aims of current AI developments. The framework illuminated a vast array of envisioned harms, from societal distrust in AI technologies to the fears of techno-solutionism. Notably, participants’ perceptions of the consequences of not developing certain AI applications heavily influenced their judgments on whether these should be pursued.

Our findings emphasize the need for inclusive frameworks like ParticipAI to guide the ethical development and governance of AI technologies, resonating with broader societal values and concerns.

Conclusion: Towards Socially Aware and Ethically Informed AI

While the capabilities of LLMs continue to advance, their journey toward safely embodying true social intelligence remains in its infancy. The development of environments like Sotopia, research into the implications of AI overconfidence, and democratizing frameworks like ParticipAI represent crucial steps in reshaping AI to be both socially aware and ethically informed.

Looking ahead, continuous interdisciplinary efforts will be essential in bridging the gap between AI capabilities and human social norms. This involves not only technical advancements but also fostering an inclusive dialogue around the ethical, cultural, and societal ramifications of AI. Together, we can pave the way for AI systems that not only assist in human endeavors but do so with a profound understanding of the social fabric they operate within.

References

Antoniak M, Mire J, Sap M, Ash E, Piper A. 2024. Where do people tell stories online? Story detection across online communities. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 7104–30. Association for Computational­Linguistics.

Dodge J, Sap M, Marasović A, Agnew W, Ilharco G, Groeneveld D, Mitchell M, Gardner M. 2021. Documenting large ­webtext corpora: A case study on the colossal clean crawled corpus. In: Proceedings of the 2021 Conference on ­Empirical ­Methods in Natural Language Processing, 1286–1305. ­Association for Computational Linguistics.

Gehman S, Gururangan S, Sap M, Choi Y, Smith NA. 2020. RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In: Findings of the Association for ­Computational Linguistics: EMNLP 2020, 3356–69. ­Association for Computational Linguistics.

Mun J, Allaway E, Yerukola A, Vianna L, Leslie S-J, Sap M. 2023. Beyond denouncing hate: Strategies for countering implied biases and stereotypes in language. In: Findings of the Association for Computational Linguistics: EMNLP 2023, 9759–77. Association for Computational Linguistics.

Mun J, Jiang L, Liang J, Cheong I, DeCario N, Choi Y, Kohno T, Sap M. 2024. Particip-AI: A democratic surveying framework for anticipating future AI use cases, harms and benefits. Proceedings, AAAI/ACM Conference on AI, Ethics, and Society, Oct 23, San Jose, California.

Santy S, Liang J, Bras RL, Reinecke K, Sap M. 2023. ­NLPositionality: Characterizing design biases of datasets and models. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. Online at https://doi.org/10.18653/v1/2023.acl-long.505.

Sap M, Bras RL, Fried D, Choi Y. 2022. Neural theory-of-mind? On the limits of social intelligence in large LMs. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 3762–80. Association for Computational Linguistics.

Sap M, Card D, Gabriel S, Choi Y, Smith NA. 2019. The risk of racial bias in hate speech detection. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 1668–78. Association for Computational Linguistics.

Sap M, Gabriel S, Qin L, Jurafsky D, Smith NA, Choi Y. 2020. Social bias frames: Reasoning about social and power implications of language. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5477–90. Association for Computational Linguistics.

Sap M, Prasettio MC, Holtzman A, Rashkin H, Choi Y. 2017. Connotation frames of power and agency in modern films. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Online at https://doi.org/10.18653/v1/d17-1247.

Shapira N, Levy M, Alavi SH, Zhou X, Choi Y, Goldberg Y, Sap M, Shwartz V. 2024. Clever Hans or neural theory of mind? Stress testing social reasoning in Large Language ­Models. In: Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), 2257–73. Association for Computational Linguistics.

Shen J, Sap M, Colon-Hernandez P, Park H, Breazeal C. 2023. Modeling empathic similarity in personal narratives. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 6237–52. Association for Computational Linguistics.

Yerukola A, Vaduguru S, Fried D, Sap M. 2024. Is the pope Catholic? Yes, the pope is Catholic. Generative evaluation of non-literal intent resolution in LLMs. In: Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 265–75. Association for Computational Linguistics.

Zhou K, Jurafsky D, Hashimoto T. 2023. Navigating the grey area: Expressions of overconfidence and uncertainty in ­language models. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 5506–24. Association for Computational Linguistics.

Zhou X, Su Z, Eisape T, Kim H, Sap M. 2024. Is this the real life? Is this just fantasy? The misleading success of simulating social interactions with LLMs. In: Proceedings of the 2024 Conference on Empirical Methods in Natural ­Language Processing. Association of Computational Linguistics ­(forthcoming).

Zhou X, Zhu H, Yerukola A, Davidson T, Hwang JD, ­Swayamdipta S, Sap M. 2023. COBRA frames: Contextual reasoning about effects and harms of offensive statements. In: Findings of the Association for Computational Linguistics: ACL, 2023 6294–6315. Rogers A, Boyd-Graber J, Toronto NO, eds. Association for Computational Linguistics.

 


[1]  https://openai.com/index/introducing-the-gpt-store/

About the Author:Maarten Sap is assistant professor at Carnegie Mellon University.