ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

Vardhan Dongre; Xiaocheng Yang; Emre Can Acikgoz; Suvodip Dey; Gokhan Tur; Dilek Hakkani-Tur

ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents

Vardhan Dongre, Xiaocheng Yang, Emre Can Acikgoz, Suvodip Dey, Gokhan Tur, Dilek Hakkani-Tur

Abstract

Large language model (LLM)-based agents have been increasingly used to interact with external environments (e.g., games, APIs, etc.) and solve tasks. However, current frameworks do not enable these agents to work with users and interact with them to align on the details of their tasks and reach user-defined goals; instead, in ambiguous situations, these agents may make decisions based on assumptions. This work introduces ReSpAct (Reason, Speak, and Act), a novel framework that synergistically combines the essential skills for building task-oriented “conversational” agents. ReSpAct addresses this need for agents, expanding on the ReAct approach. ReSpAct framework enables agents to interpret user instructions, reason about complex tasks, execute appropriate actions and engage in dynamic dialogue to seek guidance, clarify ambiguities, understand user preferences, resolve problems, and use the intermediate feedback and responses of users to update their plans. We evaluated ReSpAct with GPT-4 in environments supporting user interaction, such as task-oriented dialogue (MultiWOZ) and interactive decision-making (Alfworld, WebShop), ReSpAct is flexible enough to incorporate dynamic user feedback and addresses prevalent issues like error propagation and agents getting stuck in reasoning loops. This results in more interpretable, human-like task-solving trajectories than baselines relying solely on reasoning traces. In two interactive decision-making benchmarks, AlfWorld and WebShop, ReSpAct outperforms strong reasoning-only method ReAct by an absolute success rate of 6% and 4%, respectively. In the task-oriented dialogue benchmark MultiWOZ, ReSpAct improved Inform and Success scores by 5.5% and 3%, respectively.

Anthology ID:: 2025.iwsds-1.7
Volume:: Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology
Month:: May
Year:: 2025
Address:: Bilbao, Spain
Editors:: Maria Ines Torres, Yuki Matsuda, Zoraida Callejas, Arantza del Pozo, Luis Fernando D'Haro
Venues:: IWSDS | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 72–102
Language:
URL:: https://aclanthology.org/2025.iwsds-1.7/
DOI:
Bibkey:
Cite (ACL):: Vardhan Dongre, Xiaocheng Yang, Emre Can Acikgoz, Suvodip Dey, Gokhan Tur, and Dilek Hakkani-Tur. 2025. ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents. In Proceedings of the 15th International Workshop on Spoken Dialogue Systems Technology, pages 72–102, Bilbao, Spain. Association for Computational Linguistics.
Cite (Informal):: ReSpAct: Harmonizing Reasoning, Speaking, and Acting Towards Building Large Language Model-Based Conversational AI Agents (Dongre et al., IWSDS 2025)
Copy Citation:
PDF:: https://aclanthology.org/2025.iwsds-1.7.pdf

PDF Cite Search Fix data