AI Made Simple
Posts
Automating Thought of Search: A Journey Towards Soundness and Completeness

Automating Thought of Search: A Journey Towards Soundness and Completeness

Hassan Dhia
August 23, 2024

The relentless pursuit of automating complex tasks has always been at the heart of artificial intelligence. Among these tasks, generating search components for planning problems has traditionally required significant human input, particularly in the iterative feedback loop. But imagine if this process could be fully automated—Daniel Cao, Michael Katz, Harsha Kokel, Kavitha Srinivas, and Shirin Sohrabi explore this possibility in their groundbreaking paper, "Automating Thought of Search: A Journey Towards Soundness and Completeness."

The Core Idea

The crux of their research is the automation of sound and complete search components using large language models (LLMs). The goal? To eliminate human intervention from the iterative feedback loop integral to the Thought of Search (ToS) framework, aiming for 100% accuracy in solving planning problems.

The Technical Approach

Their approach guides LLMs to autonomously generate search components—specifically the successor function (succ) and goal test (isgoal)—through automated feedback mechanisms. Here’s a glimpse into the process:

Initial Prompting: The LLM is first prompted to generate the successor function and goal test.
Goal Function Check: The generated goal test is validated using predefined unit tests to ensure it accurately identifies goal and non-goal states.
Successor Function Soundness Check: The successor function is verified for soundness by running it through a modified Breadth-First Search (BFS) or Depth-First Search (DFS) algorithm with additional checks.
Successor Function Completeness Check (Optional): Completeness is confirmed by ensuring the function generates all known successors for given states.

If any test fails, automated feedback prompts the LLM to revise the code until all tests pass or a predefined iteration limit is reached.

Distinctive Features

What sets AutoToS apart is its complete automation of the feedback loop. Rather than relying on human experts, AutoToS uses unit tests and debugging statements, ensuring the generated search components are both sound and complete—crucial for reliable planning.

Another unique feature is its evaluation across multiple LLMs. The research showcases the robustness of AutoToS across various models of different sizes.

Experimental Setup and Results

The team tested AutoToS on five representative search problems: BlocksWorld, PrOntoQA, Mini Crossword, 24 Game, and Sokoban. Using various LLMs, including GPT-4o, Llama3.1, and DeepSeek-CoderV2, they conducted experiments with a maximum of 10 calls per function, repeated five times.

The results were nothing short of remarkable:

AutoToS achieved 100% accuracy across all domains.
The number of calls to the language model was comparable to ToS with human feedback.
Partial soundness tests significantly improved accuracy.
Completeness checks for successors further enhanced performance.

Advantages and Limitations

Advantages

High Accuracy: Achieves 100% accuracy in generating sound and complete search components.
Minimal Human Intervention: Eliminates the need for human experts in the feedback loop.
Scalability: Maintains consistent performance across various LLMs and problem domains.

Limitations

Unit Test Dependency: Relies on predefined unit tests and partial soundness checks, which may not always be readily available.
Model-Specific Performance Variability: Smaller models may need more iterations to match the accuracy of larger models.

Conclusion

AutoToS represents a significant leap forward in automating complex AI planning tasks. By leveraging automated feedback mechanisms, it removes the need for human intervention while maintaining high accuracy. Its robustness across different models and problem domains makes it a pivotal advancement in AI planning with LLMs. Future research could explore automating unit test generation and fine-tuning smaller models for even better performance.

In essence, AutoToS opens new horizons in AI planning by automating tasks that once required extensive human oversight, paving the way for greater efficiency and scalability.

🚀 Explore the Paper: Interested in pushing the boundaries of what small language models can achieve? This paper is a must-read.

Subscribe for more insights like this!