Imagine you want to build a smart assistant that can understand your question, make a plan to answer it step-by-step, execute those steps, adjust the plan if needed, and finally give you a helpful answer.
Sounds complicated? Let me break it down simply, so anyone — even if you’re new to coding — can understand how such an agent works, why it’s useful, and how to build a basic version yourself.
What Is a Plan-and-Execute Agent?
Think of this agent like a chef preparing a complex dish:
- Step 1: Plan the recipe — decide what ingredients and steps are needed.
- Step 2: Execute the recipe — prepare and cook each step one by one.
- Step 3: Adjust the recipe if something goes wrong — maybe add more salt or cook longer.
- Step 4: Present the final dish — serve the completed meal.
In AI terms:
- The input is your question.
- The agent plans a list of steps to find the answer.
- It executes each step, like searching the web or analyzing data.
- If results need tweaking, it re-plans.
- Finally, it gives you the final answer.
Why Build Such an Agent?
- To handle complex questions that can’t be answered in one step.
- To improve accuracy by breaking problems down.
- To mimic human reasoning by planning, acting, and adapting.
- To make AI assistants more reliable and interactive.
Key Concepts You Should Know
Term | Meaning |
---|---|
Model | The AI brain (like ChatGPT) that generates text or ideas. |
Tools | Extra helpers like web search APIs to get information. |
State | Memory of what’s happened so far — input, plan, past results. |
Graph | A map of steps (nodes) and connections (edges) defining flow. |
Node | A single task or step in the process (e.g., planning, execution). |
Edge | The path telling the agent where to go next based on conditions. |
Checkpoint | Saving the state so you can continue or recover from interruptions. |
Agent | The whole system combining model, tools, state, and flow control. |
Gradio | A simple way to create a web interface for your AI agent. |
How Does the Agent Work? The Roadmap
- User Input: You ask a question.
- Planning: The agent creates a plan (list of steps).
- Execution: It performs the first step.
- Re-planning: Based on results, it either:
- Continues with remaining steps, or
- Updates the plan if needed, or
- Ends with a final answer.
- Repeat Execution & Re-planning: Until done.
- Output: The agent gives you the final response.
Building a Simple Plan-and-Execute Agent — Step-by-Step
What You Need
- Programming language: Python
- Libraries:
langchain_openai
— to use AI models (like GPT-4o-mini)langgraph
— to build and manage the workflow graphlangchain_community.tools
— for web search toolpydantic
— for structured data managementgradio
— for a simple web UI
- OpenAI API key: To access GPT models
- Environment: A Python environment with internet
Step 1: Setup and Import Libraries
from langchain_openai import ChatOpenAI
from langchain_community.tools import TavilySearchResults
from langgraph.graph import START, END, StateGraph
from langgraph.checkpoint.memory import MemorySaver
from pydantic import BaseModel, Field
from typing import List, Tuple, Union
from typing_extensions import TypedDict
import gradio as gr
import os
# Load your OpenAI API key (make sure you set OPENAI_API_KEY in your environment)
Step 2: Define the Data Structures (State)
We keep track of everything happening during the conversation:
class State(TypedDict):
input: str # The user question
plan: List[str] # Steps to answer
past_steps: List[Tuple[str, str]] # List of executed steps and their results
response: str # Final answer once ready
Step 3: Setup the AI Model and Tools
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
search_tool = TavilySearchResults(max_results=3)
tools = [search_tool]
Step 4: Create the Planner
This is where the agent figures out how to answer:
class Plan(BaseModel):
steps: List[str]
planner_prompt = """
You are a helpful assistant. Given a question, create a step-by-step plan to find the answer.
Question: {input}
Plan:
"""
def planner(state: State) -> Plan:
prompt = planner_prompt.format(input=state["input"])
result = llm.invoke({"messages": [("user", prompt)]})
# Parse result to list of steps (for simplicity, split by newlines)
steps = result["choices"][0]["message"]["content"].split("\n")
return Plan(steps=[step.strip() for step in steps if step.strip()])
Step 5: Create the Executor
Executes one step at a time, using tools if needed.
def executor(state: State) -> dict:
current_step = state["plan"][0]
prompt = f"Execute this step: {current_step}"
response = llm.invoke({"messages": [("user", prompt)]})
result = response["choices"][0]["message"]["content"]
return {
"past_steps": state.get("past_steps", []) + [(current_step, result)],
"plan": state["plan"][1:] # Remove executed step
}
Step 6: Replanning or Ending
If no steps remain, give the final answer or update plan if needed.
def replanner(state: State) -> dict:
if not state["plan"]:
# Generate final response from past steps
summary_prompt = f"Based on these results: {state['past_steps']}, give a final answer."
response = llm.invoke({"messages": [("user", summary_prompt)]})
final_response = response["choices"][0]["message"]["content"]
return {"response": final_response}
else:
# Continue with current plan
return {"plan": state["plan"]}
Step 7: Build the Workflow Graph
We define how to move from planning → executing → replanning → ending.
graph = StateGraph(State)
graph.add_node("planner", planner)
graph.add_node("executor", executor)
graph.add_node("replanner", replanner)
def decide_next(state: State):
if "response" in state and state["response"]:
return END
elif state.get("plan"):
return "executor"
else:
return "replanner"
graph.add_conditional_edges("replanner", decide_next, {"executor": "executor", END: END})
graph.add_edge(START, "planner")
graph.add_edge("planner", "executor")
graph.add_edge("executor", "replanner")
graph.set_entry_point("planner")
Step 8: Add State Checkpointing
Save progress so agent can resume or recover:
memory = MemorySaver()
compiled_graph = graph.compile(checkpointer=memory)
Step 9: Create a Gradio UI
A simple web interface to chat with your agent:
def run_agent(query):
inputs = {"input": query}
config = {"recursion_limit": 50}
events = compiled_graph.stream(inputs, config)
final_response = ""
for event in events:
if "response" in event:
final_response = event["response"]
return final_response
iface = gr.Interface(fn=run_agent, inputs="text", outputs="text", title="Plan-and-Execute AI Agent")
iface.launch()
Common Questions
Q: Why do we need planning and replanning?
A: Complex questions can’t be answered in one go. Planning breaks the task into manageable steps. Replanning helps adapt if some steps didn’t give expected results.
Q: Can this work with other models?
A: Yes! You can use GPT-3.5, GPT-4, or any other supported OpenAI models.
Q: What if a step requires web search?
A: That’s where the search tool comes in. The executor can use it to fetch real-time info.
Q: Can this be used for other tasks?
A: Absolutely! Any problem that benefits from stepwise reasoning — like troubleshooting, research, tutoring — can use this architecture.
Real-World Use Cases
- Customer support bots: Plan diagnosis steps, execute checks, and give answers.
- Research assistants: Break research queries into sub-questions.
- Automation: Plan and execute workflows stepwise.
- Education: Stepwise tutoring agents.
Summary
You now understand:
- What a plan-and-execute AI agent is and why it’s useful.
- How to think about the process: input → plan → execute → replan → output.
- The key components: model, tools, state, graph, nodes, edges.
- How to build a simple version with Python, LangChain, LangGraph, OpenAI API, and Gr