Building a Plan-and-Execute AI Agent: A Friendly Guide for Beginners

Imagine you want to build a smart assistant that can understand your question, make a plan to answer it step-by-step, execute those steps, adjust the plan if needed, and finally give you a helpful answer.

Sounds complicated? Let me break it down simply, so anyone — even if you’re new to coding — can understand how such an agent works, why it’s useful, and how to build a basic version yourself.

What Is a Plan-and-Execute Agent?

Think of this agent like a chef preparing a complex dish:

Step 1: Plan the recipe — decide what ingredients and steps are needed.
Step 2: Execute the recipe — prepare and cook each step one by one.
Step 3: Adjust the recipe if something goes wrong — maybe add more salt or cook longer.
Step 4: Present the final dish — serve the completed meal.

In AI terms:

The input is your question.
The agent plans a list of steps to find the answer.
It executes each step, like searching the web or analyzing data.
If results need tweaking, it re-plans.
Finally, it gives you the final answer.

Why Build Such an Agent?

To handle complex questions that can’t be answered in one step.
To improve accuracy by breaking problems down.
To mimic human reasoning by planning, acting, and adapting.
To make AI assistants more reliable and interactive.

Key Concepts You Should Know

Term	Meaning
Model	The AI brain (like ChatGPT) that generates text or ideas.
Tools	Extra helpers like web search APIs to get information.
State	Memory of what’s happened so far — input, plan, past results.
Graph	A map of steps (nodes) and connections (edges) defining flow.
Node	A single task or step in the process (e.g., planning, execution).
Edge	The path telling the agent where to go next based on conditions.
Checkpoint	Saving the state so you can continue or recover from interruptions.
Agent	The whole system combining model, tools, state, and flow control.
Gradio	A simple way to create a web interface for your AI agent.

How Does the Agent Work? The Roadmap

User Input: You ask a question.
Planning: The agent creates a plan (list of steps).
Execution: It performs the first step.
Re-planning: Based on results, it either:
- Continues with remaining steps, or
- Updates the plan if needed, or
- Ends with a final answer.
Repeat Execution & Re-planning: Until done.
Output: The agent gives you the final response.

Building a Simple Plan-and-Execute Agent — Step-by-Step

What You Need

Programming language: Python
Libraries:
- langchain_openai — to use AI models (like GPT-4o-mini)
- langgraph — to build and manage the workflow graph
- langchain_community.tools — for web search tool
- pydantic — for structured data management
- gradio — for a simple web UI
OpenAI API key: To access GPT models
Environment: A Python environment with internet

Step 1: Setup and Import Libraries

from langchain_openai import ChatOpenAI
from langchain_community.tools import TavilySearchResults
from langgraph.graph import START, END, StateGraph
from langgraph.checkpoint.memory import MemorySaver
from pydantic import BaseModel, Field
from typing import List, Tuple, Union
from typing_extensions import TypedDict
import gradio as gr
import os

# Load your OpenAI API key (make sure you set OPENAI_API_KEY in your environment)

Step 2: Define the Data Structures (State)

We keep track of everything happening during the conversation:

class State(TypedDict):
    input: str                    # The user question
    plan: List[str]               # Steps to answer
    past_steps: List[Tuple[str, str]]  # List of executed steps and their results
    response: str                 # Final answer once ready

Step 3: Setup the AI Model and Tools

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
search_tool = TavilySearchResults(max_results=3)
tools = [search_tool]

Step 4: Create the Planner

This is where the agent figures out how to answer:

class Plan(BaseModel):
    steps: List[str]

planner_prompt = """
You are a helpful assistant. Given a question, create a step-by-step plan to find the answer.
Question: {input}
Plan:
"""

def planner(state: State) -> Plan:
    prompt = planner_prompt.format(input=state["input"])
    result = llm.invoke({"messages": [("user", prompt)]})
    # Parse result to list of steps (for simplicity, split by newlines)
    steps = result["choices"][0]["message"]["content"].split("\n")
    return Plan(steps=[step.strip() for step in steps if step.strip()])

Step 5: Create the Executor

Executes one step at a time, using tools if needed.

def executor(state: State) -> dict:
    current_step = state["plan"][0]
    prompt = f"Execute this step: {current_step}"
    response = llm.invoke({"messages": [("user", prompt)]})
    result = response["choices"][0]["message"]["content"]
    
    return {
        "past_steps": state.get("past_steps", []) + [(current_step, result)],
        "plan": state["plan"][1:]  # Remove executed step
    }

Step 6: Replanning or Ending

If no steps remain, give the final answer or update plan if needed.

def replanner(state: State) -> dict:
    if not state["plan"]:
        # Generate final response from past steps
        summary_prompt = f"Based on these results: {state['past_steps']}, give a final answer."
        response = llm.invoke({"messages": [("user", summary_prompt)]})
        final_response = response["choices"][0]["message"]["content"]
        return {"response": final_response}
    else:
        # Continue with current plan
        return {"plan": state["plan"]}

Step 7: Build the Workflow Graph

We define how to move from planning → executing → replanning → ending.

graph = StateGraph(State)

graph.add_node("planner", planner)
graph.add_node("executor", executor)
graph.add_node("replanner", replanner)

def decide_next(state: State):
    if "response" in state and state["response"]:
        return END
    elif state.get("plan"):
        return "executor"
    else:
        return "replanner"

graph.add_conditional_edges("replanner", decide_next, {"executor": "executor", END: END})

graph.add_edge(START, "planner")
graph.add_edge("planner", "executor")
graph.add_edge("executor", "replanner")

graph.set_entry_point("planner")

Step 8: Add State Checkpointing

Save progress so agent can resume or recover:

memory = MemorySaver()
compiled_graph = graph.compile(checkpointer=memory)

Step 9: Create a Gradio UI

A simple web interface to chat with your agent:

def run_agent(query):
    inputs = {"input": query}
    config = {"recursion_limit": 50}
    events = compiled_graph.stream(inputs, config)
    final_response = ""
    for event in events:
        if "response" in event:
            final_response = event["response"]
    return final_response

iface = gr.Interface(fn=run_agent, inputs="text", outputs="text", title="Plan-and-Execute AI Agent")
iface.launch()

Common Questions

Q: Why do we need planning and replanning?
A: Complex questions can’t be answered in one go. Planning breaks the task into manageable steps. Replanning helps adapt if some steps didn’t give expected results.

Q: Can this work with other models?
A: Yes! You can use GPT-3.5, GPT-4, or any other supported OpenAI models.

Q: What if a step requires web search?
A: That’s where the search tool comes in. The executor can use it to fetch real-time info.

Q: Can this be used for other tasks?
A: Absolutely! Any problem that benefits from stepwise reasoning — like troubleshooting, research, tutoring — can use this architecture.

Real-World Use Cases

Customer support bots: Plan diagnosis steps, execute checks, and give answers.
Research assistants: Break research queries into sub-questions.
Automation: Plan and execute workflows stepwise.
Education: Stepwise tutoring agents.

Summary

You now understand:

What a plan-and-execute AI agent is and why it’s useful.
How to think about the process: input → plan → execute → replan → output.
The key components: model, tools, state, graph, nodes, edges.
How to build a simple version with Python, LangChain, LangGraph, OpenAI API, and Gr