Automating Workflows with AI Agents Link to heading

The latter half of 2024 has brought significant advances in AI agent capabilities, particularly with Anthropic’s Model Context Protocol (MCP) and enhanced tool-using abilities in Claude. After months of working with multi-step prompts and manual context passing, I’m finally experimenting with true autonomous agents; systems that can use tools, make decisions, and execute complex workflows with minimal supervision. This represents a major leap from the copy-paste LLM interactions I was doing throughout most of 2024.

Evolution from Assistants to Agents Link to heading

The progression I’ve witnessed throughout 2024 has been remarkable:

Early 2024 (Post-Copilot): Using GitHub Copilot with Roo Code in VS Code for code completion and ChatGPT for explanations Mid 2024: Multi-step prompts with manual context management between API calls Late 2024: True autonomous agents with MCP enabling seamless tool integration

The key difference is agency; moving from “ask and copy-paste” to “describe the goal and let the agent figure out how to accomplish it.”

Agent Architecture Patterns Link to heading

The ReAct Pattern (Reasoning + Acting) Link to heading

The most successful agent pattern I’ve implemented follows the ReAct framework; alternating between reasoning about what to do next and taking actions:

from langchain.agents import AgentExecutor, create_react_agent
from langchain.tools import BaseTool, Tool
from langchain.prompts import PromptTemplate
from langchain_openai import ChatOpenAI
import subprocess
import requests
import os

class GitCommandTool(BaseTool):
    name = "git_command"
    description = "Execute git commands in the current repository"

    def _run(self, command: str) -> str:
        """Execute a git command and return the output"""
        try:
            result = subprocess.run(
                f"git {command}",
                shell=True,
                capture_output=True,
                text=True,
                timeout=30
            )

            if result.returncode == 0:
                return f"Success: {result.stdout}"
            else:
                return f"Error: {result.stderr}"
        except subprocess.TimeoutExpired:
            return "Error: Command timed out"
        except Exception as e:
            return f"Error: {str(e)}"

class FileOperationTool(BaseTool):
    name = "file_operations"
    description = "Read, write, or modify files in the project"

    def _run(self, operation: str) -> str:
        """Perform file operations"""
        try:
            parts = operation.split('|')
            action = parts[0]

            if action == "read":
                file_path = parts[1]
                with open(file_path, 'r') as f:
                    content = f.read()
                return f"File content:\n{content}"

            elif action == "write":
                file_path = parts[1]
                content = parts[2]
                with open(file_path, 'w') as f:
                    f.write(content)
                return f"Successfully wrote to {file_path}"

            elif action == "append":
                file_path = parts[1]
                content = parts[2]
                with open(file_path, 'a') as f:
                    f.write(content)
                return f"Successfully appended to {file_path}"

        except Exception as e:
            return f"Error: {str(e)}"

class APITestTool(BaseTool):
    name = "api_test"
    description = "Test API endpoints and return response status"

    def _run(self, request_info: str) -> str:
        """Test API endpoints"""
        try:
            parts = request_info.split('|')
            method = parts[0].upper()
            url = parts[1]

            if method == "GET":
                response = requests.get(url, timeout=10)
            elif method == "POST":
                data = parts[2] if len(parts) > 2 else {}
                response = requests.post(url, json=data, timeout=10)
            else:
                return f"Unsupported method: {method}"

            return f"Status: {response.status_code}, Response: {response.text[:200]}"

        except Exception as e:
            return f"Error: {str(e)}"

class DeploymentAgent:
    def __init__(self, openai_api_key):
        self.llm = ChatOpenAI(
            openai_api_key=openai_api_key,
            model="gpt-4",
            temperature=0.1
        )

        self.tools = [
            GitCommandTool(),
            FileOperationTool(),
            APITestTool(),
            Tool(
                name="shell_command",
                description="Execute shell commands (use carefully)",
                func=self._execute_shell_command
            )
        ]

        # Create the agent prompt
        self.prompt = PromptTemplate(
            input_variables=["tools", "tool_names", "input", "agent_scratchpad"],
            template="""
            You are an AI agent responsible for deploying web applications safely.

            Your workflow should typically follow these steps:
            1. Check current git status and branch
            2. Run tests to ensure code quality
            3. Build the application if needed
            4. Deploy to staging environment
            5. Run integration tests against staging
            6. If all tests pass, deploy to production
            7. Verify production deployment

            Available tools:
            {tools}

            Tool names: {tool_names}

            IMPORTANT SAFETY RULES:
            - Always run tests before deploying
            - Never deploy if tests are failing
            - Always deploy to staging before production
            - Verify each step before proceeding
            - If anything goes wrong, stop and report the issue

            Use this format:
            Thought: [what you need to do next]
            Action: [tool name]
            Action Input: [tool input]
            Observation: [tool output]

            When you're done, respond with:
            Final Answer: [summary of what was accomplished]

            Human: {input}

            {agent_scratchpad}
            """
        )

        # Create the agent
        self.agent = create_react_agent(
            self.llm,
            self.tools,
            self.prompt
        )

        self.agent_executor = AgentExecutor(
            agent=self.agent,
            tools=self.tools,
            verbose=True,
            max_iterations=10,
            handle_parsing_errors=True
        )

    def _execute_shell_command(self, command: str) -> str:
        """Execute shell command with safety checks"""
        # Safety checks - prevent dangerous commands
        dangerous_commands = [
            'rm -rf', 'sudo rm', 'format', 'fdisk',
            'chmod 777', 'chown -R', '>>', 'dd if='
        ]

        if any(dangerous in command.lower() for dangerous in dangerous_commands):
            return "Error: Dangerous command blocked for safety"

        try:
            result = subprocess.run(
                command,
                shell=True,
                capture_output=True,
                text=True,
                timeout=60
            )

            if result.returncode == 0:
                return f"Success: {result.stdout}"
            else:
                return f"Error: {result.stderr}"
        except Exception as e:
            return f"Error: {str(e)}"

    def deploy_application(self, project_path: str, environment: str = "staging"):
        """Deploy application with the agent"""
        os.chdir(project_path)

        task = f"""
        Deploy the application in the current directory to {environment}.

        Project details:
        - This is a Node.js application with package.json
        - Tests can be run with 'npm test'
        - Build with 'npm run build'
        - Deploy script available as 'npm run deploy:{environment}'

        Follow the safety workflow and report any issues.
        """

        return self.agent_executor.invoke({"input": task})

The Planning Agent Pattern Link to heading

For complex workflows, I’ve found success with agents that create detailed plans before execution:

class PlanningAgent:
    def __init__(self, openai_api_key):
        self.llm = ChatOpenAI(openai_api_key=openai_api_key, model="gpt-4")

    def create_plan(self, objective: str, constraints: list = None) -> dict:
        """Create a detailed execution plan"""

        constraints_text = ""
        if constraints:
            constraints_text = f"Constraints:\n" + "\n".join(f"- {c}" for c in constraints)

        planning_prompt = f"""
        Create a detailed execution plan for this objective: {objective}

        {constraints_text}

        Your plan should include:
        1. High-level strategy
        2. Detailed steps with expected outcomes
        3. Risk assessment and mitigation strategies
        4. Success criteria
        5. Rollback plan if things go wrong

        Format as JSON:
        {{
            "strategy": "brief description of approach",
            "steps": [
                {{
                    "step": 1,
                    "description": "what to do",
                    "expected_outcome": "what should happen",
                    "tools_needed": ["list", "of", "tools"],
                    "risks": ["potential", "issues"],
                    "verification": "how to confirm success"
                }}
            ],
            "success_criteria": ["criteria", "for", "success"],
            "rollback_plan": "what to do if things fail"
        }}
        """

        response = self.llm.invoke(planning_prompt)
        return json.loads(response.content)

    def execute_plan(self, plan: dict, tools: list) -> dict:
        """Execute the plan step by step"""
        results = {
            "plan": plan,
            "execution_results": [],
            "success": True,
            "final_status": ""
        }

        for step in plan["steps"]:
            print(f"Executing step {step['step']}: {step['description']}")

            # Execute step with appropriate tools
            step_result = self._execute_step(step, tools)

            results["execution_results"].append({
                "step": step["step"],
                "result": step_result,
                "success": step_result.get("success", False)
            })

            # If step fails, consider rollback
            if not step_result.get("success", False):
                print(f"Step {step['step']} failed. Considering rollback...")
                results["success"] = False

                # Ask agent whether to continue or rollback
                continue_decision = self._should_continue(step, step_result, plan)

                if not continue_decision:
                    print("Executing rollback plan...")
                    self._execute_rollback(plan["rollback_plan"], tools)
                    results["final_status"] = "Rolled back due to failure"
                    break

        if results["success"]:
            results["final_status"] = "Plan executed successfully"

        return results

Practical Agent Applications Link to heading

Code Review and Refactoring Agent Link to heading

class CodeRefactoringAgent:
    def __init__(self, openai_api_key):
        self.llm = ChatOpenAI(openai_api_key=openai_api_key, model="gpt-4")

        self.tools = [
            Tool(
                name="analyze_code",
                description="Analyze code files for issues and improvement opportunities",
                func=self._analyze_code
            ),
            Tool(
                name="refactor_code",
                description="Apply refactoring changes to code files",
                func=self._refactor_code
            ),
            Tool(
                name="run_tests",
                description="Execute test suite to verify changes",
                func=self._run_tests
            ),
            Tool(
                name="create_pull_request",
                description="Create a pull request with the changes",
                func=self._create_pull_request
            )
        ]

    def _analyze_code(self, file_path: str) -> str:
        """Analyze code for refactoring opportunities"""
        with open(file_path, 'r') as f:
            code = f.read()

        analysis_prompt = f"""
        Analyze this code for refactoring opportunities:

        {code}

        Look for:
        1. Code duplication
        2. Long methods/functions
        3. Complex conditionals
        4. Performance issues
        5. Maintainability problems

        Return JSON with specific issues and suggested improvements.
        """

        response = self.llm.invoke(analysis_prompt)
        return response.content

    def _refactor_code(self, refactor_request: str) -> str:
        """Apply specific refactoring changes"""
        # Parse the refactoring request
        # Apply changes to files
        # Return summary of changes
        pass

    def refactor_codebase(self, project_path: str, scope: str = "medium"):
        """Autonomously refactor a codebase"""

        task = f"""
        Refactor the codebase in {project_path} with {scope} scope changes.

        Process:
        1. Analyze all Python files for refactoring opportunities
        2. Prioritise improvements by impact and risk
        3. Apply safe refactoring changes one at a time
        4. Run tests after each change
        5. If tests fail, revert the change and try alternative approach
        6. Create a summary report of all changes made

        Focus on low-risk, high-impact improvements.
        """

        return self.agent_executor.invoke({"input": task})

Content Management Agent Link to heading

class ContentManagementAgent:
    def __init__(self, openai_api_key):
        self.llm = ChatOpenAI(openai_api_key=openai_api_key)

        self.tools = [
            Tool(
                name="analyze_content_gaps",
                description="Analyze existing content to identify gaps",
                func=self._analyze_content_gaps
            ),
            Tool(
                name="generate_content",
                description="Generate new content based on requirements",
                func=self._generate_content
            ),
            Tool(
                name="optimise_seo",
                description="Optimise content for search engines",
                func=self._optimise_seo
            ),
            Tool(
                name="schedule_publication",
                description="Schedule content for publication",
                func=self._schedule_publication
            )
        ]

    def _analyze_content_gaps(self, content_directory: str) -> str:
        """Analyze existing content for gaps and opportunities"""
        # Scan content directory
        # Analyze topics, keywords, content types
        # Identify missing areas
        pass

    def manage_content_calendar(self, goals: str, timeline: str):
        """Autonomously manage content creation and publication"""

        task = f"""
        Manage content creation for the next {timeline} with these goals: {goals}

        Process:
        1. Analyze existing content to identify gaps
        2. Generate content calendar with topic suggestions
        3. Create high-priority content pieces
        4. Optimise all content for SEO
        5. Schedule publication across appropriate channels
        6. Generate performance tracking metrics

        Focus on creating valuable, engaging content that meets the stated goals.
        """

        return self.agent_executor.invoke({"input": task})

Agent Safety and Reliability Link to heading

Implementing Safeguards Link to heading

class SafetyLayer:
    def __init__(self, allowed_commands=None, blocked_patterns=None):
        self.allowed_commands = allowed_commands or []
        self.blocked_patterns = blocked_patterns or [
            r'rm\s+-rf',
            r'sudo\s+rm',
            r'chmod\s+777',
            r'>\s*/dev/',
            r'dd\s+if='
        ]

    def validate_command(self, command: str) -> tuple[bool, str]:
        """Validate that a command is safe to execute"""

        # Check against blocked patterns
        for pattern in self.blocked_patterns:
            if re.search(pattern, command, re.IGNORECASE):
                return False, f"Command blocked by safety pattern: {pattern}"

        # Check if command is in allowed list (if specified)
        if self.allowed_commands:
            command_base = command.split()[0]
            if command_base not in self.allowed_commands:
                return False, f"Command not in allowed list: {command_base}"

        return True, "Command validated"

    def sandbox_execution(self, command: str, working_dir: str = None):
        """Execute command in a sandboxed environment"""

        is_safe, message = self.validate_command(command)
        if not is_safe:
            raise ValueError(f"Unsafe command: {message}")

        # Create temporary directory if needed
        if working_dir:
            original_dir = os.getcwd()
            os.chdir(working_dir)

        try:
            result = subprocess.run(
                command,
                shell=True,
                capture_output=True,
                text=True,
                timeout=30,
                env={**os.environ, 'PATH': '/usr/local/bin:/usr/bin:/bin'}
            )

            return {
                'success': result.returncode == 0,
                'stdout': result.stdout,
                'stderr': result.stderr,
                'returncode': result.returncode
            }

        finally:
            if working_dir:
                os.chdir(original_dir)

Human Oversight Integration Link to heading

class HumanOversightAgent:
    def __init__(self, approval_threshold=0.7):
        self.approval_threshold = approval_threshold
        self.pending_approvals = []

    def requires_approval(self, action: str, confidence: float) -> bool:
        """Determine if an action requires human approval"""

        high_risk_actions = [
            'deploy to production',
            'delete database',
            'modify user permissions',
            'send external communications',
            'make financial transactions'
        ]

        # High-risk actions always require approval
        if any(risk_action in action.lower() for risk_action in high_risk_actions):
            return True

        # Low confidence actions require approval
        if confidence < self.approval_threshold:
            return True

        return False

    def request_approval(self, action: str, context: dict, confidence: float) -> str:
        """Request human approval for an action"""

        approval_request = {
            'id': str(uuid.uuid4()),
            'action': action,
            'context': context,
            'confidence': confidence,
            'timestamp': datetime.now(),
            'status': 'pending'
        }

        self.pending_approvals.append(approval_request)

        # In a real system, this would send notifications
        print(f"Approval requested for: {action}")
        print(f"Context: {context}")
        print(f"Confidence: {confidence}")

        # For demo purposes, return a mock approval
        return "approved"  # In practice, would wait for human input

Monitoring and Observability Link to heading

Agent Performance Metrics Link to heading

import time
from dataclasses import dataclass, field
from typing import List, Dict, Any

@dataclass
class AgentMetrics:
    agent_id: str
    start_time: float = field(default_factory=time.time)
    end_time: float = 0
    total_steps: int = 0
    successful_steps: int = 0
    failed_steps: int = 0
    tools_used: List[str] = field(default_factory=list)
    tokens_consumed: int = 0
    cost_estimate: float = 0.0
    errors: List[str] = field(default_factory=list)

    def record_step(self, success: bool, tool_used: str = None, error: str = None):
        self.total_steps += 1

        if success:
            self.successful_steps += 1
        else:
            self.failed_steps += 1
            if error:
                self.errors.append(error)

        if tool_used:
            self.tools_used.append(tool_used)

    def finalise(self):
        self.end_time = time.time()

    def get_summary(self) -> Dict[str, Any]:
        duration = self.end_time - self.start_time if self.end_time > 0 else time.time() - self.start_time

        return {
            'agent_id': self.agent_id,
            'duration_seconds': duration,
            'total_steps': self.total_steps,
            'success_rate': self.successful_steps / self.total_steps if self.total_steps > 0 else 0,
            'tools_used': len(set(self.tools_used)),
            'tokens_consumed': self.tokens_consumed,
            'cost_estimate': self.cost_estimate,
            'error_count': len(self.errors)
        }

class MonitoredAgent:
    def __init__(self, base_agent, metrics_collector):
        self.base_agent = base_agent
        self.metrics = metrics_collector

    def execute_with_monitoring(self, task: str):
        """Execute task while collecting metrics"""

        agent_metrics = AgentMetrics(agent_id=f"agent_{int(time.time())}")

        try:
            # Wrap the base agent to collect metrics
            result = self._monitored_execution(task, agent_metrics)
            agent_metrics.finalise()

            # Store metrics for analysis
            self.metrics.record_session(agent_metrics)

            return result

        except Exception as e:
            agent_metrics.record_step(success=False, error=str(e))
            agent_metrics.finalise()
            self.metrics.record_session(agent_metrics)
            raise

Real-World Results and Lessons Link to heading

Deployment Automation Success Link to heading

I deployed an agent that handles routine deployment tasks:

  • Time saved: 3-4 hours per week previously spent on manual deployments
  • Error reduction: 90% fewer deployment-related incidents
  • Consistency: 100% compliance with deployment checklist
  • Team satisfaction: Developers can focus on feature development

Content Generation Agent Link to heading

An agent that manages technical documentation:

  • Coverage improvement: 40% increase in documented APIs
  • Freshness: Documentation updates within 24 hours of code changes
  • Quality: Consistent formatting and structure across all docs
  • Maintenance: Reduced documentation debt by 60%

Code Review Assistant Link to heading

An agent that performs initial code reviews:

  • Review speed: First-pass review within 5 minutes of PR creation
  • Issue detection: Catches 70% of common issues before human review
  • Learning: Helps junior developers learn best practices
  • Coverage: Reviews 100% of PRs, unlike human reviewers

Challenges and Limitations Link to heading

Reliability Issues Link to heading

  • Non-deterministic behavior: Same input can produce different results
  • Context drift: Long-running agents can lose focus
  • Error propagation: Mistakes compound across steps
  • Tool limitations: Agents are constrained by available tools

Cost Considerations Link to heading

class CostManager:
    def __init__(self, budget_limit=100.0):
        self.budget_limit = budget_limit
        self.current_spend = 0.0
        self.token_costs = {
            "gpt-4": 0.03,  # per 1k tokens
            "gpt-3.5-turbo": 0.002
        }

    def estimate_task_cost(self, task_complexity: str, model: str = "gpt-4"):
        """Estimate cost for a task before execution"""

        complexity_multipliers = {
            "simple": 1.0,
            "medium": 3.0,
            "complex": 8.0
        }

        base_tokens = 1000  # Minimum tokens for any task
        multiplier = complexity_multipliers.get(task_complexity, 3.0)
        estimated_tokens = base_tokens * multiplier

        cost_per_token = self.token_costs.get(model, 0.03) / 1000
        estimated_cost = estimated_tokens * cost_per_token

        return estimated_cost

    def can_afford_task(self, task_complexity: str, model: str = "gpt-4"):
        """Check if task is within budget"""
        estimated_cost = self.estimate_task_cost(task_complexity, model)
        return self.current_spend + estimated_cost <= self.budget_limit

Future Directions Link to heading

Multi-Agent Systems Link to heading

class AgentTeam:
    def __init__(self):
        self.agents = {
            'planner': PlanningAgent(),
            'coder': CodeGenerationAgent(),
            'reviewer': CodeReviewAgent(),
            'tester': TestingAgent(),
            'deployer': DeploymentAgent()
        }

    def coordinate_development(self, feature_request: str):
        """Coordinate multiple agents to develop a feature"""

        # Planner creates development plan
        plan = self.agents['planner'].create_plan(feature_request)

        # Coder implements the feature
        implementation = self.agents['coder'].implement_feature(plan)

        # Reviewer checks the code
        review = self.agents['reviewer'].review_code(implementation)

        # Tester creates and runs tests
        test_results = self.agents['tester'].test_implementation(implementation)

        # Deployer handles deployment
        if test_results['passed']:
            deployment = self.agents['deployer'].deploy_feature(implementation)
            return deployment
        else:
            return {'status': 'failed', 'reason': 'tests failed'}

Learning and Adaptation Link to heading

Agents that learn from their experiences and improve over time represent the next frontier. This involves building feedback loops, maintaining performance metrics, and adapting strategies based on outcomes.

Key Takeaways Link to heading

  1. Start simple: Begin with single-purpose agents before building complex workflows

  2. Safety first: Implement multiple layers of safeguards and human oversight

  3. Monitor everything: Track performance, costs, and failure modes

  4. Gradual autonomy: Increase agent independence as reliability improves

  5. Human-agent collaboration: Best results come from combining human judgment with agent capabilities

  6. Domain expertise matters: Agents work best in well-defined problem domains

  7. Iteration is key: Continuously refine agent behavior based on real-world performance

AI agents represent a significant evolution in automation capabilities. While they’re not perfect, they’re already providing substantial value in specific domains where the benefits outweigh the risks. As the technology matures, we’ll see agents handling increasingly complex workflows with greater reliability and autonomy.


Have you experimented with building AI agents? What workflows have you found most suitable for automation, and what challenges have you encountered?