First Experiments Building Apps with LLMs Link to heading

After getting my GitHub Copilot license in late February and spending months using ChatGPT as both a sounding board and development tool, I decided to explore building applications that integrate Large Language Models directly through APIs. This journey has been both technically fascinating and conceptually challenging, requiring new approaches to application architecture and user experience design that go beyond the copy-paste workflows I’d been using.

The Shift from Using to Building Link to heading

Using ChatGPT through its web interface is one thing; building applications that leverage LLMs programmatically is quite another. The transition involves understanding:

  • API costs and rate limits
  • Prompt engineering as a core application concern
  • Managing context windows and token limits
  • Handling the non-deterministic nature of AI responses
  • Building reliable applications on top of probabilistic systems

Project 1: Personal Knowledge Assistant Link to heading

My first serious LLM application was a personal knowledge assistant that could answer questions about my own notes, documents, and code repositories.

The Challenge Link to heading

I had accumulated thousands of notes, blog posts, and code snippets over the years. Traditional search was inadequate because:

  • Keyword matching missed conceptual relationships
  • Similar ideas were expressed with different terminology
  • Context and connections between ideas were lost
  • No way to ask natural language questions about the content

The Solution: RAG Architecture Link to heading

I implemented a Retrieval-Augmented Generation (RAG) system:

import openai
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader, TextLoader
import pinecone

class KnowledgeAssistant:
    def __init__(self, api_key, pinecone_key, pinecone_env):
        self.openai_api_key = api_key
        openai.api_key = api_key

        # Initialise Pinecone
        pinecone.init(api_key=pinecone_key, environment=pinecone_env)

        # Initialise embeddings
        self.embeddings = OpenAIEmbeddings(openai_api_key=api_key)

        # Initialise vector store
        self.vectorstore = Pinecone.from_existing_index(
            index_name="knowledge-base",
            embedding=self.embeddings
        )

    def index_documents(self, directory_path):
        """Index documents from a directory"""
        # Load documents
        loader = DirectoryLoader(
            directory_path,
            glob="**/*.{txt,md,py,js,ts,json}",
            loader_cls=TextLoader,
            show_progress=True
        )
        documents = loader.load()

        # Split into chunks
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len,
        )
        docs = text_splitter.split_documents(documents)

        # Create embeddings and store in Pinecone
        self.vectorstore = Pinecone.from_documents(
            docs,
            self.embeddings,
            index_name="knowledge-base"
        )

        print(f"Indexed {len(docs)} document chunks")

    def query(self, question, k=5):
        """Query the knowledge base"""
        # Get relevant documents
        relevant_docs = self.vectorstore.similarity_search(question, k=k)

        # Prepare context
        context = "\n\n".join([doc.page_content for doc in relevant_docs])

        # Create prompt
        prompt = f"""
        Based on the following context from my personal knowledge base,
        please answer the question. If the answer isn't in the context,
        say so clearly.

        Context:
        {context}

        Question: {question}

        Answer:
        """

        # Query GPT-4
        response = openai.ChatCompletion.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."},
                {"role": "user", "content": prompt}
            ],
            max_tokens=500,
            temperature=0.3
        )

        return {
            "answer": response.choices[0].message.content,
            "sources": [doc.metadata for doc in relevant_docs],
            "context_used": len(context.split())
        }

Web Interface Link to heading

I built a simple Streamlit interface to interact with the knowledge assistant:

import streamlit as st
from knowledge_assistant import KnowledgeAssistant

st.title("🧠 Personal Knowledge Assistant")

# Initialise assistant
if "assistant" not in st.session_state:
    st.session_state.assistant = KnowledgeAssistant(
        api_key=st.secrets["OPENAI_API_KEY"],
        pinecone_key=st.secrets["PINECONE_API_KEY"],
        pinecone_env=st.secrets["PINECONE_ENV"]
    )

# Query interface
question = st.text_input("Ask a question about your knowledge base:")

if question:
    with st.spinner("Searching knowledge base..."):
        result = st.session_state.assistant.query(question)

    st.markdown("### Answer")
    st.write(result["answer"])

    with st.expander("Sources and Context"):
        st.write(f"Context tokens used: {result['context_used']}")
        st.write("Sources:")
        for source in result["sources"]:
            st.write(f"- {source.get('source', 'Unknown')}")

Lessons Learned Link to heading

  1. Chunking Strategy Matters: Getting the right chunk size and overlap is crucial for good retrieval
  2. Embedding Quality: The quality of embeddings directly impacts retrieval accuracy
  3. Context Management: Balancing context length with API costs and response quality
  4. Source Attribution: Users need to know where information comes from
  5. Graceful Failures: Handling cases where relevant information isn’t found

Project 2: Code Review Assistant Link to heading

My second project was a code review assistant that could analyse pull requests and provide detailed feedback.

Architecture Overview Link to heading

from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from github import Github
import ast
import sys

class CodeReviewAssistant:
    def __init__(self, openai_api_key, github_token):
        self.llm = OpenAI(
            openai_api_key=openai_api_key,
            model_name="gpt-4",
            temperature=0.1,
            max_tokens=1000
        )
        self.github = Github(github_token)

    def analyze_code_diff(self, diff_content, file_path):
        """Analyze a code diff and provide review comments"""

        # Determine language and context
        language = self._detect_language(file_path)

        # Create specialized prompt based on language
        prompt_template = PromptTemplate(
            input_variables=["language", "file_path", "diff"],
            template="""
            You are an expert code reviewer. Analyze this {language} code diff
            for the file {file_path} and provide constructive feedback.

            Focus on:
            1. Code quality and best practices
            2. Potential bugs or issues
            3. Performance considerations
            4. Security vulnerabilities
            5. Maintainability and readability
            6. Test coverage considerations

            Diff:
            {diff}

            Provide specific, actionable feedback with line references where applicable.
            If the code looks good, acknowledge what's done well.

            Review:
            """
        )

        prompt = prompt_template.format(
            language=language,
            file_path=file_path,
            diff=diff_content
        )

        review = self.llm(prompt)
        return review

    def review_pull_request(self, repo_name, pr_number):
        """Review an entire pull request"""
        repo = self.github.get_repo(repo_name)
        pr = repo.get_pull(pr_number)

        reviews = []

        for file in pr.get_files():
            if file.changes > 0:  # Only review modified files
                review = self.analyze_code_diff(file.patch, file.filename)
                reviews.append({
                    "file": file.filename,
                    "review": review,
                    "changes": file.changes,
                    "additions": file.additions,
                    "deletions": file.deletions
                })

        # Generate overall PR summary
        summary_prompt = f"""
        Based on the following file-by-file reviews of pull request #{pr_number}:

        PR Title: {pr.title}
        PR Description: {pr.body}

        File Reviews:
        {self._format_file_reviews(reviews)}

        Provide an overall assessment of this pull request including:
        1. Overall code quality
        2. Key areas of concern
        3. Recommendations for improvement
        4. Approval recommendation (approve, request changes, or comment)

        Overall Assessment:
        """

        overall_review = self.llm(summary_prompt)

        return {
            "file_reviews": reviews,
            "overall_assessment": overall_review,
            "pr_info": {
                "title": pr.title,
                "number": pr.number,
                "author": pr.user.login,
                "files_changed": pr.changed_files
            }
        }

    def _detect_language(self, file_path):
        """Detect programming language from file extension"""
        extensions = {
            ".py": "Python",
            ".js": "JavaScript",
            ".ts": "TypeScript",
            ".vue": "Vue Single File Component",
            ".java": "Java",
            ".go": "Go",
            ".rs": "Rust",
            ".cpp": "C++",
            ".c": "C",
            ".rb": "Ruby",
            ".php": "PHP"
        }

        for ext, lang in extensions.items():
            if file_path.endswith(ext):
                return lang

        return "Unknown"

    def _format_file_reviews(self, reviews):
        """Format file reviews for summary prompt"""
        formatted = []
        for review in reviews:
            formatted.append(f"""
            File: {review['file']}
            Changes: +{review['additions']} -{review['deletions']}
            Review: {review['review']}
            """)
        return "\n".join(formatted)

Integration with GitHub Actions Link to heading

I created a GitHub Action to automatically review PRs:

# .github/workflows/ai-code-review.yml
name: AI Code Review

on:
  pull_request:
    types: [opened, synchronise]

jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
        with:
          fetch-depth: 0

      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: "3.9"

      - name: Install dependencies
        run: |
          pip install openai langchain PyGithub streamlit

      - name: Run AI Code Review
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        run: |
          python scripts/ai_review.py \
            --repo ${{ github.repository }} \
            --pr ${{ github.event.pull_request.number }}

Challenges and Solutions Link to heading

Cost Management: GPT-4 API calls can be expensive for large PRs

  • Solution: Intelligent file filtering and diff summarisation

Context Limits: Large files exceed token limits

  • Solution: Chunk files and review sections independently

False Positives: AI sometimes flags correct code as problematic

  • Solution: Confidence scoring and human oversight

Integration Complexity: GitHub API integration had edge cases

  • Solution: Robust error handling and fallback strategies

Project 3: Documentation Generator Link to heading

The third project focused on automatically generating and maintaining documentation for codebases.

Smart Documentation Analysis Link to heading

import ast
import os
from typing import List, Dict, Any
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate

class DocumentationGenerator:
    def __init__(self, openai_api_key: str):
        self.llm = OpenAI(
            openai_api_key=openai_api_key,
            model_name="gpt-3.5-turbo-16k",
            temperature=0.2
        )

    def analyze_python_file(self, file_path: str) -> Dict[str, Any]:
        """Analyze a Python file and extract structure"""
        with open(file_path, 'r') as file:
            content = file.read()

        try:
            tree = ast.parse(content)
        except SyntaxError:
            return {"error": "Could not parse Python file"}

        analysis = {
            "classes": [],
            "architecture": [],
            "imports": [],
            "docstrings": []
        }

        for node in ast.walk(tree):
            if isinstance(node, ast.ClassDef):
                analysis["classes"].append({
                    "name": node.name,
                    "methods": [method.name for method in node.body
                              if isinstance(method, ast.FunctionDef)],
                    "docstring": ast.get_docstring(node),
                    "line_number": node.lineno
                })

            elif isinstance(node, ast.FunctionDef):
                # Skip methods (already captured in classes)
                if not any(node.lineno >= cls["line_number"] for cls in analysis["classes"]):
                    analysis["architecture"].append({
                        "name": node.name,
                        "args": [arg.arg for arg in node.args.args],
                        "docstring": ast.get_docstring(node),
                        "line_number": node.lineno
                    })

            elif isinstance(node, ast.Import):
                analysis["imports"].extend([alias.name for alias in node.names])

            elif isinstance(node, ast.ImportFrom):
                if node.module:
                    analysis["imports"].append(node.module)

        return analysis

    def generate_documentation(self, file_path: str, analysis: Dict[str, Any]) -> str:
        """Generate documentation for a file"""

        with open(file_path, 'r') as file:
            source_code = file.read()

        prompt = PromptTemplate(
            input_variables=["file_path", "analysis", "source_code"],
            template="""
            Generate comprehensive documentation for the Python file: {file_path}

            File Analysis:
            {analysis}

            Source Code:
            ```python
            {source_code}
            ```

            Create documentation that includes:
            1. File overview and purpose
            2. Class documentation with methods
            3. Function documentation with parameters and return types
            4. Usage examples where appropriate
            5. Dependencies and requirements

            Use Markdown format with clear sections and code examples.

            Documentation:
            """
        )

        documentation = self.llm(prompt.format(
            file_path=file_path,
            analysis=str(analysis),
            source_code=source_code[:4000]  # Truncate for context limits
        ))

        return documentation

    def generate_readme(self, project_path: str) -> str:
        """Generate a README.md for the entire project"""

        # Analyze project structure
        structure = self._analyze_project_structure(project_path)

        # Find existing documentation
        existing_docs = self._find_existing_docs(project_path)

        prompt = PromptTemplate(
            input_variables=["structure", "existing_docs"],
            template="""
            Generate a comprehensive README.md for this project.

            Project Structure:
            {structure}

            Existing Documentation:
            {existing_docs}

            Create a README that includes:
            1. Project title and description
            2. Installation instructions
            3. Usage examples
            4. Project structure overview
            5. Contributing guidelines
            6. License information

            Use proper Markdown formatting with badges, code examples, and clear sections.

            README:
            """
        )

        readme = self.llm(prompt.format(
            structure=structure,
            existing_docs=existing_docs
        ))

        return readme

    def _analyze_project_structure(self, project_path: str) -> str:
        """Analyze the overall project structure"""
        structure_lines = []

        for root, dirs, files in os.walk(project_path):
            # Skip hidden directories and common build artifacts
            dirs[:] = [d for d in dirs if not d.startswith('.')
                      and d not in ['__pycache__', 'node_modules', 'venv', '.git']]

            level = root.replace(project_path, '').count(os.sep)
            indent = ' ' * 2 * level
            structure_lines.append(f"{indent}{os.path.basename(root)}/")

            subindent = ' ' * 2 * (level + 1)
            for file in files:
                if not file.startswith('.') and not file.endswith('.pyc'):
                    structure_lines.append(f"{subindent}{file}")

        return '\n'.join(structure_lines[:50])  # Limit for context

    def _find_existing_docs(self, project_path: str) -> str:
        """Find and summarise existing documentation"""
        doc_files = []

        for root, dirs, files in os.walk(project_path):
            for file in files:
                if file.lower() in ['readme.md', 'readme.txt', 'docs.md', 'documentation.md']:
                    file_path = os.path.join(root, file)
                    try:
                        with open(file_path, 'r') as f:
                            content = f.read()[:1000]  # First 1000 chars
                            doc_files.append(f"File: {file}\nContent: {content}")
                    except Exception:
                        pass

        return '\n\n'.join(doc_files) if doc_files else "No existing documentation found"

Emerging Patterns and Best Practices Link to heading

Through these experiments, several patterns have emerged for building robust LLM applications:

1. The RAG Pattern Link to heading

Retrieval-Augmented Generation has become the standard pattern for building knowledge-based applications:

  • Vector Database: Store embeddings of your knowledge base
  • Similarity Search: Find relevant context for user queries
  • Context Injection: Include relevant information in prompts
  • Generation: Let the LLM generate responses with context

2. Prompt Engineering as Software Engineering Link to heading

Prompts are becoming a core part of application logic:

  • Template Management: Use structured templates for consistency
  • Versioning: Track and version prompt iterations
  • Testing: Test prompts with various inputs
  • Optimisation: Iterate on prompts for better results

3. Error Handling and Graceful Degradation Link to heading

LLM applications need robust error handling:

  • API Failures: Handle rate limits and timeouts gracefully
  • Cost Management: Monitor and limit token usage
  • Quality Assurance: Validate LLM outputs before using them
  • Fallback Strategies: Provide alternative responses when LLMs fail

4. Human-in-the-Loop Design Link to heading

Most successful LLM applications include human oversight:

  • Review Mechanisms: Let humans review and edit AI outputs
  • Confidence Scoring: Indicate when AI is uncertain
  • Feedback Loops: Learn from user corrections
  • Escalation Paths: Route complex cases to humans

Technical Challenges Link to heading

Cost Optimisation Link to heading

LLM API costs can escalate quickly:

class CostOptimisedLLM:
    def __init__(self, openai_api_key, budget_limit=100.0):
        self.openai_api_key = openai_api_key
        self.budget_limit = budget_limit
        self.current_spend = 0.0
        self.token_costs = {
            "gpt-4": {"input": 0.03/1000, "output": 0.06/1000},
            "gpt-3.5-turbo": {"input": 0.0015/1000, "output": 0.002/1000}
        }

    def estimate_cost(self, prompt, model="gpt-3.5-turbo", max_tokens=500):
        """Estimate API call cost before making it"""
        input_tokens = len(prompt.split()) * 1.3  # Rough approximation
        output_tokens = max_tokens

        costs = self.token_costs.get(model, self.token_costs["gpt-3.5-turbo"])
        estimated_cost = (
            input_tokens * costs["input"] +
            output_tokens * costs["output"]
        )

        return estimated_cost

    def safe_call(self, prompt, model="gpt-3.5-turbo", max_tokens=500):
        """Make API call with budget checking"""
        estimated_cost = self.estimate_cost(prompt, model, max_tokens)

        if self.current_spend + estimated_cost > self.budget_limit:
            raise Exception(f"Would exceed budget limit of ${self.budget_limit}")

        # Make actual API call
        # ... implementation

        self.current_spend += estimated_cost
        return response

Context Window Management Link to heading

Dealing with limited context windows:

def chunk_and_process(text, max_chunk_size=3000, overlap=200):
    """Process long text by chunking with overlap"""
    chunks = []
    start = 0

    while start < len(text):
        end = start + max_chunk_size

        # Try to break at sentence boundary
        if end < len(text):
            sentence_end = text.rfind('.', start, end)
            if sentence_end > start + max_chunk_size // 2:
                end = sentence_end + 1

        chunk = text[start:end]
        chunks.append(chunk)

        start = end - overlap

    return chunks

Future Directions Link to heading

Based on these experiments, I see several exciting directions for LLM applications:

Multi-Modal Integration Link to heading

Combining text, images, and code for richer applications

Fine-Tuning for Domain Expertise Link to heading

Creating specialised models for specific use cases

Agent-Based Systems Link to heading

LLMs that can use tools and take actions

Real-Time Collaboration Link to heading

Integrating LLMs into collaborative workflows

Key Takeaways Link to heading

  1. Start Simple: Begin with straightforward use cases and iterate
  2. Prompt Engineering is Critical: Invest time in crafting good prompts
  3. Context is King: Providing relevant context dramatically improves results
  4. Cost Management: Monitor and optimise token usage from the start
  5. Human Oversight: Build in human review and feedback mechanisms
  6. Graceful Failures: Handle API failures and unexpected outputs
  7. Iterative Development: LLM applications require extensive experimentation

Building applications with LLMs is fundamentally different from traditional software development. The probabilistic nature of AI responses, the importance of prompt engineering, and the need for human oversight create new challenges and opportunities.

These experiments have convinced me that LLMs are not just powerful tools; they’re enabling entirely new categories of applications that were previously impossible. The key is learning to work with their strengths while mitigating their weaknesses through careful application design.


Have you experimented with building LLM-powered applications? What challenges have you encountered, and what patterns have you found most effective?