First Experiments Building Apps with LLMs Link to heading
After getting my GitHub Copilot license in late February and spending months using ChatGPT as both a sounding board and development tool, I decided to explore building applications that integrate Large Language Models directly through APIs. This journey has been both technically fascinating and conceptually challenging, requiring new approaches to application architecture and user experience design that go beyond the copy-paste workflows I’d been using.
The Shift from Using to Building Link to heading
Using ChatGPT through its web interface is one thing; building applications that leverage LLMs programmatically is quite another. The transition involves understanding:
- API costs and rate limits
- Prompt engineering as a core application concern
- Managing context windows and token limits
- Handling the non-deterministic nature of AI responses
- Building reliable applications on top of probabilistic systems
Project 1: Personal Knowledge Assistant Link to heading
My first serious LLM application was a personal knowledge assistant that could answer questions about my own notes, documents, and code repositories.
The Challenge Link to heading
I had accumulated thousands of notes, blog posts, and code snippets over the years. Traditional search was inadequate because:
- Keyword matching missed conceptual relationships
- Similar ideas were expressed with different terminology
- Context and connections between ideas were lost
- No way to ask natural language questions about the content
The Solution: RAG Architecture Link to heading
I implemented a Retrieval-Augmented Generation (RAG) system:
import openai
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import DirectoryLoader, TextLoader
import pinecone
class KnowledgeAssistant:
def __init__(self, api_key, pinecone_key, pinecone_env):
self.openai_api_key = api_key
openai.api_key = api_key
# Initialise Pinecone
pinecone.init(api_key=pinecone_key, environment=pinecone_env)
# Initialise embeddings
self.embeddings = OpenAIEmbeddings(openai_api_key=api_key)
# Initialise vector store
self.vectorstore = Pinecone.from_existing_index(
index_name="knowledge-base",
embedding=self.embeddings
)
def index_documents(self, directory_path):
"""Index documents from a directory"""
# Load documents
loader = DirectoryLoader(
directory_path,
glob="**/*.{txt,md,py,js,ts,json}",
loader_cls=TextLoader,
show_progress=True
)
documents = loader.load()
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
length_function=len,
)
docs = text_splitter.split_documents(documents)
# Create embeddings and store in Pinecone
self.vectorstore = Pinecone.from_documents(
docs,
self.embeddings,
index_name="knowledge-base"
)
print(f"Indexed {len(docs)} document chunks")
def query(self, question, k=5):
"""Query the knowledge base"""
# Get relevant documents
relevant_docs = self.vectorstore.similarity_search(question, k=k)
# Prepare context
context = "\n\n".join([doc.page_content for doc in relevant_docs])
# Create prompt
prompt = f"""
Based on the following context from my personal knowledge base,
please answer the question. If the answer isn't in the context,
say so clearly.
Context:
{context}
Question: {question}
Answer:
"""
# Query GPT-4
response = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant that answers questions based on provided context."},
{"role": "user", "content": prompt}
],
max_tokens=500,
temperature=0.3
)
return {
"answer": response.choices[0].message.content,
"sources": [doc.metadata for doc in relevant_docs],
"context_used": len(context.split())
}
Web Interface Link to heading
I built a simple Streamlit interface to interact with the knowledge assistant:
import streamlit as st
from knowledge_assistant import KnowledgeAssistant
st.title("🧠 Personal Knowledge Assistant")
# Initialise assistant
if "assistant" not in st.session_state:
st.session_state.assistant = KnowledgeAssistant(
api_key=st.secrets["OPENAI_API_KEY"],
pinecone_key=st.secrets["PINECONE_API_KEY"],
pinecone_env=st.secrets["PINECONE_ENV"]
)
# Query interface
question = st.text_input("Ask a question about your knowledge base:")
if question:
with st.spinner("Searching knowledge base..."):
result = st.session_state.assistant.query(question)
st.markdown("### Answer")
st.write(result["answer"])
with st.expander("Sources and Context"):
st.write(f"Context tokens used: {result['context_used']}")
st.write("Sources:")
for source in result["sources"]:
st.write(f"- {source.get('source', 'Unknown')}")
Lessons Learned Link to heading
- Chunking Strategy Matters: Getting the right chunk size and overlap is crucial for good retrieval
- Embedding Quality: The quality of embeddings directly impacts retrieval accuracy
- Context Management: Balancing context length with API costs and response quality
- Source Attribution: Users need to know where information comes from
- Graceful Failures: Handling cases where relevant information isn’t found
Project 2: Code Review Assistant Link to heading
My second project was a code review assistant that could analyse pull requests and provide detailed feedback.
Architecture Overview Link to heading
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from github import Github
import ast
import sys
class CodeReviewAssistant:
def __init__(self, openai_api_key, github_token):
self.llm = OpenAI(
openai_api_key=openai_api_key,
model_name="gpt-4",
temperature=0.1,
max_tokens=1000
)
self.github = Github(github_token)
def analyze_code_diff(self, diff_content, file_path):
"""Analyze a code diff and provide review comments"""
# Determine language and context
language = self._detect_language(file_path)
# Create specialized prompt based on language
prompt_template = PromptTemplate(
input_variables=["language", "file_path", "diff"],
template="""
You are an expert code reviewer. Analyze this {language} code diff
for the file {file_path} and provide constructive feedback.
Focus on:
1. Code quality and best practices
2. Potential bugs or issues
3. Performance considerations
4. Security vulnerabilities
5. Maintainability and readability
6. Test coverage considerations
Diff:
{diff}
Provide specific, actionable feedback with line references where applicable.
If the code looks good, acknowledge what's done well.
Review:
"""
)
prompt = prompt_template.format(
language=language,
file_path=file_path,
diff=diff_content
)
review = self.llm(prompt)
return review
def review_pull_request(self, repo_name, pr_number):
"""Review an entire pull request"""
repo = self.github.get_repo(repo_name)
pr = repo.get_pull(pr_number)
reviews = []
for file in pr.get_files():
if file.changes > 0: # Only review modified files
review = self.analyze_code_diff(file.patch, file.filename)
reviews.append({
"file": file.filename,
"review": review,
"changes": file.changes,
"additions": file.additions,
"deletions": file.deletions
})
# Generate overall PR summary
summary_prompt = f"""
Based on the following file-by-file reviews of pull request #{pr_number}:
PR Title: {pr.title}
PR Description: {pr.body}
File Reviews:
{self._format_file_reviews(reviews)}
Provide an overall assessment of this pull request including:
1. Overall code quality
2. Key areas of concern
3. Recommendations for improvement
4. Approval recommendation (approve, request changes, or comment)
Overall Assessment:
"""
overall_review = self.llm(summary_prompt)
return {
"file_reviews": reviews,
"overall_assessment": overall_review,
"pr_info": {
"title": pr.title,
"number": pr.number,
"author": pr.user.login,
"files_changed": pr.changed_files
}
}
def _detect_language(self, file_path):
"""Detect programming language from file extension"""
extensions = {
".py": "Python",
".js": "JavaScript",
".ts": "TypeScript",
".vue": "Vue Single File Component",
".java": "Java",
".go": "Go",
".rs": "Rust",
".cpp": "C++",
".c": "C",
".rb": "Ruby",
".php": "PHP"
}
for ext, lang in extensions.items():
if file_path.endswith(ext):
return lang
return "Unknown"
def _format_file_reviews(self, reviews):
"""Format file reviews for summary prompt"""
formatted = []
for review in reviews:
formatted.append(f"""
File: {review['file']}
Changes: +{review['additions']} -{review['deletions']}
Review: {review['review']}
""")
return "\n".join(formatted)
Integration with GitHub Actions Link to heading
I created a GitHub Action to automatically review PRs:
# .github/workflows/ai-code-review.yml
name: AI Code Review
on:
pull_request:
types: [opened, synchronise]
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: Setup Python
uses: actions/setup-python@v4
with:
python-version: "3.9"
- name: Install dependencies
run: |
pip install openai langchain PyGithub streamlit
- name: Run AI Code Review
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
python scripts/ai_review.py \
--repo ${{ github.repository }} \
--pr ${{ github.event.pull_request.number }}
Challenges and Solutions Link to heading
Cost Management: GPT-4 API calls can be expensive for large PRs
- Solution: Intelligent file filtering and diff summarisation
Context Limits: Large files exceed token limits
- Solution: Chunk files and review sections independently
False Positives: AI sometimes flags correct code as problematic
- Solution: Confidence scoring and human oversight
Integration Complexity: GitHub API integration had edge cases
- Solution: Robust error handling and fallback strategies
Project 3: Documentation Generator Link to heading
The third project focused on automatically generating and maintaining documentation for codebases.
Smart Documentation Analysis Link to heading
import ast
import os
from typing import List, Dict, Any
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
class DocumentationGenerator:
def __init__(self, openai_api_key: str):
self.llm = OpenAI(
openai_api_key=openai_api_key,
model_name="gpt-3.5-turbo-16k",
temperature=0.2
)
def analyze_python_file(self, file_path: str) -> Dict[str, Any]:
"""Analyze a Python file and extract structure"""
with open(file_path, 'r') as file:
content = file.read()
try:
tree = ast.parse(content)
except SyntaxError:
return {"error": "Could not parse Python file"}
analysis = {
"classes": [],
"architecture": [],
"imports": [],
"docstrings": []
}
for node in ast.walk(tree):
if isinstance(node, ast.ClassDef):
analysis["classes"].append({
"name": node.name,
"methods": [method.name for method in node.body
if isinstance(method, ast.FunctionDef)],
"docstring": ast.get_docstring(node),
"line_number": node.lineno
})
elif isinstance(node, ast.FunctionDef):
# Skip methods (already captured in classes)
if not any(node.lineno >= cls["line_number"] for cls in analysis["classes"]):
analysis["architecture"].append({
"name": node.name,
"args": [arg.arg for arg in node.args.args],
"docstring": ast.get_docstring(node),
"line_number": node.lineno
})
elif isinstance(node, ast.Import):
analysis["imports"].extend([alias.name for alias in node.names])
elif isinstance(node, ast.ImportFrom):
if node.module:
analysis["imports"].append(node.module)
return analysis
def generate_documentation(self, file_path: str, analysis: Dict[str, Any]) -> str:
"""Generate documentation for a file"""
with open(file_path, 'r') as file:
source_code = file.read()
prompt = PromptTemplate(
input_variables=["file_path", "analysis", "source_code"],
template="""
Generate comprehensive documentation for the Python file: {file_path}
File Analysis:
{analysis}
Source Code:
```python
{source_code}
```
Create documentation that includes:
1. File overview and purpose
2. Class documentation with methods
3. Function documentation with parameters and return types
4. Usage examples where appropriate
5. Dependencies and requirements
Use Markdown format with clear sections and code examples.
Documentation:
"""
)
documentation = self.llm(prompt.format(
file_path=file_path,
analysis=str(analysis),
source_code=source_code[:4000] # Truncate for context limits
))
return documentation
def generate_readme(self, project_path: str) -> str:
"""Generate a README.md for the entire project"""
# Analyze project structure
structure = self._analyze_project_structure(project_path)
# Find existing documentation
existing_docs = self._find_existing_docs(project_path)
prompt = PromptTemplate(
input_variables=["structure", "existing_docs"],
template="""
Generate a comprehensive README.md for this project.
Project Structure:
{structure}
Existing Documentation:
{existing_docs}
Create a README that includes:
1. Project title and description
2. Installation instructions
3. Usage examples
4. Project structure overview
5. Contributing guidelines
6. License information
Use proper Markdown formatting with badges, code examples, and clear sections.
README:
"""
)
readme = self.llm(prompt.format(
structure=structure,
existing_docs=existing_docs
))
return readme
def _analyze_project_structure(self, project_path: str) -> str:
"""Analyze the overall project structure"""
structure_lines = []
for root, dirs, files in os.walk(project_path):
# Skip hidden directories and common build artifacts
dirs[:] = [d for d in dirs if not d.startswith('.')
and d not in ['__pycache__', 'node_modules', 'venv', '.git']]
level = root.replace(project_path, '').count(os.sep)
indent = ' ' * 2 * level
structure_lines.append(f"{indent}{os.path.basename(root)}/")
subindent = ' ' * 2 * (level + 1)
for file in files:
if not file.startswith('.') and not file.endswith('.pyc'):
structure_lines.append(f"{subindent}{file}")
return '\n'.join(structure_lines[:50]) # Limit for context
def _find_existing_docs(self, project_path: str) -> str:
"""Find and summarise existing documentation"""
doc_files = []
for root, dirs, files in os.walk(project_path):
for file in files:
if file.lower() in ['readme.md', 'readme.txt', 'docs.md', 'documentation.md']:
file_path = os.path.join(root, file)
try:
with open(file_path, 'r') as f:
content = f.read()[:1000] # First 1000 chars
doc_files.append(f"File: {file}\nContent: {content}")
except Exception:
pass
return '\n\n'.join(doc_files) if doc_files else "No existing documentation found"
Emerging Patterns and Best Practices Link to heading
Through these experiments, several patterns have emerged for building robust LLM applications:
1. The RAG Pattern Link to heading
Retrieval-Augmented Generation has become the standard pattern for building knowledge-based applications:
- Vector Database: Store embeddings of your knowledge base
- Similarity Search: Find relevant context for user queries
- Context Injection: Include relevant information in prompts
- Generation: Let the LLM generate responses with context
2. Prompt Engineering as Software Engineering Link to heading
Prompts are becoming a core part of application logic:
- Template Management: Use structured templates for consistency
- Versioning: Track and version prompt iterations
- Testing: Test prompts with various inputs
- Optimisation: Iterate on prompts for better results
3. Error Handling and Graceful Degradation Link to heading
LLM applications need robust error handling:
- API Failures: Handle rate limits and timeouts gracefully
- Cost Management: Monitor and limit token usage
- Quality Assurance: Validate LLM outputs before using them
- Fallback Strategies: Provide alternative responses when LLMs fail
4. Human-in-the-Loop Design Link to heading
Most successful LLM applications include human oversight:
- Review Mechanisms: Let humans review and edit AI outputs
- Confidence Scoring: Indicate when AI is uncertain
- Feedback Loops: Learn from user corrections
- Escalation Paths: Route complex cases to humans
Technical Challenges Link to heading
Cost Optimisation Link to heading
LLM API costs can escalate quickly:
class CostOptimisedLLM:
def __init__(self, openai_api_key, budget_limit=100.0):
self.openai_api_key = openai_api_key
self.budget_limit = budget_limit
self.current_spend = 0.0
self.token_costs = {
"gpt-4": {"input": 0.03/1000, "output": 0.06/1000},
"gpt-3.5-turbo": {"input": 0.0015/1000, "output": 0.002/1000}
}
def estimate_cost(self, prompt, model="gpt-3.5-turbo", max_tokens=500):
"""Estimate API call cost before making it"""
input_tokens = len(prompt.split()) * 1.3 # Rough approximation
output_tokens = max_tokens
costs = self.token_costs.get(model, self.token_costs["gpt-3.5-turbo"])
estimated_cost = (
input_tokens * costs["input"] +
output_tokens * costs["output"]
)
return estimated_cost
def safe_call(self, prompt, model="gpt-3.5-turbo", max_tokens=500):
"""Make API call with budget checking"""
estimated_cost = self.estimate_cost(prompt, model, max_tokens)
if self.current_spend + estimated_cost > self.budget_limit:
raise Exception(f"Would exceed budget limit of ${self.budget_limit}")
# Make actual API call
# ... implementation
self.current_spend += estimated_cost
return response
Context Window Management Link to heading
Dealing with limited context windows:
def chunk_and_process(text, max_chunk_size=3000, overlap=200):
"""Process long text by chunking with overlap"""
chunks = []
start = 0
while start < len(text):
end = start + max_chunk_size
# Try to break at sentence boundary
if end < len(text):
sentence_end = text.rfind('.', start, end)
if sentence_end > start + max_chunk_size // 2:
end = sentence_end + 1
chunk = text[start:end]
chunks.append(chunk)
start = end - overlap
return chunks
Future Directions Link to heading
Based on these experiments, I see several exciting directions for LLM applications:
Multi-Modal Integration Link to heading
Combining text, images, and code for richer applications
Fine-Tuning for Domain Expertise Link to heading
Creating specialised models for specific use cases
Agent-Based Systems Link to heading
LLMs that can use tools and take actions
Real-Time Collaboration Link to heading
Integrating LLMs into collaborative workflows
Key Takeaways Link to heading
- Start Simple: Begin with straightforward use cases and iterate
- Prompt Engineering is Critical: Invest time in crafting good prompts
- Context is King: Providing relevant context dramatically improves results
- Cost Management: Monitor and optimise token usage from the start
- Human Oversight: Build in human review and feedback mechanisms
- Graceful Failures: Handle API failures and unexpected outputs
- Iterative Development: LLM applications require extensive experimentation
Building applications with LLMs is fundamentally different from traditional software development. The probabilistic nature of AI responses, the importance of prompt engineering, and the need for human oversight create new challenges and opportunities.
These experiments have convinced me that LLMs are not just powerful tools; they’re enabling entirely new categories of applications that were previously impossible. The key is learning to work with their strengths while mitigating their weaknesses through careful application design.
Have you experimented with building LLM-powered applications? What challenges have you encountered, and what patterns have you found most effective?