Blog & Articles

Insights, tutorials, and best practices from my experience in software development, AI, and cloud technologies

All Posts

Jan 2, 20265 min read855

What is an AI Foundry? The Simple Guide to Custom AI

Microsoft Foundry is basically a platform where you build AI apps and agents without stitching together fifteen different Azure services yourself. That's it. That's the core idea. Before this, you had Azure OpenAI as one thing, Azure AI Services as another thing, Azure ML Studio doing its own thing, and then like five different SDKs to talk to all of them. I remember one project where we had azure-ai-inference, azure-ai-generative, AND AzureOpenAI() client all in same codebase. Different endpoints, different auth patterns. It was mess.

AI FoundryMicrosoft FoundryMicrosoft

1M Token Context Windows vs RAG: What Nobody Tells You About the Cost

PythonArticle

Dec 9, 20255 min read958

1M Token Context Windows vs RAG: What Nobody Tells You About the Cost

Every time a bigger context window drops, people rush to declare RAG dead — but after building these systems for real clients, the answer is way messier than that. Long context is amazing for reasoning across whole documents, but at $15 per million-token query, your finance team will shut that down fast once actual users hit it. RAG is still 50-200x cheaper and way faster for high-volume use cases. The honest answer? Know when each one makes sense and build systems flexible enough to use both.

RAGLong Context WindowsLLM Cost Optimization

Running AI Models Locally Changed How I Build — Here's How

PythonArticle

Dec 2, 20256 min read987

Running AI Models Locally Changed How I Build — Here's How

If you've got data privacy concerns or just want to stop paying for API calls, running Llama 3 or Mistral locally with Ollama is surprisingly easy — like 10 minutes easy. I set this up for a client who couldn't send data to external APIs and it turned out great. It won't match GPT-4 quality, but for prototyping, private projects, and offline RAG pipelines, it gets the job done at zero monthly cost.

Local LLMsOllamaLlama 3

Why Your Next AI Project Probably Needs More Than One Agent

PythonArticle

Nov 25, 20255 min read1,002

Why Your Next AI Project Probably Needs More Than One Agent

I used to think multi-agent workflows were overkill — until a single prompt chain fell apart on a real project. After working with both CrewAI and LangGraph, I found CrewAI is great for quick prototypes but struggles with complex routing, while LangGraph gives you the production-grade control you'll eventually need. The sweet spot? Use them together — LangGraph for orchestration, CrewAI for collaborative tasks inside it — but only reach for multi-agent when single-agent genuinely can't cut it.

Multi-Agent WorkflowsLangGraphCrewAI

Beyond Prompt Engineering: Architecting Runtime Safety in Python

PythonArticle

Nov 18, 20255 min read1,029

Beyond Prompt Engineering: Architecting Runtime Safety in Python

No matter how good your system prompt is, it won't save you — we learned that the hard way when a client's product chatbot started giving medical advice within two days. The real safety net is what happens at runtime, and after multiple projects I've landed on layering tools: Guardrails AI for output validation, NeMo for conversation flow control, and something like Llama Guard for the truly dangerous stuff. It adds latency and easily doubles your API costs, but it beats explaining to a client why their app generated something it absolutely shouldn't have.

LLM Safety & GuardrailsGuardrails AINVIDIA NeMo Guardrails

The Orchestrator-Worker Pattern: Managing Complex Tasks with Sub-Graphs

PythonArticle

Nov 11, 20255 min read1,052

The Orchestrator-Worker Pattern: Managing Complex Tasks with Sub-Graphs

The orchestrator-worker pattern in LangGraph is one of those things that sounds complex but is really just: break a task into pieces, run them in parallel, combine the results. We used it at work to cut document processing from 45 seconds down to 14 — same model, same prompts, just smarter architecture. But honestly, don't reach for it too early. If your tasks aren't naturally parallelizable or your orchestrator's planning step is shaky, you'll just burn more tokens on parallel garbage. Start simple, refactor when latency actually becomes the bottleneck.

LangGraphOrchestrator-Worker PatternAI Agent Architecture

Why Pydantic is the "Standard Library" for Every Modern AI Framework

PythonBlog

Nov 4, 20254 min read1,138

Why Pydantic is the "Standard Library" for Every Modern AI Framework

I tried building an agent workflow recently and realized every single AI library I reached for — LangChain, FastAPI, CrewAI, instructor — all had Pydantic under the hood. It's not just a validation library anymore, it's basically infrastructure for Python AI development. The reason is simple: LLMs return messy, unpredictable data, and Pydantic is ridiculously good at enforcing structure on that chaos. If you're doing anything with AI in Python, stop treating it as a background dependency and actually learn it properly — it'll pay off in every framework you touch.