Beyond Prompt Engineering: Architecting Runtime Safety in Python

Prompt engineering is not going to save you. I know that sounds harsh but I have been building LLM-powered apps for long enough to know that no matter how good your system prompt is, someone will find a way around it. Or the model will just hallucinate something dangerous on its own. No prompt needed. We had this exact situation at OZ where a client's chatbot was supposed to only answer questions about their product catalog, and within two days of deployment it was giving medical advice. The system prompt was solid. Didn't matter.

The real question is what happens at runtime. Between the user typing something and your app returning a response, what checks are actually running? If the answer is "just the system prompt," you have a problem.

The tools that actually exist

There are two main frameworks people reach for in Python right now — Guardrails AI and NVIDIA's NeMo Guardrails. They solve the same problem but in very different ways and honestly, I think most people pick the wrong one for their use case.

Guardrails AI is basically a validation layer. You define validators — things like "no PII in the output" or "response must be valid JSON" or "no toxic language" — and it checks the LLM output against them before returning to the user. They have a Hub with pre-built validators you can just pull in. The mental model is simple: input goes in, LLM generates output, validators check it, bad stuff gets caught. It is very Pythonic, feels like writing normal code. You can build custom validators in like 20 lines.

NeMo Guardrails is a different animal. It uses something called Colang, which is basically a domain-specific language for defining conversation flows. You are not just validating output — you are defining what conversations should look like. Input rails, output rails, dialog rails, retrieval rails. It sits as a proxy between user and model, and the runtime engine decides at each step whether the conversation is going in an allowed direction. Think of it like a state machine for conversations.

Here is the thing nobody tells you — Guardrails AI adds latency on every single call because it is running validation after generation. NeMo adds latency too but sometimes more because it is making additional LLM calls to classify user intent before the main model even runs. I measured this on a project last year. Guardrails AI was adding 200-400ms per request. NeMo was adding 500-800ms depending on the complexity of the rails. At scale, that matters. Your users will feel it.

Where this actually breaks

The biggest issue I have seen is people treating these tools like a magic checkbox. "We added guardrails, we are safe now." No. You are safer. There is a difference.

Jailbreak detection sounds great until you realize that new jailbreak patterns show up every week. Your static validators will catch the obvious stuff — the "ignore previous instructions" type attacks — but the sophisticated ones? The ones that use encoding tricks or multi-turn manipulation? Those get through. I tested NeMo's built-in jailbreak detection with about 30 known attack patterns and it caught maybe 22 of them. That is a 73% catch rate, which sounds okay until you think about what the other 27% could do in production.

PII detection is another one. Every guardrails tool claims to catch PII. And they do catch obvious stuff like social security numbers and email addresses. But names? Addresses in non-US formats? Medical record numbers? It is patchy. We had to layer Presidio on top of Guardrails AI just to get reasonable PII coverage for a healthcare client. That is three validation layers for one use case.

The pattern I have landed on after multiple projects is this — don't pick one tool. Use Guardrails AI for structured output validation because it is fast and simple. Use NeMo for conversation flow control when you have a chatbot that needs to stay on topic. And add a separate content classification model like Llama Guard for the really dangerous stuff. Yes, this means more latency. Yes, this means more infrastructure. But the alternative is explaining to your client why their LLM app generated something it absolutely should not have.

One more thing — none of this replaces logging and monitoring. Every guardrail that fires should be logged. Every blocked response should be reviewed. Because your guardrails will have false positives, and they will have false negatives, and the only way to improve them is to actually look at what is happening in production. I am still surprised by how many teams deploy guardrails and then never look at the logs. That is like installing a security camera and never watching the footage.

The cost of running all this adds up fast. You are paying for the main LLM call, plus the classification calls NeMo makes, plus the validator compute, plus whatever content moderation model you are running on the side. Multiply your base API cost by 2-3x minimum. Budget for it or your finance team will come asking questions you don't want to answer.

Beyond Prompt Engineering: Architecting Runtime Safety in Python

The tools that actually exist

Where this actually breaks

Share this article

About the Author

Related Articles

How to Build Adaptive Dialog Management in Microsoft Copilot Studio

How to Build a Copilot Studio Agent From Scratch (Without the Mistakes)

Need help with your project?