Notes on prompting like an engineer
Specs over vibes. Tests over hopes. A practical way to write prompts that behave more like engineering work than wishful thinking.
01 / start with the contract
Good prompts are not magic words. They are small contracts. They say what role the model should play, what input it has, what output shape is acceptable, and what should happen when the input is incomplete.
That sounds less romantic than prompt craft, but it is the part that keeps the work usable. A vague instruction can produce one good answer in a demo. A contract can produce a repeatable answer when the same workflow runs next week with a different file, customer, city, or edge case.
The version I reach for now is simple:
You are helping with [job].
Input: [what the model receives].
Output: [format, fields, constraints].
Rules: [what must never happen].
If uncertain: [how to say so].
The last line matters. Models are very good at filling silence with confidence. Engineering prompts give them a safe path to say, "I do not know from the evidence provided."
02 / write examples before adjectives
Words like concise, strategic, robust, or professional mean different things to different people. They mean even less to a model unless you anchor them with examples.
Instead of asking for a "clear summary," show one summary that is too vague, one that is too long, and one that fits. The examples become a calibration set. They teach the shape of the output better than another paragraph of instructions.
When I am using AI for product or operations work, I usually include three examples:
- one normal case
- one edge case
- one refusal or uncertainty case
That third example is the difference between a tool that looks smart and a tool that can be trusted. If the model knows what a responsible non-answer looks like, the system becomes easier to review.
03 / test prompts like code paths
A prompt is part of the system. It deserves test inputs.
Before changing a production prompt, I keep a tiny set of cases beside it: a clean input, a messy input, a short input, a multilingual input if relevant, and one input that should not produce an answer. Then I run the old and new prompt against the same cases.
You do not need a heavy evaluation platform for this to help. Even a small markdown file with expected behavior catches the obvious regressions: format drift, missing fields, overconfident guesses, hidden assumptions, and answers that sound good while ignoring the input.
The habit is the point. Once a prompt has tests, it stops being a sentence someone liked in a chat window. It becomes a small component with behavior you can improve deliberately.
04 / keep prompts boring
The best production prompts I have seen are not clever. They are boring in a useful way: explicit, structured, and easy to diff. They name the job. They define the output. They state what evidence is allowed. They make uncertainty visible.
That is prompting like an engineer: fewer vibes, more interfaces.