My agentic engineering workflow (step by step)

26 May 2026

Shreyas Prakash headshot

Shreyas Prakash

My agentic engineering workflow has changed in the recent past. The models are better, and there is much more freedom in choosing the harness, abilities, and actions you provide.

Table of contents:

I’ll walk you through each of the phases in this workflow which I follow to build my side projects (as of June 1, 2026). It has evolved from IDE chat, to CLI coding agents, to now “slice-driven product development”.. so let’s get right into it..

Pre-planning, pre-idea, pre-everything

To curate the list of ideas I have in the pipeline, I use Trello for a simple kanban board. There are many other tools you could use for a simple, organized list of items, even a TODO list app would suffice. However, I’ve particularly liked Trello for this purpose because it also allows you to add a nice little thumbnail for each idea. Usually, an idea comes to me all of a sudden, out of nowhere. In this fleeting moment, I try my best to capture it as accurately as possible, as if I were feeling it in my veins. So, a picture along with a one-line description helps me capture this fleeting thought..

Trello mobile experience is also nice, and I mostly capture them while on-the-go.

My only heuristic for picking my next side project here would be to go with something I’m most excited with. I have a crematory of 100s of abandoned side projects, and I don’t want to add another one to this ever growing list of dead projects that don’t bite. I would at least want to ship what I start, and for that, the fuel here is motivation. You feel it in the gut, and you want it to guide what you should do next.

The first chat with the agents

Usually, when you capture an idea or thought, it might be in a very urgent mode and you wouldn’t go into the weeds of how this might be envisioned. So to explore what this could mean, I start having a chat with the agents. I usually feel more comfortable doing this on mobile, so I pick this up on ChatGPT. One other reason I’ve been tied into the OpenAI ecosystem is the additional benefit I get from the $20/month subscription package. I get to also use this for leveraging GPT-Codex via Opencode. More on the agentic coding setup later, but I just wanted to mention this right away, as I see great benefits in tying into the ecosystem offered by OpenAI.

Another reason I use ChatGPT is, interestingly, its default typographic choices. Look at Claude, and the mess they’ve made with their default choices. Anthropic folks have used and abused the serif fonts, and their defaults have slowly trickled down to the way everyone vibe codes and makes their half-cooked apps. It’s a mess, and I want to stay away from it.

Another important reason for going with ChatGPT is in its ability to do OAuth with any of the self-learning, persistent memory agents such as Openclaw or Hermes. This allows any user already with a ChatGPT subscription to connect directly with Openclaw, instead of having to buy additional API credits.

While having this on-the-go chat with the agents, I might also end up in a deep-research rabbit hole. An example from the recent past was when I tried to find open source repositories on GitHub for creating music karaoke tracks for my father-in-law, who is practicing to be a singer more recently. I was itching to build a custom solution for his needs, but then, I wanted to double-confirm if there are any ready-made solutions available right off-the-shelf on GitHub which I could fork. (and it just turns out that there was an off-the-shelf solution, so I didn’t have to reinvent the wheel)

So I ask ChatGPT on the Deep Research mode to provide me a list of well-maintained repositories which do the whole thing, or a ‘part’ of the pipeline, really well.

^^ Early explorative conversations done on ChatGPT.

I have noticed that I get better results this way, rather than just trying to piece together a curated list of repos by searching through GitHub manually. While doing forking and modifying, I also try to ensure I have the right licenses to do so.

Having more Socratic dialogues

I have also noticed that this type of search works better than a mere LLM search. I also instruct the agents to “steelman” or “strawman” the concept to identify the fault-lines, or even to cultivate an opinion sometimes.

All software engineering is ultimately tradeoffs, and there exists no perfect solution without tradeoffs, so this line of reasoning helps shape an opinion on what the product should do (without having to do everything under the sun)

Once, GPT presented this list in a table format. (Make sure to provide custom instructions to always use comparison tables wherever necessary.) I began visualizing the concept in my mind, keeping the architecture in mind. Sometimes, I say, “Help me visualize the end-to-end pipeline in ASCII, including the components and libraries we’re using.” At this stage, it’s crucial to hold the complete concept in our minds without drifting away. All these conversations and visualizations help shape our vision. And with these simple ASCII diagrams, the simple act of arrows pointing to each other can help us conceptualize better.

At this stage, all of the chat threads are still on ChatGPT, and I haven’t even opened my laptop yet. All this is on mobile. And when I finally find satisfaction with the chat outputs, I would then do a ‘handoff’ to do some serious work with the foundation already set by my chat. For this handoff, I would try to synthesize the conversation into either a spec .md file, just to see if what I understood and what the agents have understood are the same. I’m looking here for mind-AI convergence here, nothing else. And in case there is some drift, I still have a way to make sure there aren’t any gaps.

The first chat on the terminal

Now that we have something to work with, we start our first chat on the terminal. I primarily use Ghostty as my default terminal application. Surprisingly Ghostty is faster than the native terminal offered on Mac, and I haven’t looked back.

Why use the CLI over a code editor? Because, the job becomes more of pointing the agents at the right location in the codebase, rather than writing code. We’re in the era of CHOP - Chat-oriented programming.

Apart from the speed benefits, it also provides a similar interface as that of Google Chrome: Just like you open multiple tabs on Google Chrome, you can also open multiple chats with agents on Ghostty, and the interface helps a lot to have multiple conversations with the agents. Just to maintain sanity, I keep one project for each tab, and open multiple panes/agents under that tab for that specific project. In that way, I could streamline my chats with multiple agents working, across multiple projects. I’ve tried going this route and have got a dopamine hit from the code throughput, but have realized that it’s much more important to hold an ‘entire problem in your head’ from start to finish. So I’ve let go of context switching, and have embraced FOCUS.

^^ I’m usually opening multiple Opencode conversations via Ghostty.

With Ghostty as the terminal application, I use Opencode as the TUI app for chatting with the agents directly. Think of it as a more hackable, model-agnostic alternative to tools like Claude Code, Codex CLI, Cursor Agent, or Gemini CLI.

I’ve also heard that Pi agent is even more hackable than Opencode, but Opencode strikes a good balance in my view. Pi can even manipulate its own installation, and emits events for everything, making it easier to build reactive UIs on top of it. (With Opencode, you could still do various customizations by means of Opencode plugins).

Setting up the Opencode environment

The default Opencode application itself helps you do most of what you need. These are the skills I use with Opencode:

SkillPurpose
build-cliDesign or improve agent-friendly and human-friendly CLIs.
copy-adsCreate paid ad copy variants for channels like Google, Meta, LinkedIn, X, and TikTok.
copy-marketingWrite persuasive website, landing page, headline, CTA, and value prop copy.
copy-release-notesGenerate user-facing release notes and changelogs from shipped work.
customize-opencodeEdit or create opencode configuration, agents, skills, plugins, MCP servers, or permissions.
frontend-advancedBuild technically ambitious frontend experiences such as shaders, virtual tables, spring physics, and scroll effects.
frontend-performanceImprove frontend loading speed, rendering, animation, images, and bundle performance.
frontend-remotionApply best practices for Remotion video creation in React.
frontend-slidesBuild animated HTML presentations or convert PowerPoint decks into web slides.
meta-design-setupSet up persistent design context and guidelines for a project.
meta-find-skillsHelp discover and install additional agent skills.
meta-thinkingAct as a structured thinking partner for decisions, tradeoffs, mental models, and stress testing ideas.
product-breadboard-reviewReview an existing breadboard against implementation and surface wiring/design drift.
product-breadboardingMap workflows into product affordances, code affordances, stores, and wiring.
product-framingTurn transcripts or interview notes into structured product framing documents.
product-kickoffConvert kickoff transcripts into builder-facing implementation reference docs.
product-namingBrainstorm five memorable product names with rationale.
product-shapingCollaboratively shape a product or feature before implementation.
product-visionCreate inspiring product vision statements and alignment narratives.
research-blueskyDeep research using docs, web, and codebase before planning.
research-deepThorough evidence-backed research across code, docs, and web.
research-last30Recency-focused research across many sources from the last 30 days.
research-lightTargeted lightweight research before planning or implementation.
tool-browserAutomate browser tasks like navigation, screenshots, forms, scraping, and web app testing.
ux-clarityImprove interface microcopy such as labels, buttons, helper text, errors, and empty states.
ux-onboardingDesign or improve onboarding, activation, setup, and first-run flows.
ux-resilienceMake interfaces robust against errors, edge cases, i18n, overflow, and production issues.

You could download my total list of skills here: https://github.com/shreyas-makes/agent-skills

As you can see here, more broadly the list of skills are more focused on UX, research, copywriting, performance and browser-automation testing. I see skills being created and updated as a dynamic ongoing process that needs to be reflected upon every now and then. It would look something like this diagram here: if it’s a process that has been repeated more than 5 times, then definitely create this as a skill.

In the first wave of adoption to agentic coding, we saw a lot of impetus given to designing the right prompt.

In the old era, this would have been quite a useful technique to get a lot more sauce from the models, but in the new way, where the agents have caught up with intelligence, we don’t require any such sorcery.

Prompt engineering is just English grammar in my view, and even if you’re rambling incoherently (making sense sometimes), they are still OK.

Doing light research, deep research

For complex tasks, you might want to research, walk through the planned sequence of steps, and then execute. Sometimes you might need to do all three, and might straightaway jump to execute too, that’s fine too. Especially on the “research” step, I might do a light-research that’s not too rigorous, and a more extensive “deep research” that scrapes every last bit of information arbitrage from the internet..

If I have to do more ‘light-research’ inline while using the terminal, I use the light-research skill popularized by Josh Pigford..

If I’m looking for more “hot” research, especially the word-of-mouth from the zeitgeist, especially since every day is a year in the AI age, I use the /last-30-days skill.

The /last30days skill popularized by Matt vhorn is an ‘AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary’.

Reddit upvotes. X likes. YouTube transcripts. TikTok engagement. Polymarket odds backed by real money and insider information. That’s millions of people voting with their attention and their wallets every day. /last30days searches all of it in parallel, scores it by what real people actually engage with, and an AI agent judge synthesizes it into one brief.

Google aggregates editors. /last30days searches people.

You can’t get this search anywhere else because no single AI has access to all of it. Google search doesn’t touch Reddit comments or X posts. ChatGPT has a deal with Reddit but can’t search X or TikTok. Gemini has YouTube but not Reddit. Claude has none of them natively. Each platform is a walled garden with its own API, its own tokens, its own auth. But you can bring your own keys and browser sessions, and suddenly an AI agent can search all of them at once, score them against each other, and tell you what actually matters.

That’s the unlock. Not one better search engine. A dozen disconnected platforms, bridged by an agent.

In their own words, I was baited by the description here where they mention how their search offers conversational intelligence by pulling in info from pretty much all the biggie social platforms):

SourceWhat the people tell you
RedditThe unfiltered take. Top comments with upvote counts, free via public JSON. The real opinions that Google buries.
X / TwitterThe hot take, the expert thread, the breaking reaction. First to know, first to argue.
YouTubeThe 45-minute deep dive. Full transcripts searched for the 5 quotable sentences that matter.
TikTokThe creator reaching 3.6M people with a take you’ll never find on Google.
Instagram ReelsThe influencer perspective with spoken-word transcripts. The visual culture signal.
Hacker NewsThe developer consensus. 825 points, 899 comments. Where technical people actually argue.
PolymarketNot opinions. Odds. Backed by real money. 96% confidence on album sales. 4% on an acquisition.
GitHubFor people: PR velocity, top repos by stars, release notes. For topics: issues and discussions.
DiggCurated story clusters from Digg’s AI 1000 leaderboard (~1000 high-signal AI accounts on X), with attributable inline quotes (no X auth required). Auto-enabled when digg-pp-cli is on PATH.
ThreadsThe post-Twitter text layer. Conversations from creators and brands.
PinterestVisual discovery. Pins, saves, and comments on products and ideas.
BlueskyThe decentralized social layer. AT Protocol posts from the post-Twitter migration.
PerplexityGrounded web search with citations via Sonar Pro.
WebThe editorial coverage, the blog comparisons. One signal of many, not the only one.
/last30days can be used for person research, competitor analysis, feature A versus feature B, etc.

Model choices while working with coding

I mostly have subscribed completely to Peter Steinberger with the obsessive use of the Codex agents. And this is before him joining OpenAI, so I know that the initial take was unbiased.

Sometimes it just silently reads files for 10, 15 minutes before starting to write any code. On the one hand that’s annoying, on the other hand that’s amazing because it greatly increases the chance that it fixes the right thing. Opus on the other hand is much more eager - great for smaller edits - not so good for larger features or refactors, it often doesn’t read the whole file or misses parts and then delivers inefficient outcomes or misses sth. I noticed that even tho codex sometimes takes 4x longer than Opus for comparable tasks, I’m often faster because I don’t have to go back and fix the fix, sth that felt quite normal when I was still using Claude Code. — Peter Steinberger

I also don’t use the “plan mode”, when I need the agent to do a set of instructions, I say “let’s discuss”.. My approach here with building is very iterative. I am not a big fan of taking a complete spec, and putting it in a ralph loop, if that’s so easy, then that should not be a software then.

This is my current stack of model choices:

  1. Codex GPT series
  2. MiniMax or Kimi series (for fallbacks in case I run out of Codex credits)

The design process: sequence of steps while building apps

I follow this sequence of steps in my conversations with the agents while building 0 to 1. I’d outlined this in my previous essay (breadboarding and shaping with AI agents), and have it here handy:

This is how my current process looks like:

StepTermWhat happensWhy it existsOutput artifact
1VisionDescribe the future state of the productAligns all work to a long-term directionVision statement
2ProblemIdentify the concrete obstacle preventing the visionPrevents building random featuresProblem statement
3Requirements (R)Extract constraints and must-have behaviorsCreates a contract to evaluate solutionsRequirement list
4Shaping (Solutions A/B)Propose high-level solution approachesMoves from problem → possible architecturesShape document
5Fit Check (R × A)Verify if the solution actually satisfies requirementsReveals gaps, over-engineering, or missing piecesFit matrix
6SpikesResearch unknown technical areasReduce uncertainty before architecture solidifiesSpike notes
7Fat Marker SketchSketch user interaction and visible stateClarifies product behavior and UI affordancesSimple UX diagram
8BreadboardingMap system wiring (UI + code + data + services)Convert ideas into architectureBreadboard diagram
9Slicing (Scopes)Divide architecture into demoable piecesEnables incremental deliveryVertical slice plan
10Steel ThreadBuild the minimal end-to-end pathProve the architecture integrates correctlyWorking skeleton
11Iterative Slice BuildExpand slices into complete featuresGradually complete the productProduction system

You might be looking at this 11 point list, and questioning why do all this?? Why can’t we just prompt in one line and be okay with whatever AI agents generate? This was my initial line of exploration, and I failed badly after encountering various bugs in the process.

Why vision first? This gives a sense of direction, especially when agents could take you anywhere, and be sycophantic when they say “you’re absolutely right!”. You need a strong, opinionated product vision.

How I start the chat:

Ask me one question at a time so we can define and shape a strong product vision for this idea. Each question should build on my previous answers, helping clarify the user, the problem, the unique insight, the product’s point of view, and the future it is trying to create. Let’s do this iteratively and focus on asking the right questions before jumping into features or implementation. Remember, only one question at a time.

Here’s the idea:

Once you get to the end of the dialogue, you then say:

use your shaping skill to capture the requirements and tease apart the key parts of solution A that I have specified here

More often than not, we give lengthy jumbled up argumentation mixing up the problem and the solution together. By doing it this way, we separate out the problem and the solution neatly.

Alongside the conventional software development lifecycle, what has changed here is me incorporating Ryan Singer’s workflow in terms of building shapes, and slices. Long story short, you don’t constrain yourself too much by finalising a “spec.md” and then telling our AI overlords to “go build it!”. That kind of stuff seldom works. Instead what I do is to come up with a few requirements/constraints, and spend more time ‘shaping’ multiple approaches. Let’s say, if you have shape A, shape B and shape C (with some unknowns on how they tie together to create a complete solution), you then spike the unknowns, resolve (or not resolve them) in the process, and come up with the right “shape”.

Once you have a shape, based on the complexity of the shape, which could be as complex as “rewrite Linux in Rust” or as simple as “build a HTML presentation deck”. If things get way way too complex, then you might also have to slice the shape into multiple pieces. Each “slice” is a demoable piece, which means it’s possible for us to feel something and feed useful information back to the agents instead of just ‘TAB, TAB, TAB, CONTINUE, ENTER, TAB, TAB, TAB…’

This is a remix, or a fork of the conventional agile development lifecycle, adapted to work better with AI agents, and I have found great results so far, especially since this is more universal go-to process that could work for the entire spectrum of simple to complex/heavy-duty stuff.

Another benefit of breaking them into such slices is to reinforce the need to split a giant agent output into reviewable pieces. You would certainly not be able to review a 15,000 line PR.

To incorporate the Shape Up methodology of Ryan Singer, here is the GitHub repo

If slicing doesn’t work, try steelthreading it..

Imagine standing at the edge of a canyon, needing to cross to the opposite cliff. One option is to construct a bridge, starting with logs, ropes, and foundations—carefully assembling each piece until a complete, safe crossing exists. This is similar to how MVPs are often built: you fully develop a core feature or component (like building an engine) before moving on to other parts, ensuring that the piece you create is robust and ready for future scaling.

In product development, a concept known as the steel thread has gained attention for its focus on creating the most direct yet robust path from concept to functionality. Unlike traditional methods such as building a Minimum Viable Product (MVP), which often prioritize incremental construction and polishing a single component before moving to the full system, the steel thread approach prioritizes end-to-end integration early on, even with minimal implementation.

The steel thread approach, however, takes a different perspective. Instead of starting with a full bridge, you imagine a thin steel thread stretched across the canyon. It represents the simplest, lightest, and most minimal path to achieve an end-to-end connection. The steel thread is strong enough to support essential functionality and demonstrates that all critical integrations work together. Even if the overall experience is barebones, the team can traverse the complete journey from point A to point B, proving that the product can function holistically.

From a product development perspective, this method focuses on building the smallest possible version of the full flow. Rather than fully developing isolated components, the goal is to establish a working skeleton that spans the entire product experience. This allows teams to quickly identify integration challenges, potential bottlenecks, and areas of risk. Once the steel thread is in place, subsequent iterations can enhance, reinforce, and expand it—eventually turning the thread into a fully realized product structure.

By prioritizing end-to-end connectivity over depth in one area, the steel thread approach offers several advantages:

  • Rapid validation of system feasibility
  • Early identification of integration issues
  • Efficient feedback collection on the full user journey

In contrast to building a polished engine first (the MVP approach), the steel thread demonstrates the viability of the whole vehicle—even if, at first, it is only a bare-metal prototype. Teams practicing this method move faster toward functional products and discover critical insights earlier in the development lifecycle.

In short, the steel thread method is about achieving the simplest full journey before committing to deep, complex builds. It highlights the importance of robust integration early, providing a strategic pathway to scale confidently and efficiently.

Ensuring that the design in the design process is coherent and consistent..

My mantra here is to first make it work, then, make it fast, and then, make it delightful.. Form should follow the function. And when I reach this stage where I’m happy with the functionality, and there aren’t much glitchiness to the way it achieves its core function, I move on to the design bit.

Hardik Pandya, in his essay — Expose your design system to LLMs, talks about how LLMs undergo design drift, and why it’s important to feed the design system to the AI coding agents, to make it stop guessing.

To achieve a consistent and coherent way of presenting the interface for my app, I set up a design system if I haven’t already. In case of brownfield apps, I use /design-audit skill, inspired by this essay, to comb through all the patterns it could find and translate it into the right theming. This then becomes a ‘DRY’, where I wouldn’t have to repeat myself to the agents for the 100th time.

You could place this at the root of your project:

Audit this project and make the design system LLM-readable.

Step 1: Audit
Scan every CSS/SCSS file. List every hardcoded visual value:
hex colors, rgb/rgba colors, pixel spacing, raw font sizes,
font weights, border radii, z-index values, box shadows,
and transition durations. Group them by category. Count totals.
Report which files have the most hardcoded values.

Step 2: Token layer
Create a tokens.css file with three layers:
- Layer 1: upstream design system tokens (use existing ones
 if the project already uses a design system, otherwise
 derive sensible primitives from the audit)
- Layer 2: project aliases that reference Layer 1 with
 fallbacks, e.g. --color-text: var(--ds-text, #292A2E)
- Layer 3 is the components themselves — they only ever
 reference Layer 2 aliases, never raw values

Include tokens for: colors (text, background, link, border,
interactive states), spacing (at least 8 steps), typography
(font families, sizes, weights, line heights), border radius,
elevation/shadow, z-index, and motion/transitions.

Step 3: Spec files
Create a specs/ directory. Write structured markdown specs:
- specs/foundations/ — color.md, spacing.md, typography.md,
 radius.md, elevation.md, motion.md
- specs/tokens/ — token-reference.md (master map of every
 CSS variable, its value, and when to use it)
- specs/components/ — one file per major component in the
 project. Each spec follows this template:
 1. Metadata (name, category, status)
 2. Overview (when to use, when not to use)
 3. Anatomy (parts of the component)
 4. Tokens used (which CSS variables it references)
 5. Props/API (if applicable)
 6. States (default, hover, active, focus, disabled, error)
 7. Code example
 8. Cross-references (related components)

Only spec components that actually exist in this project.

Step 4: Audit script
Create scripts/token-audit.js (or .sh) that:
- Scans all CSS files for hardcoded values
- Suggests the correct token for each violation
- Prints file, line number, violation, and suggestion
- Returns exit code 1 if any errors found (CI-ready)
- Distinguishes errors (hardcoded colors, spacing) from
 warnings (raw durations, uncommon values)

Step 5: Replace hardcoded values
Go through every CSS file and replace hardcoded values with
the tokens from Step 2. Every color:, background:, padding:,
margin:, gap:, border-radius:, font-size:, font-weight:,
box-shadow:, z-index:, and transition: should reference a
var(--token). No raw values should remain.

Step 6: Project instructions
Add a section to the project's AI instruction file (CLAUDE.md,
.cursorrules, or equivalent) that says:
"Before writing or modifying any UI code, read the relevant
spec file in specs/. Use only tokens from tokens.css. Run the
token audit script before committing. Zero errors required."

Run the audit script at the end and confirm zero violations.

It also happens that AI still makes a lot of common mistakes on spacing, typography, hierarchy, etc., which a very keen design engineer who is trained for that eye can spot, I use /design-type for such tweaks that get repeated.

This is the current list of design specific skills I use, and you could remix or fork them by clicking here.

design-auditProduce a comprehensive UI audit across accessibility, performance, responsiveness, theming, and UX quality.
design-boldMake safe or boring designs more visually striking.
design-colorAdd strategic color to monochromatic or visually flat interfaces.
design-critiqueEvaluate a design’s product and UX effectiveness.
design-delightAdd personality, joy, micro-moments, and memorable touches to interfaces.
design-distillSimplify designs by removing unnecessary complexity.
design-extractExtract reusable components, tokens, and design patterns into a system.
design-layoutImprove spacing, rhythm, composition, and hierarchy.
design-minimalCreate clean editorial minimalism with restrained warm monochrome styling.
design-motionAdd purposeful animations and micro-interactions.
design-normalizeAlign a feature with an existing design system and component language.
design-polishPerform final pre-ship UI refinement.
design-premiumCreate expensive, cinematic, agency-crafted interfaces.
design-quietTone down overly aggressive or loud visual designs.
design-responsiveAdapt designs across screen sizes, devices, and contexts.
design-systematicBuild stricter frontend design systems and measurable UI implementation rules.
design-typeImprove typography, font hierarchy, sizing, weight, and readability.
design-uiCreate distinctive production-grade frontend interfaces from scratch or through major redesign.
You could download my total list of skills here: https://github.com/shreyas-makes/agent-skills

Ensuring the code generated is clear, and reviewed

If you have read the theory of constraints, you would know that a bottleneck is never completely eliminated, it just shifts from one place to the other. Previously we had the bottleneck in terms of generating or writing code, that was a bottleneck. But then once the coding agents were able to like solve that bottleneck, the bottleneck just shifted to code review. Which makes the need for having better tools to support code review, even more important.

There is even an argument floating around in the internet that there is a cost to accelerating the code throughput without reviewing the code properly, leading to complex failures which are harder to resolve, by AI agents, as well as by human engineers. But for which I would say that the only metric that matters are the number of decisions that could be taken per day. At normal velocity, a team might make one or two decisions per week, but at 10x velocity, you see them making multiple a day. The usual bottlenecks where you’re waiting for a slack response, or for scheduling a quick sync later, no longer exist.

I’ve previously used the /review tool available in most coding agents, and yet, the fundamental question which I still don’t have answer to is: if the language models need a separate tool for code review, why can’t it just stitch the review loop onto the code generation? Currently, I’m a bit skeptical about code review, and think this would eventually be a part of the code-generation loop.

Right now, after a major feature update, I try to ask the agent its plan before it writes any code (so that I could perform some pre-emptive strikes), and ask it in plain English what it has written. In a previous note on this topic, I write about how the top layer and bottom layer should still be done by humans, leaving the middle layer for the AI agents:

The AI sandwich technique outlines a structured approach where humans and AI agents collaborate effectively.

In this model, the top layer involves human input, where goals and instructions are clearly defined by humans. This ensures that the desired outcomes are aligned with human intentions. The middle layer is where AI agents take over, handling the orchestration, execution, and processing tasks. This allows for efficient and automated handling of complex operations. Finally, the bottom layer involves human evaluation, where the output is assessed based on subjective human taste and feedback. This ensures that the final result meets human standards and expectations.

Think of the end game polishing done by humans, akin to how pilots and co-pilots still have the final call on the airliner they’re operating in, despite all the automations at place that could technically automate the role of the pilot, but not in principle.

A concept which I recently became aware of is that of a backpressure. It’s described as a pressure which arises from failed builds/tests that pushes the model loop to improve output.

Templatize everything that needs templatizing

If I have already built the feature successfully somewhere else in a different project, I cross-reference that successful implementation to the AI agents for helping diagnose what’s wrong.

I’ve already mentioned my way of planning a feature. I cross-reference projects all the time, esp if I know that I already solved sth somewhere else, I ask codex to look in../project-folder and that’s usually enough for it to infer from context where to look. This is extremely useful to save on prompts. I can just write “look at../vibetunnel and do the same for Sparkle changelogs”, because it’s already solved there and with a 99% guarantee it’ll correctly copy things over and adapt to the new project. That’s how I scaffold new projects as well.

I started templatizing patterns across projects because I was spending too much energy repeating setup decisions in every new repo: how to plan work, how to enforce coding preferences, how to deploy etc. The core idea is simple: separate reusable workflow rules from project-specific code. That gives me a stable operating layer across all work in ~/Projects, while still letting each project keep its own context.

Instead of reinventing process every time, I reuse a consistent scaffold and only customize where the stack or business logic actually differs.

The agent-scripts model came from studying Peter Steinberger’s setup and adapting the parts that matched my own way of working. I kept the structure but changed the intent: a global rules file for hard constraints, stack-specific profiles for Rails Inertia vs Next.js vs Tauri behavior, and command-like prompts for recurring actions such as build, review, research, and ship.

1) Folder/System View
/Users/shreyas/Desktop/Projects/
|
+-- agent-scripts/                    <-- YOUR canonical workflow system
|   |
|   +-- AGENTS.md                     <-- global rules (always-on behavior)
|   +-- stack-profiles/
|   |   +-- rails-inertia.md          <-- stack-specific rules
|   |   +-- nextjs.md
|   |   `-- tauri.md
|   +-- prompts/
|   |   +-- build-feature.md          <-- reusable command templates
|   |   +-- review.md
|   |   +-- research.md
|   |   +-- ship.md
|   |   `-- inspire.md
|   `-- skills/
|       +-- build-feature/SKILL.md    <-- execution workflow skills
|       +-- review/SKILL.md
|       +-- ship/SKILL.md
|       `-- inspiration/SKILL.md
|
+-- my-saas-app/                      <-- your real project
+-- next-app/                         <-- your real project
+-- tauri-tool/                       <-- your real project
`-- others/                           <-- external repos, reference only
    +-- cool-ui-repo/
    `-- random-oss/

In my day-to-day workflow, it looks like this: I open a repo, trigger /build-feature, get a short plan, and then let the agent execute within the detected stack profile. If I get stuck, I run /inspire, which inspects relevant repos under ~/Projects/others and returns transferable patterns without copying code. Once implementation is stable, I run /review for risk-first feedback, and only invoke /ship when I explicitly want release actions.

While inspiration-seeking, I also identify some starred GitHub repos that can hold some clues and examples that could be applied to the current problem at hand.. when that happens, I clone that repo into the Projects/others/ folder and then chat with the repo with a prompt that looks somewhat like this:

read the code for this repo and write a markdown doc telling me everything you can infer or know with certainty about the high-level intent and idea behind this repo ask me questions for anything that isn't clear then pop open the doc for me to review and answer the goal here is to make sure my intent is obvious to any agent reading this code

Closing thoughts

I just want to close by saying that purely spec-driven development, where you make a “perfect” spec to send it to the agents on a ralph loop is not going to work. These are for the same reasons why we have moved away from waterfall to agile, they are still the same reasons. We sometimes revert our earlier decisions, cross over or even contradict what we might have said earlier, as every new iteration of the product is a learning for us, and new facts could evolve.

One thing that’s clear from this exercise of writing this essay is that software has started transitioning from a software development lifecycle, to a “context development lifecycle”, aka CDLC. SDLC, is now being offloaded to agents, with utmost trust, where engineers are now involved in maintenance of context, which constantly gets updated over time, and needs a human owner for its reliability.

Subscribe to get future posts via email (or grab the RSS feed). 2-3 ideas every month across design and tech

Read more

  1. My agentic engineering workflow (step by step)agentic-coding
  2. Every darn thing is a kekulean loop if you notice itdesign-thinking
  3. Hammock driven developmentagentic-coding
  4. Peculiar ways number three fits into our funny little brainsmental-models
  5. AI sandwich as a defacto principle for anything agentic engineering relatedagentic-coding
  6. How I write essays in 2026writing
  7. Authority in the guise of evidencecritical-rationalism
  8. Map is not the territoryphilosophy
  9. Self hypnosis as a manifestation ritualmeditation
  10. Hegelian dialectic for structured reasoning with AI agentsphilosophy
  11. How I prepare for tough negotiations nowadaysnegotiation
  12. When should we steelthread somethingproduct-development
  13. Learning and re-learning my mother tongue in Malayalam
  14. Breadboarding, shaping, slicing, and steelthreading solutions with AI agentsproduct
  15. Healthy conflict in teams have a tipping pointteam-building
  16. How I deslopify AI writingwriting
  17. How I started building softwares with AI agents being non technicalagentic-coding
  18. Read raw transcriptswriting
  19. Legible and illegible tasks in organisationsproduct
  20. L2 Fat marker sketchesdesign
  21. Writing as moats for humanswriting
  22. Beauty of second degree probesdecision-making
  23. Boundary objects as the new prototypesprototyping
  24. One way door decisionsproduct
  25. Finished softwares should existproduct
  26. How I periodically rank my rough draftsobsidian
  27. Flipping questions on its headinterviewing
  28. Vibe writing maximswriting
  29. How I blog with Obsidian, Cloudflare, AstroJS, Githubwriting
  30. How I build greenfield apps with AI-assisted codingagentic-coding
  31. We have been scammed by the Gaussian distribution clubmathematics
  32. Classify incentive problems into stag hunts, and prisoners dilemmasgame-theory
  33. I was wrong about optimal stoppingmathematics
  34. Thinking like a shipmental-models
  35. Hyperpersonalised N=1 learningeducation
  36. New mediums for humans to complement superintelligenceagentic-coding
  37. Maxims for AI assisted codingagentic-coding
  38. Virtual bookshelvesaesthetics
  39. It's computational everythingtrends
  40. Public gardens, secret routesdigital-garden
  41. Git way of learning to codeagentic-coding
  42. Style Transfer in AI writingagentic-coding
  43. Understanding codebases without using codeagentic-coding
  44. Vibe coding with Cursoragentic-coding
  45. Virtuoso Guide for Personal Memory Systemsmemory
  46. Writing in Future Pastwriting
  47. Publish Originally, Syndicate Elsewhereblogging
  48. Poetic License of Designdesign
  49. Idea in the shower, testing before breakfastsoftware
  50. Technology and regulation have a dance of ice and firetechnology
  51. How I ship "stuff"software
  52. Writing is thinkingwriting
  53. Song of Shapes, Words and Pathscreativity
  54. How do we absorb ideas better?knowledge
  55. Read writers who operatewriting
  56. Brew your ideas lazilyideas
  57. Trees, Branches, Twigs and Leaves — Mental Models for Writingwriting
  58. Compound Interest of Private Noteswriting
  59. Conceptual Compression for LLMsagentic-coding
  60. Meta-analysis for contradictory research findingsdigital-health
  61. Proof of workproduct
  62. Gauging previous work of new joinees to the teamleadership
  63. Task management for product managersproduct
  64. Beauty of Zettelswriting
  65. Stitching React and Rails togetheragentic-coding
  66. Exploring "smart connections" for note takingwriting
  67. Deploying Home Cooked Apps with Railssoftware
  68. Repetitive Copypromptingwriting
  69. Questions to ask every decadejournalling
  70. Balancing work, time and focusproductivity
  71. Hyperlinks are like cashew nutswriting
  72. Brand treatments, Design Systems, Vibesdesign
  73. How to spot human writing on the internetwriting
  74. Can a thought be an algorithm?product
  75. Opportunity Harvestingcareers
  76. How does AI affect UI?design
  77. Everything is a prioritisation problemproduct
  78. How I do product roastsproduct
  79. The Modern Startup Stacksoftware
  80. In-person vision transmissionproduct
  81. How might we help children invent for social good?social-design
  82. The meeting before the meetingmeetings
  83. Design that's so bad it's actually gooddesign
  84. Lessons learnt interview prepping for product rolesinterviewing
  85. Obsessing over personal websitessoftware
  86. English is the hot new programming languagesoftware
  87. Better way to think about conflictsconflict-management
  88. The role of taste in building productsdesign
  89. Dear enterprises, we're tired of your subscriptionssoftware
  90. Products need not be user centereddesign
  91. World's most ancient public health problemsoftware
  92. Pluginisation of Modern Softwaredesign
  93. Let's make every work 'strategic'consulting
  94. Making Nielsen's heuristics more digestibledesign
  95. Startups are a fertile ground for risk takingentrepreneurship
  96. Insights are not just a salad of factsdesign
  97. Minimum Lovable Productproduct
  98. Methods are lifejackets not straight jacketsmethodology
  99. How to arrive at on-brand colours?design
  100. Minto principle for writing memoswriting
  101. Importance of Whytask-management
  102. Quality Ideas Trump Executionsoftware
  103. Why I prefer indie softwareslifestyle
  104. Use code only if no code failscode
  105. Self Marketing
  106. Personal Observation Techniquesdesign
  107. Design is a confusing worddesign
  108. A Primer to Service Design Blueprintsdesign
  109. Rapid Journey Prototypingdesign
  110. Visualise detailed file structures on CLIcli
  111. Do's and Don'ts of User Researchdesign
  112. Design Manifestodesign
  113. Complex project management for productproducts
  114. How might we enable patients and caregivers to overcome preventable health conditions?digital-health
  115. Pedagogy of the Uncharted — What for, and Where to?education
  116. Future of Ageing with Mehdi Yacoubiinterviewing
  117. Future of Tacit knowledge with Celeste Volpiinterviewing
  118. Future of Rural Innovation with Thabiso Blak Mashabainterviewing
  119. Future of Equity with Ludovick Petersinterviewing
  120. Future of work with Laetitia Vitaudinterviewing
  121. Future of Mental Health with Kavya Raointerviewing
  122. Future of unschooling with Che Vanniinterviewing
  123. How might we prevent acquired infections in hospitals?digital-health
  124. The why to endure any howentrepreneurship
  125. Design education amidst social tribulationsdesign
  126. How might we assist deafblind runners to navigate?social-design