My agentic engineering workflow (step by step)

32 minutes read agentic-coding

My agentic engineering workflow has changed in the recent past. The models are better, and there is much more freedom in choosing the harness, abilities, and actions you provide.

Table of contents:

I’ll walk you through each of the phases in this workflow which I follow to build my side projects (as of June 1, 2026). It has evolved from IDE chat, to CLI coding agents, to now “slice-driven product development”.. so let’s get right into it..

Pre-planning, pre-idea, pre-everything

To curate the list of ideas I have in the pipeline, I use Trello for a simple kanban board. There are many other tools you could use for a simple, organized list of items, even a TODO list app would suffice. However, I’ve particularly liked Trello for this purpose because it also allows you to add a nice little thumbnail for each idea. Usually, an idea comes to me all of a sudden, out of nowhere. In this fleeting moment, I try my best to capture it as accurately as possible, as if I were feeling it in my veins. So, a picture along with a one-line description helps me capture this fleeting thought..

Trello mobile experience is also nice, and I mostly capture them while on-the-go.

My only heuristic for picking my next side project here would be to go with something I’m most excited with. I have a crematory of 100s of abandoned side projects, and I don’t want to add another one to this ever growing list of dead projects that don’t bite. I would at least want to ship what I start, and for that, the fuel here is motivation. You feel it in the gut, and you want it to guide what you should do next.

The first chat with the agents

Usually, when you capture an idea or thought, it might be in a very urgent mode and you wouldn’t go into the weeds of how this might be envisioned. So to explore what this could mean, I start having a chat with the agents. I usually feel more comfortable doing this on mobile, so I pick this up on ChatGPT. One other reason I’ve been tied into the OpenAI ecosystem is the additional benefit I get from the $20/month subscription package. I get to also use this for leveraging GPT-Codex via Opencode. More on the agentic coding setup later, but I just wanted to mention this right away, as I see great benefits in tying into the ecosystem offered by OpenAI.

Another reason I use ChatGPT is, interestingly, its default typographic choices. Look at Claude, and the mess they’ve made with their default choices. Anthropic folks have used and abused the serif fonts, and their defaults have slowly trickled down to the way everyone vibe codes and makes their half-cooked apps. It’s a mess, and I want to stay away from it.

Another important reason for going with ChatGPT is in its ability to do OAuth with any of the self-learning, persistent memory agents such as Openclaw or Hermes. This allows any user already with a ChatGPT subscription to connect directly with Openclaw, instead of having to buy additional API credits.

While having this on-the-go chat with the agents, I might also end up in a deep-research rabbit hole. An example from the recent past was when I tried to find open source repositories on GitHub for creating music karaoke tracks for my father-in-law, who is practicing to be a singer more recently. I was itching to build a custom solution for his needs, but then, I wanted to double-confirm if there are any ready-made solutions available right off-the-shelf on GitHub which I could fork. (and it just turns out that there was an off-the-shelf solution, so I didn’t have to reinvent the wheel)

So I ask ChatGPT on the Deep Research mode to provide me a list of well-maintained repositories which do the whole thing, or a ‘part’ of the pipeline, really well.

^^ Early explorative conversations done on ChatGPT.

I have noticed that I get better results this way, rather than just trying to piece together a curated list of repos by searching through GitHub manually. While doing forking and modifying, I also try to ensure I have the right licenses to do so.

Having more Socratic dialogues

I have also noticed that this type of search works better than a mere LLM search. I also instruct the agents to “steelman” or “strawman” the concept to identify the fault-lines, or even to cultivate an opinion sometimes.

All software engineering is ultimately tradeoffs, and there exists no perfect solution without tradeoffs, so this line of reasoning helps shape an opinion on what the product should do (without having to do everything under the sun)

Once, GPT presented this list in a table format. (Make sure to provide custom instructions to always use comparison tables wherever necessary.) I began visualizing the concept in my mind, keeping the architecture in mind. Sometimes, I say, “Help me visualize the end-to-end pipeline in ASCII, including the components and libraries we’re using.” At this stage, it’s crucial to hold the complete concept in our minds without drifting away. All these conversations and visualizations help shape our vision. And with these simple ASCII diagrams, the simple act of arrows pointing to each other can help us conceptualize better.

At this stage, all of the chat threads are still on ChatGPT, and I haven’t even opened my laptop yet. All this is on mobile. And when I finally find satisfaction with the chat outputs, I would then do a ‘handoff’ to do some serious work with the foundation already set by my chat. For this handoff, I would try to synthesize the conversation into either a spec .md file, just to see if what I understood and what the agents have understood are the same. I’m looking here for mind-AI convergence here, nothing else. And in case there is some drift, I still have a way to make sure there aren’t any gaps.

The first chat on the terminal

Now that we have something to work with, we start our first chat on the terminal. I primarily use Ghostty as my default terminal application. Surprisingly Ghostty is faster than the native terminal offered on Mac, and I haven’t looked back.

Why use the CLI over a code editor? Because, the job becomes more of pointing the agents at the right location in the codebase, rather than writing code. We’re in the era of CHOP - Chat-oriented programming.

Apart from the speed benefits, it also provides a similar interface as that of Google Chrome: Just like you open multiple tabs on Google Chrome, you can also open multiple chats with agents on Ghostty, and the interface helps a lot to have multiple conversations with the agents. Just to maintain sanity, I keep one project for each tab, and open multiple panes/agents under that tab for that specific project. In that way, I could streamline my chats with multiple agents working, across multiple projects. I’ve tried going this route and have got a dopamine hit from the code throughput, but have realized that it’s much more important to hold an ‘entire problem in your head’ from start to finish. So I’ve let go of context switching, and have embraced FOCUS.

^^ I’m usually opening multiple Opencode conversations via Ghostty.

With Ghostty as the terminal application, I use Opencode as the TUI app for chatting with the agents directly. Think of it as a more hackable, model-agnostic alternative to tools like Claude Code, Codex CLI, Cursor Agent, or Gemini CLI.

I’ve also heard that Pi agent is even more hackable than Opencode, but Opencode strikes a good balance in my view. Pi can even manipulate its own installation, and emits events for everything, making it easier to build reactive UIs on top of it. (With Opencode, you could still do various customizations by means of Opencode plugins).

Setting up the Opencode environment

The default Opencode application itself helps you do most of what you need. These are the skills I use with Opencode:

SkillPurpose
build-cliDesign or improve agent-friendly and human-friendly CLIs.
copy-adsCreate paid ad copy variants for channels like Google, Meta, LinkedIn, X, and TikTok.
copy-marketingWrite persuasive website, landing page, headline, CTA, and value prop copy.
copy-release-notesGenerate user-facing release notes and changelogs from shipped work.
customize-opencodeEdit or create opencode configuration, agents, skills, plugins, MCP servers, or permissions.
frontend-advancedBuild technically ambitious frontend experiences such as shaders, virtual tables, spring physics, and scroll effects.
frontend-performanceImprove frontend loading speed, rendering, animation, images, and bundle performance.
frontend-remotionApply best practices for Remotion video creation in React.
frontend-slidesBuild animated HTML presentations or convert PowerPoint decks into web slides.
meta-design-setupSet up persistent design context and guidelines for a project.
meta-find-skillsHelp discover and install additional agent skills.
meta-thinkingAct as a structured thinking partner for decisions, tradeoffs, mental models, and stress testing ideas.
product-breadboard-reviewReview an existing breadboard against implementation and surface wiring/design drift.
product-breadboardingMap workflows into product affordances, code affordances, stores, and wiring.
product-framingTurn transcripts or interview notes into structured product framing documents.
product-kickoffConvert kickoff transcripts into builder-facing implementation reference docs.
product-namingBrainstorm five memorable product names with rationale.
product-shapingCollaboratively shape a product or feature before implementation.
product-visionCreate inspiring product vision statements and alignment narratives.
research-blueskyDeep research using docs, web, and codebase before planning.
research-deepThorough evidence-backed research across code, docs, and web.
research-last30Recency-focused research across many sources from the last 30 days.
research-lightTargeted lightweight research before planning or implementation.
tool-browserAutomate browser tasks like navigation, screenshots, forms, scraping, and web app testing.
ux-clarityImprove interface microcopy such as labels, buttons, helper text, errors, and empty states.
ux-onboardingDesign or improve onboarding, activation, setup, and first-run flows.
ux-resilienceMake interfaces robust against errors, edge cases, i18n, overflow, and production issues.

You could download my total list of skills here: https://github.com/shreyas-makes/agent-skills

As you can see here, more broadly the list of skills are more focused on UX, research, copywriting, performance and browser-automation testing. I see skills being created and updated as a dynamic ongoing process that needs to be reflected upon every now and then. It would look something like this diagram here: if it’s a process that has been repeated more than 5 times, then definitely create this as a skill.

In the first wave of adoption to agentic coding, we saw a lot of impetus given to designing the right prompt.

In the old era, this would have been quite a useful technique to get a lot more sauce from the models, but in the new way, where the agents have caught up with intelligence, we don’t require any such sorcery.

Prompt engineering is just English grammar in my view, and even if you’re rambling incoherently (making sense sometimes), they are still OK.

Doing light research, deep research

For complex tasks, you might want to research, walk through the planned sequence of steps, and then execute. Sometimes you might need to do all three, and might straightaway jump to execute too, that’s fine too. Especially on the “research” step, I might do a light-research that’s not too rigorous, and a more extensive “deep research” that scrapes every last bit of information arbitrage from the internet..

If I have to do more ‘light-research’ inline while using the terminal, I use the light-research skill popularized by Josh Pigford..

If I’m looking for more “hot” research, especially the word-of-mouth from the zeitgeist, especially since every day is a year in the AI age, I use the /last-30-days skill.

The /last30days skill popularized by Matt vhorn is an ‘AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary’.

Reddit upvotes. X likes. YouTube transcripts. TikTok engagement. Polymarket odds backed by real money and insider information. That’s millions of people voting with their attention and their wallets every day. /last30days searches all of it in parallel, scores it by what real people actually engage with, and an AI agent judge synthesizes it into one brief.

Google aggregates editors. /last30days searches people.

You can’t get this search anywhere else because no single AI has access to all of it. Google search doesn’t touch Reddit comments or X posts. ChatGPT has a deal with Reddit but can’t search X or TikTok. Gemini has YouTube but not Reddit. Claude has none of them natively. Each platform is a walled garden with its own API, its own tokens, its own auth. But you can bring your own keys and browser sessions, and suddenly an AI agent can search all of them at once, score them against each other, and tell you what actually matters.

That’s the unlock. Not one better search engine. A dozen disconnected platforms, bridged by an agent.

In their own words, I was baited by the description here where they mention how their search offers conversational intelligence by pulling in info from pretty much all the biggie social platforms):

SourceWhat the people tell you
RedditThe unfiltered take. Top comments with upvote counts, free via public JSON. The real opinions that Google buries.
X / TwitterThe hot take, the expert thread, the breaking reaction. First to know, first to argue.
YouTubeThe 45-minute deep dive. Full transcripts searched for the 5 quotable sentences that matter.
TikTokThe creator reaching 3.6M people with a take you’ll never find on Google.
Instagram ReelsThe influencer perspective with spoken-word transcripts. The visual culture signal.
Hacker NewsThe developer consensus. 825 points, 899 comments. Where technical people actually argue.
PolymarketNot opinions. Odds. Backed by real money. 96% confidence on album sales. 4% on an acquisition.
GitHubFor people: PR velocity, top repos by stars, release notes. For topics: issues and discussions.
DiggCurated story clusters from Digg’s AI 1000 leaderboard (~1000 high-signal AI accounts on X), with attributable inline quotes (no X auth required). Auto-enabled when digg-pp-cli is on PATH.
ThreadsThe post-Twitter text layer. Conversations from creators and brands.
PinterestVisual discovery. Pins, saves, and comments on products and ideas.
BlueskyThe decentralized social layer. AT Protocol posts from the post-Twitter migration.
PerplexityGrounded web search with citations via Sonar Pro.
WebThe editorial coverage, the blog comparisons. One signal of many, not the only one.
/last30days can be used for person research, competitor analysis, feature A versus feature B, etc.

Model choices while working with coding

I mostly have subscribed completely to Peter Steinberger with the obsessive use of the Codex agents. And this is before him joining OpenAI, so I know that the initial take was unbiased.

Sometimes it just silently reads files for 10, 15 minutes before starting to write any code. On the one hand that’s annoying, on the other hand that’s amazing because it greatly increases the chance that it fixes the right thing. Opus on the other hand is much more eager - great for smaller edits - not so good for larger features or refactors, it often doesn’t read the whole file or misses parts and then delivers inefficient outcomes or misses sth. I noticed that even tho codex sometimes takes 4x longer than Opus for comparable tasks, I’m often faster because I don’t have to go back and fix the fix, sth that felt quite normal when I was still using Claude Code. — Peter Steinberger

I also don’t use the “plan mode”, when I need the agent to do a set of instructions, I say “let’s discuss”.. My approach here with building is very iterative. I am not a big fan of taking a complete spec, and putting it in a ralph loop, if that’s so easy, then that should not be a software then.

This is my current stack of model choices:

  1. Codex GPT series
  2. MiniMax or Kimi series (for fallbacks in case I run out of Codex credits)

The design process: sequence of steps while building apps

I follow this sequence of steps in my conversations with the agents while building 0 to 1. I’d outlined this in my previous essay (breadboarding and shaping with AI agents), and have it here handy:

This is how my current process looks like:

StepTermWhat happensWhy it existsOutput artifact
1VisionDescribe the future state of the productAligns all work to a long-term directionVision statement
2ProblemIdentify the concrete obstacle preventing the visionPrevents building random featuresProblem statement
3Requirements (R)Extract constraints and must-have behaviorsCreates a contract to evaluate solutionsRequirement list
4Shaping (Solutions A/B)Propose high-level solution approachesMoves from problem → possible architecturesShape document
5Fit Check (R × A)Verify if the solution actually satisfies requirementsReveals gaps, over-engineering, or missing piecesFit matrix
6SpikesResearch unknown technical areasReduce uncertainty before architecture solidifiesSpike notes
7Fat Marker SketchSketch user interaction and visible stateClarifies product behavior and UI affordancesSimple UX diagram
8BreadboardingMap system wiring (UI + code + data + services)Convert ideas into architectureBreadboard diagram
9Slicing (Scopes)Divide architecture into demoable piecesEnables incremental deliveryVertical slice plan
10Steel ThreadBuild the minimal end-to-end pathProve the architecture integrates correctlyWorking skeleton
11Iterative Slice BuildExpand slices into complete featuresGradually complete the productProduction system

You might be looking at this 11 point list, and questioning why do all this?? Why can’t we just prompt in one line and be okay with whatever AI agents generate? This was my initial line of exploration, and I failed badly after encountering various bugs in the process.

Why vision first? This gives a sense of direction, especially when agents could take you anywhere, and be sycophantic when they say “you’re absolutely right!”. You need a strong, opinionated product vision.

How I start the chat:

Ask me one question at a time so we can define and shape a strong product vision for this idea. Each question should build on my previous answers, helping clarify the user, the problem, the unique insight, the product’s point of view, and the future it is trying to create. Let’s do this iteratively and focus on asking the right questions before jumping into features or implementation. Remember, only one question at a time.

Here’s the idea:

Once you get to the end of the dialogue, you then say:

use your shaping skill to capture the requirements and tease apart the key parts of solution A that I have specified here

More often than not, we give lengthy jumbled up argumentation mixing up the problem and the solution together. By doing it this way, we separate out the problem and the solution neatly.

Alongside the conventional software development lifecycle, what has changed here is me incorporating Ryan Singer’s workflow in terms of building shapes, and slices. Long story short, you don’t constrain yourself too much by finalising a “spec.md” and then telling our AI overlords to “go build it!”. That kind of stuff seldom works. Instead what I do is to come up with a few requirements/constraints, and spend more time ‘shaping’ multiple approaches. Let’s say, if you have shape A, shape B and shape C (with some unknowns on how they tie together to create a complete solution), you then spike the unknowns, resolve (or not resolve them) in the process, and come up with the right “shape”.

Once you have a shape, based on the complexity of the shape, which could be as complex as “rewrite Linux in Rust” or as simple as “build a HTML presentation deck”. If things get way way too complex, then you might also have to slice the shape into multiple pieces. Each “slice” is a demoable piece, which means it’s possible for us to feel something and feed useful information back to the agents instead of just ‘TAB, TAB, TAB, CONTINUE, ENTER, TAB, TAB, TAB…’

This is a remix, or a fork of the conventional agile development lifecycle, adapted to work better with AI agents, and I have found great results so far, especially since this is more universal go-to process that could work for the entire spectrum of simple to complex/heavy-duty stuff.

Another benefit of breaking them into such slices is to reinforce the need to split a giant agent output into reviewable pieces. You would certainly not be able to review a 15,000 line PR.

To incorporate the Shape Up methodology of Ryan Singer, here is the GitHub repo

If slicing doesn’t work, try steelthreading it..

Imagine standing at the edge of a canyon, needing to cross to the opposite cliff. One option is to construct a bridge, starting with logs, ropes, and foundations—carefully assembling each piece until a complete, safe crossing exists. This is similar to how MVPs are often built: you fully develop a core feature or component (like building an engine) before moving on to other parts, ensuring that the piece you create is robust and ready for future scaling.

In product development, a concept known as the steel thread has gained attention for its focus on creating the most direct yet robust path from concept to functionality. Unlike traditional methods such as building a Minimum Viable Product (MVP), which often prioritize incremental construction and polishing a single component before moving to the full system, the steel thread approach prioritizes end-to-end integration early on, even with minimal implementation.

The steel thread approach, however, takes a different perspective. Instead of starting with a full bridge, you imagine a thin steel thread stretched across the canyon. It represents the simplest, lightest, and most minimal path to achieve an end-to-end connection. The steel thread is strong enough to support essential functionality and demonstrates that all critical integrations work together. Even if the overall experience is barebones, the team can traverse the complete journey from point A to point B, proving that the product can function holistically.

From a product development perspective, this method focuses on building the smallest possible version of the full flow. Rather than fully developing isolated components, the goal is to establish a working skeleton that spans the entire product experience. This allows teams to quickly identify integration challenges, potential bottlenecks, and areas of risk. Once the steel thread is in place, subsequent iterations can enhance, reinforce, and expand it—eventually turning the thread into a fully realized product structure.

By prioritizing end-to-end connectivity over depth in one area, the steel thread approach offers several advantages:

  • Rapid validation of system feasibility
  • Early identification of integration issues
  • Efficient feedback collection on the full user journey

In contrast to building a polished engine first (the MVP approach), the steel thread demonstrates the viability of the whole vehicle—even if, at first, it is only a bare-metal prototype. Teams practicing this method move faster toward functional products and discover critical insights earlier in the development lifecycle.

In short, the steel thread method is about achieving the simplest full journey before committing to deep, complex builds. It highlights the importance of robust integration early, providing a strategic pathway to scale confidently and efficiently.

Ensuring that the design in the design process is coherent and consistent..

My mantra here is to first make it work, then, make it fast, and then, make it delightful.. Form should follow the function. And when I reach this stage where I’m happy with the functionality, and there aren’t much glitchiness to the way it achieves its core function, I move on to the design bit.

Hardik Pandya, in his essay — Expose your design system to LLMs, talks about how LLMs undergo design drift, and why it’s important to feed the design system to the AI coding agents, to make it stop guessing.

To achieve a consistent and coherent way of presenting the interface for my app, I set up a design system if I haven’t already. In case of brownfield apps, I use /design-audit skill, inspired by this essay, to comb through all the patterns it could find and translate it into the right theming. This then becomes a ‘DRY’, where I wouldn’t have to repeat myself to the agents for the 100th time.

You could place this at the root of your project:

Audit this project and make the design system LLM-readable.

Step 1: Audit
Scan every CSS/SCSS file. List every hardcoded visual value:
hex colors, rgb/rgba colors, pixel spacing, raw font sizes,
font weights, border radii, z-index values, box shadows,
and transition durations. Group them by category. Count totals.
Report which files have the most hardcoded values.

Step 2: Token layer
Create a tokens.css file with three layers:
- Layer 1: upstream design system tokens (use existing ones
 if the project already uses a design system, otherwise
 derive sensible primitives from the audit)
- Layer 2: project aliases that reference Layer 1 with
 fallbacks, e.g. --color-text: var(--ds-text, #292A2E)
- Layer 3 is the components themselves — they only ever
 reference Layer 2 aliases, never raw values

Include tokens for: colors (text, background, link, border,
interactive states), spacing (at least 8 steps), typography
(font families, sizes, weights, line heights), border radius,
elevation/shadow, z-index, and motion/transitions.

Step 3: Spec files
Create a specs/ directory. Write structured markdown specs:
- specs/foundations/ — color.md, spacing.md, typography.md,
 radius.md, elevation.md, motion.md
- specs/tokens/ — token-reference.md (master map of every
 CSS variable, its value, and when to use it)
- specs/components/ — one file per major component in the
 project. Each spec follows this template:
 1. Metadata (name, category, status)
 2. Overview (when to use, when not to use)
 3. Anatomy (parts of the component)
 4. Tokens used (which CSS variables it references)
 5. Props/API (if applicable)
 6. States (default, hover, active, focus, disabled, error)
 7. Code example
 8. Cross-references (related components)

Only spec components that actually exist in this project.

Step 4: Audit script
Create scripts/token-audit.js (or .sh) that:
- Scans all CSS files for hardcoded values
- Suggests the correct token for each violation
- Prints file, line number, violation, and suggestion
- Returns exit code 1 if any errors found (CI-ready)
- Distinguishes errors (hardcoded colors, spacing) from
 warnings (raw durations, uncommon values)

Step 5: Replace hardcoded values
Go through every CSS file and replace hardcoded values with
the tokens from Step 2. Every color:, background:, padding:,
margin:, gap:, border-radius:, font-size:, font-weight:,
box-shadow:, z-index:, and transition: should reference a
var(--token). No raw values should remain.

Step 6: Project instructions
Add a section to the project's AI instruction file (CLAUDE.md,
.cursorrules, or equivalent) that says:
"Before writing or modifying any UI code, read the relevant
spec file in specs/. Use only tokens from tokens.css. Run the
token audit script before committing. Zero errors required."

Run the audit script at the end and confirm zero violations.

It also happens that AI still makes a lot of common mistakes on spacing, typography, hierarchy, etc., which a very keen design engineer who is trained for that eye can spot, I use /design-type for such tweaks that get repeated.

This is the current list of design specific skills I use, and you could remix or fork them by clicking here.

design-auditProduce a comprehensive UI audit across accessibility, performance, responsiveness, theming, and UX quality.
design-boldMake safe or boring designs more visually striking.
design-colorAdd strategic color to monochromatic or visually flat interfaces.
design-critiqueEvaluate a design’s product and UX effectiveness.
design-delightAdd personality, joy, micro-moments, and memorable touches to interfaces.
design-distillSimplify designs by removing unnecessary complexity.
design-extractExtract reusable components, tokens, and design patterns into a system.
design-layoutImprove spacing, rhythm, composition, and hierarchy.
design-minimalCreate clean editorial minimalism with restrained warm monochrome styling.
design-motionAdd purposeful animations and micro-interactions.
design-normalizeAlign a feature with an existing design system and component language.
design-polishPerform final pre-ship UI refinement.
design-premiumCreate expensive, cinematic, agency-crafted interfaces.
design-quietTone down overly aggressive or loud visual designs.
design-responsiveAdapt designs across screen sizes, devices, and contexts.
design-systematicBuild stricter frontend design systems and measurable UI implementation rules.
design-typeImprove typography, font hierarchy, sizing, weight, and readability.
design-uiCreate distinctive production-grade frontend interfaces from scratch or through major redesign.
You could download my total list of skills here: https://github.com/shreyas-makes/agent-skills

Ensuring the code generated is clear, and reviewed

If you have read the theory of constraints, you would know that a bottleneck is never completely eliminated, it just shifts from one place to the other. Previously we had the bottleneck in terms of generating or writing code, that was a bottleneck. But then once the coding agents were able to like solve that bottleneck, the bottleneck just shifted to code review. Which makes the need for having better tools to support code review, even more important.

There is even an argument floating around in the internet that there is a cost to accelerating the code throughput without reviewing the code properly, leading to complex failures which are harder to resolve, by AI agents, as well as by human engineers. But for which I would say that the only metric that matters are the number of decisions that could be taken per day. At normal velocity, a team might make one or two decisions per week, but at 10x velocity, you see them making multiple a day. The usual bottlenecks where you’re waiting for a slack response, or for scheduling a quick sync later, no longer exist.

I’ve previously used the /review tool available in most coding agents, and yet, the fundamental question which I still don’t have answer to is: if the language models need a separate tool for code review, why can’t it just stitch the review loop onto the code generation? Currently, I’m a bit skeptical about code review, and think this would eventually be a part of the code-generation loop.

Right now, after a major feature update, I try to ask the agent its plan before it writes any code (so that I could perform some pre-emptive strikes), and ask it in plain English what it has written. In a previous note on this topic, I write about how the top layer and bottom layer should still be done by humans, leaving the middle layer for the AI agents:

The AI sandwich technique outlines a structured approach where humans and AI agents collaborate effectively.

In this model, the top layer involves human input, where goals and instructions are clearly defined by humans. This ensures that the desired outcomes are aligned with human intentions. The middle layer is where AI agents take over, handling the orchestration, execution, and processing tasks. This allows for efficient and automated handling of complex operations. Finally, the bottom layer involves human evaluation, where the output is assessed based on subjective human taste and feedback. This ensures that the final result meets human standards and expectations.

Think of the end game polishing done by humans, akin to how pilots and co-pilots still have the final call on the airliner they’re operating in, despite all the automations at place that could technically automate the role of the pilot, but not in principle.

A concept which I recently became aware of is that of a backpressure. It’s described as a pressure which arises from failed builds/tests that pushes the model loop to improve output.

Templatize everything that needs templatizing

If I have already built the feature successfully somewhere else in a different project, I cross-reference that successful implementation to the AI agents for helping diagnose what’s wrong.

I’ve already mentioned my way of planning a feature. I cross-reference projects all the time, esp if I know that I already solved sth somewhere else, I ask codex to look in../project-folder and that’s usually enough for it to infer from context where to look. This is extremely useful to save on prompts. I can just write “look at../vibetunnel and do the same for Sparkle changelogs”, because it’s already solved there and with a 99% guarantee it’ll correctly copy things over and adapt to the new project. That’s how I scaffold new projects as well.

I started templatizing patterns across projects because I was spending too much energy repeating setup decisions in every new repo: how to plan work, how to enforce coding preferences, how to deploy etc. The core idea is simple: separate reusable workflow rules from project-specific code. That gives me a stable operating layer across all work in ~/Projects, while still letting each project keep its own context.

Instead of reinventing process every time, I reuse a consistent scaffold and only customize where the stack or business logic actually differs.

The agent-scripts model came from studying Peter Steinberger’s setup and adapting the parts that matched my own way of working. I kept the structure but changed the intent: a global rules file for hard constraints, stack-specific profiles for Rails Inertia vs Next.js vs Tauri behavior, and command-like prompts for recurring actions such as build, review, research, and ship.

1) Folder/System View
/Users/shreyas/Desktop/Projects/
|
+-- agent-scripts/                    <-- YOUR canonical workflow system
|   |
|   +-- AGENTS.md                     <-- global rules (always-on behavior)
|   +-- stack-profiles/
|   |   +-- rails-inertia.md          <-- stack-specific rules
|   |   +-- nextjs.md
|   |   `-- tauri.md
|   +-- prompts/
|   |   +-- build-feature.md          <-- reusable command templates
|   |   +-- review.md
|   |   +-- research.md
|   |   +-- ship.md
|   |   `-- inspire.md
|   `-- skills/
|       +-- build-feature/SKILL.md    <-- execution workflow skills
|       +-- review/SKILL.md
|       +-- ship/SKILL.md
|       `-- inspiration/SKILL.md
|
+-- my-saas-app/                      <-- your real project
+-- next-app/                         <-- your real project
+-- tauri-tool/                       <-- your real project
`-- others/                           <-- external repos, reference only
    +-- cool-ui-repo/
    `-- random-oss/

In my day-to-day workflow, it looks like this: I open a repo, trigger /build-feature, get a short plan, and then let the agent execute within the detected stack profile. If I get stuck, I run /inspire, which inspects relevant repos under ~/Projects/others and returns transferable patterns without copying code. Once implementation is stable, I run /review for risk-first feedback, and only invoke /ship when I explicitly want release actions.

While inspiration-seeking, I also identify some starred GitHub repos that can hold some clues and examples that could be applied to the current problem at hand.. when that happens, I clone that repo into the Projects/others/ folder and then chat with the repo with a prompt that looks somewhat like this:

read the code for this repo and write a markdown doc telling me everything you can infer or know with certainty about the high-level intent and idea behind this repo ask me questions for anything that isn't clear then pop open the doc for me to review and answer the goal here is to make sure my intent is obvious to any agent reading this code

Closing thoughts

I just want to close by saying that purely spec-driven development, where you make a “perfect” spec to send it to the agents on a ralph loop is not going to work. These are for the same reasons why we have moved away from waterfall to agile, they are still the same reasons. We sometimes revert our earlier decisions, cross over or even contradict what we might have said earlier, as every new iteration of the product is a learning for us, and new facts could evolve.

One thing that’s clear from this exercise of writing this essay is that software has started transitioning from a software development lifecycle, to a “context development lifecycle”, aka CDLC. SDLC, is now being offloaded to agents, with utmost trust, where engineers are now involved in maintenance of context, which constantly gets updated over time, and needs a human owner for its reliability.

I keep mistaking the kekulean loop, for a line.

I see a thing called a “problem”, and another thing called a “solution”, and I imagine the first one comes before the second. I see a person called “religious”, and a set of rituals called “practice”, and I imagine the belief came first and the practice merely expressed it. I see “aging” and “disease”, and I imagine aging as the background condition, while diseases are the foreground enemies we are meant to defeat.

In this essay, I will try to chalk out three such examples, to illustrate the point. By the end of this essay, I want the reader to think, if they have any such kekulean loops, they are mistaking for a linear line?

the problem and solution co-creative loop

In my first career arc of being an inexperienced product practitioner, I used to think of the problem and the solution as a traditional waterfall.

You think of the problem, define the constraints, check. Head to explore solutions. Check. Finalise the solution. Check. And then pass it on, and move on to the next problem <> solution. It was supposed to always point right —>

It took me that entire decade and a Masters education in design methodology to help me realise that it seldom works that way, in reality. Problem and solution pairs are having a kekulean dance with each other, eating each others tail, biting and pouncing at each other until they end up with a better problem and solution pair.

There is a co evolution which we miss noticing. In retrospect, it felt fairly obvious and I was scrutinising myself as to why I needed a masters education in design theory for me to realise this trivial aspect of life.

the belief and practise co creative loop

After all, life is nothing but problem solving right? How can I get this all wrong for something so fundamental?!

I also then realised that I make the same mistake in other aspects too. I have been mistaking bidirectional pairs for unidirectional pairs the whole time.

I’ll give another example of such a mistake from my most recent reading of the book review of Tanya Luhrmann’s How God Becomes Real by Michael Nielsen:

Tanya explores the relationship between belief and practise in this book. I used to think that people who are “religious” by spirit, also do practise a lot of rituals because they’re religious. Atleast that was the notion I was used to hearing. However, Tanya explores a different dynamic where she suggests that practise itself by virtue of practise — practise instils belief.

Tanya M. Luhrmann has written a beautiful book exploring this question, “How God Becomes Real: Kindling the Presence of Invisible Others”. The book explores the idea that much of the purpose of religious practice is to help practitioners believe. This inverts conventional wisdom, with Luhrmann taking seriously the possibility that sometimes people aren’t worshipping because they believe, but rather believing because they worship. More generally: Luhrmann makes a compelling case that there is a much more complex relationship between belief and religious practice than you might naively suppose, and she explores some of that relationship.

Personally, I come from the other side of the spectrum where I want to be religious for some reason, and I lack embedding some rituals in my day to day. This thought that religion and practise are a co-evolutionary loop gives me great hope. I’ve envied my mom, and my sister in the way they have cultivated their faith through practise: which involves chanting Hanuman chalisa, Vishnu sahasranaamam, Lalitha sahasranaamam etc and various other mantras in the evenings as a part of their daily/weekly/occasional puja rituals.

Right now, It’s 6:15 PM at dusk, an auspicious time usually meant for evening prayers, and here I am talking about the process of Bhakti (practise of worship), rather than doing the practise itself. Religious belief isn’t something that’s easily attained or endowed with, and it’s hard to tilt towards a religious worshipper from a somewhat-agnostic mode:

Let me review a few of the moves Luhrmann makes in setting up her project. She points out, convincingly, that religious belief usually isn’t something easily attained, despite the fact that many theories of religion “presume that belief is direct and unproblematic – that in most cultures, people simply take spirit and the supernatural to be there. That doesn’t make sense. Gods and spirits cannot be seen. You cannot shake their hands, look them in the eye, or hear their voice when they speak. It seems odd to assume that people just take for granted that they are present.”

I find it weird when I say this, but knowing this theory has actually made me more religious. It has given me more meaning to what’s interpreted as practise.

From the day I read this book review article on the co-evolutionary loop phenomenon exhibited by religion and practise, I am not able to unsee these patterns applied across everywhere. And there are more..

the aging and disease co-creative loop

I sometimes wonder if it’s a cognitive psychology flaw where we mistake such systemic effects.

Aging is considered from the ancient ages to be a linear line. We grow old, we age, and we die. However, in the recent past, this notion has been contested, with the bet that aging is a disease which can be treated. We now have scientists trying to reverse the age of human cells, and we now have clinical trials trying to prove this more recently. Newlimit is one such company, working in this bleeding edge of longevity with the same thesis that aging is the root cause of most major diseases (resulting in loss of function in our cells).

if you’re under 50, cellular reprogramming drugs are how you’ll live to 150.

even if we cure every disease on Earth, that still wouldn’t get you there on its own.

the oldest person who ever lived, jeanne calment, made it to 122, and in almost 30 years nobody has come close to beating her.

120 is just the natural human limit.

so while curing diseases can keep you healthy right up to that limit, it can’t actually extend the limit.

but what if you treated aging itself as a disease, and cured that?

that’s exactly what brian armstrong’s company just raised $435M to do.

NewLimit is working on something called cellular reprogramming.

in plain terms, every cell carries a set of instructions that decide how young or old it behaves

NewLimit uses RNA to switch on the combinations that make an old cell start acting young again.

they already have a prototype that does this to human liver cells, healing the liver faster after injury and speeding up recovery from alcohol damage.

the first human trial is set for 2027.

now, to be clear, solving aging is unfathomably hard.

this first liver drug is one cell type in one organ, and reversing aging across an entire body is a much bigger problem than repairing the liver.

but the reason it’s even on the table now is AI.

there are more possible combinations of those cellular instructions than any human could test by hand in a billion years

but machine learning is what lets you search that space and find the few that actually make a cell young again.

that’s the bet NewLimit is built on.

so the liver is the proving ground. once you can safely reset the age of human cells there, 120 stops being a fixed ceiling.

cellular reprogramming is the real path to 150, and if you can keep resetting the clock, there’s no obvious reason it has to stop there…

  • From anon, Twitter

closing thoughts

It’s all in an ultimate evolutionary loop. And we fail to realise the systemic complexity hidden underneath the everyday narratives.

Most of us are primed to think linearly with “X causes Y” logic, but we have to adopt a systems thinking lens, and I’m trying to train myself here with the notion that everything is a system (X causes Y, which feedback’s into X again..), and it’s not easy..

I will keep hunting for more such co-creative systemic loops. And I hope to find more.

Hammock driven development

5 minutes read agentic-coding

You’ve heard of TDD, and more recently also SDD (spec driven development)… but have you heard of HDD — aka Hammock driven development?

I recently came across a video gifted to me by the YouTube algorithmic gods with a catchy enough title that sounded more like a “honest bait” than a clickbait, it was titled: “Hammock driven development”, talking about an alternate approach to software development. I’ve currently been thinking a lot about various new processes that could improve software development, and this lecture from Rich Hickey from 13 years ago seems like one of those older ideas that need a new revival story. Especially now.

So I jumped right in.

I try here to write in my own words what I understood from the lecture and not refer to any of the supporting transcripts or Youtube timestamps (it’s now ‘fresh’ in my mind and want to make use of this moment in time)


We have two types of minds: the waking mind, and the background mind. Historically we’re quite used to the system 1 and system 2 framing by Daniel Kahnemann, but this is far more encompassing than I’d expected, the background mind here, the OP refers to being generally good at strategic, holistic thinking.

You would normally want to leverage such strategic tasks for the background mind. Not that the waking mind doesn’t do strategy, but it’s more focussed on the input <> output processing. And as such its results are much more strategic and immediate output oriented. The OP then suggests to leverage this partisanship to our interests.

While writing code, we are thinking through problems and we should first know how to draft a problem, and this could be in terms of scope, constraints, framing etc, we need to get that right first, and then when we have to start solving problems, at times we might encounter harder problems which were not used to encountering before (not the usual fetch from a dB and display it on the CRUD UI types).

When this happens, we then need to think and segregate such problems into the known types which can be easily done by the ‘waking mind’, and the harder subset which needs to be delegated to the background mind. The background mind works in interesting ways, you would not know in advance what the solution might be, you just need to assign it to this background mind and see what the ‘eye of the mind’ unravels.

But you will get there nevertheless, and to let this slow burn process happen without any stress it needs. In fact stress environments make you go into the ‘waking mind’ mode, and you would not be able to do the slow burn. That’s also one of the reasons why this should be done through a shower thought.

OP also recommends Michael Poyalyi’s How to solve problems book as it gives a much more mathematical rigour to the varied approaches to solve problems. There is another book I was able to find on the YT comments called ‘How to solve with computer’ that gives an algorithmic perspective to solve problems. This is another one on my Umberto Eco’s antilibrary style to-read list.

This time when I listened to the lecture, it hit me in a different way especially since I’ve also been fascinated about the AFK (away from keyboard) and HITL (human in the loop) segregation of work by Matt Pocock, an AI tutor who recently shared this in his talk at the AI engineers forum.

The AFK mode is usually for known solved problems, it’s similar to the waking mind for the agents, where they could run autonomously and completely solve the problem/s without causing much tech debt.

The HITL mode is usually for problems that require a feedback-loop type dynamic with humans to provide the right inputs for it to solve the problems better. This works well for human guidance aided by the right ontology. In the HITL mode, for example in case of 1 and 2:

1:

“I have a deep pimple only on my chin is it more likely from diet or hormones”

2:

“sudden cystic acne isolated to chin, is the more likely etiology hormonal fluctuation or pro-inflammatory gut disruption from dietary changes?

Both 1 and 2 are effectively the same question but 2 provides you vastly different (and better) responses. There is also a Harvard AI safety study which talks about this method, where changing ontology provides different responses.

And for AFK mode, you might not even need such thoroughly grounded ontology. It gets the aim, the objective and the steering right in one go.

Now coming back to the original topic by OP about the waking mind and background mind, and also finding similar analogies in the agentic coding world with the AFK and HITL types, I do think there is a bigger set of problem space where despite the best of agentic coding modes, both HITL and AFK, you would need to consciously use your background mind, for days, weeks or months to probably crack the code.

This process might take days, or months. A question I have for the reader here is how rare or common has it been for you to be in this background mind mode trying to solve a hard problem? Me, personally, it’s been quite rare. It’s such a terrific experience to be in this mode of problem solving, where you don’t want to be disturbed, and neither do you should be in front of the computer.

You should be lazying around, doing random chores, or as the OP suggests, if you do have a hammock in your background, then that’s all you need.

Recently I had made a digital painting on Procreate, this was describing the Hindu deities in the Kerala mural art style. What I painted doesn’t matter much for this story, but what I wanted to share was the way in which this digital painting was printed and framed. Naturally, I was interested to get a quote from multiple printing vendors in my hometown, Thrissur, to compare the prices and see which one offers the best price at the best quality of service.

I instinctively went for three quotes from different vendors for comparison, and I asked myself why did I settle for three? Couldn’t I have picked four? Or why not just one, or two, or seven, for that matter.

Another thought which stuck me here was the analogies the three-quote selection process was similar to the teachings I had had during my design studies at Delft University. The professors always used to point out that while hunting for solutions it’s always best to choose three solutions of equal fidelity, navigate the tradeoffs and constraints through user testing and then arrive at the final solution. And this was similar to the vendor selection process, which intrigued me about the unique nature of the three options.

What kind of psychological window does it offer to help take better decisions? With just a single option to choose from, you’re not thinking about alternatives, so it might mostly be a wrong decision with a huge sunk cost attached to it. With two options, my suspicion is that you are too focussed on steelmanning and straw manning both the sides engaged in a Hegelian dialectic, that you don’t really think outside this context window.

In the design process, when you have to arrive at three options which are more of less similar, you would also have to do a lot more preparatory work to make them more or less similar, and also harder to make a quicker decision on which is a best option. In an ideal world, the options available have all their own unique advantages, and while evaluating, we start thinking hard about the tradeoffs, the constraints, and by the merit of thinking hard about the constraints, we might arrive at a unique solution. With three options, you can still hold the solution space entirely in your head.

With four, it might probably be too exhaustive that you end up losing the sight of the forest over the trees.

If we look closely, the rule of three manifests everywhere, and I now think it has to do with the ability to hold the three options properly in our head. Take storytelling, and you see this as “beginning —> middle —> end”. Or the three click-rule in UX, or even in religion where you see the trinity in Christianity, and trimurthi in Hindu traditions. My theory here is that the comfort of using three has a lot to do with the way it sits neatly into our funny little brains, and less with how the numerologists or the astrologer soothsayers would like to point out..

The AI sandwich technique outlines a process where certain phases of product development are led by AI agents, while others require human intervention. The phases led by AI agents include planning, coding, reviewing, testing, and creating pull requests (PR). These tasks are automated to enhance efficiency and consistency.

In contrast, the phases that should be human-led are brainstorming (deciding what to build) and polishing (ensuring the product is of high quality). These stages require human creativity, judgment, and critical thinking, which AI agents cannot fully replicate.

The AI sandwich technique outlines a structured approach where humans and AI agents collaborate effectively.

In this model, the top layer involves human input, where goals and instructions are clearly defined by humans. This ensures that the desired outcomes are aligned with human intentions. The middle layer is where AI agents take over, handling the orchestration, execution, and processing tasks. This allows for efficient and automated handling of complex operations. Finally, the bottom layer involves human evaluation, where the output is assessed based on subjective human taste and feedback. This ensures that the final result meets human standards and expectations.

Think of the end game polishing done by humans, akin to how pilots and co-pilots still have the final call on the airliner they’re operating in, despite all the automations at place that could technically automate the role of the pilot, but not in principle.

The technique is particularly effective in scenarios where outcomes can be verified, such as in RLHF (Reinforcement Learning from Human Feedback) environments. Initially, human feedback is crucial, but as the AI system becomes more proficient, the need for human intervention decreases. The analogy of a sandwich or an airliner illustrates this process, where humans set the course and make critical decisions, while AI agents manage the routine operations. This collaborative approach leverages the strengths of both humans and AI, ensuring optimal results.

How I write essays in 2026

7 minutes read writing

I’ve updated my writing process and workflow for 2026, and the main reason I am revisiting it is because the process itself has started to change in a meaningful way.

Writing is no longer just about producing an essay. It has become a feedback loop where each draft I produce also acts as training data. I’ve naturally been interested in the gap between what AI generates and how I would naturally write is something I am now actively trying to close. Every piece I write contributes to that. The process I am describing here is not just how I write, but how I am shaping future AI outputs to sound more like me. In a way, this essay itself is part of that loop, and I’m naturally interested in the end game, where the AI sounds exactly like me; we’re not even 50% there yet, but curious how this would unfold..

While doing this, I’ve also ensured the principle source of the “key idea” still comes from me, it should resonate with me first, and the integrity of this nature continues to remain uncompromised.

To talk about my writing process, it begins long before I open a document. It starts with a running surface of thoughts that I maintain in Trello.

This is where everything goes first, without pressure to make sense. Ideas arrive as fragments, questions, contradictions, or half-formed intuitions. I don’t try to structure them immediately. What matters is that they are captured quickly. Over time, some of these ideas begin to stand out. They survive small acts of revisiting, get slightly reworded, or start to point in a clearer direction. Sometimes I had a good image thumbnail, to entice myself in drafting the essay further. When an idea feels like it has “hit home”, it shapes up to be explored further, it stops being just a card and becomes something I want to work on deliberately.

That is the point where I move it into Obsidian. This is where the rest of my personal knowledge base resides, and which I now try to take advantage of through a compounding loop which I will talk later about.

I create an Obsidian note with the intention to shape this into an essay, YouTube video, Substack article or anything else. Around it, I begin to gather material. This includes articles, essays, research, and most importantly, my own past writing that might connect to the same theme. The vault becomes a kind of extended memory. I follow links, revisit old notes, and slowly build a context around the central idea. Sometimes this expands what I already think. Other times it challenges it. Either way, the idea becomes less isolated and more dimensional.

As the material builds up, I move into a more deliberate way of processing it using the Zettelkasten approach. I have started to think of this stage as a process of crystallization. Ideas, when they first appear, are in a gaseous state (in this case, the ones that are first documented on Trello). They are diffuse, free-floating, and difficult to pin down. You can sense them, but you cannot yet hold them. As I begin working through notes, something changes. Writing reference notes and especially literature notes from memory forces the idea to condense. It becomes more coherent, more bounded. This is the liquid state. It can still flow and change shape, but it has a form that can be worked with.

During this process of working with my drafts on Obsidian, I also use tools like Enzyme, to start making more semantic connections with my existing notes on the vault. Connections appear between ideas that did not seem related at first. And then, in writing permanent notes, the idea crystallizes. It becomes solid. It takes the form of a single, clear, self-contained thought expressed in my own words. At this point, it is no longer something I am trying to understand. It is something I can use.

The Zettelkasten process, for me, is fundamentally about taking ideas through these states, from gas to liquid to solid, until they become stable enough to build with.

Some ideas still feel unclear even after this. They might have internal tension or multiple possible interpretations. Or might even lack proper framing.

In those cases, I use the Hegelian dialectic to shape them further. I take an initial framing and push it as far as it can go. Then I construct an opposing view and strengthen that as well. The goal is not to balance the two, but to force both sides to become sharp. What emerges from this is usually a better articulation of the idea. Sometimes it is a clearer position. Sometimes it is a reframing that resolves the tension altogether. Often, it simply results in better wording that feels more precise.

Once the idea feels clear enough, I start assembling it into a draft. This is where I bring together the different notes and arrange them into a sequence that makes sense. I pay a lot of attention to what deserves to be central and what should remain in the background. Not every idea should carry the same weight. There is usually one core message that I want to drive, and everything else either supports it or stays peripheral. This shaping is important because without it, the writing tends to become scattered, with too many ideas competing for attention.

If I have used any AI-generated text in the process, I clean it up at this stage. I remove phrases that feel generic, sections that over-explain, and structures that feel predictable. The amount of AI-generated text might vary essay by essay. Sometimes I just ramble on my phone (ChatGPT dictation on chat threads work quite well), and then convert this messy rambling into a AI generated first draft.

The goal is not just to simplify, but to make the writing feel specific and grounded. After this, I don’t treat the draft as final. Instead, I use it as a reference and start writing again from scratch. I type everything in my own words. This helps me rebuild the flow in a way that feels natural to me. While doing this, I also pay attention to how I feel about what I am writing. There are moments where the writing needs more intensity and others where it needs restraint. This emotional alignment is something I can only achieve when I am actively rewriting rather than editing.

At the end of this, I have two versions of the same piece: one that came out of the structured process and another that reflects how I naturally write. I treat this as an input-output pair to train a custom agent skill that I’ve developed to make the AI generated writing write like me.

Over time, I am collecting these pairs with the intention of using them to train models that can better match my style. The idea is that the gap between what the AI produces and what I would write myself keeps shrinking. It is not something that happens instantly. It improves gradually as more examples accumulate, and every essay I write contributes to that convergence.

Once the writing is ready, I turn it into an artefact. Sometimes it remains a blog post. Other times it becomes a rough note or evolves into something larger like a video.

Alongside this, I am building toward a system where a persistent agent takes over distribution. I use a Hermes agent for this, a self-evolving memory layer that can run independently and learn from how I write, what I publish, and where it performs best. The direction is for it to eventually handle publishing across platforms on my behalf, adapting the same core idea to different contexts without losing the original intent. It is still a work in progress, but it is the natural extension of the system I am building.

Authority in the guise of evidence

4 minutes read critical-rationalism

Authority was once the primary means of determining truth. If the Pope at Vatican City declared something right or wrong, it was accepted without question. Over time, that model weakened, replaced in large part by the rise of rational, scientific inquiry. At least on the surface, we moved from deference to authority toward deference to method. But that shift is less absolute than it appears.

Explanations carry more weight than mere outputs. Progress does not come from accumulating results, but from correcting flawed ideas. A good explanation is not just predictive; it is resilient under scrutiny. The difficulty, however, is that deriving such explanations is expensive. Even if one is capable of independently verifying a claim with pen and paper, the time required makes it impractical at scale. In practice, this constraint forces a compromise. We rely on systems that compress explanation into signals we can consume quickly, even if that means trusting intermediaries.

Peer-reviewed papers in scientific journals illustrate this tension. They represent a system where credibility is delegated to an institutional process rather than individually verified. The assumption is not that every reader will replicate the results, but that the process itself filters for quality. This makes science functionally scalable, but it also reveals its dependence on a form of distributed authority. What appears as objectivity is, in part, structured trust.

Some thinkers, such as Balaji Srinivasan, argue that science will shift from “prestigious citation” toward “independent verification.” The idea is that advances in computation and tooling will allow individuals to verify claims directly, reducing reliance on institutions. Yet this vision runs into a fundamental constraint: verification does not just require tools, it requires attention. Even if verification becomes cheaper, it is unlikely to become free. Most participants in any system will still prefer to trust rather than verify.

The notion of a fully “trustless” society, often imagined in technological circles, rests on the assumption that trust can be eliminated entirely. In reality, trust is not removed but displaced. Systems that rely on code and cryptography shift trust from human institutions to technical ones. The claim that code is neutral ignores the fact that code is written, maintained, and executed within environments controlled by specific actors. Even in systems designed to minimize trust, there remains an implicit reliance on those who design the protocols and the hardware that runs them.

This is evident in emerging approaches such as trusted execution environments, discussed by Vitalik Buterin. These systems attempt to guarantee that computation occurs without leaking information, offering a form of verifiable privacy. Yet even here, the guarantees are not absolute. Hardware can be compromised, assumptions can fail, and users must ultimately trust that the underlying infrastructure behaves as advertised. The system reduces the surface area of trust, but does not eliminate it.

What becomes clear is that authority has not disappeared; it has become more abstract. In pre-modern systems, authority was visible and centralized. In scientific systems, it is procedural and institutional. In computational systems, it is embedded in code, protocols, and hardware. Each transition claims greater objectivity, but each also introduces new, less visible dependencies. Trust moves downward through layers, becoming harder to inspect as it becomes more technical.

This suggests that the future is not a binary choice between trust and verification, but a spectrum of trade-offs. Different domains will tolerate different levels of trust depending on the cost of error and the cost of verification. Financial systems may push toward near-complete verification, while everyday decisions will continue to rely on heuristics and delegated judgment. Rather than eliminating authority, we are learning to compose it—deciding where to rely on institutions, where to rely on code, and where to rely on ourselves.

In that sense, the dream of a fully trustless society is less a destination and more a direction. It reveals a desire to reduce arbitrary power, but it underestimates the irreducible costs of knowing. As long as explanations remain expensive and attention remains scarce, systems of trust will persist. The question is not whether we can remove authority, but how we distribute it, how we audit it, and how we remain aware of where it quietly resides.

Map is not the territory

4 minutes read philosophy

This is one of my favourite ideas from Shane Parrish Mental Models book which I read recently, and it’s so applicable, that it doesn’t exist to me as this abstract idea which seldom translates to the reality.. it’s very tangible..

The map is never be the territory. In other words, the “description” of the thing, is not the thing in itself.

So, the Argentinian writer Luis Borges, in his brilliant allegorical style, summarized the mental model nicely in a one-paragraph short story, On Exactitude in Science:

… In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it. The following Generations, who were not so fond of the Study of Cartography as their Forebears had been, saw that that vast Map was Useless, and not without some Pitilessness was it, that they delivered it up to the Inclemencies of Sun and Winters. In the Deserts of the West, still today, there are Tattered Ruins of that Map, inhabited by Animals and Beggars; in all the Land there is no other Relic of the Disciplines of Geography.

—Suarez Miranda, Viajes de varones prudentes, Libro IV,Cap. XLV, Lerida, 1658

This is a fictional tale of a map that is so precise of its’ “territory” that it is “useless”, after all, one could just follow the territory instead of using the map. Why use a map then?

It’s a brilliant thought experiment which highlights the fact that maps are lossy, and you lose a lot of details in the process, and this is not a bug, but a feature of it being a map.

Even the best maps are reductions, and they’re never perfect. They differ in their exactitude..

And this is my commentary on the notion that AI is taking away all the “human jobs”, it would at the end of the day, AI can only be an approximation, and cannot exactly replace the human.

It could probably be more than what a human can do, or sometimes, less than what humans are meant to do. Love, care and kindness etc, are not something they can exact, and can only mimic.

My thinking now is that, as long as these approximations exist, the human jobs are not going away any time soon.

I do have a similar argument to suggest folks to read raw transcripts. This is quite a contrary opinion still, and we’re more used to referring to summaries. Maybe that’s a quick nugget and has faster absorption, but we do mistake comprehension for understanding sometimes. They are not the same. If I do have the luxury of time, I would personally prefer reading through the raw transcripts to get more higher dimensional granularity to what’s going on..

If we extend the analogy, the raw transcript is almost like the “territory” we’re speaking about. And the insights generated, are more or less like the “map”. And the map can never be equivalent to the territory.

Apart from this quirk to read raw transcripts, I also have a similar asymmetric thumb rule when it comes to reading long form blog posts. Some of them are really high alpha, high signal, so when a writer deeply resonates with me, I go the last mile and read the whole essay end to end.

I have a similar principle here that I want to read it in its lossless format, without it being stripped of any details.. as “map is not the territory”, I do prefer waltzing through the territory in some occasions, and not use the map instead..

Self hypnosis as a manifestation ritual

2 minutes read rough-notes

I’ve set up an Apple shortcut to wake me up with a personal voice message. This is particularly scripted in a prophetic future tense, where I’m talking as if whatever I wanted to achieve has already

I’m not sure how the results would be, but I’m planning to experiment doing this for the rest of the year and report back if this has been successful or not..

I’m running this experiment as it’s an offshoot of two ideas: that your voice is very powerful in making a behavioral transformation. And the second, the idea that talking in future past tense, in a way that the idea you wanted to happen has already happened, can in fact make things happen..

You follow this script on repeat mode everyday, and the idea here is to make this script tap into your subconscious..

For now, I only have a “lean MVP” where it is just my plain voice, I would like to experiment with theta waves in the background in the future..

I am still skeptical about the science behind it, and apparently there are certain frequencies shift what mode your brain operates in. Theta waves (4-8 Hz) are associated with deep relaxation, meditation, and suggestibility. This is the state hypnotherapists try to induce. It’s the state right before sleep where your subconscious is most open. So the attempt here is to do this right before you go to bed..

Many types of music… especially ambient, binaural beats, or certain electronic genres contain frequencies that push your brain toward theta. And by using this as background score for this memo spoken in your voice, it could potentially increase the effect..

Image

Wouldn’t it fascinate you to realize that every song you’ve ever listened to on repeat has installed beliefs into your mind without your permission?

I have also embedded a post-hypnotic trigger word as “steady” to indicate the state I would like to be oriented more towards..

You have an interesting thought in your head, and you want to get it out there, open, out in the internet. It’s spiky, and has all the right characteristics for titillating the audience, hoping to convince the reader of an opinion. When you’re in the heat of the moment, and having this word-burst moment, you ask yourself to think twice before you hit publish, and now that you think about, it does feel like the argument is not landing there precisely where you want it to be.

The argument is probably, not false, but not entirely true either. It’s a bit on the fence, neither here nor there. It might not even be the fault of the argument not being “meaty”, but it could be a victim of being trapped in the wrong frame.

Being trapped in the wrong framing is a worser problem in my view, compared to being a false argument. “Ideas are only as good as our ability to communicate them”, and a good framing is half the job here. False arguments can be discarded, but misframed arguments can feel correct for years, while quietly trapping our thinking inside the very setup that produced them. You are the output of your priors.

In the recent agentic coding discourse, we see this as we ask such questions often: should human judgement be delegated to these paperclip maximisers? Should we only be a “human in the loop” layer, while the agents do all the plumbing/grunt work? or the more trite: should AI writing be banned in known blogosphers such as Hackernews, stackoverflow etc?

These questions sound very serious because they force a choice, but they also smuggle a very bad assumption underneath these all: that, the real work here is picking a side (inside a “frame” that’s already on the table.)

“They question the question. Before rushing to answer your question, they question whether it’s the right question to answer. They know the right answer to the wrong question is worse than no answer to the right question.”

Source: Clippings/High Agency in 30 Minutes - George Mack.md

Especially in these lines of doing such rigorous exercises, and to arrive at a good framing of what needs to be discussed and reasoned with, Hegelian dialectic helps this quite a bit.

I had almost forgotten about Hegel, but I’d renewed my interest in his ideas because of the Claude/Codex skill shared by Kyle on github called the hegelian-dialectic.

I’ve tried this for various questions, they’re not, in fact, deep philosophical questions such as “is there free will?” or “why is the sky blue” kinds, but were all quite practical such as “Should I prefer library X over library Y”, and the likes. It’s a very practical reasoning method. It’s not a mere philosophical cosplay, and can have a lot of utility for our day-to-day too.

The way this works is to first discover when the question we’re asking is in itself malformed? then pressure-test it until a better question emerges.

I found this example on Reddit which succinctly describes Hegelian dialectic in simple words:

It can be hard to understand, because its such a big idea. Sometimes its good to tackle big ideas like that with an example.

Here’s how it works. Let’s imagine you have a friend. You want to be the best friend you can. You are totally, 10,000% into being their friend. So you do all the stuff you think a friend should do. You say hi to them whenever you see them, you give them big hugs, you give them presents, you tell them how nice they are. And you do this all the time. You even hog your friend, and don’t want them to play with other people! You pour yourself into it, completely [Entäußerrung]

Eventually your friend doesn’t want to be around you because they feel like you don’t give them any room to be themselves. You tried hard to be the best friend you could be, but you ended up being the opposite of a good friend. You did the best you could, and you ended up with the opposite of what you wanted.

What would you do? Would you decide to start picking on your friend and calling them names? No that would be silly. You wouldn’t say “everything I ever thought about being a good friend was TOTALLY wrong.” You would see that what you were doing was partially right. But it was just one little piece of the “Big Picture” of what being a good friend is all about—which includes lots of other stuff like respecting your friend’s space, and letting them be themselves. So now, you have a bigger, better understanding of what it means to be a good friend. [Aufhebung]

Now that I give you a situational example, I would like to take you through a political example of an Hegelian dialectic. It’s such a broad, all-encompassing idea that casts such a vast net that it could be applied to anything, let’s now take the politics of governments:

As per the dialectic mode: you start with an existing synthesis (say, monarchy), which gets split up into a new thesis (say, democracy), and another anti-thesis (say, communism).

These two ideas, could converge/diverse bringing forth more such theses, and anti-theses (such as libertarian democracy for eg.). This is how Hegelian ideas evolve over time..

Thesis, antithesis are a constant dance. Hegel’s determinate negation matters here because it does not merely say, this is wrong, and this is right. It says, this is wrong in a way that reveals what is missing. The negation has shape. The failure is informative.

Synthesis, then, is not compromise. It is a reframing that preserves something real from both sides while making the original dispute feel too narrow to carry the weight you put on it. Reality is far from a black and white binary, and it gets you closer to this, The interesting move is usually not “find a better solution” but “change what the problem is about.”

Hegel’s views on synthesis between thesis and antithesis can be viewed in juxtaposition to his view that nature is an evolving organism striving towards progress as indicated here:

In this attempt to arrive at a better synthesis, there are a lot more attempts here towards framing and reframing the ideas and their supporting questions. One could find more such exercises on the framing techniques from George Lakoff’s book — Metaphors we live by. The author doesn’t specifically use the dialectic, but it comes forth in the way he constantly evolves the shaping of the framing, to come to a better framing. For eg, the “immigrant neighbourhood” problem then evolves into a “shared belonging” problem which is much more solvable from both ends of the political spectrum due to the more inclusive framing.

Old frameNew frameHow this helps?
Immigrant neighborhood decline is a “housing / public order” problemIt is a “shared belonging and memory” problemInstead of policing difference, you build common identity through story exchange, which reduces alienation and makes cooperation possible
Integration of disabled people is a “care delivery” problemIt is an “abilities and contribution” problemYou stop treating people as burdens to manage and start designing roles through which they participate in society
Nightlife violence is a “crime control” problemIt is an “urban hospitality / festival infrastructure” problemYou redesign flows, toilets, signage, transit, and care, making the environment less violence-producing
Youth employment is a “recruitment / loyalty” problemIt is an “identity and attractiveness” problemInstitutions stop presenting as bureaucratic gatekeepers and become places young people actually want to join
Social housing failure is a “buildings” problemIt is a “social fabric” problemYou stop defaulting to demolition and start repairing networks, dignity, trust, and local belonging

A neighborhood in decline looks like a housing or public-order problem until someone reframes it as a problem of belonging, shared memory, and social contact. Nightlife violence looks like a crime problem until someone reframes it as festival infrastructure: flows, toilets, transport, signage, places to cool down.

Even in the current context, AI-led job layoffs are usually framed as job-loss, versus innovation. One side warrants that the machines are coming for labour, and institutions should slow-down/regulate the usage as much as possible. The other side thinks everything is a positive-sum game, and should be treated with the foresight that even more abundance would be bestowed. Even here, a better framing from a hegelian dialectic can emerge, which could be that the real “question behind the question” is that we need institutions to better redistribute judgement..

By means of this agent skill, it was also shown to me that AI agents can also help us with this exercise.

Especially, when we also see that humans are also not quite natural in holding this way of reasoning, and counter-reasoning. We can probably hold one position quite strongly, but become worser as we hold more opposite positions too with equal force, evidence and emotional seriousness. It is most likely to be lopsided in one direction.

Coming back to the usage of this Hegelian dialectic skill, it is a process which goes through different phases: in the first phase, there is socratic questioning to uncover assumptions behind the questions posited. then it grounds the domain to ground the questions in the specifics. Then it creates two monks, each calibrated to carry the belief burden, plus targetted research directives for position specific evidence. both these electric monks write fully positioned essays to steelman their sides. This is purposefully spawned in separate sessions with no shared context, so that there is no path which is crossing over, and no context rot. With this approach, we look for “clinks in the armor”, especially what we’re aiming for is a state of self-sublation, where each position’s own logic can potentially undermine itself. after this destructive dissolution, then comes creative induction, where the important atomic bits are brought together and recombined alongside the monks’ material. Then comes sublation, where both electric monk A, and electric monk B are cancelled as complete truths, and a new “framing” emerges. This is what happened with politics as well, when capitalism got cancelled, and communism got cancelled too, and what emerged was a synthesis/antithesis dance, and then finally, “libertarian democracy”.

This is how the phases in this dialectic looks like as outlined by the author on GitHub:

The process has seven phases.

Phase 1: Elenctic Interview + Research — surface the real contradiction

The orchestrator interviews you Socratically — surfacing hidden assumptions, finding the deepest version of the contradiction, and identifying your belief burden. Then it researches the domain to ground both sides in specifics. The interview surfaces what you’re actually wrestling with; the research ensures the downstream arguments are grounded in specifics, not generics.

Phase 2: Generate Electric Monk Prompts — calibrate the belief assignments

The orchestrator crafts two prompts — one per Monk — calibrated to your specific belief burden. Each prompt includes framing corrections that prevent the Monk from falling into the obvious, boring version of the argument, plus targeted research directives for position-specific evidence.

Phase 3: Spawn the Electric Monks — two fully committed position essays

Two separate AI agents — each in a fresh, isolated context — write fully committed position essays. They don’t hedge. They don’t try to be balanced. Each one inhabits its position and makes the absolute strongest case. Spawning them in separate sessions with no shared context produces structural decorrelation — genuinely different reasoning paths, not the same analysis with different conclusions bolted on.

Phase 4: Determinate Negation — find where each argument undermines itself

The orchestrator analyzes both essays to find: where each position’s own logic undermines itself (self-sublation), what both sides implicitly agree on without realizing it (shared assumptions), and the specific way each position fails — not “it’s wrong” but “it fails in THIS way, which points toward THIS thing that’s missing.”

Then comes Boyd’s destructive deduction: shatter both arguments into atomic parts, break the correspondence between each position and its constituents, and scatter them into what Boyd calls a “sea of anarchy.” Then the creative induction: find common qualities, attributes, or operations among the scattered parts to build cross-domain connections that were invisible when the parts were trapped inside their original positions. In Round 2+, lateral creativity interventions (compressed conflicts, random Wikipedia domain injection, metaphor generation) inject genuinely external material before the decomposition, so it gets shattered and recombined alongside the monks’ material.

Phase 5: Sublation (Aufhebung) — synthesize something neither side could reach

The orchestrator generates a synthesis that simultaneously cancels both positions as complete truths, preserves the genuine insight in each, and elevates to a new concept that transforms the question itself.

This is not compromise. It’s not “use A for some cases and B for others.” It’s a reconceptualization — something neither Monk could have conceived from within their frame, but which, once stated, makes the original contradiction predictable. The synthesis is an abductive hypothesis: what would make it unsurprising that both Monk positions exist with genuine evidence? After drafting, a reversibility check (from Boyd) traces each claim back to specific atomic parts from the decomposition — untraceable claims get flagged as either confabulation or new insights needing their own evidence.

Phase 6: Validation — did the Monks feel elevated or defeated?

Both Monks evaluate the synthesis: were they elevated (their core insight preserved within something larger) or defeated (their position just dismissed)? Then a hostile auditor — a fresh agent with no position — attacks the synthesis for hidden assumptions, compromise disguised as transcendence, structural flaws, and runs its own reversibility check (can each claim trace to material in the essays, or is the synthesis just one monk’s structure wearing the other’s vocabulary?).

Phase 7: Recursion — where the real value lives

Each synthesis generates new contradictions. The orchestrator proposes 2–4 directions; you choose which to pursue. The process repeats — and each round gets sharper, pulling in new cross-domain material that the previous round made relevant.

The first round is calibration — the least insightful output. By Round 2–3, the dialectic has dug past the obvious framing into territory that neither you nor the Monks could have reached from the starting question.

This is another one of the commentaries by a person who used this to test the dialectic skill to help answer the question of “ React versus Vue” framework better..

In test runs, a React/Vue dialectic evolved from “corporate lab vs. auteur” into a “co-evolutionary arms race” framework. An institutional identity dialectic went through seven cycles, pulling in Gödel’s incompleteness theorem, Coasean transaction costs, and jurisprudential concepts that had nothing to do with the original question — but were essential by the time the dialectic reached them.