How I started building softwares with AI agents being non technical

Shreyas Prakash headshot

Shreyas Prakash

At the start of the year, the CEO of Anthropic had made a prediction that 90% of code in enterprises would be written by AI by September. Now that we have crossed September, we now know that the prediction turned out to be false. As Ethan Mollick mentions, he only seems to have been off by a couple of months (this was recently posted by Boris, the creator of Claude Code) where he mentions 100% of his contributions to Claude Code, written by Claude Code!

Last year 2025, by no doubt has been the year of AI agents. And I was tempted right from the beginning to play with this shiny new toy. And as a “technically curious” person, I want to dive right in. Ended up spending most nights and weekends understanding how to build software with AI agents. It was super fun..

Out of the many side projects that I built this year, here are some memorable ones:

Apart from these heavy-duty apps, I also built various micro-tools that served various ad-hoc use cases which include a chrome extension to import X bookmarks as Trello cards, a Windows XP-esque wallpaper with dynamic 3D clouds, an AI chess coach for improving elo score through socratic dialogue, an Obsidian plugin to help me prioritize the worst rough draft of an essay to improve first, and even a Mac-native open source screen recorder.. (all open source, and free for use)

The beginning was quite benign. Early 2025, I felt initially that LLMs could only build toy apps, and were not capable of building anything truly substantial. So I was stuck to creating various one-off prototypes using Lovable and Claude Artefacts to make them. I still wasn’t sure about it’s usage in complex codebases. The only way I was using them was by copy-pasting code snippets to ChatGPT and feeding them back. Then it evolved to simple autocomplete on AI-native IDEs such as Cursor IDE. Then I started using the chat window on the IDEs directly to interact with my codebases. I now run a Ghostty terminal with multiple tabs open with Codex instances.

By this time, mid-2025, the scaling laws were kicking in, and the agents were becoming much more successful in performing longer operations without breaking or hallucinating in between. The recent charts show tasks that take humans up to 5 hours, and plots the evolution of models that can achieve the same goals working independently. 2025 saw some enormous leaps forward here with GPT-5, GPT-5.1 Codex Max and Claude Opus 4.5 able to perform tasks that take humans multiple hours—2024’s best models tapped out at under 30 minutes. With this equipped model capabilities, I was excited to try CLIs and had great success. I hardly look at any code nowadays, not even a code ditor. My current setup looks like this:

It’s all on the terminal with Codex with multiple tasks running on different terminal windows. All I do is, engaging in a socratic dialogue with the models on various aspects: is X more performant than Y? Have you researched on alternative to perform feature Y? Does the API provided by platform Z have any rate limits which need to be considered?

To some extent, it almost feels like coding has evolved to a higher-level of abstraction, and like how Karpathy sensei mentions in this tweet, “here’s a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering”

Building such projects was giving me an intuitive understanding of how something as non-deterministic as a large language model can fit into a deterministic workflow of building software. I slowly moved from being a “no code” person, and with AI agents I moved to being a “some code” person. I still couldn’t write code and defined myself as a “non-technical rookie” to some extent. But with AI agents, it just changed the game, I was able to steer them towards what I wanted to achieve, and build great software.

How I use LLMs now

Here are some lessons I learnt while just jumping into the “AI waters” using LLMs and agents, and learning how to swim with them (as of Jan 2, 2025, things change really fast TBH):

Model usage

  • Model usage boils down to economics, with evaluations of tradeoffs between cost and intelligence being done for answering various questions on a frequent basis.. (you wouldn’t really use the most sophisticated model on the leaderboard to figure out how to center a div, for eg.). I now use gpt 5.2-codex-extra-high for complex problems, and gpt5.2-codex-medium for anything else.
  • In my initial explorations, I used to be very open ended in deciding which framework I should use. I was going with the defaults which codex gave. Especially when this gets subjective on a well-oiled, well contributed library, or framework which we can trust. I’ve arrived at a sensible set of defaults which I’m comfortable to understand, and almost always use them for various apps. For any web-app to be built, I use this starter kit which is basically Ruby on Rails in the backend, with Inertia for using React on the frontend. It does a good job of bringing together best of both worlds: React and Rails together, and also has great component libraries such as shadcn baked in. For anything mobile, I build on Expo, and for one-off frontend prototypes, I build React/vite apps. Over time, I’ve also gained an intuition on the prowess of each of these frameworks, so I can understand what their limitations are. language/framework and ecosystems are important decisions taken,and hard lessons have been learnt.
  • In terms of model selection, I almost always choose Codex over anything else, even Claude Code. Claude Code has great DX, and other utilities such as hooks, skills, commands etc, but Codex seems to just “get” it without any such charades. I was using Claude Code until I saw the brilliance of Codex from Peter Steinberger in his talk at the previous Claude Code Anonymous meetup in London. I’ve never really touched Claude Opus/Sonnet after that.
  • Another reason I use Codex is that they’re not as sycophantic as Claude, and pushback whenever necessary. When I make delirious requests.. Codex is like “are you sure you want to do Y, it might break X and Z… here are couple of alternate options a, b and c…”

UI prototyping

  • Another technique for faster UI explorations in a “low fidelity” way is to ask it to generate ASCII diagrams of the UI layouts, and it cooks up something like this, making it easier to iterate on loop.

Article content

  • For the past three projects that I’ve shipped with AI agents I’ve never touched Figma to communicate anythng at all.. despite years of being ingrained in the Figma-way of building prototypes.. Now I just use excalidraw (to draw loose sketches), ascii diagrams (to generate lo-fi mockups) and prototype sandboxes with good design systems (to generate hi-fi mockups)

Workflows

  • I usually parallelize by running multiple tabs with Codex open. I don’t git worktrees or anything of that sort, but in a way I prevent the models from stepping into each other’s toes by means of atomic commits:
Keep commits atomic: commit only the files you touched and list each path explicitly. For tracked files run git commit -m "<scoped message>" -- path/to/file1 path/to/file2. For brand-new files, use the one-liner git restore --staged :/ && git add "path/to/file1" "path/to/file2" && git commit -m "<scoped message>" -- path/to/file1 path/to/file2
  • For debugging, I almost always copy+paste the dev/production tail logs to ChatGPT and it solves 99.99% of the problems. I’ve heard some of my friends have a much more advanced workflow where they integrate Sentry to log all the errors (I haven’t personally tried this yet, I wanted to cross this bridge when I have no other escape route)
  • With AGENTS.md or CLAUDE.md file, I give instructions only on a higher abstract level, as I’ve seen some Twitter folks mentioning that the models almost always bypass the code snippets which are added to the agents file. Think of this as higher level steering instructions. Not too detailed, and not too vague either. I use a variant of this gist for my own purposes.
  • With context window optimisation, I’ve been recently understanding that there is a great-dumbening of the model especially when the context window is more than 40% of it’s actual limit, and the best approach then is to start a new chat with the agents, instead of adding more to the same chat session.
  • I don’t do compaction of the chat windows as I view them as lossy. In case the chat is more than it’s 40% limit and if I haven’t been able to get the problem fixed yet, I instruct the agent to write a markdown file with the output of all the revisions, changes and decisions made. I then reference this file to a new chat
  • No more plan modes. Previously with Claude, I used to build very detailed product specs documents (following Harper’s LLM codegen hero’s journey guide, with Codex now, it’s changed. Instead I just write a “product vision” document. This helped set expectations on the vision I want to build the product towards. This was also 100% written by me without any AI agents help, as this was something I could uniquely contribute. Plan mode was just plain boring, as I was not so excited to create a 50-point to-do lists to build MVPs. I started feeling almost like a mechanical turk blindly pressing “continue continue continue..” ad infinitum without sharp thinking. This process of taking the complete idea as input and then delivering an output was stripping me of my creative process and wasn’t really working well for me. Now, I just start with the boilerplate starter kit and ask questions based on various user stories.. I would say something like “I want users to sign in with Google” and it builds it. then I’m like “I want users to be onboarded on how to use this service” and it builds an onboarding page. one by one, one user story at a time until I build an MVP.
  • break your app down into what users actually do. “a user can sign up with email and password.” “a user can create a new post.” “a user can see a feed of all posts.” this is the language the ai understands. this is how you communicate clearly.

For product vision drafting:

Ask me one question at a time so we can develop a thorough product vision for this idea. Each question should build on my previous answers, and our end goal is to have a detailed product vision, I can hand off to all of you (agents) to provide a direction of the north star. Let’s do this iteratively and dig into every relevant detail. Remember, only one question at a time.

Here’s the idea: [insert idea here]
  • No matter whatever code is written, TDD is still a must. LLMs can still make errors. I instruct them to write tests, and I read through the test scenarios to cross check if the user journey logic is intact.
  • As I’ve now started to build more projects, I have them all neatly organised within a /Projects folder with various projects under them. /Project 1, /Project 2 etc.. If I run into an error which I’ve encountered in a different project that I’ve solved, I reference feature X and it’s implentation from Project 1 into Project 2 and it does it neatly. Over time, as we accumulate exposure to more problems solved by means of LLMs, it almost becomes an art of “compound engineering”, where previous solutions, solve current problems
  • For more “harder” problems to implement, or for new feature implementations, I break the prompts into three parts. Act one would be to research potential ways to integrate the feature where I ask codex to come up with three directions, from which I pick one. Act two, involves asking Codex how it aims to build it, and the series of steps it would entail. Knowing this helps me steer Codex better. Act three involves executing it’s plan. While Act three is ongoing, I do keep an eye on what it’s doing. If something seems fishy, I either abort the operation, or ask followup questions for it to look closely. This was popularised by Dexter Horthy from Human Layer, and is a nice way to separate (research) (plan) and (execute) into different operations for clarity.
  • For vibe coding on mobile, I initially attempted to run a Tailscale server, where I install a headless Claude Code CLI on a VPS server with which I can text over phone via SSH.. however this was quitew slow, and I didn’t enjoy the experience as much. For now, I just use Codex web to chat and create PRs.. once I’m back at my desktop, I just code review the PRs and merge them with the codebase…
  • I’ve also been exploring skills. I recently built a “Wes Kao writing” skill to improve my executive communications at my day job. This was a custom skill fed on all the blog posts written by Wes Kao, and gives much more refined feedback on how I could improve my first drafts in business comms.. I’ve also been using Claude’s frontend-skill for instructing agents with building UI better.. I’ve seen tons of resources (such as this one, but haven’t caught up yet)

Most of these ideas I’ve learnt from Peter Steinberger, Ian Nuttall as well as Tal Raviv / Teresa Torres on Linkedin have also been inspirational to understand how to approach building with AI agents from a product lens. (I recently found Teresa’s “build in public” updates on her recent AI interviewing tool to be quite motivating)

What I haven’t explored yet (but would try soon)

Stage 1: Zero or Near-Zero AI: maybe code completions, sometimes ask Chat questions. Stage 2: Coding agent in IDE, permissions turned on. A narrow coding agent in a sidebar asks your permission to run tools. Stage 3: Agent in IDE, YOLO mode: Trust goes up. You turn off permissions, agent gets wider. Stage 4: In IDE, wide agent: Your agent gradually grows to fill the screen. Code is just for diffs. Stage 5: CLI, single agent. YOLO. Diffs scroll by. You may or may not look at them. Stage 6: CLI, multi-agent, YOLO. You regularly use 3 to 5 parallel instances. You are very fast. Stage 7: 10+ agents, hand-managed. You are starting to push the limits of hand-management. Stage 8: Building your own orchestrator. You are on the frontier, automating your workflow.

I haven’t ventured into these stages personally in 2025, and I’m also not sure if things would change in 2026. Stages 7 and 8 are still very controversial, and debate-able right now, and is still not ripe enough for even the early-adopter’s “adoption”. Agent orchestration seems be the hottest word right now in such AI-pilled dev circles and I’m curious how this would unfold..

Wrapping up 2025..

As 2025 ends, and a new year begins, it feels that everything is possible. It’s the “age of the builder” and understanding “how to write code syntax” is no longer the bottleneck. This is also likely going to be one of the most important decades in human history, and even ordinary actions like “putting an essay on the internet” can be extremely high leverage.

Aiming to think hard about what we’re doing, and post more, write more, participate more in 2026!

Subscribe to get future posts via email (or grab the RSS feed). 2-3 ideas every month across design and tech

Read more

  1. My agentic engineering workflow (step by step)agentic-coding
  2. Keep hunting for co creative systemic loopsdesign-thinking
  3. Keep hunting for co creative loops
  4. Every darn thing is a kekulean loop if you notice itdesign-thinking
  5. Hammock driven developmentagentic-coding
  6. Peculiar ways number three fits into our funny little brains
  7. AI sandwich as a defacto principle for anything agentic engineering relatedagentic-coding
  8. How I write essays in 2026writing
  9. Authority in the guise of evidencecritical-rationalism
  10. Map is not the territoryphilosophy
  11. Self hypnosis as a manifestation ritualmeditation
  12. Hegelian dialectic for structured reasoning with AI agentsphilosophy
  13. How I prepare for tough negotiations nowadaysnegotiation
  14. When should we steelthread somethingproduct-development
  15. How to become a polyglot
  16. Breadboarding, shaping, slicing, and steelthreading solutions with AI agentsproduct-management
  17. Healthy conflict in teams have a tipping pointteam-building
  18. Deslopify AI writingwriting
  19. How I started building softwares with AI agents being non technicalagentic-coding
  20. Read raw transcriptsknowledge
  21. Legible and illegible tasks in organisationsproduct
  22. L2 Fat marker sketchesdesign
  23. Writing as moats for humanswriting
  24. Beauty of second degree probesdecision-making
  25. Boundary objects as the new prototypesprototyping
  26. One way door decisionsproduct
  27. Finished softwares should existproduct
  28. How I periodically rank my rough draftsobsidian
  29. Flipping questions on its headinterviewing
  30. Vibe writing maximswriting
  31. How I blog with Obsidian, Cloudflare, AstroJS, Githubwriting
  32. How I build greenfield apps with AI-assisted codingai-coding
  33. We have been scammed by the Gaussian distribution clubmathematics
  34. Classify incentive problems into stag hunts, and prisoners dilemmasgame-theory
  35. I was wrong about optimal stoppingmathematics
  36. Thinking like a shipmental-models
  37. Hyperpersonalised N=1 learningeducation
  38. New mediums for humans to complement superintelligenceai-coding
  39. Maxims for AI assisted codingai-coding
  40. Personal Website Starter Kitai-coding
  41. Virtual bookshelvesaesthetics
  42. It's computational everythingtrends
  43. Public gardens, secret routesdigital-garden
  44. Git way of learning to codeai-coding
  45. Kaomoji generatorsoftware
  46. Copy, Paste and Citeai-coding
  47. Style Transfer in AI writingai-coding
  48. Understanding codebases without using codeai-coding
  49. Vibe coding with Cursorai-coding
  50. Virtuoso Guide for Personal Memory Systemsmemory
  51. Writing in Future Pastwriting
  52. Publish Originally, Syndicate Elsewhereblogging
  53. Poetic License of Designdesign
  54. Idea in the shower, testing before breakfastsoftware
  55. Technology and regulation have a dance of ice and firetechnology
  56. How I ship "stuff"software
  57. Writing is thinkingwriting
  58. Song of Shapes, Words and Pathscreativity
  59. How do we absorb ideas better?knowledge
  60. Read writers who operatewriting
  61. Brew your ideas lazilyideas
  62. Trees, Branches, Twigs and Leaves — Mental Models for Writingwriting
  63. Compound Interest of Private Notesknowledge
  64. Conceptual Compression for LLMsai-coding
  65. Meta-analysis for contradictory research findingsdigital-health
  66. Proof of workproduct
  67. Gauging previous work of new joinees to the teamleadership
  68. Task management for product managersproduct
  69. Beauty of Zettelswriting
  70. Stitching React and Rails togetherai-coding
  71. Exploring "smart connections" for note takingknowledge
  72. Deploying Home Cooked Apps with Railssoftware
  73. Repetitive Copypromptingwriting
  74. Questions to ask every decadejournalling
  75. Balancing work, time and focusproductivity
  76. Hyperlinks are like cashew nutswriting
  77. Brand treatments, Design Systems, Vibesdesign
  78. How to spot human writing on the internetwriting
  79. Can a thought be an algorithm?product
  80. Opportunity Harvestingcareers
  81. How does AI affect UI?design
  82. Everything is a prioritisation problemproduct-management
  83. Nowlifestyle
  84. How I do product roastsproduct
  85. The Modern Startup Stacksoftware
  86. In-person vision transmissionproduct
  87. How might we help children invent for social good?social-design
  88. The meeting before the meetingmeetings
  89. Design that's so bad it's actually gooddesign
  90. Lessons learnt interview prepping for product rolesinterviewing
  91. Obsessing over personal websitessoftware
  92. English is the hot new programming languagesoftware
  93. Better way to think about conflictsconflict-management
  94. The role of taste in building productsdesign
  95. Dear enterprises, we're tired of your subscriptionssoftware
  96. Products need not be user centereddesign
  97. World's most ancient public health problemsoftware
  98. Pluginisation of Modern Softwaredesign
  99. Let's make every work 'strategic'consulting
  100. Making Nielsen's heuristics more digestibledesign
  101. Startups are a fertile ground for risk takingentrepreneurship
  102. Insights are not just a salad of factsdesign
  103. Minimum Lovable Productproduct
  104. Methods are lifejackets not straight jacketsmethodology
  105. How to arrive at on-brand colours?design
  106. Minto principle for writing memoswriting
  107. Importance of Whytask-management
  108. Quality Ideas Trump Executionsoftware
  109. How to hire a personal doctormental-models
  110. Why I prefer indie softwareslifestyle
  111. Use code only if no code failscode
  112. Self Marketing
  113. Personal Observation Techniquesdesign
  114. Design is a confusing worddesign
  115. A Primer to Service Design Blueprintsdesign
  116. Rapid Journey Prototypingdesign
  117. Visualise detailed file structures on CLIcli
  118. Do's and Don'ts of User Researchdesign
  119. Design Manifestodesign
  120. Complex project management for productproducts
  121. How might we enable patients and caregivers to overcome preventable health conditions?digital-health
  122. Pedagogy of the Uncharted — What for, and Where to?education
  123. Future of Ageing with Mehdi Yacoubiinterviewing
  124. Future of Tacit knowledge with Celeste Volpiinterviewing
  125. Future of Rural Innovation with Thabiso Blak Mashabainterviewing
  126. Future of Equity with Ludovick Petersinterviewing
  127. Future of work with Laetitia Vitaudinterviewing
  128. Future of Mental Health with Kavya Raointerviewing
  129. Future of unschooling with Che Vanniinterviewing
  130. How might we prevent acquired infections in hospitals?digital-health
  131. The why to endure any howentrepreneurship
  132. Design education amidst social tribulationsdesign
  133. How might we assist deafblind runners to navigate?social-design