Agentic developer experience starts with your system, not your prompts

In 1998, I right-clicked on websites I thought were cool and hit “view source.” The HTML was right there. The thing I was looking at and the thing that made it were the same language. I could copy it, change it, reload. That’s how I started coding. Shamelessly copying other people’s work with a lot of persistence.

There was the “[insert technology] for Dummies” series, computer magazines with tutorials, and the occasional workshop at the local Internet café. The path from curiosity to running code was: find an example, read it, change it, see what happens.

Fast forward almost three decades (yikes!) to a few weeks ago.

I’m standing by a whiteboard at Sanity HQ in San Francisco, talking to the team building our SDKs about what happens when agents try to use our stuff. They wanted to know: how do we write better skills files? How do we give devs better prompts?

These are intuitive questions. All of us have spent the last couple of years figuring out what to tell agents to make them useful. Still, I told them they were starting in the wrong place. I’ve been working through this with several of our teams, and what I keep finding is that the answers aren’t where you’d expect.

When someone types “build me an app with [your product]” into Claude Code, unlike a Google search, it just does stuff. It picks packages. It picks patterns. It makes architectural decisions based on statistical opinions trained on everything it’s seen. And it does all of this before your user has read a single line of your documentation.

The new “time to hello world”

If you make developer tools, you probably think about “time to hello world.” How fast can a developer go from zero to running code with your product? You optimize your getting-started page, your quickstart guides, your API key flow. You design for a human reading your docs. You make it easy to copy-paste. You make wizards that scaffold projects from choices.

But now there’s another type of cognition operating between your product and the developer. Language models wrapped in different affordances (an IDE, web or terminal apps, different system prompts, MCPs, skills), what’s often called “agent harnesses.” These harnesses sometimes go fetch your docs, sometimes use your MCP tools, sometimes just predict code from training data. The line between “what the LLM knows” and “what the harness finds” is blurry, and it’s getting blurrier.

What’s consistent is this: someone types a naive prompt. “Build me a content-driven app with Sanity.” “Set up authentication with Clerk.” “Deploy this to Netlify.” There might be no docs consulted. No getting-started page visited. Just a prompt and an expectation.

That naive prompt is your new first impression.

The new kind of developers

The person typing that prompt isn’t always who you’d expect. Agentic coding tools have lowered the barrier enough that people who wouldn’t have called themselves developers two years ago are now building applications. Designers prototyping interfaces. Marketers building bespoke dashboards. Founders shipping MVPs. They’re not going to read your getting-started tutorial. They might not even know they should. (We had these users before too. They just used to hit a wall that forced them to learn or hire someone. The agent removed the wall but not the gap in knowledge behind it.)

We’re seeing this at Sanity too. Builders on Reddit who report that “they are not technical” while touting that they went to production with a website using Next.js and Sanity. That tends to work pretty well, because most models have seen a lot of Next.js plus Sanity code.

But when the task goes beyond what’s well-represented in training data, things get interesting. I work at Sanity, have done for nearly eight years, so I’m obviously invested in how this plays out for us. An agent asked to build a Sanity application, without specific guidance, grabbed the Sanity client, grabbed Next.js, and tried to build a generic app. It had no idea what our App SDK was. It defaulted to whatever had the most representation in its training data. I started calling this “popularity horror”: the agent follows the most-trodden path, not the correct one.

Where you actually have control

This is where the current conversation about “Agent Experience” gets interesting.

Matt Biilmann coined the term AX in early 2025 and did something important: he gave people a name for a design problem nobody was talking about systematically. In his one-year update, he lays out four pillars: Access, Context, Tools, Orchestration. It’s a useful map of the territory. His point about context engineering, managing what’s in the agent’s context window during its tool-use loop, is especially sharp. And the “deploy first, claim later” pattern that Netlify, Clerk, and Prisma have adopted is a genuine innovation in onboarding design.

What I want to add is a question about sequencing. When you have a team with limited engineering time, where do you start?

All four pillars are worth investing in. But I keep finding that the system layer (your API shape, your error messages, your SDK abstractions) does double duty: it improves the experience for agents AND for the humans who were already using your product. The instruction layer (skills, llms.txt, MCP descriptions) is important work, but it’s agent-specific. If I had to pick where to spend the first sprint, I’d pick the system.

Here’s how I think about where your design control actually sits:

Low control: the user’s prompt. You can’t influence what someone types into Claude Code. Zero-control territory.

Medium control: the instruction layer. Skills files, llms.txt, MCP tool descriptions, documentation. You write these, and they matter. But they get interpreted, compressed, sometimes dropped. Whether the agent finds your docs through its harness or predicts from training data, the instruction layer is always mediated. Cloudflare’s developer docs produce an llms-full.txt that’s 3.7 million tokens long. That’s not fitting in any context window.

High control: your system’s surface. API responses, CLI output, error messages, SDK abstractions. These are what the agent actually touches when it interacts with your system, whether it’s predicting code or calling your tools. And this is where I keep finding the real leverage.

Error messages as instructions

There’s already a standard for this, and it’s been around since 2016. RFC 9457 (“Problem Details for HTTP APIs”) defines a JSON format for error responses with structured fields: type (a stable URI identifying the problem), title (human-readable summary), detail (what went wrong this time), and extension members for anything else. The content type is application/problem+json.

The difference between a useless error and a useful one is the difference between the agent spinning its wheels and self-correcting:

json

// What the agent gets from a lazy error:
{
  "statusCode": 400,
  "error": "Bad Request",
  "message": "Invalid request"
}

json

// RFC 9457 Problem Details: the agent can work with this
{
  "type": "https://api.sanity.io/problems/document-type-not-found",
  "title": "Document type not found",
  "status": 400,
  "detail": "Document type 'post' not found in dataset 'production'",
  "availableTypes": ["article", "page", "author"],
  "hint": "Did you mean 'article'? See https://sanity.io/docs/schema-types"
}

Same HTTP status code. Wildly different outcome. The type URI gives the agent a stable identifier it can pattern-match on across occurrences. The detail field explains this specific failure. The extension members (availableTypes, hint) give it everything it needs to self-correct. This isn’t a new idea. It’s a standard that most APIs still don’t implement.

I ran into a version of this with our own tooling recently. I was trying to deploy Sanity functions from CI using our Blueprints CLI and hit Missing scope configuration for Blueprint. The scope config (project ID, stack ID) lives in .sanity/, which is gitignored because it also contains build cache. CI clones fresh, so the config is never there.

It turns out there’s a dedicated GitHub Action for Blueprints deployments that handles all of this. But neither I nor my agents found it. The agents went straight to the CLI (the path of least resistance), hit the error, and started trying to work around it. One agent, given enough rope, actually read the CLI source code and discovered that the scope can come from environment variables (SANITY_PROJECT_ID, SANITY_BLUEPRINT_STACK_ID), bypassing the config file entirely. It solved the problem. But it solved it the hard way, by reverse-engineering the internals instead of finding the purpose-built solution.

The interesting part is what happened when I shared this with the Blueprints team. Within minutes we were sketching out improvements: what if the error message mentioned the GitHub Action? What if blueprints init offered a --with-github-action flag? What if --help had a CI/CD section? The building blocks were all there. The surfaces just didn’t connect them. Something like:

text

Error: No scope configured for this project.

  To set up locally:
    sanity blueprints init

  To deploy from CI (recommended):
    Use the official GitHub Action:
    https://sanity.io/docs/blueprints/blueprint-action

  Or set environment variables:
    SANITY_PROJECT_ID, SANITY_BLUEPRINT_STACK_ID

  Run 'sanity blueprints deploy --help' for all options.

That’s the same information the agent eventually found by reading source code, surfaced at the moment it’s needed. The agent hitting this error goes from spinning to self-correcting. The human hitting it goes from frustrated to deployed.

The experiment

To make the SDK point concrete, I ran a simplified experiment. I opened the Anthropic Workbench, no IDE, no skills files, no system prompt, just Claude Sonnet and a text box, and typed:

Build me a custom content approval dashboard for Sanity. Editors should see a list of documents pending review, be able to open and read each document, and approve (publish) or reject them. Use React.

Some context: the App SDK is a React toolkit for building custom content tools on top of Sanity. It provides hooks for listing documents, reading them with real-time updates, and publishing them atomically. If the agent finds the SDK, this is a straightforward build.

The agent did not find the SDK. (It’s relatively new, so the training data is dominated by older patterns.)

10,335 tokens of code. Built everything from scratch on top of @sanity/client. To publish a document, it predicted three separate API calls with no transaction and race conditions between steps. The production-grade version, publishDocument(handle), is a single atomic operation in the SDK the agent didn’t know existed.

typescript

// Patch the draft
await client
  .patch(`drafts.${documentId}`)
  .set({ approvalStatus: 'approved', reviewedAt: new Date().toISOString() })
  .commit()

// Copy draft to published
const draft = await client.getDocument(`drafts.${documentId}`)
const { _id, ...docWithoutId } = draft
await client.createOrReplace({ ...docWithoutId, _id: documentId })

// Delete the draft
await client.delete(`drafts.${documentId}`)

Second run, I added “real-time” to the requirements. Worse, not better. 17,840 tokens (72% more code), 150 lines of hand-rolled state management. Still no SDK.

Third run, I added a system prompt listing the SDK hooks and their signatures. Dramatically different. The result was dramatically different. The agent used every hook I mentioned. useDocuments for listing. useDocument for reading with real-time updates. publishDocument(handle) for the atomic publish. Suspense boundaries for loading states. A standalone App SDK app with <SanityApp> provider instead of a Studio plugin. Cleaner architecture, fewer tokens (11,832), more correct code.

But to write that system prompt, I had to already know the answer. The instruction layer worked, but it required the human to have the expertise the agent lacked.

The hallway, not the signage

There’s an analogy I keep coming back to. If you’re designing a physical space, you can put up signs telling people where to go. Or you can design the hallways so the natural flow takes people where they need to be. Both matter. But if your hallways are confusing, no amount of signage will fix it.

Skills files and llms.txt are signage. Your API design is the hallway.

That’s what good developer experience has always been: making your tooling less vulnerable to oversight. Less vulnerable to the human forgetting a step, skipping a doc page, not knowing what they don’t know. Agents just add another layer where oversight can happen. The SDK absorbs that layer the same way it absorbs the human one. The error message that tells you what went wrong and how to fix it works for both.

Three questions for your next sprint

If you’re on a team building developer tools and you want to start somewhere concrete, here are three questions I’d run every surface through:

1. What does the agent see when it fails?

Pull up your error messages, your CLI output on bad input, your API responses on invalid requests. Read them as if you have no context. Does the failure tell you what went wrong and how to fix it? Or does it just say “invalid request” and leave you guessing?

The Blueprints CLI example is mine. Yours will be different. Maybe it’s an API that returns 400 Bad Request with no body. Maybe it’s a CLI that says “configuration error” without naming the missing field. Maybe it’s a runtime error that shows up in the browser console as an opaque stack trace instead of telling the developer which prop they forgot or which environment variable isn’t set.

That browser console one is worth lingering on. Agents working in coding environments read terminal output and console logs. If a library throws TypeError: Cannot read properties of undefined when someone forgets to wrap their app in a provider component, the agent will flail. If it throws something like Provider not found. Wrap your app in <AppProvider>. See https://example.com/docs/getting-started, the agent fixes it in one pass. Same error, different surface. Every place your system outputs text is a place it can guide the agent.

Fix the worst dead-ends first. (You probably already know which ones they are. Check your support tickets.)

2. Where can you direct agents regardless of user input?

This is the one most teams miss. There are moments in every developer workflow where your system gets to speak, unprompted, into the agent’s context. Not because the user asked for help, but because your tooling is running and producing output.

Think about what happens when someone (or an agent) installs your package. The terminal output from npm install is right there in the agent’s context. A postinstall message that says “Run npx your-tool init to get started” is free guidance. The agent will often just do it.

Or think about what your dev server prints on startup. Most frameworks print a URL. What if yours also printed the key configuration it detected, or flagged a missing environment variable before the first request fails? sanity dev could print “Studio running on http://localhost:3333 | dataset: production, project: abc123.” Now the agent knows the project ID without having to parse config files.

CLI help text is another one. When an agent runs your-tool --help (and many harnesses do this as a first step), what comes back? A wall of flags, or a structured list with examples? The --help output is documentation that lives inside the tool itself. It can’t be summarized away or dropped from context. It’s always there.

These aren’t instruction-layer investments. You’re not writing docs or skills files. You’re making your system’s normal output more informative. It’s the difference between a hallway with good lighting and one where you need a flashlight.

3. What’s the path of least resistance?

If someone gives an agent a naive prompt about your product, what does it build? Try it. Open a workbench, type the kind of thing a new user would type, and look at what comes out. Is the agent reaching for the right packages? The right patterns? Or is it building everything from scratch because your newer, better abstractions aren’t well-represented in training data yet?

You can’t control what the agent predicts. But you can control how many paths through your system lead somewhere good. Fewer paths, better defaults, opinionated SDKs. If your product has three ways to do the same thing (the old way, the new way, and the raw API), the agent will pick whichever one has the most training data. That’s usually the old way. Deprecation warnings that suggest the replacement, clear migration guides, and SDKs that make the new way shorter than the old way all shift what “least resistance” means.

4. Who’s evaluating the output?

This is the one that changes the stakes. An experienced developer will catch a fragile publish pattern and go check the docs. A designer shipping their first app with Cursor won’t. They’ll trust whatever the agent produces.

The instruction layer (skills, docs, llms.txt) assumes someone who can evaluate what the agent produces. The system layer (SDKs, error messages, API shape) protects everyone, including the people who can’t. If your user base is shifting toward less experienced builders (and for most developer tools, it is), the system layer matters more than it used to. The question isn’t just “does this work?” It’s “does this work safely for someone who can’t tell the difference?”

What I'd do if I ran a DX team rn

Take one afternoon. Pull up your product’s error messages, your CLI help text, your API responses on bad input. Run a naive prompt through an agent and watch what happens. Then ask the four questions:

What does the agent see when it fails? Find the dead-end errors and rewrite them. This is the highest-leverage change you can make, and it helps your human users too.
Where can you direct agents regardless of user input? Your postinstall messages, your dev server output, your --help text. These are free guidance that can’t be dropped from context.
What’s the path of least resistance? If the agent follows the most obvious path through your system, does it end up somewhere good? If not, that’s a system design problem, not a documentation problem.
Who’s evaluating the output? If your user base includes people who can’t tell good output from bad (and increasingly, it does), the system layer is their only safety net.

The instruction layer matters. I’m not arguing against skills files, llms.txt, or MCP servers. But if you’re deciding where to spend your next sprint, start with the system. It’s the work that helps both your human users and the agents they’re working through. And it’s the work that compounds: every error message you improve, every CLI output you make more informative, every SDK that encodes the production-grade pattern stays improved regardless of which agent harness comes along next month.

Skills files and llms.txt are signage. Your API design is the hallway. The hallways come first.

Agentic developer experience starts with your system, not your prompts

The new “time to hello world”

The new kind of developers

Where you actually have control

Error messages as instructions

The experiment

The hallway, not the signage

Three questions for your next sprint

1. What does the agent see when it fails?

2. Where can you direct agents regardless of user input?

3. What’s the path of least resistance?

4. Who’s evaluating the output?

What I'd do if I ran a DX team rn

Comments

Comments

#The new “time to hello world”

#The new kind of developers

#Where you actually have control

#Error messages as instructions

#The experiment

#The hallway, not the signage

#Three questions for your next sprint

#1. What does the agent see when it fails?

#2. Where can you direct agents regardless of user input?

#3. What’s the path of least resistance?

#4. Who’s evaluating the output?

#What I'd do if I ran a DX team rn

Comments

Comments

The new “time to hello world”

The new kind of developers

Where you actually have control

Error messages as instructions

The experiment

The hallway, not the signage

Three questions for your next sprint

1. What does the agent see when it fails?

2. Where can you direct agents regardless of user input?

3. What’s the path of least resistance?

4. Who’s evaluating the output?

What I'd do if I ran a DX team rn