March, 22 2026
I wanted to see if I could get LLMs to write non-slop in my voice that I'd enjoy reading, and maybe even put my name on (only if it was good enough).
I build this with Codex in a Ralph Wiggum loop:
The output after about 5 hours and some manual input from me: EVAL.md.
The best part of this process was when I came back after a while and the rubric had gotten nerfed. It removed this rule that I had added manually:
Instant fail — Em-dashes or touching dashes (-14)
Em-dashes (—) or n-dashes without spaces on both sides in the author's own prose. Steve ONLY uses spaced n-dashes:
word – aside – word. Any other dash style is not Steve. Dashes in direct quotations or poem attributions are excluded.
If you think about it, this rule kinda represents a watermark for my writing. Up until recently, I didn't realize that I was typing n-dashes instead of m-dashes! I don't even know how to type an m-dash and now I don't want to know. If you ever see writing with an m-dash or with the dashes touching the works on either side of it – unlike this – then you know it definitively wasn't me.
But after a couple hours of my Ralph Wiggum loop, this rule had disappeared! I asked the AI why and it linked me to this passage that I had supposedly written:
This tool—and others like it—are stops on our way toward end-programmer programming.
I laughed out loud. That post was ghost-written for me by our new team member Pete Millspaugh! The Steve Eval was correctly identifying text not written by me!
Even before this result, we had stopped the ghost-writing practice. I'm not a fan. I much prefer when the byline is accurate, and posts are written by the people who say they wrote them. We had only done it that time out of a mistaken feeling that people would prefer to hear from me, the founder. But that only works if we're doing it authentically, so it didn't make sense. I just updated the author on that post to Pete, because after all, he wrote it!
To write a new post, I would vomit out all my thoughts about a new blog by voice and then give AI that transcript. Then I'd tell it to do the following in another Ralph Wiggum loop:
I wrote two blogs with this approach, in part:
The AI-produced results weren't great, so I spent a lot of time editing and in many cases rewriting whole sections by hand. I think those posts are pretty decent in spite of the help I got from AI.
This was a fun exercise, but mostly not very helpful for me yet. I'm excited for when AI is good enough to be able to do this well.
But it is fun to learn that I have a fairly unique watermark – long live my en-dash clauses with spaces around it! – and to get a bit clearer about what I think good writing in my voice looks like.
The folks at Every are thinking in this direction:
A list of AI writing tropes: