← back

how i use ai

a long-form walk-through of my actual workflow with claude code and openai codex, how i think about them, and why it matters for the kind of product work i want to do next

intro

i don't think of claude code and openai codex as tools. that framing undersells what they actually are in my workflow. they're closer to research partners, ones that never forget context, never get tired of re-reading a 55-file wiki, and never complain when i ask them to re-derive a financial model for the fourth time with different assumptions.

the difference between "using ai" and "working with ai" is the difference between googling a question and having a colleague who has read everything you've ever written. the second only works if you give them something worth reading. that is the wiki.

the setup

three things make the system work together.

the wiki. 55 markdown files following andrej karpathy's pattern for maintaining a personal knowledge base for LLM sessions. one canonical file per topic. changelogs. cross-references. a CLAUDE.md schema that tells any agent how to read the wiki: what operations are allowed, what the folder structure means, what the privacy model is.

claude code. my primary working environment. it has full context on the wiki, the site source, and the design guide. i use it for research synthesis, content generation, code generation, and the weekly self-update agent that maintains this site.

openai codex. second opinion and cross-model verification. when i need to stress-test a finding or get an alternative perspective on a structural decision, i run the same question through codex with the same wiki context. disagreements between models are often the most informative signal.

a typical day

morning starts with a research question, usually something that came up the day before that i need to think through properly. i open claude code, it loads the wiki context, and we start working.

the conversation is not "write me a summary of X." it is "here is what i think about X based on my prior research. here are the three assumptions i am least confident about. help me stress-test them." the wiki means claude already knows what i researched last week, what i concluded, and what i flagged as uncertain.

after a research session, i update the wiki. new findings get their own page or get appended to an existing page with a changelog entry. the wiki grows, which means the next session starts with more context. it compounds.

if i am building something (like this site, or cloud brain), the workflow shifts to generation. i describe what i want in terms of the design guide and the existing patterns. claude code generates. i review, tighten, and iterate. the generation is fast. the review is where the value is.

what ai does, what i still do

ai handles synthesis. reading 20 sources, finding patterns, generating structured outputs (tables, flow diagrams, TLDR summaries), re-deriving analysis with new assumptions, cross-referencing claims against my wiki. this is where ai is genuinely 10x faster than doing it manually.

i handle judgment. what to research. what to kill. what the initial theory of an idea actually is. whether a finding is surprising enough to change direction. whether the tone of a piece of copy is right. whether a self-assessment is honest or self-serving. the wiki gives ai context, but context is not judgment.

the split is roughly: ai does 80% of the volume (reading, synthesizing, generating, formatting). i do 80% of the decisions (what to work on, what to cut, what to publish, what to keep private).

the most important skill in ai-native work is not prompting. it is knowing what questions to ask and when to stop asking.
examples

this site. generated from the wiki using claude code and openai codex. the design system, case study structure, and all content were extracted from structured research notes. the weekly self-update agent reads the wiki every sunday, diffs against the site, runs a privacy filter, and proposes changes i review in 10 minutes.

cloud brain. the entire mcp server, wiki parser, retrieval engine, and safety layer were built with claude code. i wrote very little code by hand. the value i added was the product thinking: what the schema should be, what the privacy model should look like, when to pause building and think about distribution.

the eval engine. every case study on this site has an honest self-assessment table and kill criteria. those evaluations were structured with ai assistance but the judgments are mine. the maritime case study was deprioritized because primary research contradicted desk assumptions, and ai helped me see the gap faster by re-running the financial model with corrected inputs.