Walkthrough: Building "Streak" from Scratch

Product Kit is a free, open-source toolkit that brings structured product thinking to Claude. It guides you through 9 steps — from understanding your market to writing a validated product spec — using guided conversations that produce markdown artifacts.

In this walkthrough, we'll build Streak — a habit tracker for people who keep starting habits and quitting. You'll see every conversation, every artifact, and every decision along the way.

Prerequisites

  • Claude with the Cowork plugin, or Claude Code with the CLI
  • Product Kit set up (plugin uploaded, or productkit workspace / productkit init in the CLI — covered in the next two sections)
  • A product idea you want to validate (or follow along with Streak)

Command format

Slash commands look slightly different depending on how you use Product Kit. Pick your setup and all commands in this walkthrough will update automatically.

Estimated time

  • Steps 0–4 (Landscape through Assumptions): ~2 hours
  • Step 5 (Validate): 1–2 weeks of fieldwork (interviews, surveys, data collection)
  • Steps 6–8 (Solution through Spec): ~1 hour
Follow the order. Each step reads the artifacts from previous steps. Skipping ahead means Claude won't have the context it needs to ask the right questions.
Building solo? This walkthrough uses team mode examples (user stories, engineering handoffs). In solo mode, Product Kit adapts — you get direct task lists instead of “As a user...” stories, personal build plans instead of sprint estimates, and techreview resolves questions conversationally instead of deferring to engineers. You'll choose your mode during project initialization.
Already have a product? This walkthrough follows a new product from scratch, but Product Kit works for existing products too. Run /product-kit:bootstrap to generate artifacts from your codebase, then re-run individual steps (like Problem, Assumptions, and Validate) scoped to your next feature. You don't need to redo Landscape or Constitution every time.

Create a Workspace

productkit workspace streak-org

A workspace is a parent directory that holds one or more Product Kit projects. It gives you a shared landscape, a shared knowledge base, and a single place to run /productkit.landscape that feeds every project inside it.

Why use a workspace?

  • Your company has multiple products that share the same landscape (team, stack, competitors)
  • You want a single knowledge/ directory that informs all projects
  • You want landscape.md written once at the org level, not repeated per project

Run the command

$ productkit workspace streak-org

Workspace created successfully!

Workspace: streak-org

Next steps:
  1. cd streak-org
  2. productkit init my-app
  3. claude
  4. /productkit.landscape  (writes to workspace root)

What it creates

streak-org/
├── .productkit/
│   └── config.json          # { "type": "workspace", ... }
├── .claude/
│   └── commands/
│       └── productkit.landscape.md
└── knowledge/
    └── README.md

The workspace gets the landscape slash command and a knowledge/ directory. Individual projects created inside it will automatically detect the workspace and link to it.

Building a single product? You can skip this step entirely and run productkit init streak directly. Workspaces are for orgs managing multiple products under one roof.

Initialize Your Project

productkit init streak

This scaffolds a new Product Kit project with all the slash commands, a CLAUDE.md for agent context, and a knowledge/ directory. If you're inside a workspace, the project automatically links to it.

Run the command

$ cd streak-org
$ productkit init streak

How are you building this product?
  1. Solo — I'm building it myself
  2. Team — I'm working with engineers/designers

Choose (1 or 2): 2

Project initialized successfully!

Next steps:
  1. cd streak
  2. claude
  3. /productkit.constitution

What it creates

streak/
├── .productkit/
│   └── config.json          # { "type": "project", "workspace": "..", "mode": "team", ... }
├── .claude/
│   └── commands/
│       ├── productkit.constitution.md
│       ├── productkit.users.md
│       ├── productkit.problem.md
│       └── ... (all slash commands except landscape)
├── knowledge/
│   └── README.md
├── CLAUDE.md
├── README.md
└── .gitignore

Notice that landscape is not included in the project — it lives at the workspace level. When you run /productkit.landscape from Claude Code, it writes landscape.md to the workspace root, and all projects inside the workspace read from it.

Solo vs. Team mode

The mode you choose during init affects how downstream commands behave:

  • Solo: Task lists instead of user stories, personal build plans instead of sprint estimates, techreview resolves questions conversationally
  • Team: User stories with acceptance criteria, engineering handoffs, sprint-ready story points

You can also skip the prompt by passing --mode solo or --mode team.

Existing project? Run productkit init --existing from inside your project directory to add Product Kit without creating a new folder.
Without a workspace: If you skipped the workspace step, run productkit init streak from any directory. The project gets all slash commands except landscape, which lives at the workspace level. To follow this walkthrough's Step 0, create a workspace first.

Set Up Your Knowledge Base

/product-kit:learn

Before you begin the 9-step workflow, set up your knowledge/ directory. This is where you store research files — competitor analyses, user interview transcripts, market reports, articles, anything that informs your product thinking.

How it works

  • Drop any research files into knowledge/ (PDFs, markdown, text files)
  • Run /product-kit:learn to index them into knowledge-index.md
  • All other commands automatically read this index and reference your evidence

For Streak, you might add articles about habit formation psychology, competitor app reviews, or behavioral science research. When you later run /product-kit:problem, Claude will reference this research when framing the problem.

Tip: You can add files and re-run /product-kit:learn at any point in the workflow. The index updates automatically.
0

Landscape

/product-kit:landscape

"Front-load the context so every future step is smarter."

Why this step matters

Before you can build something useful, Claude needs to understand your world. Without landscape context, later commands will suggest features that don't fit your team size, market position, or tech stack.

This step isn't about your product idea yet — it's about the environment your product will live in. A 2-person bootstrapped startup and a 50-person funded company will get very different suggestions in every subsequent step.

Think of it as giving Claude the same context a new hire would need on their first day. Who are we? What do we do? Who else is doing it?

What to prepare

  • Your company basics (stage, size, funding)
  • Team composition and skills
  • Tech stack and platform priorities
  • 3-5 competitors you've researched
  • Market gaps you've noticed

The conversation

Claude

Let's start with the basics. What does your company do, and what stage are you at?

You

We're a two-person startup building personal productivity tools. Pre-revenue, bootstrapped.

Claude

Got it. Two people — what does each person do? And what's your tech stack?

You

I'm the PM and my co-founder writes code. We're targeting mobile-first, building in React Native with Expo.

Claude

Who are your competitors in the habit tracking space? I need specifics — not just "there are some apps," but which ones and what they get right or wrong.

You

Habitica (gamified, complex — lots of features but overwhelming), Streaks (Apple-only, simple but limited), and Notion templates (flexible but zero guidance). The gap we see is between "too simple" and "too complex."

Full artifact

landscape.md # Landscape

## Company
- Name: (unnamed startup)
- Stage: Pre-revenue / Bootstrapped
- Team Size: 2 (PM + Engineer)
- Tech Stack: React Native, Expo
- Revenue: None (pre-launch)

## Domain
- Market: Personal productivity — habit tracking
- Competitors:
  - Habitica: Gamified habit tracker. Rich feature set but steep learning curve. Appeals to RPG fans.
  - Streaks: Apple-only, minimal UI. Simple but limited — no flexibility in streak definitions.
  - Notion templates: Infinitely flexible but no built-in guidance or motivation mechanics.
- Competitive Gap: No product occupies the middle ground between oversimplified trackers and complex gamified systems.

## Constraints
- Budget: Zero (bootstrapped)
- Timeline: Aiming for MVP in 8-12 weeks
- Platform: Mobile-first (iOS + Android via React Native)

Common mistakes

Being too vague about competitors. "There are some habit apps out there" gives Claude nothing to work with. Name them, describe what they do well and poorly.
Forgetting to mention constraints. Budget, team size, and technical limitations shape everything downstream. If you don't mention them now, Claude will suggest ideas you can't execute.
Describing what you want to build instead of the market you're entering. This step is about context, not solutions. Save your product ideas for later.

Tips & best practices

Research 3-5 competitors before running this. Spend 30 minutes trying competing products. First-hand experience makes your answers much richer.
Be honest about constraints. Claude calibrates all future suggestions to what you tell it here. Understating your constraints leads to over-scoped recommendations.
Include non-obvious competitors. Notion templates, spreadsheets, pen-and-paper — anything your users currently do instead of using a purpose-built tool.

When to re-run

  • Team size changes (hiring, co-founder leaves)
  • A major new competitor enters the market
  • You pivot to a different market or platform
Commit before re-running. This applies to every step: commit your current artifacts to git before re-running any command. Re-running overwrites the artifact, and git history lets you see how your thinking evolved.

How it connects

1

Constitution

/product-kit:constitution

"Make the hard tradeoffs before you write a line of code."

Why this step matters

Principles prevent scope creep. When your engineer asks "should we add social features?" the constitution gives you the answer without a debate.

Good principles are opinionated. "We value quality" isn't a principle — everyone values quality. "Simplicity over power" is a principle because it tells you what to sacrifice.

Without explicit principles, every feature request becomes a subjective argument. With them, you have a decision filter that works even when you're not in the room.

What to prepare

  • What your product stands for
  • What you will never do
  • Past decisions you regret or are proud of
  • How you'd resolve a tradeoff between speed and quality

The conversation

Claude

I see you're a 2-person team building mobile-first. Let's define what Streak stands for. What's the one word that describes the experience you want users to have?

You

Simplicity.

Claude

"Simplicity" is vague — Streaks is simple and you said it's too limited. What do you mean specifically? Give me a concrete example.

You

If a feature needs a tooltip or explanation, it's too complex. We'd rather have 3 features that feel effortless than 10 that need a tutorial.

Claude

That's much clearer. Now, when simplicity conflicts with what power users want, what do you sacrifice?

You

Power users. We're not building for people who want 50 configurable settings. If someone needs that, Habitica exists. We're building for people who gave up on habit apps because they were too much work.

Full artifact

constitution.md # Constitution

## Vision
A habit tracker so simple that keeping a habit feels easier than breaking one.

## Core Principles
1. Effortless over powerful — No feature should require an explanation. If it needs a tooltip, cut it.
2. Momentum over metrics — Show streaks and progress, not spreadsheets and charts. Feeling good matters more than data.
3. Forgiveness over perfection — Missing one day shouldn't feel like failure. The system should encourage recovery, not punishment.

## Non-Negotiables
- No dark patterns (guilt-based notifications, shame screens)
- No data selling — user data stays on device
- Free core experience — monetize through premium, not paywalls on basics

## Decision Framework
When in doubt: Does this feature make the app feel lighter or heavier? If heavier, cut it.

## Anti-Patterns
- Never add a feature just because a competitor has it
- Never show a user a wall of settings on first launch
- Never punish a missed day with negative messaging

Common mistakes

Listing principles that don't create tradeoffs. "We value quality" — who doesn't? A useful principle tells you what you're willing to sacrifice.
Having too many principles. If you have 5+, none of them are memorable. Aim for 3.
Not connecting principles to real decisions. Each principle should clearly tell you what NOT to build.

Tips & best practices

A good principle tells you what NOT to build. "Effortless over powerful" means no advanced configuration screens, no complex onboarding flows, no dense dashboards.
Revisit after step 5 (Validate). Real evidence may change your principles. That's a feature, not a bug.
Share the constitution with your whole team. It's only useful if everyone knows the rules.

When to re-run

  • After validation reveals your assumptions were wrong
  • When team values shift significantly
  • Before a major pivot

How it connects

2

Users

/product-kit:users

"'Everyone' is not a user."

Why this step matters

Building for "everyone" means building for no one. A specific persona lets you make decisions. Would "The Restarter" want this feature? You can answer that. Would "people" want it? You can't.

Personas turn subjective debates into testable claims. Instead of arguing about whether to add a feature, you ask: "Does our primary persona need this?" That question has a researchable answer.

What to prepare

  • Who you've talked to or observed
  • Demographic details and behavioral patterns
  • What they currently do to solve this problem

The conversation

Claude

I've read your landscape and constitution. Now — who has the problem you're solving? Who specifically keeps failing at building habits?

You

People who want to build habits.

Claude

That's 3 billion people. I need you to be much more specific. Who specifically keeps downloading habit apps and quitting? What's their life like? Walk me through their Tuesday.

You

Mid-20s professionals, 1-3 years into their career. They have unpredictable schedules — some days they're in back-to-back meetings, others they have free time. They've tried habit apps before, used each one for about two weeks, then forgot about them.

Claude

Good. What does their phone look like? How many habit apps are in their app graveyard?

You

They've downloaded 2-3 habit apps. Habitica was too complex, Streaks was too simple, and they tried a Notion template that they forgot about after a week. They feel guilty about it — "I can't even stick to a simple habit tracker."

Full artifact

users.md # Users

## Primary Persona: The Restarter
- Who: 24-30 year old professionals, 1-3 years into career
- Context: Unpredictable schedule, high ambition, low follow-through on personal goals
- Day-in-the-life: Wakes up with good intentions, gets pulled into work, remembers their habit at 10 PM, feels too tired, skips it, feels guilty
- Pain Points: Guilt from broken streaks, app fatigue, feature overload
- Current Solutions: Downloaded 2-3 habit apps, used each <2 weeks. Currently using nothing.
- Quote: "I can't even stick to a habit tracker, let alone the habit."

## Secondary Persona: The Optimizer
- Who: 28-35 year old self-improvement enthusiasts
- Context: Already tracks multiple habits, finds current apps too simple
- Pain Points: Wants flexible scheduling, finds binary daily tracking too rigid

## Key Insight
Both personas share a frustration with rigid daily tracking. The Restarter is punished by it, the Optimizer is limited by it. A flexible model serves both.

Common mistakes

Describing demographics without behaviors. "25-35 year olds" tells you nothing useful. What do they do? What have they tried? What failed?
Making the persona aspirational. "They love self-improvement" is what you wish they were. "They've failed at 3 habit apps" is what they actually are.
Not including current workarounds. What users do today tells you what you need to beat. It's the most actionable part of the persona.

Tips & best practices

Base personas on real people. Even 2-3 conversations with real users makes your persona dramatically better than one based on imagination.
Include a direct quote. A real quote grounds the persona and makes it memorable for your team.
The "current solutions" section is gold. It tells you what to beat and reveals unmet needs.
B2B / Enterprise adaptation: In B2B, your "user" is often multiple people — the buyer, the admin, and the end user have different goals. Create separate personas for each role. The buyer cares about ROI and compliance; the end user cares about daily workflow. Don't collapse them into one persona.

When to re-run

  • After validation interviews reveal a different primary user
  • When you discover a new user segment you hadn't considered

How it connects

3

Problem

/product-kit:problem

"'We need an app' is a solution. What's the problem?"

Why this step matters

Most failed products solve the wrong problem. They build what the founder wants, not what users need. Problem framing is the highest-leverage activity in product development — get it wrong and nothing downstream matters.

The key insight: separate the problem from the solution. "We need a habit tracker" is a solution. The problem is something deeper — something that could be solved in multiple ways.

What to prepare

  • User pain points from step 2
  • Specific moments where the problem occurs
  • Root causes vs. symptoms

The conversation

Claude

I've read your users.md. The Restarter feels guilty about broken streaks. What specific pain is driving that guilt?

You

They can't stick to habits. They start strong but always fall off after a week or two.

Claude

That's the symptom, not the problem. Lots of people can't stick to habits for lots of reasons — low motivation, bad goals, unrealistic expectations. What specifically goes wrong for your user?

You

They miss one day because of their unpredictable schedule, the app resets their streak to zero, and they feel like they've failed completely. So they give up.

Claude

Now we're getting somewhere. Is the problem that they break streaks, or that breaking a streak feels like total failure?

You

The second one. They will miss days — that's inevitable. The problem is that existing apps treat a missed day as a reset, so they give up entirely instead of recovering.

Full artifact

problem.md # Problem

## Problem Statement
Existing habit trackers punish imperfection. When users with unpredictable schedules inevitably miss a day, the all-or-nothing streak model makes them feel like they've failed completely, leading to abandonment rather than recovery.

## Impact
- Frequency: Happens within 2-3 weeks of starting any new habit app
- Severity: Users don't just stop using the app — they stop trying to build the habit entirely
- Scope: Affects anyone with an unpredictable schedule (most working professionals)

## Root Cause
The streak model is binary: you either did it today or you didn't. This works for people with perfectly consistent schedules but fails everyone else. One missed day erases weeks of progress, making the psychological cost of failure disproportionate to the actual setback.

## Current Alternatives & Why They Fail
- Habitica: Still uses daily tracking. Gamification distracts from the core problem.
- Streaks: Rigid daily model with no flexibility in definitions.
- Manual tracking: No motivation or accountability mechanics.

## Scope
- In: How habit apps handle imperfect adherence
- Out: Motivation, goal-setting, social accountability (for now)

## Success Criteria
Users continue tracking habits after missing a day, instead of abandoning the app.
Notice the shift: We went from "build a habit tracker" to a specific, testable problem — "streak models punish imperfection, causing abandonment." That's a much better foundation.

Common mistakes

Describing a solution disguised as a problem. "The problem is there's no app that does X" is a solution, not a problem. The problem should describe pain, not a missing product.
Being too broad. "People struggle with habits" is a PhD thesis, not a product problem. Narrow it to a specific mechanism.
Not separating root cause from symptoms. "Users quit" is a symptom. "The streak model punishes imperfection" is the root cause.

Tips & best practices

Ask "why?" five times. Keep digging until you find the root cause. "They quit" → "Why?" → "Streak reset" → "Why is that bad?" → "It erases progress" → "Why does that matter?" → "It feels like total failure."
A good problem statement doesn't mention your product. If you can't describe the problem without describing the solution, you haven't found the real problem yet.
B2B / Enterprise adaptation: Enterprise problems often have organizational friction layered on top. "The tool is slow" might really be "the procurement process took 6 months, so by the time we got it, our needs changed." Dig into process and organizational pain, not just product pain.

When to re-run

  • After validation proves the problem is different than expected
  • When user research reveals a deeper root cause

How it connects

4

Assumptions

/product-kit:assumptions

"What are you betting on without knowing it?"

Why this step matters

Every product decision is built on assumptions. Most teams don't make these explicit, so they never test them. The assumptions step surfaces your hidden bets and ranks them by risk.

This is what separates validated products from expensive guesses. If your critical assumptions are wrong, you'll find out before you build, not after.

What to prepare

  • Nothing — Claude reads your artifacts and surfaces assumptions automatically
  • Think about what you're least sure about

The conversation

Claude

I've analyzed your landscape, constitution, users, and problem artifacts. I found 7 assumptions embedded in your work. Let me walk you through the riskiest ones. First: you're assuming users abandon because of the streak model, not because of low motivation. If it's actually low motivation, your entire solution is misdirected.

You

That's our biggest bet, yeah. We're not 100% sure that's the cause.

Claude

That's a critical assumption then — low confidence, high impact. Here's another: you're assuming "Restarters" exist as a distinct, reachable segment. If they're not reachable through online communities, your go-to-market strategy fails. How confident are you about that?

You

Honestly, not very. We think they're on Reddit and Twitter but we haven't tested it.

Full artifact

assumptions.md # Assumptions

## Critical (Low Confidence, High Impact)
1. Users abandon because of the streak model, not low motivation
  - Source: problem.md
  - Confidence: Low
  - Risk: If it's low motivation, a forgiving streak model won't help
  - Test: Interview 5-8 lapsed habit app users about why they quit

2. "Restarters" exist as a distinct segment and are reachable
  - Source: users.md
  - Confidence: Low
  - Risk: If not a real segment, positioning and marketing fail
  - Test: Survey in r/getdisciplined and r/habits

## Important (Medium Confidence)
3. A "weekly rhythm" model would feel more forgiving than daily streaks
  - Source: constitution.md principle #3
  - Test: Prototype test with 5 users

4. Mobile-first is the right platform choice
  - Source: landscape.md
  - Test: Check where users currently track habits

## Minor (High Confidence)
5. Users prefer simple UI over feature-rich UI
  - Source: constitution.md, competitive analysis
  - Confidence: High (supported by competitor reviews)

## Validation Priority
1. Assumption #1 (streak model causation)
2. Assumption #2 (segment reachability)
3. Assumption #3 (weekly rhythm preference)

Common mistakes

Only listing assumptions you're already confident about. The valuable assumptions are the ones that make you uncomfortable. Those are the ones that need testing.
Not having a testable hypothesis. Every critical assumption needs a concrete test. "Interview 5-8 users" is testable. "Do more research" is not.

Tips & best practices

The most dangerous assumptions are the ones you don't realize you're making. That's why Claude reads your artifacts — it catches things you take for granted.
Focus validation effort on Critical assumptions only. Don't boil the ocean. Test the 2-3 bets that would kill your product if wrong.
If you have 0 critical assumptions, you're not being honest enough. Every early-stage product has bets. Surface them.

When to re-run

  • After completing validation (to update confidence levels)
  • When new information surfaces that changes your risk profile

How it connects

5

Validate

/product-kit:validate

"Don't guess. Go find out."

Why this step matters

This is where Product Kit earns its keep. Assumptions without evidence are just opinions. Validation turns opinions into data.

This step generates the actual instruments — interview scripts, survey questions, CSV templates — and then processes the results. It's the step that takes real-world effort: you have to talk to people.

It's also the step most people skip. Don't. The cost of 5-8 interviews is a few hours. The cost of building the wrong product is months.

What to prepare

  • Access to potential users (Reddit communities, Twitter, personal network)
  • 1-2 weeks for data collection
  • A willingness to hear that you're wrong

Finding your first 5 interviewees

The hardest part of validation isn't the interview — it's finding people to talk to. Here's where to start:

  • Reddit & online communities: Post in subreddits related to your problem space. For Streak, r/getdisciplined, r/habits, and r/productivity are goldmines. Write a short post: "I'm researching why people quit habit apps. If you've downloaded and abandoned one, I'd love a 15-minute chat. DM me."
  • Twitter/X: Search for people complaining about the problem. "quit habit app" or "broken streak" surface real frustrations in real time.
  • Your extended network: Post on LinkedIn or ask friends to introduce you. "Know anyone who's tried and quit a habit tracker?" One warm intro often leads to 2-3 more.
  • In-person: Coffee shops, co-working spaces, meetups. "Quick question — have you ever used a habit tracking app?" works surprisingly well.
  • Respondent services: If you have budget, tools like Respondent.io or User Interviews can find screened participants in 24-48 hours.
Outreach template: "Hi! I'm building a [category] app and trying to understand why people [problem]. Would you be open to a 15-minute call? No sales pitch — just questions. Happy to send a $10 coffee gift card as thanks."

The conversation (Phase 1: Generate instruments)

Claude

I'll create validation instruments for your 2 critical assumptions. For assumption #1 (streak model causes abandonment), here's an interview guide with open-ended questions that avoid leading the witness.

You

Great. What about reaching Restarters for the survey?

Claude

For assumption #2, here's a short survey you can post in r/getdisciplined and r/habits. I've also created a CSV template for recording interview responses so I can analyze them later.

The conversation (Phase 2: After data collection)

You

I interviewed 8 people and have notes. I also got 45 survey responses from Reddit.

Claude

Drop the files into your project and let me analyze them. ... Done. 6 of 8 interviewees cited "streak reset" as the specific moment they quit. Here's the strongest quote: P3 said "I missed one day and it said Day 1. I just deleted it." Your first critical assumption is validated with strong evidence.

You

What about the Restarter segment?

Claude

Your survey got 45 responses in 3 days from two subreddits. 38 of 45 respondents matched the Restarter profile. The segment exists and is reachable. Both critical assumptions are now validated.

Full artifact

validation.md (after data collection) # Validation Results

## Assumption Status

1. Users abandon because of the streak modelVALIDATED
  - Evidence: 6/8 interviewees cited streak reset as the moment they quit
  - Key quote: P3: "I missed one day and it said Day 1. I just deleted it."
  - Quality: Strong (consistent pattern across diverse interviewees)

2. Restarters exist as a reachable segmentVALIDATED
  - Evidence: 38/45 survey respondents matched the Restarter profile
  - Channels: r/getdisciplined (28 responses), r/habits (17 responses)
  - Quality: Moderate (self-selected sample, but strong signal)

## Validation Gate: PASSED
All critical assumptions validated. Cleared to proceed to Solution.

What if an assumption fails?

You

I interviewed 6 people and... they actually said the streak reset wasn't a big deal. Most of them quit because they simply forgot about the app after a few days. Notifications didn't help either.

Claude

That's a significant finding. Your critical assumption #1 is invalidated — the root cause isn't the streak model, it's forgetting. This changes everything downstream. Here's what I recommend:

1. Update problem.md: The root cause shifts from "streak punishment" to "habit apps fail to stay top-of-mind." That's a different problem with different solutions.
2. Re-run /product-kit:solution: Weekly rhythms may still help, but the core solution should focus on re-engagement, not forgiveness.
3. Don't panic: This is exactly why you validate. You just saved months of building the wrong thing.

Invalidation is a success, not a failure. You spent a few hours discovering that your original direction was wrong. Without validation, you'd have built an entire app around streak forgiveness — and still had the same abandonment problem.
Gate: If critical assumptions still have [PENDING] evidence, /product-kit:solution will refuse to proceed. This prevents building on guesses.

Common mistakes

Only talking to friends and family. They'll tell you what you want to hear. Talk to strangers who match your persona.
Asking leading questions. "Don't you hate when apps reset your streak?" will always get a "yes." Ask open-ended questions instead.
Skipping this step. "We're agile, we'll learn after launch" is how you waste months building the wrong thing.

Tips & best practices

5-8 interviews is enough. You're looking for patterns, not statistical significance. After 5 interviews, you'll start hearing the same things.
If an assumption is invalidated, that's a SUCCESS. You just avoided building the wrong thing. Pivot the assumption, not the whole product.
B2B / Enterprise adaptation: Enterprise validation looks different. You may not have access to 45 survey respondents — 3-5 conversations with decision-makers at target companies can be enough. Focus on procurement blockers, integration requirements, and compliance needs. A single enterprise sales call often reveals more than dozens of consumer surveys.

When to re-run

  • When you collect new data
  • When you discover new assumptions to test

How it connects

6

Solution

/product-kit:solution

"NOW you can brainstorm what to build."

Why this step matters

Notice that solution is step 6, not step 1. Most teams start here. Product Kit makes you earn the right to brainstorm by validating the problem first.

Now that you know the problem is real and your assumptions are tested, solution ideas are grounded in evidence, not fantasy. Every option gets evaluated against what you've learned.

What to prepare

  • Any solution ideas you've been sitting on (it's normal to have them by now)
  • Competitor approaches you've studied

The conversation

Claude

Your validation confirmed that streak resets cause abandonment. Let me generate 5 solution approaches. Option A: Streak Freeze (let users pause streaks). Option B: Grace Days (allow 1-2 misses per streak). Option C: Weekly Rhythm (track "days this week" with a flexible target). Option D: Decay Model (streaks fade slowly instead of resetting). Option E: Milestone Streaks (celebrate milestones, ignore daily gaps).

You

I like Option C — the weekly rhythm idea.

Claude

What about Option A? Streak Freeze is simpler to build — just add a "pause" button.

You

It still punishes missing days, just delays the punishment. That conflicts with our constitution principle — "Forgiveness over perfection."

Claude

Good point — Option C aligns better with your principles. It reframes the whole mental model: instead of "did I do it today?" it becomes "am I on track this week?" That's fundamentally more forgiving. Let's scope the v1.

Full artifact

solution.md # Solution

## Options Considered
A. Streak Freeze — Pause streaks manually. Pro: Simple. Con: Still binary thinking; conflicts with "forgiveness" principle.
B. Grace Days — Allow 1-2 misses. Pro: Familiar model. Con: Still punishes the 3rd miss; arbitrary threshold.
C. Weekly Rhythm — Track days per week with flexible target. Pro: Reframes the mental model; aligns with constitution. Con: Unfamiliar to users; needs clear onboarding.
D. Decay Model — Streaks fade over days. Pro: Gradual. Con: Hard to visualize; feels abstract.
E. Milestone Streaks — Celebrate milestones only. Pro: Positive. Con: No daily accountability.

## Recommended Solution: Weekly Rhythm (Option C)
Replace all-or-nothing daily streaks with a "weekly rhythm" model. Users set a target like "4 out of 7 days." Missing a day doesn't break anything — you see progress toward the week's goal. Streaks count weeks met, not consecutive days.

## v1 Scope
- Core rhythm tracker with weekly targets
- Simple daily check-in (one tap)
- Recovery nudge after a missed day

## Deferred to v2+
- Social accountability (unvalidated — low confidence)
- Advanced analytics / charts (conflicts with "momentum over metrics")
- Integrations with other apps

Common mistakes

Falling in love with the first idea. Generate at least 3-5 options before choosing. Your first idea is rarely your best.
Including features that conflict with your constitution. Check every feature against your principles. If it conflicts, cut it.

Tips & best practices

The best v1 solves ONE problem well. Resist the urge to solve everything. You can always add features later.
"Deferred" doesn't mean "never." It means "not yet." Explicitly listing deferred features prevents scope creep.

When to re-run

  • When validation results change
  • When you realize the solution doesn't address the root cause

How it connects

7

Prioritize

/product-kit:prioritize

"What's the smallest thing you can ship that solves the core problem?"

Why this step matters

Even a focused solution has features that can be ranked. Prioritization prevents scope creep by forcing every feature to justify itself with a score. It's the difference between "let's add this too" and "that scored 1.5, it's v2."

What to prepare

  • The solution from step 6
  • A sense of engineering effort for each feature (or access to your engineer)

The conversation

Claude

I've broken your solution into 8 features and scored each one using Impact × Confidence / Effort. The top scorer is "Single daily check-in" at 20.0 — high impact, high confidence, low effort. Social accountability scored 1.5 — I'd cut it from v1.

You

Agreed on cutting social. What about the recovery nudge?

Claude

It scored 8.0 — solid but not essential. Should it be must-have or nice-to-have?

You

Nice-to-have. The core rhythm tracker comes first.

Claude

One note: these effort scores are PM estimates. Share this table with your engineer before we write the spec — engineering effort is the most commonly misjudged score.

Full artifact

priorities.md # Priorities

| Rank | Feature | Impact | Confidence | Effort | Score | Status |
|------|---------|--------|------------|--------|-------|--------|
| 1 | Single daily check-in | 5 | 4 | 1 | 20.0 | v1 must-have |
| 2 | Weekly rhythm tracker | 5 | 5 | 2 | 12.5 | v1 must-have |
| 3 | Week completion celebration | 4 | 4 | 1 | 16.0 | v1 must-have |
| 4 | Recovery nudge | 4 | 4 | 2 | 8.0 | v1 nice-to-have |
| 5 | Habit creation flow | 3 | 5 | 1 | 15.0 | v1 must-have |
| 6 | Progress visualization | 3 | 3 | 2 | 4.5 | v2 |
| 7 | Notifications | 3 | 3 | 3 | 3.0 | v2 |
| 8 | Social accountability | 3 | 2 | 4 | 1.5 | v2 |

## Engineering Review
Effort scores are PM estimates. Engineer validation required before spec.

Common mistakes

Not having your engineer validate effort scores. PMs consistently underestimate effort. Get engineering input before committing.
Making everything a "must-have." If you have more than 3-4 must-haves for v1, you're not prioritizing hard enough.

Tips & best practices

Confidence should be based on validation evidence, not gut feeling. Features backed by interview data get high confidence. Features based on "we think users want this" get low confidence.
Low-effort, high-impact features are your best friends. Ship those first.

When to re-run

  • After engineering review changes effort scores
  • When scope changes significantly

How it connects

8

Spec

/product-kit:spec

"Everything synthesized. Every claim traced. Ready to build."

Why this step matters

The spec is the culmination of all previous work. It synthesizes 8 artifacts into one document that an engineer can build from. Every feature traces to evidence. Every decision traces to a principle.

It's not a requirements doc — it's a validated blueprint. The difference? Requirements docs say "build this feature." A spec says "build this feature because 6 of 8 users cited this as their #1 pain point."

What to prepare

  • All 8 previous artifacts complete
  • Priorities reviewed by engineering

The conversation

Claude

I'll synthesize all your artifacts into a product spec. Every feature will link back to validation evidence and constitution principles.

You

Sounds good. Include acceptance criteria for every feature — specific, testable conditions that must be true for a feature to be considered done.

Claude

Done. I also noticed a gap — your success metrics don't include a retention metric, which is critical given your problem is about abandonment. Want me to add week-over-week retention as a primary metric?

You

Yes, that's a good catch. Add it.

Full artifact

spec.md # Product Spec: Streak

## Overview
A mobile habit tracker that replaces all-or-nothing daily streaks with a forgiving "weekly rhythm" model, designed for 24-30 year old professionals with unpredictable schedules who've abandoned 2-3 habit apps.

## Principles (from constitution.md)
1. Effortless over powerful
2. Momentum over metrics
3. Forgiveness over perfection

## Target User: The Restarter (from users.md)
24-30 year old professionals. Unpredictable schedules. Downloaded 2-3 habit apps, used each <2 weeks.

## Problem (from problem.md)
All-or-nothing streak models punish imperfection, causing abandonment. 6/8 interviewees confirmed (validation.md).

## v1 Features

### Weekly Rhythm Tracker — Must Have
- What: Users set a weekly target (e.g., 4/7 days). Progress shown as filled dots.
- Why: 6/8 interviewees cited streak resets as why they quit (validation.md)
- Acceptance criteria:
  ☐ User can create a habit with a weekly frequency target
  ☐ Missing a day does not reset progress
  ☐ Weekly completion is celebrated, not daily perfection

### Single Daily Check-in — Must Have
- What: One-tap "did it" button. No forms, no details.
- Why: Constitution principle #1 (effortless over powerful)
- Acceptance criteria:
  ☐ Check-in takes <2 seconds
  ☐ No required fields beyond the tap

## Success Metrics
- Week-over-week retention (primary)
- % of users who continue after missing a day
- Weekly rhythm completion rate

## Risks
- Users may not understand the weekly model without onboarding
- Weekly rhythms may feel less motivating than daily streaks for some users

## Appendix
Source artifacts: landscape.md, constitution.md, users.md, problem.md, assumptions.md, validation.md, solution.md, priorities.md
The result: A spec grounded in real evidence. Not "I think users want this" but "6 out of 8 people we talked to said this." Hand spec.md to Claude and start building.

Common mistakes

Treating the spec as final. It should evolve as you build and learn. Update it when priorities shift.
Adding features at spec time that weren't prioritized. If it didn't go through prioritization, it doesn't belong in the spec.

Tips & best practices

Hand spec.md to Claude with "build this." The spec has everything needed — target user, problem, features, acceptance criteria.
Run /product-kit:audit after building. It compares what you specced vs. what you built and flags gaps.

When to re-run

  • When priorities change after engineering review
  • After significant user feedback post-launch

How it connects

Utility Commands

These commands can be run at any point in the workflow. Here's how each one helps during the Streak project.

Clarify

/product-kit:clarify

After writing solution.md, you notice it mentions a "daily check-in" but your constitution says "no daily pressure." Is that a contradiction?

Claude

I found a potential contradiction. Your solution includes a "single daily check-in" but your constitution principle #2 says "momentum over metrics — no daily pressure." However, looking deeper, the check-in is optional and frictionless (one tap). The constitution prohibits pressure, not daily actions. Recommendation: clarify in the spec that the check-in has no penalty for skipping.

Analyze

/product-kit:analyze

After completing step 5, run Analyze for a quality check.

Claude

Score: 7/10. Strengths: strong validation evidence, clear problem framing, well-defined persona. Gaps: competitive differentiation is weak — your problem statement doesn't explain why you will solve this better than a competitor who copies the weekly rhythm model. Recommendation: add a "why us" section to your problem statement.

Bootstrap

/product-kit:bootstrap

Your friend has a half-built habit app with no product artifacts. Run Bootstrap on their codebase.

Claude

I scanned the codebase and found: a React Native app with daily tracking, streak counting, and push notifications. I've auto-drafted users.md (inferred from UI text), problem.md (inferred from the streak reset handler), and solution.md (the current implementation). Review each one and refine — these are starting points, not finished artifacts.

Audit

/product-kit:audit

After building Streak v1, run Audit to compare spec vs. code.

Claude

Spec vs. Code Comparison:
• Weekly rhythm tracker — Implemented
• Single daily check-in — Implemented
• Recovery nudge — Not implemented (was nice-to-have)
• Settings page — Built but not specced (scope creep?)

Learn

/product-kit:learn

You found 3 articles about habit formation psychology and saved them to knowledge/.

Claude

Indexed 3 files into knowledge-index.md. Key findings: "implementation intentions" increase habit adherence by 2-3x (source: habit-research.pdf). I'll reference this in future artifact drafts when relevant.

Tech Review

/product-kit:techreview

Before writing stories, run Tech Review so engineering concerns are surfaced early.

Claude

Feature-by-feature feasibility:
• Weekly rhythm tracker — Straightforward, ~2 days. Uses standard date math.
• Recovery nudge — [Needs engineering input] Push notification timing depends on OS-level scheduling constraints. Consider local notifications vs server-triggered.
• Social accountability — High effort (~2 weeks). Requires real-time sync, friend graph, privacy controls. Recommend deferring to v2.

Stories

/product-kit:stories

After Tech Review, break the spec into implementable stories.

Claude

Epic 1: Core Tracking
• E1-S1: Weekly rhythm setup — Must-have, 3 pts
• E1-S2: Daily check-in — Must-have, 2 pts, depends on E1-S1
Epic 2: Engagement
• E2-S1: Recovery nudge — Nice-to-have, 5 pts
Each story has acceptance criteria and definition of done. Export with productkit export --stories-csv for Jira/Linear import.

Next Steps

You've gone from "I want to build a habit tracker" to a validated, evidence-based product spec. Along the way, you:

  • Mapped your competitive landscape and constraints
  • Defined principles that filter every decision
  • Built a specific, research-grounded persona
  • Framed a testable problem (not a solution in disguise)
  • Surfaced your hidden assumptions and tested them
  • Brainstormed solutions grounded in evidence
  • Prioritized ruthlessly with a scoring framework
  • Synthesized everything into a build-ready spec

The spec is your handoff to development. Give spec.md to Claude and start building the right thing. Need to share your artifacts with stakeholders? Run productkit export to bundle all artifacts into a single markdown file.

Back to Product Kit View on GitHub