Building the Voice Coach — Step FWD Devlog

Most walking apps motivate you with notifications. A buzz in your pocket, a banner on your lock screen, maybe a confetti animation if you hit a number. We wanted something different — a voice that walks with you. Not a chatbot, not a narrator. A calm presence that knows when to speak and, more importantly, when to stay quiet.

The voice

His name is Kit. Male, measured, calm authority — inspired less by a fitness instructor and more by a co-pilot. Every line was written before a single word was recorded. The voice spec came first, grounded in sports psychology research: autonomy-supportive language from Self-Determination Theory (Ryan & Deci, 2000), identity framing from Carol Dweck’s work on process praise (Mueller & Dweck, 1998), and brevity principles from motor learning research (Wulf, 2013).

The rules were strict. No exclamation marks. No “Great job!” No controlling language like “Don’t stop now.” Kit says things like “You showed up” and “This is discipline” — acknowledging agency, never manufacturing guilt.

157 audio files

The voice coach ships with 157 professionally generated MP3 files, organized across six categories:

Step milestones — 51 files covering 17 thresholds (1,000 to 60,000 steps), each with 3 variants
Time milestones — 45 files for 15 checkpoints (5 minutes to 8 hours), each with 3 variants
Goal progress — 9 files for halfway, approaching, and reached states
Back-and-return — 6 variants for the turnaround point
Session lifecycle — 6 files for start and end
Soft lines — 40 ambient motivational, form, focus, and chunking lines

Three variants per trigger is the minimum for the system to feel alive. With fewer, you’d hear the same line twice in an hour. With more, you’re shipping audio files users will never hear.

Milestone detection

The voice coach runs on a dual-track system. Steps and time are tracked independently, each with their own milestone thresholds.

Step milestones are tiered: early engagement (1K–10K), builder (12K–20K), endurance (25K–35K), and ultra (40K+). Time milestones follow a similar pattern, from 5-minute check-ins up to 8-hour markers for serious walkers.

When your step count crosses a threshold, the system fires — but only if that milestone hasn’t already been announced. And when it fires, every lower milestone gets marked as crossed too. This prevents the awkward scenario where you start a walk already at 8,000 steps and get congratulated for hitting 1,000.

Anti-repetition

Three variants per trigger isn’t enough if the selection is random. You’d still hear the same line back-to-back by chance.

The solution is a rolling window. The system maintains a list of the last 3 primary lines and the last 3 follow-up lines played. When selecting a variant, it filters out anything in the recent window. If all variants are recent (which can happen with 3-variant triggers), it falls back to random selection from the full set. The window is FIFO — oldest entries drop off as new ones are added.

Simple, effective, and no user ever notices the machinery behind it.

Composition and follow-ups

Most announcements aren’t a single line. They’re a composition: a primary line, a 1.75-second gap, and then optionally a follow-up soft line.

The follow-up probability varies by context. Goal reached? 100% — Kit always has something to say when you hit your target. Standard step milestone? 50%. Ultra-distance milestone past 40,000 steps? 30% — by that point, you’ve been walking for hours. Less is more.

The composition system handles timing precisely: 350ms pre-roll silence, the primary audio, a 1.75-second gap (long enough to feel like a natural pause, short enough to feel connected), the follow-up audio, and 350ms post-roll silence.

Background audio ducking

Most of our users will be listening to something while they walk — music, podcasts, audiobooks. Kit needs to play nicely with that.

We use AVAudioSession with the .duckOthers option. When Kit speaks, the system automatically lowers the volume of whatever else is playing. When he finishes, .notifyOthersOnDeactivation tells the other app to restore its volume. The user’s music dips gently, Kit says his piece, and the music comes back up. No hard cuts, no jarring interruptions.

This only activates during actual playback, not during the gaps between primary and follow-up lines. The ducking envelope wraps the entire composition as one unit.

Cooldowns and soft lines

There’s a gap-filler system for long walks. If 20 minutes pass without any milestone triggering, Kit drops in a soft line — a brief motivational or form cue. “Just the next five minutes.” “Shoulders down.” These are chosen from a pool of 40 lines with the same anti-repetition logic.

The system also respects a global cooldown of 90 seconds between any announcements, preventing rapid-fire triggers when multiple milestones hit close together.

Priority

When multiple milestones trigger simultaneously (crossing a step and time threshold in the same update cycle), the system resolves conflicts with a priority ladder. Goal reached takes the top spot, followed by ultra-distance steps, hour-mark time milestones, standard steps, standard time, and finally soft lines at the bottom.

The lower-priority announcement doesn’t get lost — it stays queued for the next valid window.

What we learned

The voice coach went through several iterations. Early versions were too chatty — hitting milestones every few minutes felt like being nagged. The tiered milestone spacing fixed that: announcements naturally thin out as your walk gets longer, matching the psychological profile of someone deep into an endurance walk.

We also learned that brevity matters more than content. Seventy percent of Kit’s lines are five words or fewer. The best line in the system might be two words: “Still here.” It says everything.

The voice coach is optional. You can walk in silence and Step FWD works exactly the same. But for the walks where you want a quiet companion who respects your pace and your headspace — Kit is there.