Diagnostic & Vocabulary Assessment Game Design for Word Tag
The Word Tag Assessment Project has two parts: a short onboarding diagnostic that estimates learners' vocabulary levels and personalizes content from the start, and Word Fair, a recurring assessment that tracks vocabulary growth and provides data for long-term adaptation. This page shows both sections—click each navigation item to learn more.
Role
Skills & Tools
Work Responsibilities
- Designed the Onboarding Diagnostic Word Game and Word Fair mini-games from scratch. Enhanced diagnostic precision and learner engagement through extensive diagnostic word game research and competitive analysis.
- Strengthened product usability and iteration by leading 2 rounds of comprehensive playtests, including protocol design, participant recruitment, and post-test analysis to identify issues and prioritize refinements.
- Facilitated transparent design communication with modular and accessible Design Guides that clarified design rationale and specifications.
- Collaborated closely with cross-functional teams including product managers, developers, and designers to ensure seamless integration of game mechanics with educational objectives and technical feasibility.
Overview
The onboarding diagnostic is a short, game-based assessment that runs during a new player's first login. It quickly evaluates each learner's vocabulary level so that content can be personalized from the very first session.
The Problem
Currently, Word Tag relies on a learner's grade level to set the starting point. While the CatBoost algorithm adapts content over time, it takes several days of gameplay before meaningful adjustments occur. This delay can cause early content mismatches, reducing engagement and learning effectiveness.
The Opportunity
The onboarding diagnostic introduces a fast, approximately two-minute assessment at the start of the game. This diagnostic immediately identifies the learner's vocabulary level, enabling accurate personalization from the first session and creating a smoother, more engaging experience.
Design Tools & Process

Onboarding Diagnostic Design
Assessment Types
The diagnostic includes two multiple-choice question formats, chosen for speed and reliability:
Synonym Matching
Players are shown a target word and select a synonym from three options.
Word-in-Context
Players read a sentence with a blank and select the word that best fits from three options.
Distractor Design Framework
To ensure clear, fair assessment items, each multiple-choice question follows a research-based three-option structure designed to minimize confusion while accurately measuring vocabulary knowledge.
Standard Distractor Pattern - Each question includes:
Correct Answer
A clear synonym or contextually appropriate word
Distractor 1
An antonym or contrasting concept
Distractor 2
A "near-miss" option—thematically related but semantically distinct
Special Cases:
When target words lack clear antonyms, we use contextually contrasting or commonly confused terms. This maintains the framework's consistency while ensuring all distractors serve their diagnostic purpose.
Distractor Effectiveness Validation:
Based on research by Gierl et al. (2017), we recommend using this model to evaluate distractor effectiveness:
- • Distractors selected less than 5% indicate they're too implausible and should be replaced
- • Distractors selected more than 25% suggest they're too attractive or ambiguous
- • Ideal distractors fall within the 5%-25% selection range, indicating they're plausible but distinguishable from correct answers
Word Selection
Words are selected and organized by Lexile levels, aligning with Word Tag's vocabulary structure. This ensures consistency between the assessment and the game's content progression.
The current prototype and conceptual design were developed for Grade 3 learners as the initial focus. The framework can be extended to additional grade levels in future iterations.
Grade-Based Difficulty Bands:
300–500L
600–800L
900–1100L
Question Distribution
The diagnostic includes a carefully balanced mix of items across difficulty levels:
Synonym Questions (67 total):
- • 15 easy-level words
- • 30 medium-level words
- • 22 hard-level words
Word-in-Context Questions (22 total):
- • 6 easy-level passages
- • 9 medium-level passages
- • 6 hard-level passages
Adaptive Logic
The onboarding diagnostic adjusts question difficulty dynamically based on the learner's responses:
- • Correct answer: The next item becomes slightly more difficult.
- • Incorrect answer: The next item becomes slightly easier.
This approach allows the diagnostic to quickly converge on an accurate estimate of the learner's vocabulary level within a short session.
Item Response Theory (IRT)
The adaptive sequence is powered by Item Response Theory (IRT), a widely used model in educational assessment that estimates learner ability based on both item difficulty and item characteristics.
Each word is tagged with properties such as Lexile level, word frequency, word length, and age of acquisition. These attributes enable the adaptive model to make statistically grounded, pedagogically meaningful adjustments in real time.
The full diagnostic session runs for approximately 138 seconds (2 minutes 18 seconds), including transition animations, ensuring players have about 2 minutes of active response time.
Game Specifications
Structure
- • Questions alternate in a repeating pattern: 5 synonym → 1 word-in-context
- • Average: 18+ questions completed in 2 minutes
Timing
- • Players have 5 seconds per question.
- • Faster answers preserve unused time, allowing more questions to be answered.
- • No answer within 5 seconds counts as a miss, and the next question begins automatically.
Mechanics
- • The player character, Roxy, runs along a 3-lane track.
- • For each question:
- • Target word or context sentence appears at the top.
- • After 1 second, three options appear (one per lane).
- • To answer, players tap the word they want to choose. Roxy immediately switches to that lane.

Feedback
Immediate visual and audio feedback helps players stay engaged and understand their progress. There are three feedback states: Correct, Incorrect, and Missed.
Correct Feedback
Shared elements:
- • Brick's face icon with a green checkmark
- • Selected word turns green
- • Green vignette glow around the screen
- • Uplifting cheer or celebration sound


Behavior by question type:
- • Synonym matching: The chosen word rises beneath the prompt.
- • Word-in-context: The chosen word rises and fills the blank correctly.
Incorrect Feedback
Shared elements:
- • Brick's face icon with a red crossmark
- • Selected word turns red
- • Red vignette glow around the screen
- • Sympathetic "aww" sound


Behavior by question type:
- • Synonym matching: The chosen word rises beneath the prompt but is marked incorrect.
- • Word-in-context: The chosen word rises but fails to fill the blank.
Missed Feedback
If the timer runs out before a response:
- • All options slide past the player and disappear
- • The unselected word hits the player character, causing Roxy to stumble
- • Brick's face icon with a red crossmark
- • Red vignette glow around the screen
- • Sympathetic "aww" sound


Game Flow
Character Spotlight: Brick & Roxy
To ensure a consistent tone across the Word Tag experience, the onboarding diagnostic features two established characters: Brick and Roxy. These characters guide players into the diagnostic and appear throughout the gameplay experience.
Brick
A sporty and enthusiastic hippo who thrives on competition and physical activity. Brick brings energy and confidence to fast-paced moments, balanced by a sincere, trusting, and lighthearted personality.

Roxy
A bold and curious fox with a zest for life. She jumps into new situations without hesitation and balances courage with creativity and determination, maintaining a thoughtful, compassionate streak.

Entry (First Login)
- • When new users log into Word Tag for the first time, during onboarding, the diagnostic launches automatically.
- • Brick welcomes players with an enthusiastic greeting and guides them into the diagnostic experience.

Game Tutorial
After the welcome screen, players click "Let's go" to enter the Game Tutorial interface where Brick introduces:
The tutorial ends with a clear prompt to begin.



Countdown
A 3-second countdown signals the start of the session.
Gameplay
- • Questions follow a repeating pattern of 5 synonym questions followed by 1 word-in-context question.
- • Players tap a word to select their answer; Roxy moves to that lane automatically.
- • Immediate feedback is provided after each response.
- • Each question allows up to 5 seconds, with unused time rolling over to allow more total questions within the 2-minute limit.

Completion & Rewards
At the end of the session, a summary screen displays:
- • Total questions answered correctly
- • Notification of earned rewards
Players return to the plaza where rewards become available.

User Testing & Iteration
To validate and refine the diagnostic design, we conducted two rounds of user testing with child participants, focusing on gameplay flow and pacing, Cognitive Load, Feedback Effectiveness, Engagement & Motivation, Timing Appropriateness, and Reward Clarity.

Prototype Scope
All testing was conducted using a standalone HTML prototype that represented a minimum viable version of the diagnostic experience. This prototype simulated the core interaction flow, game mechanics, and feedback system without integrating into the full Word Tag game. This allowed the team to evaluate the assessment in a controlled environment.
For more details, see the prototype site: lexplorehq.com
Round 1 Testing
- • Questions completed: 14-25 per participant
- • Average response time: 2-5 seconds, with noticeable hesitation on confusing items
Key observations:
- • Distractors caused hesitation and confusion
- • Some participants spent too long on single questions
- • Time pressure reduced motivation after mistakes
Design updates implemented after Round 1:
- 1. Distractor Redesign: Adopted the research-based framework (synonym, antonym, near-miss) to reduce confusion
- 2. Time Management: Introduced 5-second soft limits per question with rollover time for quick responses
- 3. Enhanced Feedback: Improved visual cues for clearer positive/negative reinforcement
Round 2 Testing
- • Questions completed: 25-39 per participant (significant improvement)
- • Average response time: Consistent 2-4 seconds, indicating better flow
Key observations:
- • Improved pacing and consistency
- • Increased familiarity with the question formats over time
- • Stronger engagement and interest in replaying the diagnostic
Refinements identified for future consideration:
- • Further adjust audio feedback to reduce repetition
- • Increase feedback visibility across devices
- • Explore device-specific optimizations for mobile gameplay (phones and tablets) during future pilot tests
For the detailed user testing result report, see the Diagnostic User Testing Comparison Report.
Team Collaboration
Our team maintained regular communication and structured planning sessions to ensure alignment, track progress, and iterate on design decisions collaboratively. Regular meetings with team members, faculty advisors, and clients ensured continuous feedback and alignment throughout the design process.
