2025

Diagnostic & Vocabulary Assessment Game Design for Word Tag

The Word Tag Assessment Project has two parts: a short onboarding diagnostic that estimates learners' vocabulary levels and personalizes content from the start, and Word Fair, a recurring assessment that tracks vocabulary growth and provides data for long-term adaptation. This page shows both sections—click each navigation item to learn more.

Role

Research Lead

Skills & Tools

Game Design Learning Experience DesignUser ResearchEfficacy StudyCorrelation StudyLearning PrinciplesFigmaPrototypingInteractive DesignPlaytestData AnalyticsMiroTrelloMLPython

Explore Website

Work Responsibilities

Research Lead

Led end-to-end research across 20+ topics and 10+ competitors; conducted four rounds of usability testing and mixed-methods (qual + quant) analysis to drive data-informed iteration, improving diagnostic precision and user engagement.
Designed and optimized an adaptive vocabulary assessment system using the Item Response Theory (IRT) framework, and built analytics dashboards to monitor learning behavior and support data-driven intervention decisions.
Facilitated transparent design communication with modular and accessible Design Guides that clarified design rationale and specifications.
Collaborated closely with cross-functional teams including product managers, developers, and designers to ensure seamless integration of game mechanics with educational objectives and technical feasibility.

Overview

The onboarding diagnostic is a short, game-based assessment that runs during a new player's first login. It quickly evaluates each learner's vocabulary level so that content can be personalized from the very first session.

The Problem

Currently, Word Tag relies on a learner's grade level to set the starting point. While the CatBoost algorithm adapts content over time, it takes several days of gameplay before meaningful adjustments occur. This delay can cause early content mismatches, reducing engagement and learning effectiveness.

The Opportunity

The onboarding diagnostic introduces a fast, approximately two-minute assessment at the start of the game. This diagnostic immediately identifies the learner's vocabulary level, enabling accurate personalization from the first session and creating a smoother, more engaging experience.

Design Tools & Process

Onboarding Diagnostic Design

Assessment Types

The diagnostic includes two multiple-choice question formats, chosen for speed and reliability:

Synonym Matching

Players are shown a target word and select a synonym from three options.

Word-in-Context

Players read a sentence with a blank and select the word that best fits from three options.

Distractor Design Framework

To ensure clear, fair assessment items, each multiple-choice question follows a research-based three-option structure designed to minimize confusion while accurately measuring vocabulary knowledge.

Standard Distractor Pattern - Each question includes:

Correct Answer

A clear synonym or contextually appropriate word

Distractor 1

An antonym or contrasting concept

Distractor 2

A "near-miss" option—thematically related but semantically distinct

Special Cases:

When target words lack clear antonyms, we use contextually contrasting or commonly confused terms. This maintains the framework's consistency while ensuring all distractors serve their diagnostic purpose.

Distractor Effectiveness Validation:

Based on research by Gierl et al. (2017), we recommend using this model to evaluate distractor effectiveness:

• Distractors selected less than 5% indicate they're too implausible and should be replaced
• Distractors selected more than 25% suggest they're too attractive or ambiguous
• Ideal distractors fall within the 5%-25% selection range, indicating they're plausible but distinguishable from correct answers

Word Selection

Words are selected and organized by Lexile levels, aligning with Word Tag's vocabulary structure. This ensures consistency between the assessment and the game's content progression.

The current prototype and conceptual design were developed for Grade 3 learners as the initial focus. The framework can be extended to additional grade levels in future iterations.

Grade-Based Difficulty Bands:

Easy

300–500L

Med

600–800L

Hard

900–1100L

Question Distribution

The diagnostic includes a carefully balanced mix of items across difficulty levels:

Synonym Questions (67 total):

• 15 easy-level words
• 30 medium-level words
• 22 hard-level words

Word-in-Context Questions (22 total):

• 6 easy-level passages
• 9 medium-level passages
• 6 hard-level passages

Adaptive Logic

The onboarding diagnostic adjusts question difficulty dynamically based on the learner's responses:

• Correct answer: The next item becomes slightly more difficult.
• Incorrect answer: The next item becomes slightly easier.

This approach allows the diagnostic to quickly converge on an accurate estimate of the learner's vocabulary level within a short session.

Item Response Theory (IRT)

The adaptive sequence is powered by Item Response Theory (IRT), a widely used model in educational assessment that estimates learner ability based on both item difficulty and item characteristics.

Each word is tagged with properties such as Lexile level, word frequency, word length, and age of acquisition. These attributes enable the adaptive model to make statistically grounded, pedagogically meaningful adjustments in real time.

The full diagnostic session runs for approximately 138 seconds (2 minutes 18 seconds), including transition animations, ensuring players have about 2 minutes of active response time.

Game Specifications

Structure

• Questions alternate in a repeating pattern: 5 synonym → 1 word-in-context
• Average: 18+ questions completed in 2 minutes

Timing

• Players have 5 seconds per question.
• Faster answers preserve unused time, allowing more questions to be answered.
• No answer within 5 seconds counts as a miss, and the next question begins automatically.

Mechanics

• The player character, Roxy, runs along a 3-lane track.
• For each question:
• Target word or context sentence appears at the top.
• After 1 second, three options appear (one per lane).
• To answer, players tap the word they want to choose. Roxy immediately switches to that lane.

Feedback

Immediate visual and audio feedback helps players stay engaged and understand their progress. There are three feedback states: Correct, Incorrect, and Missed.

Correct Feedback

Shared elements:

• Brick's face icon with a green checkmark
• Selected word turns green

• Green vignette glow around the screen
• Uplifting cheer or celebration sound

Behavior by question type:

• Synonym matching: The chosen word rises beneath the prompt.
• Word-in-context: The chosen word rises and fills the blank correctly.

Incorrect Feedback

Shared elements:

• Brick's face icon with a red crossmark
• Selected word turns red

• Red vignette glow around the screen
• Sympathetic "aww" sound

Behavior by question type:

• Synonym matching: The chosen word rises beneath the prompt but is marked incorrect.
• Word-in-context: The chosen word rises but fails to fill the blank.

Missed Feedback

If the timer runs out before a response:

• All options slide past the player and disappear
• The unselected word hits the player character, causing Roxy to stumble
• Brick's face icon with a red crossmark

• Red vignette glow around the screen
• Sympathetic "aww" sound

Game Flow

Character Spotlight: Brick & Roxy

To ensure a consistent tone across the Word Tag experience, the onboarding diagnostic features two established characters: Brick and Roxy. These characters guide players into the diagnostic and appear throughout the gameplay experience.

Brick

A sporty and enthusiastic hippo who thrives on competition and physical activity. Brick brings energy and confidence to fast-paced moments, balanced by a sincere, trusting, and lighthearted personality.

Roxy

A bold and curious fox with a zest for life. She jumps into new situations without hesitation and balances courage with creativity and determination, maintaining a thoughtful, compassionate streak.

Entry (First Login)

• When new users log into Word Tag for the first time, during onboarding, the diagnostic launches automatically.
• Brick welcomes players with an enthusiastic greeting and guides them into the diagnostic experience.

Game Tutorial

After the welcome screen, players click "Let's go" to enter the Game Tutorial interface where Brick introduces:

• How to answer synonym-matching questions

• How to answer word-in-context questions

• Interface elements like the timer progress bar

The tutorial ends with a clear prompt to begin.

Countdown

A 3-second countdown signals the start of the session.

Gameplay

• Questions follow a repeating pattern of 5 synonym questions followed by 1 word-in-context question.
• Players tap a word to select their answer; Roxy moves to that lane automatically.
• Immediate feedback is provided after each response.
• Each question allows up to 5 seconds, with unused time rolling over to allow more total questions within the 2-minute limit.

Completion & Rewards

At the end of the session, a summary screen displays:

• Total questions answered correctly
• Notification of earned rewards

Players return to the plaza where rewards become available.

User Testing & Iteration

To validate and refine the diagnostic design, we conducted two rounds of user testing with child participants, focusing on gameplay flow and pacing, Cognitive Load, Feedback Effectiveness, Engagement & Motivation, Timing Appropriateness, and Reward Clarity.

Prototype Scope

All testing was conducted using a standalone HTML prototype that represented a minimum viable version of the diagnostic experience. This prototype simulated the core interaction flow, game mechanics, and feedback system without integrating into the full Word Tag game. This allowed the team to evaluate the assessment in a controlled environment.

For more details, see the prototype site: lexplorehq.com

Round 1 Testing

• Questions completed: 14-25 per participant
• Average response time: 2-5 seconds, with noticeable hesitation on confusing items

Key observations:

• Distractors caused hesitation and confusion
• Some participants spent too long on single questions
• Time pressure reduced motivation after mistakes

Design updates implemented after Round 1:

1. Distractor Redesign: Adopted the research-based framework (synonym, antonym, near-miss) to reduce confusion
2. Time Management: Introduced 5-second soft limits per question with rollover time for quick responses
3. Enhanced Feedback: Improved visual cues for clearer positive/negative reinforcement

Round 2 Testing

• Questions completed: 25-39 per participant (significant improvement)
• Average response time: Consistent 2-4 seconds, indicating better flow

Key observations:

• Improved pacing and consistency
• Increased familiarity with the question formats over time
• Stronger engagement and interest in replaying the diagnostic

Refinements identified for future consideration:

• Further adjust audio feedback to reduce repetition
• Increase feedback visibility across devices
• Explore device-specific optimizations for mobile gameplay (phones and tablets) during future pilot tests

For the detailed user testing result report, see the Diagnostic User Testing Comparison Report.

Team Collaboration

Our team maintained regular communication and structured planning sessions to ensure alignment, track progress, and iterate on design decisions collaboratively. Regular meetings with team members, faculty advisors, and clients ensured continuous feedback and alignment throughout the design process.