Methodology

Every public number on AgentBoard follows a clear rule.

This page explains how AgentBoard works in practice: what gets collected, how each metric is calculated, what drives the leaderboard, and where the current limits are.

Source of truth
public/hook.sh
public/collect.py
src/lib/checkins/aggregate.ts
src/app/api/checkin/route.ts
supabase/schema.sql

Last reviewed against the current repository state on March 20, 2026.

Aggregate only

The hook and collector send numeric aggregates only. No prompts, responses, source code, or transcript text are uploaded.

Overlap is deduped

Daily coding time is recomputed from merged session intervals so overlapping sessions do not stack into impossible totals.

Rank by work time

Public leaderboard ranking is ordered by coding_time_mins descending. Boost and token volume are contextual fields, not the primary rank key.

Windows are explicit

The current public leaderboard fetches today and week in UTC. Profile pages use the stored user timezone for local week views.

Pipeline

How a local session becomes a public stat

01
Hook fires on Claude Code stop

The local hook reads session_id, transcript_path, and cwd from Claude Code. It throttles to at most one sync every 5 minutes per session.

public/hook.sh
02
Collector extracts numeric aggregates

The collector walks the transcript JSONL, counts activity events, tokens, edits, tool uses, files, and projects, then posts one payload per date.

public/collect.py
03
Server recomputes daily totals

Session payloads are upserted by session_id. The API then rebuilds the daily checkin so totals stay idempotent and overlapping windows are merged.

src/app/api/checkin/route.ts, src/lib/checkins/aggregate.ts
Metric definitions

What each metric means and how it is calculated

Coding time

Built from merged activity windows. User messages count for 90s, assistant messages 60s, tool calls 45s, progress events 30s. Overlapping windows are merged, then converted to minutes.

This is the primary leaderboard metric. It estimates active coding time, not passive editor-open time.

AI equivalent time

Calculated as output_tokens / 300, rounded down to whole minutes, with a minimum of 1 minute when output tokens are present.

Shows how much AI output was generated, expressed in a time-based format that is easy to compare.

Boost ratio

Computed as ai_time_mins / coding_time_mins. Stored daily in the database and recomputed for weekly and all-time views from summed totals.

Shows AI leverage. It is displayed on cards and leaderboard rows, but it is not the main ranking key.

Tokens used

Simple sum of input_tokens + output_tokens extracted from Claude Code usage metadata.

Shows total model usage volume for a period.

Input tokens

Read directly from assistant message usage.input_tokens in the transcript.

Useful when you want to separate prompt/context load from generated output.

Output tokens

Read directly from assistant message usage.output_tokens in the transcript.

This powers AI equivalent time and indicates how much model output was produced.

Sessions

Each transcript session is keyed by session_id, user_id, and date. Re-sends update the same session log instead of duplicating it.

Measures distinct synced sessions for that day or rolled-up period.

Messages

Counts user messages only. Assistant replies are tracked for activity windows and tokens, but not included in this metric.

Acts as a lightweight proxy for interaction depth.

Tool calls

Counts every tool_use block inside assistant messages.

Shows how much concrete tool execution happened inside an AI-assisted session.

Files touched

Counts unique file_path values observed in tool inputs for a day.

Captures breadth of edited or accessed files without exposing file contents.

Projects

Counts unique working directories associated with user messages in a day.

Gives a rough sense of how many codebases a user worked across.

Lines changed

Computed as lines_added - lines_removed. Edit uses old_string/new_string diff lengths, Write counts written content lines.

Net code movement. It can be negative, positive, or zero.

Lines added

Sum of new_string line count for Edit plus content line count for Write.

Tracks gross additions regardless of removals.

Lines removed

Sum of old_string line count for Edit operations.

Tracks deleted or replaced lines separately from additions.

Guardrails

Fairness and anti-gaming rules

Collector-side caps limit summary-mode daily minutes to 960 and single-session minutes to 480 before upload.
API-side clamps reject absurd payload sizes, including caps for minutes, tokens, sessions, messages, projects, tool calls, and files touched.
Session logs are unique on session_id + user_id + date, so repeated syncs update existing rows instead of inflating totals.
Leaderboard SQL exposes the exact rank key: coding_time_mins descending for today, week, and all-time rollups.
Not collected

What AgentBoard does not upload

Prompt text
Assistant response text
Source code contents
Transcript bodies
Repository secrets
Full file diffs

The current collector extracts counts and timestamps from local transcript metadata, then posts aggregate numbers only. The product is built around public proof of work, without sending the work itself.

More context

Want to see how this shows up on the leaderboard?

This page explains the logic. The leaderboard shows the live public result.