AI-POWERED REVIEWS · FOR ENGINEERING TEAMS

Performance reviews, written from evidence, not memory.

Arbor connects to your engineering tools, pulls real signals across the review period, and uses AI to produce a structured evidence brief for every engineer. You make the call.

Free during the beta · Read-only access · No model training
THE PROBLEM

“Reviews get written the weekend before they're due, from whatever the manager remembers.”

Every engineering manager, every cycle
01

Recency bias decides outcomes.

The last six weeks dominate a six-month review. Steady contributors get under-credited; recent firefighters get over-credited.

02

The signals are already in your tools.

GitHub has every PR, every review, every comment. Jira has the tickets, the cycle times, the delivery patterns. The story is there. Nobody has time to read it.

03

Calibration starts from a blank page.

Managers walk into committees with a few anecdotes and a vibe. Calibration drifts toward whoever speaks loudest, not whoever has the strongest evidence.

HOW IT WORKS

Three steps. No setup ceremony.

Arbor is hosted. Connect once during onboarding, then kick off a review run whenever the cycle comes around.

STEP 01

Connect your engineering sources.

GitHub and Jira at launch, with more on the way. Personal access tokens or app-style integrations, whichever your security team prefers. Read-only access only, tokens encrypted at rest.

STEP 02

Select engineers and review period.

Pick a cohort: a team, a sub-team, an org. Pick a date range. Arbor handles the cohort calibration math.

STEP 03

Read the evidence briefs.

One brief per engineer: deterministic metrics, an AI-powered narrative grounded in citations, cohort-relative context. Read in-app or export the structured data.

WHAT YOU GET

A brief, not a verdict.

Arbor's output is a structured document the manager and calibration committee read together. Every section is grounded in evidence the team can re-open.

01 · DETERMINISTIC

Metrics you can verify.

PR throughput, code review depth, cycle time, ticket health. Every number is computed from raw events you can audit. No black-box scoring.

02 · NARRATIVE

AI-powered synthesis, fully cited.

Arbor turns hundreds of events into a paragraph that reads patterns, not platitudes. Every claim links back to the PR, comment, or ticket it came from.

03 · CALIBRATION

Cohort-relative, not absolute.

Engineers are compared within their team, over the same period, doing comparable work. The cohort is the calibration; nothing gets stacked against an external benchmark.

04 · YOU DECIDE

Evidence brief, not a rating.

Arbor produces the structured artifact your calibration committee anchors on. The brief never outputs a rank or recommendation; that judgment stays human.

SECURITY · ISOLATION · CONTROL

Built for teams that read the security review.

Arbor is hosted to the same constraints your platform team would impose. Read-only access, encrypted credentials, no model training on your data.

  • Read-only scopes. Arbor cannot write to your repos or your tracker.
  • Integration tokens are encrypted at rest with AES-GCM and only decrypted in-memory at the moment of an API call.
  • Your data is never used to train models, yours or anyone else's. We use the providers' no-training API tier.
  • Delete a review cycle and the underlying events cascade out with it. Sub-processor list and DPA available on request.
# scopes Arbor requests · all read-only
github: "repo:read" "pull_request:read"
jira: "read:issue" "read:project"

# tokens at rest
storage: "aes-256-gcm"
in-memory: "per-call only"

# your data, your kill switch
retention: "on-cycle-delete"
training: "never"
FAQ

Worth asking.

Will my engineers feel surveilled by this?
Arbor reads what's already public to your team: the same PRs, comments, and tickets a manager would scroll through manually before review season. There's no new instrumentation, no time tracking, no IDE telemetry, no presence detection.
What about work that doesn’t show up in a PR or ticket?
For docs (design docs, RFCs, ADRs, postmortems, runbooks), upload them into Arbor and the doc activity folds into the brief alongside code and tickets. For work that's truly off-tool (mentoring, hiring, on-call rotations), that's exactly why Arbor ships an evidence brief instead of a rating: you and the engineer add the rest before the calibration committee sees it.
Won’t this just push engineers to game the metrics, more PRs, more lines?
Three guards. The metrics are cohort-relative, so the arms race is bounded by what's actually shippable on this team. The narrative grounds claims in linked evidence, so a wall of rubber-stamp PRs reads as suspicious instead of impressive. And the brief itself never outputs a rank, score, or rating. Engineers don't have a number to optimize toward.
Why trust AI with something this consequential?
Every metric is auditable from raw events; every narrative claim links back to the PR, comment, or ticket it came from. If you can't re-open the source, the claim doesn't ship. You're not trusting the AI, you're auditing it.
What if my team uses a different tracker?
GitHub and Jira are the launch sources. Other code platforms and trackers are on the roadmap; if a specific tool blocks a real rollout for your team, that's exactly the kind of feedback the beta is for.
What does it cost?
Free during the beta. Paid pricing isn't finalized; per-seat-per-month is the likely model, and we'll tell you before any charge ever happens.