Turn engineering work into trusted performance evidence

Arbor builds source-backed review briefs from GitHub, Jira, and docs, so calibration starts from examples and evidence instead of memory.

WHAT IT LOOKS LIKE

Before calibration, see the evidence trail.

Arbor turns the work already sitting in engineering systems into review briefs with examples, patterns, gaps, and source links attached.

Without Arbor
GitHub PR #421GitHub PR #430Jira JIRA-1831Review doc
Jordan Reyes - review notes

Jordan Reyes review notes

Due: tomorrow

Jordan owned the payments rollout this cycle. Need to make this specific, not just "strong ownership".

PR #421 - caught retry edge case before merge. Was this customer-facing risk or just cleanup? Pull quote from review thread.

Impact sentence maybe: made the launch calmer for support and infra, but find dates first.

Evidence to check
  • JIRA-1831 says rollout owner - verify handoff dates
  • Migration notes might have concrete customer timing
  • Need one peer example that is not only PR volume

Growth: better early status updates? Needs evidence. Do not write from memory.

Draft shape

Start with launch ownership, then show the review quality example. Avoid making this sound like generic "high ownership" unless I can tie it to the rollout dates and the retry thread.

  • Strength: stayed close to release risk and follow-through
  • Example: retry behavior review on PR #421
  • Open question: was dependency handoff Jordan or Priya?

TODO before final: add migration note, peer example, one growth area.

With Arbor
FROM THE MANAGER'S CHAIR

“Review time arrives, and months of work have to be reconstructed from memory, tabs, and anecdotes.”

Every engineering manager before calibration
01

The last six weeks get overweighted.

Months of work get compressed into what a manager can remember quickly. Steady contributors disappear; recent firefighting becomes the whole story.

02

The evidence is scattered across systems.

GitHub has the reviews. Jira has the ownership trail. Confluence has the decisions. The manager has to reconstruct the story from all of it.

03

Calibration rewards the loudest story.

Committees should compare the work, not anecdotes. When examples are hard to retrieve, the room drifts toward whoever has the strongest story.

AND ON THE OTHER SIDE

Engineers feel it too. A different version of the same problem.

Managers can't see the work clearly. Engineers can't see the decision at all. We asked people across consulting, banking, big tech, and startups how they're actually reviewed. Many different processes, the same underlying disease.

The visible process is one thing. The real process is calibration, and we never see it.
Software EngineerAmazon
Manager's opinion only. Performance calls are just formality.
Product ManagerStandard Chartered
You're assigned a coach from another team who fights your case at review. How good they are matters more than the work.
Software EngineerDeloitte
Structured on paper. Manager's opinion in practice.
Data EngineerHDFC Bank
Whoever you ask for feedback matters more than what you actually did.
Data EngineerThorogood
GRAD takes me a week to fill, just for me to be wondering who sees what.
Software EngineerGoogle

Quotes from individual contacts describing their employer's process. Lightly edited for clarity. Not affiliated with or endorsed by any company named.

HOW IT WORKS

Three steps. No setup ceremony.

Arbor is hosted. Connect once during onboarding, then kick off a review run before calibration, promotion, or compensation decisions.

STEP 01

Connect GitHub, Jira, and docs.

Connect the systems where engineering work already lives. Personal access tokens or OAuth, whichever your security team prefers. Credentials are encrypted at rest.

STEP 02

Select engineers and review period.

Pick a cohort: a team, a sub-team, an org. Pick a date range. Arbor keeps the comparison window consistent across the group.

STEP 03

Get sourced review briefs.

One workspace per engineer: patterns, metrics, examples, evidence gaps, and source links from the period you selected. Read in-app or export the structured data.

WHAT YOU GET

The examples, patterns, and gaps before calibration starts.

Arbor scans the tools your team already uses and gives you the evidence you wish you had before calibration: what happened, why it mattered, what is missing, and where every important example came from.

01 · EXAMPLES

Source-backed examples.

Arbor scans the full review period across PRs, reviews, tickets, and docs, so steady work does not disappear behind the most recent launch.

02 · PATTERNS

Work patterns.

Arbor groups raw activity into themes you can use: ownership, review quality, follow-through, execution risk, and collaboration shape.

03 · GAPS

Evidence gaps.

See where the source trail is thin, ambiguous, or missing, so managers know what to verify before the room starts making decisions.

04 · BRIEFS

Calibration-ready briefs.

Each brief keeps the work, source links, metrics, and manager questions together before writing or calibration starts.

REVIEWS FIRST, USEFUL BETWEEN CYCLES

Review evidence first. Manager prep from the same work trail.

Arbor helps you read the full cycle before reviews, then reuse the same evidence layer for manager conversations and follow-ups.

Review cycle

Find the work before calibration.

Each engineer gets the patterns, examples, metrics, gaps, and source links you need before writing starts.

Source trail

Re-open the source when it matters.

Examples link back to PRs, reviews, tickets, docs, or decision threads.

Between cycles

Use the same evidence layer between reviews.

Scan recent team activity and report cards before manager conversations and follow-ups.

Follow-ups

Track private follow-ups.

Suggested manager actions stay private until they become a conversation or tracked work.

SECURITY · ISOLATION · CONTROL

Built for teams that read the security review.

Arbor is hosted to the same constraints your platform team would impose. Read-only access, encrypted credentials, no model training on your data.

  • Read-only scopes. Arbor cannot write to your repos or your tracker.
  • Integration tokens are encrypted at rest with AES-GCM and only decrypted in-memory at the moment of an API call.
  • Your data is never used to train models, yours or anyone else's. We use the providers' no-training API tier.
  • Delete a review cycle and the underlying events cascade out with it. Sub-processor list and DPA available on request.
# scopes Arbor requests · all read-only
github:"repo:read""pull_request:read"
jira:"read:issue""read:project"
confluence:"read:confluence-content.all""read:confluence-space.summary"
# tokens at rest
storage:"aes-256-gcm"
in-memory:"per-call only"
# your data, your kill switch
retention:"on-cycle-delete"
training:"never"
FAQ

Worth asking.

Is this only useful once or twice a year?
Reviews are the entry point because they create urgency. The same evidence trail also helps with promotion packets, off-cycle compensation decisions, manager prep, coaching follow-ups, and understanding who has been doing what as the team scales.
Will my engineers feel surveilled by this?
Arbor reads what's already public to your team: the same PRs, comments, and tickets a manager would scroll through manually before calibration. There's no new instrumentation, no time tracking, no IDE telemetry, no presence detection.
What about work that doesn’t show up in a PR or ticket?
For docs (design docs, RFCs, ADRs, postmortems, runbooks), connect Confluence and Arbor pulls page activity automatically into the brief alongside code and tickets. For work that's truly off-tool (mentoring, hiring, on-call rotations), that's exactly why Arbor ships evidence briefs instead of a rating: you and the engineer add the rest before the calibration committee sees it.
Won’t this just push engineers to game the metrics, more PRs, more lines?
Three guards. The metrics are cohort-relative, so the arms race is bounded by what's actually shippable on this team. The brief points back to linked work, so a wall of rubber-stamp PRs reads as suspicious instead of impressive. And the brief itself never outputs a rank, score, or rating. Engineers don't have a number to optimize toward.
Why trust AI with something this consequential?
Every metric is auditable from raw events; every important example links back to the PR, comment, or ticket it came from. If Arbor can't re-open the source, it leaves the example out. You're not trusting the AI, you're auditing it.
What if my team uses a different tracker?
GitHub, Jira, and Confluence are connected today. Other code platforms and trackers are on the roadmap; if a specific tool blocks a real rollout for your team, that's exactly the kind of feedback the beta is for.
What does it cost?
Pricing isn't finalized yet. Per-seat-per-month is the likely model, and we'll tell you before any charge ever happens.