GitHub

Extract Viewer

Link extracted fields to their sources — hover a field to highlight where its value came from in the document and scroll to it.

A source is the provenance of an extracted value: where in the document it was found. This page covers the source system — a small, composable abstraction that links any field-rendering component to any document viewer — and the two blocks built on it.

Hover or select a field; its source highlights in the PDF and scrolls into view.

Hover a field to view its source

The abstraction

Three decoupled pieces. The field component and the viewer never import each other — they're joined by a field path and a tiny mediator.

PieceRole
document-sourcethe data model + extractionSourcesToSourceMap
useSourceLinkthe mediator: hover/pin state, drives the viewer
pdf-sourcethe PDF adapter (usePdfSourceTarget, renderPdfSourceOverlay)
 emitter (fields) ──hover path──► useSourceLink ──location──► viewer (highlight + scroll)
                                       ▲
                                  SourceMap: path → source

The flow is unidirectional: fields drive the viewer, never the reverse.

Source data

Sources come from Retab's GET /v1/extractions/{id}/sources (ExtractionSourcesResponse): the extracted value tree plus a parallel sources tree whose leaves are { value, source }. A Source is { content, anchor }, where the anchor is a discriminated union per document format — for PDFs, pdf_bbox with a page and a normalized [0, 1] box.

extractionSourcesToSourceMap flattens that tree into a flat, path-keyed SourceMap. The keys are dotted paths (owner.name, properties.0.gas_volume) — the same encoding react-hook-form (and json-form) use for field names, so they join with no conversion.

import { extractionSourcesToSourceMap } from "@/lib/document-source"
 
const sources = extractionSourcesToSourceMap(response.sources)
// { "owner.name": { content: "L3 RESOURCES LLC", anchor: { kind: "pdf_bbox", page: 1, … } }, … }

Composing json-form with the PDF viewer

Because json-form already emits the hovered field path through setSourcesFieldPath, linking it to the viewer is just wiring useSourceLink between them — no per-field glue.

Hover a field to view its source
import * as React from "react"
import { extractionSourcesToSourceMap } from "@/lib/document-source"
import { useSourceLink } from "@/hooks/use-source-link"
import { UiForm, UiFormContent } from "@/components/json-form/json-form"
import { PdfViewer, type PdfViewerHandle } from "@/components/ui/pdf-viewer"
import {
  usePdfSourceTarget,
  renderPdfSourceOverlay,
} from "@/components/ui/pdf-source"
 
export function Example({ response, form }) {
  const viewerRef = React.useRef<PdfViewerHandle>(null)
  const target = usePdfSourceTarget(viewerRef)
  const sources = React.useMemo(
    () => extractionSourcesToSourceMap(response.sources),
    [response]
  )
  const link = useSourceLink({ sources, target })
 
  return (
    <div className="flex h-full">
      <PdfViewer
        ref={viewerRef}
        src="/document.pdf"
        bare
        className="min-w-0 flex-1"
        renderPageOverlay={renderPdfSourceOverlay(link.activeLocation)}
      />
      <UiForm
        form={form}
        schema={response.schema}
        setSourcesFieldPath={link.onFieldHover}
        /* …other UiForm props… */
      >
        <UiFormContent />
      </UiForm>
    </div>
  )
}

To link a different field component, point its hover at link.onFieldHover and its click at link.selectField (that's all extract-viewer-block does with its own field list). To support a different viewer, write one adapter — a SourceTarget (resolve + scrollTo) and an overlay renderer.

API

document-source

ExportDescription
Source{ content, anchor } — a value's provenance.
SourceAnchordiscriminated union on kind: pdf_bbox, image_bbox, csv_cell, spreadsheet_cell, docx_text_span, docx_table_cell, text_span.
ExtractionSourcesResponsethe /v1/extractions/{id}/sources shape.
SourceMapRecord<path, Source> — flat, dotted-path keyed.
extractionSourcesToSourceMap(sources) => SourceMap — flatten the sources tree.
SourceLocation / SourceAreaa normalized page region (page + % box).
sourceLocationKeystable key for dedupe.

useSourceLink({ sources, target })

MemberDescription
onFieldHover(path | null)report the hovered field — wire to setSourcesFieldPath.
selectField(path)pin a field (e.g. on click).
activeLocationthe location to highlight — feed to the overlay.
activePath / activeSourcethe active field and its source.

pdf-source

ExportDescription
usePdfSourceTarget(ref)a stable SourceTarget over a PdfViewer ref.
renderPdfSourceOverlay(location)a renderPageOverlay drawing the highlight on its page.
pdfAnchorToLocation(anchor)normalized bbox → page % location.

Both compositions ship as blocks: the Extract Viewer (a field list) and JSON Form Sources (json-form).