A source is the provenance of an extracted value: where in the document it was found. This page covers the source system — a small, composable abstraction that links any field-rendering component to any document viewer — and the two blocks built on it.
Hover or select a field; its source highlights in the PDF and scrolls into view.
The abstraction
Three decoupled pieces. The field component and the viewer never import each other — they're joined by a field path and a tiny mediator.
| Piece | Role |
|---|---|
document-source | the data model + extractionSourcesToSourceMap |
useSourceLink | the mediator: hover/pin state, drives the viewer |
pdf-source | the PDF adapter (usePdfSourceTarget, renderPdfSourceOverlay) |
emitter (fields) ──hover path──► useSourceLink ──location──► viewer (highlight + scroll)
▲
SourceMap: path → source
The flow is unidirectional: fields drive the viewer, never the reverse.
Source data
Sources come from Retab's GET /v1/extractions/{id}/sources
(ExtractionSourcesResponse): the extracted value tree plus a parallel sources
tree whose leaves are { value, source }. A Source is { content, anchor },
where the anchor is a discriminated union per document format — for PDFs,
pdf_bbox with a page and a normalized [0, 1] box.
extractionSourcesToSourceMap flattens that tree into a flat, path-keyed
SourceMap. The keys are dotted paths (owner.name,
properties.0.gas_volume) — the same encoding react-hook-form (and json-form)
use for field names, so they join with no conversion.
import { extractionSourcesToSourceMap } from "@/lib/document-source"
const sources = extractionSourcesToSourceMap(response.sources)
// { "owner.name": { content: "L3 RESOURCES LLC", anchor: { kind: "pdf_bbox", page: 1, … } }, … }Composing json-form with the PDF viewer
Because json-form already emits the hovered field path through
setSourcesFieldPath, linking it to the viewer is just wiring useSourceLink
between them — no per-field glue.
import * as React from "react"
import { extractionSourcesToSourceMap } from "@/lib/document-source"
import { useSourceLink } from "@/hooks/use-source-link"
import { UiForm, UiFormContent } from "@/components/json-form/json-form"
import { PdfViewer, type PdfViewerHandle } from "@/components/ui/pdf-viewer"
import {
usePdfSourceTarget,
renderPdfSourceOverlay,
} from "@/components/ui/pdf-source"
export function Example({ response, form }) {
const viewerRef = React.useRef<PdfViewerHandle>(null)
const target = usePdfSourceTarget(viewerRef)
const sources = React.useMemo(
() => extractionSourcesToSourceMap(response.sources),
[response]
)
const link = useSourceLink({ sources, target })
return (
<div className="flex h-full">
<PdfViewer
ref={viewerRef}
src="/document.pdf"
bare
className="min-w-0 flex-1"
renderPageOverlay={renderPdfSourceOverlay(link.activeLocation)}
/>
<UiForm
form={form}
schema={response.schema}
setSourcesFieldPath={link.onFieldHover}
/* …other UiForm props… */
>
<UiFormContent />
</UiForm>
</div>
)
}To link a different field component, point its hover at link.onFieldHover and
its click at link.selectField (that's all extract-viewer-block does with its
own field list). To support a different viewer, write one adapter — a
SourceTarget (resolve + scrollTo) and an overlay renderer.
API
document-source
| Export | Description |
|---|---|
Source | { content, anchor } — a value's provenance. |
SourceAnchor | discriminated union on kind: pdf_bbox, image_bbox, csv_cell, spreadsheet_cell, docx_text_span, docx_table_cell, text_span. |
ExtractionSourcesResponse | the /v1/extractions/{id}/sources shape. |
SourceMap | Record<path, Source> — flat, dotted-path keyed. |
extractionSourcesToSourceMap | (sources) => SourceMap — flatten the sources tree. |
SourceLocation / SourceArea | a normalized page region (page + % box). |
sourceLocationKey | stable key for dedupe. |
useSourceLink
useSourceLink({ sources, target }) →
| Member | Description |
|---|---|
onFieldHover(path | null) | report the hovered field — wire to setSourcesFieldPath. |
selectField(path) | pin a field (e.g. on click). |
activeLocation | the location to highlight — feed to the overlay. |
activePath / activeSource | the active field and its source. |
pdf-source
| Export | Description |
|---|---|
usePdfSourceTarget(ref) | a stable SourceTarget over a PdfViewer ref. |
renderPdfSourceOverlay(location) | a renderPageOverlay drawing the highlight on its page. |
pdfAnchorToLocation(anchor) | normalized bbox → page % location. |
Both compositions ship as blocks: the Extract Viewer (a field list) and JSON Form Sources (json-form).