Virtualized PDF Rendering System¶

Overview¶

The PDF annotation system implements a sophisticated virtualization approach to handle large documents efficiently. Instead of rendering all pages at once, only visible pages (plus a small buffer) are rendered, dramatically improving performance and memory usage.

Architecture¶

Core Concept¶

The virtualization system works by: 1. Calculating heights of all pages at the current zoom level 2. Determining which pages are visible in the viewport 3. Only rendering those pages (plus overscan) 4. Ensuring selected items' pages are always rendered

Component Structure¶

PDF.tsx (Virtualization Engine)
├── Manages visible page range
├── Handles scroll events
├── Calculates page positions
└── Renders PDFPage components conditionally
    └── PDFPage.tsx (Individual Page)
        ├── Renders PDF canvas when visible
        ├── Displays annotations for the page
        └── Manages its own lifecycle

Implementation Details¶

Page Height Calculation¶

When the PDF loads or zoom changes:

// In PDF.tsx
useEffect(() => {
  if (!pdfDoc) return;
  (async () => {
    const h: number[] = [];
    for (let i = 1; i <= pdfDoc.numPages; i++) {
      const page = await pdfDoc.getPage(i);
      h.push(page.getViewport({ scale: zoomLevel }).height + 32);
    }
    setPageHeights(h); // Cache heights at this zoom level
  })();
}, [pdfDoc, zoomLevel]);

Cumulative Heights¶

For efficient position calculations:

const cumulative = useMemo(() => {
  const out: number[] = [0];
  for (let i = 0; i < pageHeights.length; i++) {
    out.push(out[i] + pageHeights[i]);
  }
  return out; // cumulative[i] = top position of page i
}, [pageHeights]);

Visible Range Detection¶

The system uses binary search for efficiency:

const calcRange = useCallback(() => {
  const el = getScrollElement();
  const scroll = /* current scroll position */;
  const viewH = /* viewport height */;

  // Binary search for first visible page
  let lo = 0, hi = cumulative.length - 1;
  while (lo < hi) {
    const mid = Math.floor((lo + hi) / 2);
    if (cumulative[mid + 1] < scroll) lo = mid + 1;
    else hi = mid;
  }
  const first = lo;

  // Find last visible page
  const limit = scroll + viewH;
  // ... binary search for last visible

  // Add overscan for smooth scrolling
  const overscan = 2;
  let start = Math.max(0, first - overscan);
  let end = Math.min(pageCount - 1, last + overscan);

  setRange([start, end]);
}, [/* dependencies */]);

Smart Range Expansion¶

The system ensures important content is always rendered:

// Force selected annotation's page to be visible
if (selectedPageIdx !== undefined) {
  start = Math.min(start, selectedPageIdx);
  end = Math.max(end, selectedPageIdx);
}

// Same for search results
if (selectedSearchPageIdx !== undefined) {
  start = Math.min(start, selectedSearchPageIdx);
  end = Math.max(end, selectedSearchPageIdx);
}

// And chat source highlights
if (selectedChatSourcePageIdx !== undefined) {
  start = Math.min(start, selectedChatSourcePageIdx);
  end = Math.max(end, selectedChatSourcePageIdx);
}

Rendering Loop¶

Only pages in range are rendered:

return (
  <div style={{ position: "relative" }}>
    {pageInfos.map((pInfo, idx) => {
      const top = cumulative[idx];
      const height = pageHeights[idx];
      const visible = idx >= range[0] && idx <= range[1];

      return (
        <div
          key={pInfo.page.pageNumber}
          style={{
            position: "absolute",
            top,
            height,
            width: "100%",
          }}
        >
          {visible && (
            <PDFPage
              pageInfo={pInfo}
              /* other props */
            />
          )}
        </div>
      );
    })}
    {/* Spacer maintains correct scroll height */}
    <div style={{ height: cumulative[cumulative.length - 1] }} />
  </div>
);

Scroll-to-Annotation System¶

The system implements a two-phase approach for scrolling to specific items:

Phase 1: Page-Level Scroll (PDF.tsx)¶

When an annotation is selected: 1. Calculate which page contains the annotation 2. Scroll the container so the page is visible 3. Set a pending scroll ID for phase 2

useEffect(() => {
  if (selectedAnnotations.length === 0 || pageHeights.length === 0) return;
  if (selectedPageIdx === undefined) return;

  const targetId = selectedAnnotations[0];

  // Scroll to page
  const topOffset = Math.max(0, cumulative[selectedPageIdx] - 32);
  getScrollElement().scrollTo({ top: topOffset, behavior: "smooth" });

  // Tell PDFPage to center the annotation
  setPendingScrollId(targetId);
}, [selectedAnnotations, selectedPageIdx, /* ... */]);

Phase 2: Element-Level Scroll (PDFPage.tsx)¶

Once the page is rendered: 1. PDFPage checks for pending scroll requests 2. Finds the specific annotation element 3. Scrolls it into view with centering

useEffect(() => {
  if (!hasPdfPageRendered) return;

  if (pendingScrollId) {
    const pageOwnsAnnotation = /* check if annotation is on this page */;
    if (!pageOwnsAnnotation) return;

    let cancelled = false;
    const tryScroll = () => {
      if (cancelled) return;
      const el = document.querySelector(`.selection_${pendingScrollId}`);
      if (el) {
        el.scrollIntoView({ behavior: "smooth", block: "center" });
        setPendingScrollId(null); // Clear pending
      } else {
        requestAnimationFrame(tryScroll); // Retry
      }
    };
    tryScroll();
  }
}, [hasPdfPageRendered, pendingScrollId, /* ... */]);

Performance Benefits¶

Memory Usage¶

Only visible pages hold rendered canvases
Annotations for non-visible pages aren't mounted
Dramatic reduction for documents with 100+ pages

Rendering Performance¶

Initial load only renders visible pages
Scrolling only renders newly visible pages
Zoom changes only affect rendered pages

Smooth Scrolling¶

Overscan ensures pages are ready before visible
Height caching prevents layout recalculations
RequestAnimationFrame for optimal timing

Configuration¶

Overscan Amount¶

const overscan = 2; // Pages to render above/below viewport

Scroll Container¶

The system supports both window scrolling and container scrolling:

const getScrollElement = useCallback((): HTMLElement | Window => {
  const el = scrollContainerRef?.current;
  if (el && el.scrollHeight > el.clientHeight) return el;
  return window; // Fallback to window scrolling
}, [scrollContainerRef]);

Best Practices¶

Keep overscan reasonable - Too much defeats virtualization benefits
Cache computations - Page heights are expensive to calculate
Use binary search - Linear search is too slow for large documents
Handle edge cases - Selected items must always be visible
Debounce scroll events - Use requestAnimationFrame for smoothness

Future Enhancements¶

Dynamic overscan - Adjust based on scroll velocity
Progressive rendering - Low-res preview while scrolling
Intersection Observer - More efficient visibility detection
Memory pressure handling - Reduce overscan under memory constraints
Predictive preloading - Anticipate scroll direction