PDF Annotation System Overview¶
Table of Contents¶
- Key Questions
 - High-level Architecture
 - Layer System
 - Component Hierarchy
 - Major Features
 - Virtualized Rendering System
 - State Management
 - Specific Component Deep Dives
 
Key Questions¶
1. How is the PDF loaded?¶
- The PDF is loaded in the 
DocumentKnowledgeBasecomponent when it receives document data from the GraphQL query - The component uses 
pdfjs-distto load the PDF file specified bydocument.pdfFile - Loading progress is tracked and displayed to the user
 - Once loaded, the PDF document proxy and PAWLS parsing data are combined to create 
PDFPageInfoobjects for each page 
2. Where and how are annotations loaded?¶
- Annotations are loaded via the 
GET_DOCUMENT_KNOWLEDGE_AND_ANNOTATIONSGraphQL query inDocumentKnowledgeBase - The query fetches:
 - Document metadata and file paths
 - All annotations - returned as two separate arrays:
allAnnotations- regular user/system annotationsallStructuralAnnotations- structural markup annotations (sections, paragraphs, etc.)
 - Document type annotations
 - Annotation relationships (with 
structuralboolean property) - Corpus label information
 - Document notes and relationships
 - Summary version history
 - Annotations are processed and stored in separate Jotai atoms:
 pdfAnnotationsAtom- regular annotations onlystructuralAnnotationsAtom- structural annotations only (kept separate to prevent duplication)allAnnotationsAtom- computed atom that merges and deduplicates both arrays- Each annotation has a 
structural: booleanproperty for filtering 
3. Where is the PAWLS layer loaded?¶
- PAWLS data is loaded alongside the PDF in 
DocumentKnowledgeBase - The 
getPawlsLayerfunction fetches the token data fromdocument.pawlsParseFile - PAWLS data provides token-level information for each page, enabling precise text selection and annotation
 
High-level Architecture¶
The PDF annotation system uses a sophisticated dual-layer architecture:
- Document Layer: Traditional PDF/text viewing with annotations
 - Knowledge Layer: Summary view with version history and editing
 
Key architectural components:
- Virtualized Rendering: Only visible pages are rendered for performance
 - State Management with Jotai: Centralized, reactive state management
 - Computed Derivations: Automatic updates when dependencies change
 - Unified Feed System: Combines notes, annotations, relationships in one view
 - Summary Versioning: Git-like version control for document summaries
 - Resizable Panels: Flexible layout with chat panel width management
 
Layer System¶
The DocumentKnowledgeBase implements a dual-layer architecture:
Document Layer¶
- PDF/text document viewing with annotations
 - Search functionality
 - Annotation creation and editing
 - Extract and analysis results
 - Traditional document interaction
 
Knowledge Layer¶
- Document summary viewing and editing
 - Version history browsing
 - Markdown-based content
 - Knowledge synthesis view
 
Users can switch between layers based on their current task, with some features (like chat) available in both layers.
Component Hierarchy¶
DocumentKnowledgeBase
├── Layer Management (activeLayer: "knowledge" | "document")
├── Tab Navigation System
│   ├── Summary (knowledge layer)
│   ├── Chat (both layers)
│   ├── Notes (both layers)
│   ├── Relationships (both layers)
│   ├── Annotations (document layer)
│   ├── Relations (document layer)
│   ├── Search (document layer)
│   ├── Analyses (document layer)
│   └── Extracts (document layer)
├── Document Layer Components
│   ├── PDF (Virtualization Layer)
│   │   └── PDFPage (Rendered only when visible)
│   ├── TxtAnnotatorWrapper (for text files)
│   ├── FloatingDocumentControls
│   ├── FloatingDocumentInput
│   └── ZoomControls
├── Knowledge Layer Components
│   ├── UnifiedKnowledgeLayer
│   ├── VersionHistorySidebar
│   └── Markdown Editor/Viewer
├── Shared Components
│   ├── UnifiedContentFeed (feed mode)
│   ├── ChatTray
│   ├── FloatingSummaryPreview (PiP view)
│   └── UnifiedLabelSelector
└── Resizable Right Panel System
Major Features¶
1. Unified Feed System¶
Components: UnifiedContentFeed, SidebarControlBar (references in 4:1940-1973 and 4:1988-2042)
The unified feed combines multiple content types into a single, filterable view: - Notes - Annotations - Relationships - Search results
Features: - Filter by content type - Sort by page order or chronologically - Seamless switching between chat mode and feed mode - Real-time updates as content changes
2. Summary Version History¶
Hook: useSummaryVersions (referenced in 4:1704-1713)
Git-like version control for document summaries: - View all previous versions - Compare changes between versions - Create new versions when editing - Author and timestamp tracking - Revert to previous versions
3. Floating Summary Preview¶
Component: FloatingSummaryPreview (referenced in 4:2099-2124)
Picture-in-picture style preview that: - Shows current summary while in document layer - Allows quick switching to knowledge layer - Updates in real-time - Can be minimized or expanded
4. Chat Panel Width Management¶
Hook: useChatPanelWidth (referenced in 4:280-291)
Sophisticated resizable panel system: - Preset sizes: quarter (25%), half (50%), full (90%) - Custom width with drag handle - Auto-minimize when hovering over document - Persistent width preferences - Smooth animations
5. Tab-based Navigation¶
Array: allTabs (defined in 4:1223-1272)
Organized sidebar navigation with: - Icons and labels for each feature - Layer-aware tabs (some only in document layer) - Visual indicators for active tab - Collapsible sidebar on hover
6. Note Management System¶
Components: NoteModal, NotesGrid, PostItNote (imported in 4:147)
Rich note-taking features: - Sticky note visual style - Markdown content support - Edit and create capabilities - Author attribution - Chronological organization
7. Extract and Analysis Management¶
Components: ExtractTraySelector, AnalysisTraySelector (imported in 4:139-140)
Document analysis features: - Run custom analyzers on documents - View extract results in structured format - Create new extracts with fieldsets - Single document results view
8. Floating Controls¶
Components: FloatingDocumentControls, FloatingDocumentInput, ZoomControls
Modern floating UI elements: - Zoom in/out controls - Quick chat/search input - Document action buttons - Context-aware visibility - Annotation Controls: Shows when right panel is closed - Provides same filtering options as sidebar - Label display settings (Always/On Hover/Hide) - Label filters for selective viewing - Structural annotation toggle
9. Structural Annotation System¶
Atoms: structuralAnnotationsAtom, showStructuralAnnotationsAtom
Sophisticated handling of structural annotations: - Separate Storage: Structural annotations stored separately from regular annotations - Performance Optimization: Hidden by default to reduce visual noise - Smart Toggle: When enabling structural view, automatically enables "Show Selected Only" - Unified Filtering: Single useVisibleAnnotations hook handles all visibility logic - Backend Consistency: Mirrors backend's separation of annotation types
Virtualized Rendering System¶
The PDF component implements a sophisticated virtualization system to handle large documents efficiently:
How It Works¶
- Page Height Calculation
 - On mount and zoom changes, the system calculates the height of each page
 - Heights are cached per zoom level to avoid recalculation
 -  
A cumulative array stores the top position of each page for quick lookups
 -  
Visible Range Detection
 - The system tracks scroll position of the container
 - Binary search determines which pages intersect the viewport
 -  
An overscan of 2 pages is added above and below for smooth scrolling
 -  
Smart Range Expansion
 - If an annotation is selected, its page is forced to be in the visible range
 - Same logic applies for search results and chat source highlights
 -  
This ensures important content is always rendered when needed
 -  
Absolute Positioning
 - All pages are absolutely positioned based on cumulative heights
 - Only pages within the visible range actually render their content
 - A spacer div at the bottom maintains correct scroll height
 
State Management¶
The system uses Jotai atoms for reactive state management:
Core Atoms¶
pdfAnnotationsAtom- Regular annotations onlystructuralAnnotationsAtom- Structural annotations only (kept separate)allAnnotationsAtom- Computed atom that merges and deduplicates bothperPageAnnotationsAtom- Page-indexed annotation map for O(1) lookupsselectedAnnotationsAtom- Currently selected annotation IDschatSourceStateAtom- Chat message source tracking
UI State Atoms¶
showStructuralAnnotationsAtom- Toggle for structural annotations (default: false)showSelectedAnnotationOnlyAtom- Show only selected (auto-enabled with structural)showAnnotationBoundingBoxesAtom- Toggle bounding box visibilityshowAnnotationLabelsAtom- Label display mode (ALWAYS/ON_HOVER/HIDE)spanLabelsToViewAtom- Active label filterszoomLevelAtom- PDF zoom levelchatPanelWidthModeAtom- Panel width mode (quarter/half/full/custom)
Local Component State¶
activeLayer- Current layer (knowledge/document) in DocumentKnowledgeBaseshowRightPanel- Right panel visibility in DocumentKnowledgeBasesidebarViewMode- Chat vs feed mode in right panel
Computed State¶
- Annotations automatically filter based on user preferences via 
useVisibleAnnotations - Visible pages calculate based on scroll position
 - Summary versions update when changes are saved
 
Specific Component Deep Dives¶
DocumentKnowledgeBase.tsx¶
The main container component that: - Manages the overall layout with resizable panels - Handles data fetching via GraphQL - Coordinates between knowledge base view and document annotation view - Manages chat conversations, notes, and document relationships - Controls layer switching and tab navigation - Handles initial annotation selection from props or URL
Key responsibilities: - Data loading and transformation (referenced in 4:419-590) - Panel resize management (referenced in 4:1356-1403) - Tab click handling (referenced in 4:1899-1924) - Layer switching logic - URL parameter synchronization
PDF.tsx¶
The virtualization engine that: - Calculates which pages should be visible based on scroll position - Manages page height calculations and caching - Coordinates scrolling to specific annotations/search results - Provides the container structure for all PDF pages
PDFPage.tsx¶
Renders individual PDF pages when visible: - Manages its own canvas and PDF rendering - Displays all annotations for the page - Handles user selection and annotation creation - Integrates search results and chat source highlights
UnifiedContentFeed¶
New component that provides a unified view of all document content: - Combines notes, annotations, relationships, and search results - Sortable by page order or chronologically - Filterable by content type - Provides consistent interaction patterns
FloatingSummaryPreview¶
Picture-in-picture style component that: - Shows document summary while in document layer - Allows quick navigation to knowledge layer - Displays current version information - Can be expanded to show more content
This architecture creates a flexible, highly performant system for both document annotation and knowledge management, with smooth transitions between different viewing modes and consistent state management across the application.