Floor Plan AI Extraction
A Python pipeline that extracts individual apartment units from architectural PDFs using Meta's SAM3, Gemini OCR, and OpenCV, with GPU inference on Modal H100s.
The Problem
Extracting individual unit floor plans from full building architectural PDFs is a manual, tedious process. Property managers need isolated unit images for marketing, leasing, and resident communication. No affordable existing tool handles this reliably for multi unit residential buildings.
The Pipeline
PDF Rasterization
Converts architectural PDFs to 300 DPI images. Detects scale indicators (e.g., 1/8" = 1'-0"). Calculates pixels-per-foot.
Tiled Gemini OCR
Adaptive tiling (768 to 1536px based on text density) with Gemini Flash Lite. Extracts unit labels, numbers, sqft, room labels, dimensions. Deduplicates across tile overlaps.
Wall Detection
OpenCV adaptive thresholding + morphological operations. Ray-casting from OCR seed points to find wall boundaries. Flood fill for complex shapes. Area validation against labeled sqft.
Crop & Export
RGBA transparent-background crops. Bedroom classification from label text or area. 15% tolerance validation. Structured output: raw_crop.png + metadata.json per unit.
Gemini Polish
Gemini Flash Image redraws extracted units as clean technical line drawings with standardized architectural symbols.
SAM3 Refinement
Optional. Deploys to Modal H100 GPU for refinement of flagged units using Meta's Segment Anything 3 with point-based prompting.
This project represents an exploration into ML and computer vision. The pipeline works but proved extremely challenging. The variety of architectural drawing styles makes reliable extraction a hard problem. Included to show willingness to tackle difficult technical challenges.
Built With