All Projects
Exploration

Floor Plan AI Extraction

A Python pipeline that extracts individual apartment units from architectural PDFs using Meta's SAM3, Gemini OCR, and OpenCV, with GPU inference on Modal H100s.

The Problem

Extracting individual unit floor plans from full building architectural PDFs is a manual, tedious process. Property managers need isolated unit images for marketing, leasing, and resident communication. No affordable existing tool handles this reliably for multi unit residential buildings.

The Pipeline

0

PDF Rasterization

Converts architectural PDFs to 300 DPI images. Detects scale indicators (e.g., 1/8" = 1'-0"). Calculates pixels-per-foot.

1

Tiled Gemini OCR

Adaptive tiling (768 to 1536px based on text density) with Gemini Flash Lite. Extracts unit labels, numbers, sqft, room labels, dimensions. Deduplicates across tile overlaps.

2

Wall Detection

OpenCV adaptive thresholding + morphological operations. Ray-casting from OCR seed points to find wall boundaries. Flood fill for complex shapes. Area validation against labeled sqft.

3

Crop & Export

RGBA transparent-background crops. Bedroom classification from label text or area. 15% tolerance validation. Structured output: raw_crop.png + metadata.json per unit.

4

Gemini Polish

Gemini Flash Image redraws extracted units as clean technical line drawings with standardized architectural symbols.

5

SAM3 Refinement

Optional. Deploys to Modal H100 GPU for refinement of flagged units using Meta's Segment Anything 3 with point-based prompting.

This project represents an exploration into ML and computer vision. The pipeline works but proved extremely challenging. The variety of architectural drawing styles makes reliable extraction a hard problem. Included to show willingness to tackle difficult technical challenges.

Built With

PythonPyTorchSAM3GeminiOpenCVModal