ryan's digital forge

project: kintsugi

what is it? Shredded Document Reconstruction via Pairwise Seam Classification. Basically, it pieces back together shredded documents using AI so you don't have to.

accuracy: 82.7%

pytorch streamlit opencv resnet-18 huggingface

click for the deep-dive nerd stats

the tech stack

Libraries: PyTorch, Streamlit, OpenCV, NumPy, Plotly, Matplotlib, Seaborn, HuggingFace, Pillow, PDF2Image/Poppler.
Hardware: Rented from Vast.ai. 2x RTX 3070, AMD EPYC 7B12 64-Core Processor, 128 GB of DDR4 RAM. Navigated via terminal SSH and FileZilla.

the model: SeamResNet

Architecture: ResNet-18 backbone initialized with ImageNet weights, giving access to low-level feature extractors. Classification uses a Custom Dense Block.
Early Fusion: Input data is combined at the earliest stage, meaning the receptive field convolution filter directly covers both strips simultaneously to detect micro-texture spatial coherence.
Adversarial Training: Uses Hard Negative Mining to create spatially inaccurate but semantically similar data. This forces the loss function to penalize the smallest alignment errors, making it robust against translation invariance.

the data

Streamed real-time from the RVL-CDIP dataset via Hugging Face.
400,000 images (320k train, 40k val/test) split across 16 categories like forms, emails, and invoices.
Total size: ~35GB.

limitations

Currently limited strictly to vertical strip-cut documents.
The Multi-start Greedy Heuristic algorithm is a bottleneck and not guaranteed to find the absolute optimal sequence.
Sensitive to heavy colors or photographs, but a quick grayscale preprocessing step completely fixes performance for standard, lightly colored documents.