Project 05 — Synthetic Dataset Generation + CAD Parts Detection

An AI company building agents for industrial CAD workflows needed to detect and localize specific parts in CAD software screenshots — but had no training data. We built the entire pipeline: procedural 3D model generation, synthetic dataset creation, and YOLO model training.

The Challenge

The client's AI agents need to identify specific CAD parts and localize sub-parts in screen captures of CAD authoring software. The fundamental challenge: no real training data exists. The client could not provide a meaningful dataset of annotated CAD screenshots. Without training data, they could not build the detection model their agents depend on.

What We Built

An end-to-end pipeline from synthetic data generation to trained AI model:

1. Procedural 3D model creation — parametric models in Houdini/Blender for 5 part classes, with plausible parameter distributions and inter-dependencies between features.

2. Non-photorealistic rendering — models rendered to look like CAD software displays (stylized, not photorealistic), with correct camera parameters and automatic ground truth annotations for every frame.

3. Image compositing — models composited onto realistic CAD software screen backgrounds to maximize domain realism.

4. YOLO model training — object detection model for class identification (5 classes) and a modified YOLO variant for sub-part localization with bounding boxes and interest point coordinates.

Procedural 3D model generation for 5 part classes (~10K models per class)
Non-photorealistic CAD-style rendering with automatic ground truth annotations
Compositing onto realistic CAD software screen backgrounds
YOLO object detection for class identification
Modified YOLO variant for sub-part localization with keypoint coordinates
Complete retrainable pipeline delivered (not just the model)

The Result

The pipeline achieved 90%+ mAP at IoU > 0.5 for both parts and sub-parts detection. The system runs fully offline — containerized with CUDA GPU, no internet required. The complete training pipeline was delivered so the client can retrain on new data independently.