How We Built a Visual Product Search Engine with CLIP
A deep dive into building Trooply Search — a GPU-accelerated visual search engine using OpenAI's CLIP model, Qdrant vector database, and FastAPI. From architecture decisions to production deployment.
The Problem
Most product search relies on text — keywords, filters, categories. But what if a customer has a photo of what they want and no words to describe it? What if they see a pair of shoes on the street and want to find where to buy them?
Text search fails here. Visual search doesn't.
We built Trooply Search to solve this: upload any product image, and instantly find matching or similar products from a catalog of millions. No keywords needed.
Why CLIP?
OpenAI's CLIP (Contrastive Language-Image Pre-training) is a neural network that understands both images and text in the same embedding space. This means:
- An image of red sneakers and the text "red running shoes" produce similar vectors
- You can search by image, by text, or by both simultaneously
- No need to manually tag or categorize products
We use the ViT-L/14 variant — the largest publicly available CLIP model. It produces 768-dimensional embeddings that capture fine-grained visual details. The tradeoff is size (~900MB in memory), but on our RTX GPU, inference takes under 50ms per image.
Architecture
Our search pipeline looks like this:
- Ingestion: Client uploads product images via REST API
- Embedding: CLIP encodes each image into a 768-dim vector (GPU-accelerated, FP16)
- Storage: Vectors are stored in Qdrant with product metadata
- Search: Query image → CLIP embedding → Qdrant nearest-neighbor search → ranked results
- Serving: FastAPI serves results with sub-200ms latency
The Stack
- FastAPI — async Python API framework, handles 500+ concurrent requests
- CLIP (ViT-L/14) — image/text encoding, runs on NVIDIA RTX GPU
- Qdrant — purpose-built vector database, handles billion-scale collections
- PostgreSQL — tenant management, API keys, usage tracking
- Redis — rate limiting, caching, session management
- Docker — isolated per-tenant deployment
Multi-Tenancy
Trooply Search is a SaaS product — multiple customers share the same infrastructure but their data is completely isolated.
Each tenant gets:
- Their own Qdrant collection (no cross-tenant access)
- Scoped API keys with rate limits
- Separate usage tracking and billing
- Isolated search results
We use Qdrant's collection-level isolation rather than namespace-level — this gives stronger security guarantees and independent scaling per tenant.
GPU Optimization
Running CLIP on CPU takes ~2 seconds per image. On our RTX GPU with FP16 inference:
- Single image: 45ms
- Batch of 32 images: 180ms (5.6ms per image)
- Memory: ~3.9GB total (model + CUDA runtime + framework overhead)
Key optimizations:
- FP16 inference: Half-precision floats on GPU — 2x speed, half the VRAM
- Batch processing: Group incoming images and encode in batches
- Pre-computed category embeddings: Common search queries are pre-encoded at startup
- Background removal: rembg strips backgrounds before encoding, improving match quality by 15-20%
Results
In production with real customer catalogs:
- Search latency: p50 = 89ms, p99 = 180ms
- Relevance: 87% of top-5 results rated "relevant" by human evaluators
- Throughput: 500+ concurrent searches on a single GPU
- Uptime: 99.9% over 6 months
What We Learned
- Vector search is only as good as your embeddings. CLIP is excellent for general products but struggles with very similar items (e.g., different shades of the same shoe). Fine-tuning on domain-specific data helps significantly.
- Background matters more than you think. A product on a white background vs. a lifestyle photo produces very different embeddings. Background removal before encoding improved relevance by 15-20%.
- GPU memory is the bottleneck, not compute. The CLIP model + PyTorch + CUDA runtime takes 3.9GB. On a 16GB GPU, that leaves room for ~3 more concurrent model instances. Plan your GPU memory budget carefully.
- Self-hosting wins for privacy-sensitive clients. Several customers chose Trooply Search specifically because their product images never leave their infrastructure. This is a real competitive advantage over cloud-only solutions.
Try It
Trooply Search is live at search.trooply.in. Upload an image, get results in under 200ms.
Want to integrate visual search into your product? Contact us — we can have you running in a day.
Want to build something similar?
We help companies build and deploy AI products. Let's talk about your project.