Inside the CloneGen Pipeline: What Happens to Your 3 Photos
When you upload three photos to CloneGen, a sequence of steps runs behind the scenes to turn them into a usable identity model. This post walks through that sequence at a concept level: what happens at each stage, what each generation mode costs in credits, and why the pipeline is structured the way it is. If you like knowing what you are paying for, this is for you. We're keeping the specific model providers and implementation details out of this write-up on purpose — those are product decisions that can and do change.
Step 1 — Upload and validation
Your three photos get uploaded directly from your browser to CloneGen's EU-hosted object storage via a presigned upload URL, bypassing our application server entirely. The server never handles the raw bytes — it only receives a confirmation with the final storage path. This keeps uploads fast and removes the file size from the server bandwidth budget.
Once the URLs are confirmed, a model row goes into our database with photo_status: pending and the three storage URLs attached. From there the background pipeline takes over.
Step 2 — Model preparation
For upload models, there is no "training" step in the classical sense where we fine-tune a per-user checkpoint. CloneGen uses identity-preserving diffusion pipelines that accept reference images as input on every generation call — there is no per-user model weight file to train or store.
What actually happens in the 2–3 minutes you see during model creation is: (1) the three photos are downloaded from storage to the background worker, (2) they are validated against the pipeline's input constraints, (3) each photo is tagged by its intended position (close-up, portrait, full-body), and (4) the row flips to photo_status: ready. For generated models — the path where you pick traits instead of uploading photos — this step also runs three portrait generations and stores those as the photos. The rest of the pipeline treats them identically to uploaded photos.
Step 3 — Generation routing
When you click Generate, CloneGen routes the request to the right pipeline based on four inputs: generation type (Prompt to Image / Identity Recreation / Video Recreate / Prompt to Video), content tier (safe / spicy), model type (upload / generated), and whether NSFW mode is enabled.
Credit costs per mode:
- Prompt to Image: 20 credits. Text-to-image generation styled to your model identity. Works across both safe and spicy tiers.
- Identity Recreation: 10 credits. Drop a scene or reference image and we re-render it with your model face replacing whatever face was in the original. Handles safe and spicy tiers.
- Video Recreate: 5 credits per second (or 10 credits/second with advanced motion control). Takes your reference video and re-renders each frame with your identity preserved temporally.
- Prompt to Video: 25 credits for 5-second clips, 50 credits for 10-second clips. Text-to-video from a description.
Step 4 — Credit handling
Credits are deducted before the generation starts, not after. This is a deliberate choice — deducting on success would create a race condition where a user could fire hundreds of parallel requests while their balance was being checked. The credit deduction lives inside a single atomic Postgres function that either successfully decrements the balance and returns the new total, or fails fast with an insufficient-credits error.
If the generation fails at any stage (provider error, timeout, safety filter rejection), credits are refunded via an idempotent refund path keyed on the generation ID. Idempotent means you can retry the refund safely — the second call is a no-op. Every paid generation goes through this path, which is why you never lose credits to transient failures.
Step 5 — Output storage
When the generation completes, the output is re-uploaded to CloneGen's own EU-hosted storage at a per-user namespace, and the generation row is updated with status: completed and the final public URL. We do not serve generations from any third-party CDN. If a pipeline provider ever rotates their storage, your gallery still works — it lives with us.
Why the pipeline looks like this
Three design constraints shape the architecture: (1) every generation must survive a provider outage without corrupting user state, so deduct-then-refund is non-negotiable; (2) identity consistency comes from multi-image reference input, not fine-tuned checkpoints, so there is no "training" step to bill for; and (3) output files must be portable, so everything lives on our own storage rather than upstream CDNs. The result is a pipeline where the only variable you see from the user side is the credit cost per mode — everything else is invisible.
Want to see the credit math for your workflow? The full cost table is on the pricing page, and you can work out how many generations each subscription tier buys from the numbers in the how-to guide.
Ready to build your AI clone? Free credits on sign-up, no card required.
Get started free →