CLIP Progressive Steering Pipeline
Compare 6 embedding-steering methods side-by-side on CLIP ViT-B/16 image retrieval.
Steering Attributes
Use + to add attributes, ✕ to remove, and drag sliders to set weights. Leave all empty to let the LLM auto-generate on the next search.
Positive (steer toward)
Negative (steer away from)
Run a search to see feedback
1. Baseline CLIP
Pure cosine similarity — no steering
2. LLM Linear Steering
q′ = q + α·pos − β·neg
3. Contrastive Subspace
Centroid-based steering direction
4. Energy-Based
Gradient-descent optimisation in embedding space
5. Per-Concept Weighted
Normalised per-attribute weight steering
6. SAE PRF Steering
Pseudo-relevance feedback in sparse autoencoder latent space
User Study
Complete the steps below in order. Progress is saved as you go.
You will complete 22 image retrieval tasks.
Important: Please complete this study on your own — we need your personal opinion, not anyone else's.
How each query works
For each query you will see two sets of 5 images:
- Baseline (CLIP) — top 5 results from plain CLIP similarity (top of the page). These never change when you edit attributes.
- Linear Steering — top 5 results after steering (below). Only this section updates when you refine attributes.
Attributes and refinement
When a query first loads, the system uses an LLM to auto-generate a set of positive and negative attributes. You are free to edit, add, or remove them.
- Positive attributes steer the results toward that concept.
- Negative attributes steer the results away from that concept.
You can refine up to 3 rounds per query. After 3 rounds, only "Satisfied" is available.
Alpha (α) and Beta (β)
- Alpha (α) controls how strongly positive attributes pull the results (default 0.4).
- Beta (β) controls how strongly negative attributes push the results away (default 0.4).
- You can adjust these sliders before clicking Apply Refinement.
Example walk-through
Suppose the query is "a cozy living room":
- The LLM might suggest positive = warm lighting, soft furniture and negative = cluttered, dark.
- You look at the Linear Steering results. They look warm but too modern.
- Round 2: You remove soft furniture, add rustic to positive, and add modern to negative. Click Apply Refinement.
- The Linear Steering images update — now they look more rustic. Baseline stays the same.
- Round 3: You tweak alpha to 0.6 for stronger pull. Click Apply Refinement one last time.
- Happy with the results → click Satisfied.
After each query
- You will label whether each of the 10 retrieved images matches your intended meaning (Yes / No).
- You will answer a short comparison question and four rating questions.
- Then you move to the next query.
There are no correct answers. We are studying how people interpret subjective concepts.
Progress: Query 1 / 22 → Query 22 / 22
Estimated time: 20–30 minutes.
Participant information
Registered. Click below to start the first query.
Query 1 / 22
Round 1 / 3
🔵 Baseline (CLIP) Results
🟢 Linear Steering Results (only this section changes when you refine)
Steering Attributes — Add or remove attributes, then click Apply Refinement.
Positive (steer toward)
Negative (steer away from)
Image Annotations
For each query, annotate all images then click Done with Query X to save. You can update later if needed.
After all queries are saved, fill in the Final Survey at the bottom and click Submit & Finish.
Query 1 / 22: "a golden retriever"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 2 / 22: "Dog on the beach"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 3 / 22: "Dog looking guilty"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 4 / 22: "friendly looking dog"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 5 / 22: "aggressive looking dog"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 6 / 22: "nervous looking dog"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 7 / 22: "Hyper active dog"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 8 / 22: "a person riding a bicycle"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 9 / 22: "A dog playing"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 10 / 22: "an exciting action scene"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 11 / 22: "a joyful moment"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 12 / 22: "A kid having fun"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 13 / 22: "peaceful scene"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 14 / 22: "a photo with motion"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 15 / 22: "wearing eyeglasses"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 16 / 22: "a person smiling"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 17 / 22: "looking guilty"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 18 / 22: "looking happy"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 19 / 22: "looking sad"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 20 / 22: "looking suspicious"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 21 / 22: "looking tired"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Query 22 / 22: "looking confident"
Overall, did Linear Steering give you better top-5 results than Baseline for this query?
Consider: even if both returned correct images, did Linear provide more relevant or better-matching ones?
Overall Experience Ratings
Think about the entire study across all 22 queries.
Rate each statement (1 = strongly disagree, 7 = strongly agree):
Final Survey
Thank you!
Your responses have been saved. We appreciate your participation.