Duolingo “Speak About the Photo” Practice

Main Dashboard

The Speak About the Photo task gives you 20 seconds to look at an image and 30 to 90 seconds to describe it out loud. That’s it. No prompts, no questions, no hints — just you and a photograph.

Most students struggle with this task for one of three reasons. They rush through the observation period and miss important details. They run out of things to say after 20 seconds and start repeating themselves. Or they describe a flat list of objects (“I see a table, I see a chair, I see a window”) instead of a connected, fluent description.

The fix is daily practice with real image descriptions — and that’s exactly what this page gives you. Below, you’ll find six free practice sets organized by difficulty, from a single person portrait all the way to complex multi-element scenes. Each one includes timing instructions, a sample high-scoring response, and specific tips for that image type.

Come back to this page regularly. The more photo descriptions you practice, the faster your descriptive vocabulary and fluency will grow — and that shows directly in your DET speaking score.


Choose Your Image to Describe

People, Landscapes, and Action Shots (Sets 1–6)

Work through these sets in order if you’re new to this task. If you’re more confident, jump to Sets 4–6 for a real challenge. For each set: read the image description slowly, give yourself exactly 20 seconds to absorb it, then speak for 30 to 90 seconds. Record yourself if at all possible — you’ll hear things in the playback that you completely missed while speaking.

Set 1: Portrait — A Young Woman at a Café

Image Description:
The image shows a young woman sitting alone at a small round table inside a café. She appears to be in her mid-twenties and is wearing a light blue denim jacket over a white shirt. Her dark hair is pulled back loosely. She’s looking down at an open book on the table in front of her, and there’s a ceramic coffee mug to her right. The café behind her has exposed brick walls, warm lighting from hanging bulbs overhead, and a chalkboard menu visible in the background. A few other customers are slightly blurred in the far background. The overall atmosphere feels quiet and relaxed.

Instructions: Study this description for 20 seconds as if you’re looking at a real photograph. Then record yourself speaking for 30 to 90 seconds describing what you see in as much detail as possible.

[Record your response here before reading the sample below.]

_________________________________________________________________________________________________

Sample High-Scoring Response:

“This photo shows a young woman who looks like she’s in her mid-twenties, sitting at a small café table by herself. She’s wearing a light blue denim jacket and a white shirt, and her hair is tied back loosely — she looks relaxed and comfortable.

She’s looking down at a book that’s open on the table, and there’s a ceramic coffee mug next to it on her right side. The way she’s holding her posture and focusing on the book, it looks like she’s really absorbed in what she’s reading — maybe she’s studying, or maybe it’s just a novel she enjoys.

The café itself has a really warm, cozy feel. There are exposed brick walls in the background and some hanging light bulbs that give the space a soft, golden light. You can also see a chalkboard menu on the wall, which suggests this is a small, independent café rather than a chain. There are a few other people in the background, but they’re out of focus, so the attention stays on the woman.

Overall, the mood of this photo is calm and peaceful. It feels like a quiet afternoon moment — the kind of place people go when they want to read or think without being disturbed.”

💡 Pro Tip: For portraits, one powerful sentence is always the inference — “Based on her expression and posture, she looks like she’s…” This shows you’re engaging with the image, not just listing what’s visible.

Key Speaking Tips for Portrait Shots:

  • Start with a broad overview: who is in the image, approximate age, where they are.
  • Describe what the person is doing — use present continuous tense (“she is reading,” “he is smiling”).
  • Move from the subject outward to the setting and background.
  • Always include an inference about mood or situation — this is what separates average responses from high-scoring ones.

Common Mistakes to Avoid:

  • Don’t say “I see a woman” and stop there. That’s a label, not a description.
  • Don’t skip the background. The setting tells half the story.
  • Don’t guess too specifically (“she is 24 years old”). Say “she appears to be in her mid-twenties.”

How did you do? Did you mention the brick walls and the chalkboard menu, or did you focus only on the woman? Compare your response with the sample and note what you missed.


Set 2: Small Group — A Family Dinner Table

Image Description:
The image shows a family of four sitting around a rectangular wooden dining table for what appears to be an evening meal. Two adults — a man and a woman who look like they’re in their forties — are seated across from each other. Two children, a boy and a girl who look about eight and ten years old, are seated on either side. Everyone is mid-conversation; the adults are smiling and the children look animated. The table is set with plates, glasses of water, a bowl of salad in the center, and a large dish of pasta. The kitchen is visible in the background with warm yellow lighting. There are framed photos on the wall to the left.

Instructions: Study this description for 20 seconds. Then speak for 30 to 90 seconds.

[Record your response here before reading the sample below.]

_________________________________________________________________________________________________

Sample High-Scoring Response:

“This photo shows what looks like a family having dinner together. There are four people around the table — two adults who look like the parents, probably in their forties, and two children, a boy and a girl who seem to be around eight or ten years old.

Everyone looks engaged in conversation. The parents are smiling, and the kids look like they’re telling a story or reacting to something — there’s definitely a lot of energy at the table. It feels like a regular weeknight dinner rather than a special occasion, which actually makes the image feel very natural and warm.

The table itself has a nice spread on it. There’s a large bowl of salad in the middle, a dish of pasta, and glasses of water. The wooden table and the plates look simple and homey.

In the background, you can see part of the kitchen — the lighting is warm and yellow, which adds to the cozy atmosphere. On the wall to the left, there are some framed photos, which suggests this is a family home that’s been lived in for a while.

The overall feeling of this image is very genuine and comfortable. It’s the kind of photo that captures an ordinary moment, but in a really meaningful way.”

💡 Pro Tip: For group images, one sentence describing the energy or interaction between people is very powerful. “Everyone seems engaged in conversation” is a much stronger line than just listing who’s sitting where.

Key Speaking Tips for Group Shots:

  • Describe the group first as a whole, then break it down by individual people.
  • Note relationships if you can infer them — “they appear to be a family,” “they look like colleagues.”
  • Describe the interaction between people, not just each person separately.
  • Use spatial language: “on the left,” “across from each other,” “in the center of the table.”

Common Mistakes to Avoid:

  • Don’t describe each person as a separate, disconnected item. Connect them: “the two adults are sitting across from each other.”
  • Don’t forget the objects on the table — food, drinks, and tableware all add descriptive detail.
  • Don’t rush past the background. The kitchen tells you this is a home, not a restaurant.

Set 3: Indoor Scene — An Office Workspace

Image Description:
The image shows a tidy home office. In the center of the frame is a white desk with a large computer monitor, a keyboard, and a small plant in a terracotta pot on the left corner. A black office chair is pushed halfway back from the desk, as if someone just stood up. On the desk there are also a few notebooks, a pen holder, and a glass of water. Behind the desk, there’s a large window with natural light coming through white curtains — you can see green trees outside. To the right of the window, there’s a bookshelf filled with books and a few small decorative items. The floor is light wood. The overall space looks organized and calm.

Instructions: Study this description for 20 seconds. Then speak for 30 to 90 seconds.

[Record your response here before reading the sample below.]

_________________________________________________________________________________________________

Sample High-Scoring Response:

“This photo shows a home office that looks very clean and well-organized. The main focus of the image is a white desk in the center, with a large monitor, a keyboard, and a few notebooks arranged neatly on the surface. There’s also a small plant in a terracotta pot on the left side of the desk, which adds a bit of color and life to the workspace.

The chair is pushed back slightly, which gives the impression that whoever works here stepped away just a moment ago. It makes the scene feel very real and lived-in, not staged.

Behind the desk, there’s a big window with light curtains, and natural daylight is coming through. You can see green trees outside, so this is probably a ground-floor or low-level office with a garden view. That kind of natural light would make this a really nice place to work.

To the right of the window, there’s a bookshelf with a variety of books and some small decorative objects — a globe, maybe, or a few figures. It’s hard to say exactly, but it adds personality to the space.

The floor is light wood, and everything about the room has a calm, focused atmosphere. It looks like the workspace of someone who takes their work seriously but also cares about their environment.”

💡 Pro Tip: Indoor scenes reward structure. Try moving from front to back, or center to edges. The examiner can tell when your description is organized versus when you’re just jumping between random objects.

Key Speaking Tips for Indoor Scenes:

  • Organize spatially: center, left, right, background, floor.
  • Look for clues about who uses the space — personal items, style, organization level.
  • Describe light sources: is it natural or artificial? Warm or cool?
  • Always end with the atmosphere or mood the room creates.

Common Mistakes to Avoid:

  • Don’t list furniture items one by one without connecting them. Walk through the space logically.
  • Don’t ignore small details like the plant or the glass of water — these show you observed carefully.
  • Don’t miss the window or background view. It gives context to the entire scene.

Set 4: Outdoor Landscape — A Mountain Path at Sunrise

Image Description:
The image shows a narrow hiking trail cutting through a mountain landscape at what appears to be early morning. The trail runs from the lower left corner of the image diagonally toward a distant peak. On either side of the path, there is tall green grass and scattered wildflowers in shades of yellow and purple. The sky takes up roughly the top third of the image and is a mix of deep orange, pink, and pale blue — classic sunrise colors. There are no people visible on the trail. In the mid-distance, a row of dark pine trees runs across the hillside. The light is soft and golden, casting long shadows across the grass.

Instructions: Study this description for 20 seconds. Then speak for 30 to 90 seconds.

[Record your response here before reading the sample below.]

_________________________________________________________________________________________________

Sample High-Scoring Response:

“This is a landscape photo taken outdoors, and it looks like it was shot very early in the morning — the lighting and the colors in the sky really suggest sunrise.

The main feature of the image is a narrow hiking trail that starts in the lower left corner and winds its way up toward a distant mountain peak. The trail looks well-worn, like it’s used regularly, but there’s nobody on it right now. That actually gives the image a really peaceful, almost lonely quality.

On both sides of the trail, there’s tall grass and wildflowers — I can see yellow and purple flowers scattered through the green. It’s very colorful and natural-looking, not a manicured park but a genuinely wild landscape.

In the middle distance, there’s a line of dark pine trees running across the hillside, and they create a nice contrast against the lighter sky and grass. Beyond the trees, you can see the mountain peak in the background.

The sky is maybe the most striking part of the image. It’s a mix of deep orange, pink, and pale blue — those early morning colors that only last for maybe twenty minutes. The light from the sunrise is soft and golden, and you can see long shadows stretching across the grass.

Overall, this photo has a very calm and expansive feeling. It makes you want to walk that trail.”

💡 Pro Tip: Most people miss the background details — scan the whole image from top to bottom before you start speaking. In landscape shots especially, the sky and distant elements often carry the mood of the entire photograph.

Key Speaking Tips for Landscape Shots:

  • For landscapes, start with the sky and background, then work your way forward to the foreground. This is the opposite of portrait shots.
  • Describe the light: time of day, direction, quality (harsh, soft, golden).
  • Use color vocabulary generously — landscapes reward specific color description.
  • End with an emotional response or inference: what does the landscape make you feel or want to do?

Common Mistakes to Avoid:

  • Don’t start with the nearest detail and work backward. Landscapes flow from background to foreground.
  • Don’t neglect the sky — in many landscape photos it’s the most visually dominant element.
  • Don’t say “it’s a nice place.” Be specific: “the golden morning light and the empty trail give the image a calm, almost meditative quality.”

Set 5: Action Shot — A Street Basketball Game

Image Description:
The image captures a mid-air moment during an outdoor basketball game on a city court. In the center of the frame, a young man wearing a red jersey is jumping high to shoot the ball toward a metal hoop. His body is fully extended — one arm raised with the ball, the other arm out for balance. Directly below and behind him, two other players are watching the ball, their arms raised in defense. The court has faded painted lines on asphalt. Around the edges of the court, a small group of spectators — maybe six or eight people — are watching from behind a chain-link fence. In the background, there are low apartment buildings and a cloudy sky.

Instructions: Study this description for 20 seconds. Then speak for 30 to 90 seconds.

[Record your response here before reading the sample below.]

_________________________________________________________________________________________________

Sample High-Scoring Response:

“This photo captures a really dynamic moment during an outdoor basketball game. The main subject is a young man in a red jersey who’s in mid-air — he’s jumped up and he’s extending his arm to shoot the ball toward the basket. You can see the full length of his body is stretched out, which gives the image a lot of energy and movement.

Right beneath him, there are two other players with their arms up. They’re clearly trying to block the shot, so this looks like it’s a competitive moment — the kind where everyone watching holds their breath.

The court itself is outdoors, on asphalt, and the painted lines have faded quite a bit. That tells you this is a neighborhood court, not a professional facility — it gives the image a very authentic, street-level feeling.

Around the edges of the court, you can see a small group of spectators standing behind a chain-link fence, watching the game. There are maybe six or eight people. Some of them look engaged, leaning forward slightly.

The background shows low apartment buildings and a cloudy sky, which reinforces the urban setting. The light is flat and even, which is typical for an overcast day.

The overall feeling of this photo is raw and energetic. It feels like a real game in a real neighborhood, not something staged for a commercial.”

💡 Pro Tip: Action shots are excellent for demonstrating present continuous tense naturally and fluently. Lean into it: “the player is leaping,” “the defenders are stretching their arms upward.” It sounds smooth and it’s grammatically strong.

Key Speaking Tips for Action Shots:

  • Use present continuous tense throughout: “a man is jumping,” “two players are raising their arms.”
  • Describe the frozen moment in detail — what is each person’s body doing right now?
  • Talk about what’s about to happen or just happened — inference is especially powerful in action shots.
  • Don’t forget the context: the crowd, the setting, the weather.

Common Mistakes to Avoid:

  • Don’t use past tense (“he jumped”) for what’s visible in the photo. Use present tense.
  • Don’t describe the action as a single event. Slow down and describe every body and every position visible.
  • Don’t forget the spectators and background — they add context that elevates your description.

Set 6: Complex Scene — A Busy Weekend Market

Image Description:
The image shows a busy outdoor market on what appears to be a sunny weekend morning. The market is set up along a wide pedestrian street, with wooden stalls lining both sides. In the foreground, a woman in a yellow coat is examining a display of handmade ceramic bowls. To her left, a vendor — an older man with a gray beard and an apron — is talking to another customer. Behind them, the street is crowded with people walking in both directions. Stalls sell a variety of things: flowers in buckets, vegetables, bread loaves, and what looks like vintage clothing. Colorful banners hang overhead between the stalls. The sky is a clear, deep blue. Trees with full green leaves line the far end of the street.

Instructions: Study this description for 20 seconds. Then speak for 30 to 90 seconds. This is the most complex scene — push yourself to reach 75 to 90 seconds.

[Record your response here before reading the sample below.]

_________________________________________________________________________________________________

Sample High-Scoring Response:

“This photo shows a really lively outdoor market that looks like it’s happening on a sunny weekend morning. There’s a lot going on, so I’ll try to describe it from front to back.

In the foreground, the most prominent figure is a woman wearing a bright yellow coat. She’s standing at a stall and looking closely at a collection of handmade ceramic bowls — she looks like she’s deciding whether to buy one. Right next to her, there’s an older vendor with a gray beard wearing an apron, and he’s in conversation with another customer. The interaction looks friendly and relaxed.

Behind them, the street is quite crowded. Lots of people are moving in both directions, and you get the sense that this is a popular local market, not just a small neighborhood event. The stalls on both sides are selling different things — I can see flowers in buckets, vegetables, what looks like artisan bread, and some vintage or second-hand clothing hanging on racks.

Overhead, there are colorful banners or flags strung between the stalls, which gives the whole street a festive, celebratory feel. The sky above is a really vivid, clear blue, and the sunlight looks strong — it’s casting sharp shadows on the pavement.

At the far end of the street, there are trees with full green leaves, which suggests this is late spring or summer.

The overall atmosphere is warm, busy, and communal. It feels like the kind of market where local people spend their Saturday mornings — relaxed, social, and connected to the community.”

💡 Pro Tip: For complex scenes, your first sentence should always give the big picture: “This photo shows a busy outdoor market on a sunny day.” From there, zoom in. Starting with the overview tells the examiner that you can organize information — and that’s a fluency signal in itself.

Key Speaking Tips for Complex Scenes:

  • Organize by zone: foreground, middle ground, background. Don’t jump randomly between areas.
  • Prioritize the most active or prominent elements first, then move to supporting details.
  • In crowded scenes, describe general activity (“people are walking between stalls”) rather than trying to describe every individual.
  • Describe materials, textures, and colors generously — wooden stalls, ceramic bowls, colorful banners.

Common Mistakes to Avoid:

  • Don’t describe a complex scene randomly. The examiner should be able to follow your eyes moving through the image.
  • Don’t spend all your time on one element. A complex scene rewards breadth of description.
  • Don’t forget overhead elements like banners, sky, or signs — most people only describe what’s at eye level.

What Is the Speak About the Photo Task?

20 Seconds to Look, 30–90 Seconds to Speak

Here’s the exact format. When this task appears on your DET, you’ll see a photograph on screen. You have 20 seconds to observe it — the timer is visible, and it counts down. You cannot pause it. When the 20 seconds end, the recording starts automatically and you must speak.

The minimum speaking time is 30 seconds. The maximum is 90 seconds. Speaking for only 30 seconds is technically allowed, but it gives the AI very little audio data to score — your pronunciation, vocabulary range, fluency, and intonation all need time to be measured properly. Aim for 60 to 75 seconds as your target, and push to 90 if you have enough to say.

The 20-second observation window is the most underused resource on this entire task. Think of it like scanning a room when you walk in — you don’t stare at the doorknob. You take in the whole space first, then notice details. Train yourself to look at the image in a structured way: overall scene → foreground figures → background → colors and light → mood or atmosphere. Twenty seconds is enough time to cover all five if you’re deliberate about it.

The Difference Between Writing and Speaking About a Photo

This matters more than most students realize, and getting it wrong can cost you fluency points.

When you write a photo description — for a writing task or an essay — you have time to edit. You can delete a weak sentence, restructure a paragraph, and choose a better word. Written descriptions also benefit from formal grammar, carefully constructed complex sentences, and precise punctuation. The reader has time to re-read.

Speaking is completely different. You cannot edit. Once a sentence leaves your mouth, it’s recorded. This means your speech needs to be organized before you open your mouth — which is exactly why the 20-second observation window is so important.

At the same time, spoken English has its own natural rhythms. Contractions are normal (“it looks like,” “there’s a woman,” “I can see”). Slightly shorter sentences are fine. Natural connectors like “and then,” “which gives the scene,” or “so you get the sense that” sound fluent and human in speech, even though they might feel informal in writing.

What’s penalized in speaking that isn’t penalized in writing: filler words. “Um,” “uh,” “like” when used as a pause — these register as fluency breaks. In writing, there’s no equivalent. In speaking, they are scored directly. The goal isn’t perfection; it’s replacing filler words with brief, purposeful pauses or with natural bridge phrases like “what I mean is” or “to describe it another way.”

One more key difference: intonation. In writing, emphasis comes from bold text, punctuation, or word choice. In speech, it comes from the rise and fall of your voice. A flat, monotone delivery sounds less fluent than a varied one, even if the words are identical. Practice stressing the important words in your description naturally — “the WOMAN in the YELLOW coat,” “a really BUSY outdoor MARKET.”

Stop AI Guesswork: Get Real Human Speaking Feedback

The practice sets on this page are genuinely useful, and if you work through all six sets regularly, you will improve. But there’s a ceiling to what self-practice can give you.

When you record yourself and listen back, you can catch your own “ums” — eventually. What you can’t catch is that you’re consistently mispronouncing a common sound in a way that the DET’s AI is flagging. You can’t always hear that your sentence rhythm drops at the end of long descriptions, making you sound uncertain. You can’t see that your observation strategy is missing whole sections of the image because of a habit you’ve developed.

Real human feedback — from a specialist who has analyzed hundreds of DET speaking responses — identifies those gaps in a single session. Teacher Leda works with students specifically on pronunciation accuracy, descriptive vocabulary range, fluency under time pressure, and the kinds of speaking patterns that prevent high Conversation subscores. The difference between a student who self-studies for four weeks and one who gets two live coaching sessions is often 15 to 20 points on the speaking subscores.

Self-study is the foundation. Live feedback is the accelerator.

👉 Get Human Speaking Feedback Now

Frequently Asked Questions

How much time do I get to look at the photo before speaking? +
You have 20 seconds to observe the image before the recording starts. The timer is shown on screen and counts down automatically. You can’t pause it or ask for more time. This is why practicing deliberate observation — scanning the image in a structured way from foreground to background — is so important. Twenty seconds sounds short, but if you use it with a system (overall scene, main subjects, background details, colors, mood), it’s genuinely enough time to build a solid mental outline for a 60 to 90 second response.
How long must I speak about the photo? +
You must speak for a minimum of 30 seconds, and you can speak for up to a maximum of 90 seconds. Speaking for exactly 30 seconds is not recommended — it gives the AI’s scoring system very limited data to evaluate your pronunciation, vocabulary, and fluency. Aim for at least 60 seconds as a baseline, and push toward 90 seconds when the image gives you enough to describe. The practice sets on this page are designed to help you reach 60 to 90 seconds naturally, without running out of things to say.
Is the photo in color or black and white? +
Photos on the DET are typically in full color. This is actually helpful, because color vocabulary is one of the easiest ways to add descriptive detail to your response — “a woman in a bright yellow coat,” “a deep blue sky,” “golden morning light.” Don’t pass up the opportunity to mention colors. They demonstrate vocabulary range and make your description more vivid. If a photo ever appeared in black and white, that itself would be worth mentioning as a descriptive detail.
What happens if I describe the wrong thing in the image? +
The AI grades for relevance. If your description doesn’t match the image — for example, if you describe a beach scene when the photo shows a city street — your score will drop significantly. The system is comparing your spoken words against the content of the image using visual recognition. This is why the observation window matters so much. If you’re unsure about a detail, describe what you can see clearly and use hedging language: “it looks like,” “what appears to be,” “in the background, I think I can see.” That’s more honest and more fluent than confidently describing something incorrectly.
Can I just point at the screen and say “this is a…”? +
Pointing isn’t possible in a spoken response — the microphone records only audio, not your gestures. You need to describe location entirely through spatial language and prepositions. Instead of pointing, you’d say “on the left side of the image,” “in the foreground,” “behind the woman in the yellow coat,” “in the upper right corner,” or “in the center of the frame.” Practicing spatial prepositions is one of the most practical things you can do for this task. They help you organize your description logically and give the listener — or the AI — a clear mental map of where everything is in the photo.
How many times will I have to speak about a photo in the test? +
The DET is adaptive, meaning the number and difficulty of tasks adjusts based on your performance as you go. Typically, you can expect to see 2 to 4 Speak About the Photo tasks during a full test session. If your early responses are strong, the tasks may become more complex — images with more elements, more ambiguous content, or more challenging vocabulary demands. If you’re struggling, the tasks may stay simpler. Either way, practicing across the full range of image types — portraits, landscapes, action shots, and complex scenes — prepares you for whatever variation the adaptive test sends your way.

Looking for more DET speaking practice? Visit our Main Dashboard for the W-H Method and full response structure technique.

Stop Guessing. Get Expert DET Feedback.

If you’re making the same type of error repeatedly, self-practice isn’t enough. Get personalized coaching, error analysis, and a practice plan built around your actual results.

Targeted Error Analysis
2026 Updated Strategies
Live Feedback Sessions
Get Expert Feedback Now