Get Started
Screenshot of n8n workflow
FREE TEMPLATE
Generate Gemini Image Captions for Social Posts
11
Views
0
Downloads
16
Nodes
Download Template
Free
Preview Template
Utility Rating
6 / 10
Business Function
Marketing
Automation Orchestrator
n8n
Integrations
Google Gemini
Trigger Type
Approx setup time ≈ 25 min
Need help setting up this template?
Ask in our free Futurise community
About
Community
Courses
Events
Members
Templates

How to Generate Gemini Image Captions for Social Posts?

Leon Petrou
FREE TEMPLATE
Generate Gemini Image Captions for Social Posts
11
Views
0
Downloads
16
Nodes
Download Template
Free
Preview Template
Utility Rating
6 / 10
Business Function
Marketing
Automation Orchestrator
n8n
Integrations
Google Gemini
Trigger Type
Approximate setup time ≈ 25 minutes
Need help setting up this template?
Ask in our free Futurise community

Description

Turn any image into a ready to post asset. The system writes a title and caption, then places the text neatly on the picture. Great for social posts, blogs, and quick creative needs without a designer.

Here is how it works. A manual trigger starts the run and an HTTP Request node downloads an image. The image is resized to 512 by 512 for fast AI processing, and basic image details are read. A Gemini vision model receives the image and returns a structured title and caption using a built in parser. A Code node calculates where the text should go based on the image size. Merge nodes combine the image, the caption, and the positions. Finally, the Edit Image node draws a subtle background box and overlays the title and caption on the image.

Setup is straightforward. You need a Google Gemini API key and n8n Cloud credentials for the Gemini node. Expect faster production of branded graphics, less editing time, and clear text placement. Use it for auto captioned hero images, watermarked visuals, and quick social content. With small changes, it can batch process many images and keep your style consistent.

Copy link

Tools Required

What this workflow does?

  • Image download with HTTP Request to pull assets from a URL as binary data.
  • Image resizing to 512 by 512 for faster AI processing and stable results.
  • Vision captioning using Google Gemini 1.5 Flash through an LLM chain.
  • Structured Output Parser to return a title and caption in a clean JSON shape.
  • Code node calculates font size, line length, and bottom placement based on image size.
  • Merge nodes combine the image, AI text, and positions before editing.
  • Edit Image multi step draws a semi transparent background and overlays the text.
  • Manual trigger for safe testing with easy switch to other triggers later.

What are the benefits?

  • Reduce manual captioning and layout from 10 minutes to 1 minute per image
  • Automate 80% of repetitive design work for simple social graphics
  • Keep text placement consistent across all images to improve brand quality
  • Handle dozens of images in a single run without extra tools
  • Eliminate copy paste steps by generating and applying text in one flow

How to set this up?

  1. Import the template into n8n: Create a new workflow in n8n > Click the three dots menu > Select 'Import from File' > Choose the downloaded JSON file.
  2. You'll need accounts with Google Gemini. See the Tools Required section above for links to create accounts with these services.
  3. In n8n Cloud, open the Google Gemini Chat Model node, click the Credential to connect with dropdown, select Create new credential, and follow the on screen steps.
  4. Get your Google Gemini API key from the Google AI Studio site, paste it into the new credential, and save. Name the credential clearly, for example Gemini Prod.
  5. Open the HTTP Request node and confirm the image URL works. Set Response Format to File and make sure Binary Property is set to data to store the image as binary.
  6. Check the Resize For AI node. Set width to 512 and height to 512. Confirm the input binary property matches the HTTP Request output.
  7. Open the Image Captioning Agent node. Ensure the language model is set to the Google Gemini node and the Structured Output Parser is connected with caption_title and caption_text fields.
  8. Review the Code node named Calculate Positioning. Adjust the lineHeight value if the text appears too large or too small for your images.
  9. Open the Apply Caption to Image node. Confirm it draws the background rectangle and then writes the title and caption using the fields from the merged data.
  10. Click Test workflow to run with the manual trigger. Verify the final image shows a readable overlay and correct text. If the image fails, check API key permissions and confirm the model is available in your region.
  11. If text is cut off, increase the image width or reduce the font size logic in the Code node. If the overlay hides content, adjust the rectangle height or placement values.

Need help or want to customize this?

Similar Templates

n8n
Marketing
Automate Gemini Asset Tagging
Turn images and PDFs into clear tags, captions, and summaries using Google Gemini. Great for marketing teams that manage many assets and need fast, consistent results across different media types. Pick the method that fits your task, from quick single image checks to full control API calls. The flow starts on manual run and branches into five paths. One path sends a single image straight to an AI agent with binary passthrough for the fastest setup. Another path processes multiple images with custom prompts and loops through each item. A third path follows the standard n8n item model, converts files to base64, and calls Gemini directly. The fourth path fetches a PDF, converts it to base64, and asks Gemini for a summary. The fifth path does the same for a single image via a custom API call. You can filter inputs, split data, and control prompts per item. You need a Google Gemini API key and credentials in n8n. Add your image URLs and prompts in the Set nodes, or point to your PDFs. Run a branch, check the output text, and adjust your instructions for better tags. Expect faster media review, more consistent labels, and less manual copywriting work. Useful for product catalogs, social content planning, brand audits, and document summaries.
12 views
view
n8n
Marketing
Automate Gemini Podcast Production
Turn fresh news into ready to publish audio in minutes. This build pulls articles from a news homepage, chooses the best stories, writes a clean script, and produces a voice ready file. It helps content teams ship daily news podcasts without manual copy and paste work. Here is how it runs. A manual trigger starts a fetch of the news page, then HTML nodes extract article cards, titles, links, and short descriptions. Items split out and a limit node caps volume. A Gemini based classifier scores each headline to decide if it fits a podcast. Only suitable items move on. The workflow fetches each full article page, extracts the body, removes empty results, and aggregates the text. A Gemini LLM then turns the set of stories into a structured podcast script using an output parser. If a script exists, a Hugging Face text to speech node generates the audio file. You need access to Google Gemini and Hugging Face and the right credentials in n8n. Expect production time to drop from hours to minutes and more consistent story tone. Great for news roundups, brand updates, or internal briefings. Adjust CSS selectors if your source site layout changes and tune the classifier to match your editorial style.
15 views
view
n8n
Marketing
Generate Gemini YouTube SEO Content
Turn any public YouTube link into useful content on demand. The flow can create a clean transcript, a timestamped transcript, a scene description, short clip ideas, or a summary for blogs and video SEO. It is ideal for marketing and content teams that want fast, repeatable outputs from videos. The run starts with a manual trigger and sets core fields like automation ID, API key, model, prompt type, and the video URL. A switch chooses the right prompt based on the prompt type you set, then packages the request. The HTTP Request node calls the Google Generative Language API with your prompt and the video URL. A small code step merges the API response with earlier fields, and a mapping step extracts the final answer so it is easy to pass downstream. An optional error path helps you add custom handling if the API returns an issue. Setup is simple. You only need a Google API key and a public YouTube URL. Paste the API key once and choose the output you want. Expect to reduce manual transcription and packaging from hours to minutes. Common uses include full YouTube SEO packages, timestamped notes for editing, social clip ideas, and concise blog summaries that match the video.
5 views
view
See More Templates

These templates were sourced from publicly available materials across the web, including n8n’s official website, YouTube and public GitHub repositories. We have consolidated and categorized them for easy search and filtering, and supplemented them with links to integrations, step-by-step setup instructions, and personalized support in the Futurise community. Content in this library is provided for education, evaluation and internal use. Users are responsible for checking and complying with the license terms with the author of the templates before commercial use or redistribution.Where an original author was identified, attribution has been provided. Some templates did not include author information. If you know who created this template, please let us know so we can add the appropriate credit and reference link. If you are the author and would like this template removed from the library, email us at info@futurise.com and we will remove it promptly.