Building a custom Docker image
This page shows how to build a custom Docker image for your script tasks.
You can bake all dependencies needed for your script tasks directly into the Kestra's base image. Here is an example installing Python dependencies:
FROM kestra/kestra:latest-full
USER root
RUN apt-get update -y && apt-get install pip -y
RUN pip install --no-cache-dir pandas requests boto3
Then, point to that Dockerfile in your docker-compose.yml file:
services:
  kestra:
    build:
      context: .
      dockerfile: Dockerfile
    image: kestra-python:latest
Once you start Kestra containers using docker compose up -d, you can create a flow that directly runs Python tasks with your custom dependencies using the PROCESS runner:
id: python_process
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    runner: PROCESS
    script: |
      import pandas as pd
      import requests
      import boto3
      print(f"Pandas version: {pd.__version__}")
      print(f"Requests version: {requests.__version__}")
      print(f"Boto3 version: {boto3.__version__}")
Building a custom Docker image for your script tasks
Imagine you use the following flow:
id: zip_to_python
namespace: company.team
variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
  - id: get_zipfile
    type: io.kestra.plugin.core.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"
  - id: parquet_output
    type: io.kestra.plugin.scripts.python.Script
    warningOnStdErr: false
    runner: DOCKER
    docker:
      image: ghcr.io/kestra-io/pydata:latest
    env:
      FILE_ID: "{{ render(vars.file_id) }}"
    inputFiles: "{{ outputs.unzip.files }}"
    script: |
      import os
      import pandas as pd
      file_id = os.environ["FILE_ID"]
      file = f"{file_id}-divvy-tripdata.csv"
      df = pd.read_csv(file)
      df.to_parquet(f"{file_id}.parquet")
    outputFiles:
      - "*.parquet"
The Python task requires pandas to be installed. Pandas is a large library and it's not included in the default python image. In this case, you have the following options:
- Install pandas in the 
beforeCommandsproperty of the Python task. - Use one of our pre-built images that already include pandas, such as the 
ghcr.io/kestra-io/pydata:latestimage. - Build your own custom Docker image that includes pandas.
 
1) Installing pandas in the beforeCommands property
id: install_pandas_at_runtime
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    runner: PROCESS
    beforeCommands:
      - pip install pyarrow pandas
    script: |
      import pandas as pd
      print(f"Pandas version: {pd.__version__}")
2) Using one of our pre-built images
id: use_prebuilt_image
namespace: company.team
tasks:
  - id: custom_dependencies
    type: io.kestra.plugin.scripts.python.Script
    runner: DOCKER
    docker:
      image: ghcr.io/kestra-io/pydata:latest
    script: |
      import pandas as pd
      print(f"Pandas version: {pd.__version__}")
3) Building a custom Docker image
If you want to build a custom Docker image for some of your scripts, first create a Dockerfile:
FROM python:3.11-slim
RUN pip install --upgrade pip
RUN pip install --no-cache-dir kestra requests pyarrow pandas amazon-ion
Then, build the image:
docker build -t kestra-custom:latest .
Finally, use that image in your flow:
id: zip_to_python
namespace: company.team
variables:
  file_id: "{{ execution.startDate | dateAdd(-3, 'MONTHS') | date('yyyyMM') }}"
tasks:
  - id: get_zipfile
    type: io.kestra.plugin.core.http.Download
    uri: "https://divvy-tripdata.s3.amazonaws.com/{{ render(vars.file_id) }}-divvy-tripdata.zip"
  - id: unzip
    type: io.kestra.plugin.compress.ArchiveDecompress
    algorithm: ZIP
    from: "{{ outputs.get_zipfile.uri }}"
  - id: parquet_output
    type: io.kestra.plugin.scripts.python.Script
    warningOnStdErr: false
    runner: DOCKER
    docker:
      image: kestra-custom:latest # ⚡️ Use your custom image here
      pullPolicy: NEVER # ⚡️ Use the local image instead of pulling it from DockerHub
    env:
      FILE_ID: "{{ render(vars.file_id) }}"
    inputFiles: "{{ outputs.unzip.files }}"
    script: |
      import os
      import pandas as pd
      file_id = os.environ["FILE_ID"]
      file = f"{file_id}-divvy-tripdata.csv"
      df = pd.read_csv(file)
      df.to_parquet(f"{file_id}.parquet")
    outputFiles:
      - "*.parquet"
Note how we use the pullPolicy: NEVER property to make sure that Kestra uses the local image instead of trying to pull it from DockerHub.
Was this page helpful?