Aiven for Apache Kafka® as a source for Aiven for ClickHouse®#

This article shows by way of example how to integrate Aiven for Apache Kafka® with Aiven for ClickHouse® using Terraform provider for Aiven. An Apache Kafka® source topic is used as a data source and Aiven for ClickHouse® is used to filter or transform the raw data with a materialized view before writing it to a regular table.

Sample of sensor data

First, check out how sensor data can look like for a better understanding of this recipe and the clickhouse_kafka_user_config Terraform block used in this article.

  "sensor_id": 10000001,
  "ts": "2022-12-01T10:08:24.446369",
  "key": "cpu_usage",
  "value": 96

Let’s cook!#

flowchart LR id1(iot_measurements_topic) id2[(iot_measurements)] id3(edge_measurements_raw_table) id4(cpu_high_usage_table) subgraph Aiven for Apache Kafka id1 end subgraph Aiven for ClickHouse id2 subgraph iot_measurements DB id3-->|Filter|id4 end end id1-->|Service integration|id2

Imagine that you’ve been collecting IoT measurements from thousands of sensors and these metrics are being populated in an Apache Kafka topic called iot_measurements. Now, you’d like to set up an Aiven for ClickHouse database and write filtered messages into table cpu_high_usage.

This recipe calls for the following:

  1. Set up an Aiven for ClickHouse database for writing and processing raw data.

  2. Insert the measurements data from Apache Kafka topic iot_measurements into the Aiven for ClickHouse database.

  3. Filter the data and save the output to the new cpu_high_usage table.

Configure common files#

Expand to check out the common files needed for this recipe.

Navigate to a new folder and add the following files:

  1. file

terraform {
  required_providers {
    aiven = {
      source  = "aiven/aiven"
      version = "~> 3.10.0"

provider "aiven" {
  api_token = var.aiven_api_token


You can set environment variable TF_VAR_aiven_api_token for the api_token property so that you don’t need to pass the -var-file flag when executing Terraform commands.

  1. file

Use it for defining the variables to avoid including sensitive information in source control. The file defines the API token, the project name, and the prefix for the service name.

variable "aiven_api_token" {
  description = "Aiven console API token"
  type        = string

variable "project_name" {
  description = "Aiven console project name"
  type        = string
  1. *.tfvars file

Use it to indicate the actual values of the variables so that they can be passed (with the -var-file= flag) to Terraform during runtime and excluded later on. Configure the var-values.tfvars file as follows:


Configure the file#

The following Terraform script initializes both Aiven for Apache Kafka and Aiven for ClickHouse services, creates the service integration, the source Apache Kafka topic, and the Aiven for ClickHouse database.

resource "aiven_kafka" "kafka" {
  project                 = var.project_name
  cloud_name              = "google-europe-west1"
  plan                    = "business-4"
  service_name            = "kafka-gcp-eu"
  maintenance_window_dow  = "monday"
  maintenance_window_time = "10:00:00"

resource "aiven_kafka_topic" "source" {
  project      = var.project_name
  service_name = aiven_kafka.kafka.service_name
  partitions   = 50
  replication  = 3
  topic_name   = "iot_measurements"

resource "aiven_clickhouse" "clickhouse" {
  project                 = var.project_name
  cloud_name              = "google-europe-west1"
  plan                    = "startup-8"
  service_name            = "clickhouse-gcp-eu"
  maintenance_window_dow  = "monday"
  maintenance_window_time = "10:00:00"

resource "aiven_service_integration" "clickhouse_kafka_source" {
  project                  = var.project_name
  integration_type         = "clickhouse_kafka"
  source_service_name      = aiven_kafka.kafka.service_name
  destination_service_name = aiven_clickhouse.clickhouse.service_name
  clickhouse_kafka_user_config {
    tables {
            name        = "edge_measurements_raw"
            group_name  = "clickhouse-ingestion"
            data_format = "JSONEachRow"
            columns {
              name = "sensor_id"
              type = "UInt64"
            columns {
              name = "ts"
              type = "DateTime64(6)"
            columns {
              name = "key"
              type = "LowCardinality(String)"
            columns {
              name = "value"
              type = "Float64"
            topics {
              name = aiven_kafka_topic.source.topic_name

resource "aiven_clickhouse_database" "measurements" {
  project                 = var.project_name
  service_name            = aiven_clickhouse.clickhouse.service_name
  name                    = "iot_measurements"

Execute the Terraform files#

Expand to check out how to execute the Terraform files.
  1. Run the following command:

terraform init

The init command performs initialization operations to prepare the working directory for use with Terraform. For this recipe, init automatically finds, downloads, and installs the necessary Aiven Terraform Provider plugins.

  1. Run the following command:

terraform plan -var-file=var-values.tfvars

The plan command creates an execution plan and shows the resources to be created (or modified). This command doesn’t actually create any resources but gives you a heads-up on what’s going to happen next.

  1. If the output of terraform plan looks as expected, run the following command:

terraform apply -var-file=var-values.tfvars

The terraform apply command creates (or modifies) your infrastructure resources.

Check out the results#

  • Resource aiven_clickhouse creates an Aiven for ClickHouse service with the project name, the cloud name (provider, region, zone), the Aiven service plan, and the service name as specified in the file.

  • Resource aiven_clickhouse_database creates a database that can be used to further transform the ingested data and perform analytics on it.

  • Resource aiven_kafka creates an Aiven for Apache Kafka cluster.

  • Resource aiven_kafka_topic creates Apache Kafka topic iot_measurements.

  • Resource aiven_service_integration creates the integration between the Aiven for Apache Kafka and the Aiven for ClickHouse service.

The service integration creates a database to insert the ingested data to. In this instance, the database name is service_kafka-gcp-eu (it depends on the Kafka service name) and the table name is edge_measurements_raw as specified in the code.

Learn more#

When you use this recipe, parameters and configurations will vary from those used in this article. For Aiven for Apache Kafka and Aiven for ClickHouse advanced parameters, a related blog, and instructions on how to get started with Aiven Terraform Provider, see Set up your first Aiven Terraform project.

Follow up#