Block Data Delivery

For clients who have purchased block row-level data, Carbon Arc provides two access methods:

Iceberg REST Catalog — Query data directly using industry-standard Iceberg table format (Recommended)
Amazon S3 — Direct file access via AWS S3 buckets (Legacy)

Both methods provide access to the same underlying data. We recommend using Polaris for new integrations as it provides a modern, query-ready interface without the need to manage file ingestion pipelines.

Manage Block access from the SDK

The Carbon Arc Python SDK can discover Block datasets, request trial access, track the approval lifecycle, and register S3 ARNs programmatically. See Block for Devs.

Iceberg REST Catalog (Polaris) — Recommended

Overview

Carbon Arc provides access via an Iceberg REST Catalog, allowing you to connect your data platform directly to Carbon Arc's block data warehouse. This is the recommended approach for new clients as it offers:

No ETL required — Query tables directly without building ingestion pipelines
Always up-to-date — Access the latest data without managing incremental updates
Industry standard — Compatible with Snowflake, Databricks, ClickHouse, Spark, Trino, and more
Schema evolution — Automatic handling of schema changes

Connection Details

Parameter	Value
Catalog URI	`https://bulk.apps.carbonarc.co/api/catalog`
Warehouse	`bulk`
Auth Scope	`PRINCIPAL_ROLE:ALL`
OAuth Token Endpoint	`https://bulk.apps.carbonarc.co/api/catalog/v1/oauth/tokens`

Credentials

Your Client ID and Client Secret will be provided via a secure 1Password link after purchase. Keep these credentials secure and do not share them.

Platform Connection Guides

Select your data platform below for specific connection instructions:

Snowflake
ClickHouse
Databricks
Apache Spark
Trino / Starburst

Snowflake Integration

Step 1: Create Catalog Integration

CREATE OR REPLACE CATALOG INTEGRATION carbon_arc
  CATALOG_SOURCE = POLARIS
  TABLE_FORMAT = ICEBERG
  REST_CONFIG = (
    CATALOG_URI = 'https://bulk.apps.carbonarc.co/api/catalog'
    WAREHOUSE = 'bulk'
    ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
  )
  REST_AUTHENTICATION = (
    TYPE = OAUTH
    OAUTH_CLIENT_ID = '<your_client_id>'
    OAUTH_CLIENT_SECRET = '<your_client_secret>'
    OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
  )
  ENABLED = TRUE;

Step 2: Create Linked Database

CREATE DATABASE carc
  LINKED_CATALOG = (
    CATALOG = 'carbon_arc'
  );

Step 3: Query Data

Once connected, you can query tables directly:

SELECT * FROM carc.sloth.app_performance_data_daily LIMIT 100;

note

Replace <your_client_id> and <your_client_secret> with the credentials provided via 1Password.

ClickHouse Integration

Step 1: Enable Experimental Feature

SET allow_experimental_database_iceberg = 1;

Step 2: Create Database Connection

CREATE DATABASE carc
ENGINE = DataLakeCatalog('https://bulk.apps.carbonarc.co/api/catalog')
SETTINGS
  catalog_type = 'rest',
  catalog_credential = '<your_client_id>:<your_client_secret>',
  warehouse = 'bulk',
  auth_scope = 'PRINCIPAL_ROLE:ALL',
  oauth_server_uri = 'https://bulk.apps.carbonarc.co/api/catalog/v1/oauth/tokens';

Step 3: Query Data

SELECT * FROM carc.sloth.app_performance_data_daily LIMIT 100;

note

Replace <your_client_id> and <your_client_secret> with the credentials provided via 1Password.

Tracking Data Updates with Changelog Tables

Every client-facing data table has a companion changelog table in the same namespace, named {table_name}_changelog. Carbon Arc writes a row to the changelog each time a new partition is written to the data table, so you can drive incremental ingestion from a single audit stream instead of scanning the data table itself.

If you have access to a data table, you automatically have access to its changelog — no additional setup is required.

Schema

Column	Type	Description
`update_id`	`STRING`	Unique identifier for the update event
`event_timestamp`	`TIMESTAMP`	UTC timestamp when the partition was written
`action`	`STRING`	`FULL_REFRESH` (reinstatement) or `INCREMENTAL` (daily drop)
`drop_partition`	`STRING`	The `drop_partition` value written to the data table
`dt`	`DATE`	Date partition column — always include in filters for efficient querying

Example Changelog Tables

Data Table	Changelog Table
`dalmatian.clickstream_data`	`dalmatian.clickstream_data_changelog`
`sloth.app_performance_data_daily`	`sloth.app_performance_data_daily_changelog`

Recommended Ingestion Workflow

Persist a cursor — track the maximum event_timestamp you have processed so far.
Poll the changelog — on each run, read new rows where event_timestamp > <cursor>, filtered on dt for partition pruning.
Handle INCREMENTAL rows — re-read only the listed drop_partition values from the data table and merge them into your downstream store.
Handle FULL_REFRESH rows — the upstream vendor data was fully reinstated. Truncate your local copy of the table and re-ingest it from scratch.
Advance the cursor — persist the new maximum event_timestamp once ingestion succeeds.

Example Queries

Fetch all updates since your cursor:

SELECT update_id, event_timestamp, action, drop_partition
FROM carc.sloth.app_performance_data_daily_changelog
WHERE dt >= DATE '2026-04-01'
  AND event_timestamp > TIMESTAMP '2026-04-14 00:00:00'
ORDER BY event_timestamp;

Find the most recent full refresh (use this as a hard reset point):

SELECT MAX(event_timestamp) AS last_full_refresh
FROM carc.sloth.app_performance_data_daily_changelog
WHERE dt >= DATE '2026-01-01'
  AND action = 'FULL_REFRESH';

List every incremental partition written since the last full refresh:

SELECT drop_partition, MIN(event_timestamp) AS written_at
FROM carc.sloth.app_performance_data_daily_changelog
WHERE dt >= DATE '2026-01-01'
  AND action = 'INCREMENTAL'
  AND event_timestamp > (
    SELECT MAX(event_timestamp)
    FROM carc.sloth.app_performance_data_daily_changelog
    WHERE dt >= DATE '2026-01-01'
      AND action = 'FULL_REFRESH'
  )
GROUP BY drop_partition;

tip

A single update event can produce multiple changelog rows — one per drop_partition written. Group by update_id if you need to collapse them back into a single event.

Amazon S3 — Legacy

Legacy Access Method

S3 file delivery is maintained for existing integrations. For new implementations, we recommend using Polaris instead.

Overview

Block data is delivered to dedicated S3 buckets with a standardized folder structure. Your AWS IAM user or role is granted read-only access to the bucket containing your purchased data assets.

Bucket Access

After purchase, you'll receive:

Bucket ARN: The S3 bucket location (e.g., arn:aws:s3:::carc-ext-{dataset})
IAM Access: Your AWS principal is granted read access to the bucket

Delivery Structure

Data is organized into two delivery patterns:

Incremental Updates

For ongoing data updates, we follow a standardized incremental delivery pattern:

Attribute	Value
Path Structure	`{drop_date}/Incremental/[data_files]`
Content	Contains only new records received from vendor

Example path:

s3://carc-ext-sloth/20260203/Incremental/sloth_app_performance_data_daily/

Full Reinstatement Deliveries

Complete data reinstatements are delivered when upstream data is updated:

Attribute	Value
Path Structure	`{drop_date}/Full/[data_files]`
Content	Complete data asset including all historical data ingested to date

Example paths:

s3://carc-ext-sloth/20260129/Full/sloth_app_performance_data_daily/
s3://carc-ext-sloth/20260129/Full/sloth_app_performance_data_monthly/

When is Data Reinstated?

We generally reinstate data on the first Monday of the month if there are changes to the ontology or significant upstream data corrections.

Data Consumption Guidelines

Recommended Approach:

Start with Full Reinstatement — Always consume the most recent Full reinstatement as your baseline
Append Incrementals — Apply the Incremental deliveries that occurred after the latest Full refresh
Re-ingest on New Full — When a new Full reinstatement is available, delete your existing ingested data and re-ingest the complete Full reinstatement

Best Practice

Monitor the S3 bucket for new Full directories. When one appears, schedule a complete re-ingestion to ensure data consistency.

Available Datasets

The specific tables and feeds available depend on your purchased data package.

info

Contact your Carbon Arc representative for the complete schema documentation for your purchased data assets.

Support

For questions about block data access or connection issues:

Email: support@carbonarc.ai

Block Data Delivery

Iceberg REST Catalog (Polaris) — Recommended

Overview

Connection Details

Credentials

Platform Connection Guides

Snowflake Integration

ClickHouse Integration

Databricks Integration

Apache Spark Integration

Trino / Starburst Integration

Tracking Data Updates with Changelog Tables

Schema

Example Changelog Tables

Recommended Ingestion Workflow

Example Queries

Amazon S3 — Legacy

Overview

Bucket Access

Delivery Structure

Incremental Updates

Full Reinstatement Deliveries

When is Data Reinstated?

Data Consumption Guidelines

Available Datasets

Support