Skip to main content

Block Data Delivery

For clients who have purchased block row-level data, Carbon Arc provides two access methods:

  1. Iceberg REST Catalog — Query data directly using industry-standard Iceberg table format (Recommended)
  2. Amazon S3 — Direct file access via AWS S3 buckets (Legacy)

Both methods provide access to the same underlying data. We recommend using Polaris for new integrations as it provides a modern, query-ready interface without the need to manage file ingestion pipelines.


Overview

Carbon Arc provides access via an Iceberg REST Catalog, allowing you to connect your data platform directly to Carbon Arc's block data warehouse. This is the recommended approach for new clients as it offers:

  • No ETL required — Query tables directly without building ingestion pipelines
  • Always up-to-date — Access the latest data without managing incremental updates
  • Industry standard — Compatible with Snowflake, Databricks, ClickHouse, Spark, Trino, and more
  • Schema evolution — Automatic handling of schema changes

Connection Details

ParameterValue
Catalog URIhttps://bulk.apps.carbonarc.co/api/catalog
Warehousebulk
Auth ScopePRINCIPAL_ROLE:ALL
OAuth Token Endpointhttps://bulk.apps.carbonarc.co/api/catalog/v1/oauth/tokens

Credentials

Your Client ID and Client Secret will be provided via a secure 1Password link after purchase. Keep these credentials secure and do not share them.

Platform Connection Guides

Select your data platform below for specific connection instructions:

Snowflake Integration

Step 1: Create Catalog Integration

CREATE OR REPLACE CATALOG INTEGRATION carbon_arc
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://bulk.apps.carbonarc.co/api/catalog'
WAREHOUSE = 'bulk'
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<your_client_id>'
OAUTH_CLIENT_SECRET = '<your_client_secret>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;

Step 2: Create Linked Database

CREATE DATABASE carc
LINKED_CATALOG = (
CATALOG = 'carbon_arc'
);

Step 3: Query Data

Once connected, you can query tables directly:

SELECT * FROM carc.sloth.app_performance_data_daily LIMIT 100;
note

Replace <your_client_id> and <your_client_secret> with the credentials provided via 1Password.


Amazon S3 — Legacy

Legacy Access Method

S3 file delivery is maintained for existing integrations. For new implementations, we recommend using Polaris instead.

Overview

Block data is delivered to dedicated S3 buckets with a standardized folder structure. Your AWS IAM user or role is granted read-only access to the bucket containing your purchased data assets.

Bucket Access

After purchase, you'll receive:

  • Bucket ARN: The S3 bucket location (e.g., arn:aws:s3:::carc-ext-{dataset})
  • IAM Access: Your AWS principal is granted read access to the bucket

Delivery Structure

Data is organized into two delivery patterns:

Incremental Updates

For ongoing data updates, we follow a standardized incremental delivery pattern:

AttributeValue
Path Structure{drop_date}/Incremental/[data_files]
ContentContains only new records received from vendor

Example path:

s3://carc-ext-sloth/20260203/Incremental/sloth_app_performance_data_daily/

Full Reinstatement Deliveries

Complete data reinstatements are delivered when upstream data is updated:

AttributeValue
Path Structure{drop_date}/Full/[data_files]
ContentComplete data asset including all historical data ingested to date

Example paths:

s3://carc-ext-sloth/20260129/Full/sloth_app_performance_data_daily/
s3://carc-ext-sloth/20260129/Full/sloth_app_performance_data_monthly/

When is Data Reinstated?

We generally reinstate data on the first Monday of the month if there are changes to the ontology or significant upstream data corrections.

Data Consumption Guidelines

Recommended Approach:

  1. Start with Full Reinstatement — Always consume the most recent Full reinstatement as your baseline
  2. Append Incrementals — Apply the Incremental deliveries that occurred after the latest Full refresh
  3. Re-ingest on New Full — When a new Full reinstatement is available, delete your existing ingested data and re-ingest the complete Full reinstatement
Best Practice

Monitor the S3 bucket for new Full directories. When one appears, schedule a complete re-ingestion to ensure data consistency.


Available Datasets

The specific tables and feeds available depend on your purchased data package.

info

Contact your Carbon Arc representative for the complete schema documentation for your purchased data assets.


Support

For questions about block data access or connection issues: