Skip to main content

Library API

The Library provides programmatic access to explore all available data assets in the Carbon Arc platform. Use these APIs to discover data assets, understand their structure, preview sample data, and track changes.


Available Methods

MethodDescription
client.data.get_datasets()List all available datasets (paginated)
client.data.get_dataset_information()Get details for a specific dataset
client.data.get_data_dictionary()Get column definitions and metadata
client.data.get_data_sample()Preview sample data rows
client.data.get_library_version_changes()Check for updates and changes

Quick Start

from carbonarc import CarbonArcClient

# Initialize the client
client = CarbonArcClient(
host="https://api.carbonarc.co",
token="YOUR_API_TOKEN"
)

List All Datasets

Retrieve a paginated list of all available data assets in the Carbon Arc library.

response = client.data.get_datasets()

Response Structure

{
"page": 1,
"size": 25,
"total_pages": 3,
"datasources": [
{
"dataset_id": ["CA0056"],
"dataset_name": "Credit Card – US Complete Panel",
"description": "US credit card transaction data...",
"provider_name": "Facteus",
"last_updated_timestamp": "2026-02-18 21:45:08",
"blocked": false,
"is_current": true,
"data": {
"Topics": ["Core Panel", "by Payment Method"],
"Key Metrics": ["Credit Card Spend", "Credit Card Transactions", "Credit Card Users"],
"Coverage": {"Product Brands": "2k+", "Retailers": "1.6k+"}
}
}
]
}

Example: Display All Datasets

import pandas as pd

response = client.data.get_datasets()

# Extract datasets from response
datasets = response.get('datasources', [])

print(f"Total Datasets: {len(datasets)}")
print(f"Page {response.get('page')} of {response.get('total_pages')}")

# Create a summary DataFrame
df = pd.DataFrame([{
'dataset_id': ds.get('dataset_id', ['N/A'])[0] if isinstance(ds.get('dataset_id'), list) else ds.get('dataset_id'),
'dataset_name': ds.get('dataset_name'),
'description': ds.get('description', '')[:80] + '...'
} for ds in datasets])

print(df.to_string(index=False))

Get Dataset Information

Retrieve detailed information about a specific data asset, including available topics.

dataset_info = client.data.get_dataset_information(dataset_id="CA0056")

Parameters

ParameterTypeRequiredDescription
dataset_idstringYesThe unique dataset identifier (e.g., "CA0056")

Response Structure

{
"dataset_id": ["CA0056"],
"dataset_name": "Credit Card – US Complete Panel",
"description": "US credit card transaction data from 9 provider sources...",
"entity_topics": [
{
"entity_topic_id": 1,
"entity_topic_label": "Core Panel"
},
{
"entity_topic_id": 2,
"entity_topic_label": "by Payment Method"
}
]
}

Example: Explore Dataset Details

dataset_info = client.data.get_dataset_information(dataset_id="CA0056")

# Handle dataset_id that might be returned as a list
ds_id = dataset_info.get('dataset_id', 'N/A')
if isinstance(ds_id, list):
ds_id = ds_id[0] if ds_id else 'N/A'

print(f"Dataset ID: {ds_id}")
print(f"Name: {dataset_info.get('dataset_name', 'N/A')}")
print(f"Description: {dataset_info.get('description', 'N/A')}")

# List available topics
topics = dataset_info.get('entity_topics', dataset_info.get('topics', []))
if topics:
print(f"\nAvailable Topics ({len(topics)}):")
for topic in topics:
topic_id = topic.get('entity_topic_id', topic.get('id', 'N/A'))
topic_name = topic.get('entity_topic_label', topic.get('name', 'N/A'))
print(f" • {topic_name} (entity_topic_id: {topic_id})")
tip

The entity_topic_id values returned here can be used to filter results in get_data_dictionary() and get_data_sample().


Get Data Dictionary

Retrieve column definitions and metadata for a data asset. Optionally filter by topic.

# Get dictionary for entire dataset
data_dict = client.data.get_data_dictionary(dataset_id="CA0056")

# Filter by specific topic
data_dict = client.data.get_data_dictionary(
dataset_id="CA0056",
entity_topic_id=1
)

Parameters

ParameterTypeRequiredDescription
dataset_idstringYesThe unique dataset identifier
entity_topic_idintegerNoFilter to a specific topic (from get_dataset_information())

Response Structure

The response is typically a list of column definitions:

[
{
"column_name": "date",
"data_type": "DATE",
"description": "Transaction date"
},
{
"column_name": "brand_name",
"data_type": "STRING",
"description": "Name of the brand"
},
{
"column_name": "spend",
"data_type": "FLOAT",
"description": "Total dollar value of purchases"
}
]

Example: Display Column Definitions

import pandas as pd

data_dict = client.data.get_data_dictionary(dataset_id="CA0056")

if isinstance(data_dict, list) and data_dict:
df = pd.DataFrame(data_dict)
print(f"Total Columns: {len(df)}")

# Display key columns
display_cols = ['column_name', 'data_type', 'description']
available = [c for c in display_cols if c in df.columns]
if available:
print(df[available].to_string(index=False))

Get Data Sample

Preview sample data rows from a data asset. Useful for understanding data structure before building frameworks.

# Get sample for entire dataset
sample = client.data.get_data_sample(dataset_id="CA0056")

# Filter by specific topic
sample = client.data.get_data_sample(
dataset_id="CA0056",
entity_topic_id=1
)

Parameters

ParameterTypeRequiredDescription
dataset_idstringYesThe unique dataset identifier
entity_topic_idintegerNoFilter to a specific topic

Response Structure

{
"dataset_id": "CA0056",
"samples": [
{
"date": "2025-11-04",
"brand_id": 56290,
"brand_name": "Under Armour",
"company_id": 65116,
"company_name": "Under Armour, Inc",
"ticker_id": 1234,
"ticker_name": "UAA",
"spend": 125.50,
"transactions": 3,
"users": 2
}
]
}

Example: Preview Sample Data

import pandas as pd

sample_response = client.data.get_data_sample(dataset_id="CA0056")

# Extract samples from nested response
if isinstance(sample_response, dict) and 'samples' in sample_response:
samples = sample_response['samples']

if isinstance(samples, list) and samples:
df = pd.DataFrame(samples)
print(f"Sample Size: {len(df)} rows")
print(f"Columns: {list(df.columns)}")
print(df.head(10))

Check Library Version Changes

Track updates and changes to the library. Useful for monitoring when entities or data assets are added, modified, or deprecated.

# Get latest version changes
changes = client.data.get_library_version_changes(version="latest")

# With filters and pagination
changes = client.data.get_library_version_changes(
version="latest",
dataset_id="CA0056", # Optional: filter to specific dataset
page=1,
size=100
)

Parameters

ParameterTypeRequiredDescription
versionstringYesVersion to check (e.g., "latest", "2026.1.1")
dataset_idstringNoFilter to a specific dataset
topic_idintegerNoFilter to a specific topic
entity_representationstringNoFilter by entity representation (e.g., "company", "ticker")
pageintegerNoPage number for pagination
sizeintegerNoNumber of results per page
orderstringNoSort direction ("asc" or "desc")

Response Structure

{
"total": 20278,
"page": 1,
"size": 25,
"pages": 812,
"entities": [
{
"entity_id": "12345",
"entity_name": "Taylor Swift",
"entity_representation_name": "entertainer",
"entity_topic_id": 1,
"entity_topic_label": "Core Panel",
"dataset_id": "CA0010",
"status": "Entity Added",
"prev_ontology_version": "2026.1.1",
"current_ontology_version": "2026.1.2",
"version_release_date": "2026-01-29T22:36:26"
}
]
}

Example: Display Recent Changes

changes = client.data.get_library_version_changes(version="latest")

total = changes.get('total', 0)
page = changes.get('page', 1)
pages = changes.get('pages', 1)

print(f"Total Changes: {total}")
print(f"Page {page} of {pages}")

entities = changes.get('entities', [])
if entities:
print(f"\n{'Status':<20} | {'Entity Name':<30} | {'Dataset':<10} | {'Topic'}")
print("-" * 80)

for item in entities[:15]:
status = item.get('status', 'N/A')
entity_name = item.get('entity_name', 'N/A')[:30]
dataset_id = item.get('dataset_id', 'N/A')
topic = item.get('entity_topic_label', '')[:25]
print(f"{status:<20} | {entity_name:<30} | {dataset_id:<10} | {topic}")

# Show version info
if 'current_ontology_version' in entities[0]:
print(f"\nCurrent Version: {entities[0].get('current_ontology_version')}")
print(f"Release Date: {entities[0].get('version_release_date', 'N/A')}")

Complete Workflow Example

Here's a complete workflow demonstrating how to explore a data asset from discovery to data preview:

import pandas as pd
from carbonarc import CarbonArcClient

# Initialize client
client = CarbonArcClient(
host="https://api.carbonarc.co",
token="YOUR_API_TOKEN"
)

TARGET_DATASET_ID = "CA0056"

# ─────────────────────────────────────────────────────────────────
# Step 1: Get dataset information and available topics
# ─────────────────────────────────────────────────────────────────
print("STEP 1: Dataset Information")
info = client.data.get_dataset_information(dataset_id=TARGET_DATASET_ID)

print(f"Name: {info.get('dataset_name')}")
print(f"Description: {info.get('description')[:100]}...")

topics = info.get('entity_topics', info.get('topics', []))
print(f"\nAvailable Topics ({len(topics)}):")
for topic in topics:
print(f" • {topic.get('entity_topic_label')} (ID: {topic.get('entity_topic_id')})")

# ─────────────────────────────────────────────────────────────────
# Step 2: Get data dictionary to understand columns
# ─────────────────────────────────────────────────────────────────
print("\nSTEP 2: Data Dictionary")

# Use first topic if available
if topics:
first_topic_id = topics[0].get('entity_topic_id')
data_dict = client.data.get_data_dictionary(
dataset_id=TARGET_DATASET_ID,
entity_topic_id=first_topic_id
)
else:
data_dict = client.data.get_data_dictionary(dataset_id=TARGET_DATASET_ID)

if isinstance(data_dict, list):
print(f"Columns found: {len(data_dict)}")
for col in data_dict[:10]:
print(f" • {col.get('column_name')}: {col.get('data_type')}")

# ─────────────────────────────────────────────────────────────────
# Step 3: Preview sample data
# ─────────────────────────────────────────────────────────────────
print("\nSTEP 3: Sample Data Preview")

sample = client.data.get_data_sample(dataset_id=TARGET_DATASET_ID)

if isinstance(sample, dict) and 'samples' in sample:
samples = sample['samples']
if isinstance(samples, list) and samples:
df = pd.DataFrame(samples)
print(f"Sample rows: {len(df)}")
print(f"Columns: {list(df.columns)[:8]}...")
print(df.head(5))

print("\nExploration complete!")

Common Dataset IDs

Here are some commonly used dataset IDs for reference. To get the full list of all available data assets, use client.data.get_datasets().

Dataset IDNameType
CA0056Credit Card – US Complete PanelWallet
CA0028Credit Card – US Detailed PanelWallet
CA0029POS - Convenience StoresWallet
CA0034POS - Instore and OnlineWallet
CA0030ClickstreamAttention
CA0013Mobile AppAttention
CA0054App IntelligenceAttention
CA009Digital AdvertisingAttention
CA0049Medical & Pharmacy Open ClaimsBalance Sheet
CA0041Medicare Claims & Commercial Price TransparencyBalance Sheet
CA0040Trade ClaimsLogistics
CA0025Freight Volume - North AmericaLogistics

Error Handling

Always wrap API calls in try-except blocks to handle potential errors gracefully:

try:
dataset_info = client.data.get_dataset_information(dataset_id="CA0056")
print(f"Dataset: {dataset_info.get('dataset_name')}")
except Exception as e:
print(f"Error retrieving dataset information: {e}")

Next Steps

Once you've explored the Library and identified the data assets you need:

  1. Use the Ontology API to search for specific entities within data assets
  2. Build Frameworks to query and purchase data
  3. Set up Scheduled Deliveries for recurring data needs