The B2B Intent Data Warehouse Architecture Playbook 2026

By Dale Brett, Founder & CEO, FL0. April 2026.

Most B2B intent-data architecture discussion in 2026 still treats the data warehouse as a destination. That is the part most commonly wrong. In the warehouse-native programs that actually close feedback loops between signal and sales action, the warehouse is not where intent data lands, it is where intent data is produced. The distinction matters because it changes almost every downstream decision: how you model events, where identity resolution runs, which vendor fits which slot, how latency is budgeted, and what governance actually looks like when a Gong-meeting signal and a docs-page view need to meet in the same scoring model six seconds later. This playbook is the reference we wish we had when we rebuilt FL0's own warehouse-native intent stack in early 2026, and it traces every architectural claim to a primary source from the vendor or an IEEE/VLDB-class paper where one exists.

Methodology

This playbook covers the architectural substrate layer (Snowflake, Databricks, BigQuery, Redshift), the transformation and modeling layer (dbt, SQL patterns, medallion lakehouse), the ingestion layer (Fivetran, Airbyte, Segment-style event streams), the activation layer (reverse ETL via Hightouch, Census, Polytomic, RudderStack), the identity-resolution layer, the observability layer (Monte Carlo, Elementary), and the vendor data-share patterns (Snowflake Secure Data Sharing, Delta Sharing, BigQuery Analytics Hub, AWS Data Exchange). Every factual claim is traced to a primary source, a vendor documentation page, a peer-reviewed paper, or the vendor's own published material. Vendor-self-reported performance numbers are labeled inline as vendor-published. Forrester productivity stats, generic survey numbers, pricing, headcount, funding totals, and G2 review counts are omitted. Anything we found in research but could not verify against a primary source was dropped rather than hedged, and every URL cited below returned HTTP 200 on the day of publication. The goal is a buyer's reference that is defensible in 2027, not a link farm.

What a warehouse-native intent architecture actually is

A warehouse-native intent architecture is one where the cloud data warehouse is the system of record for behavioral, firmographic, and product-usage signals, and where all identity resolution, scoring, and segmentation happens inside the warehouse before the result is pushed back out to operational tools. Snowflake documents the warehouse itself as a fully managed service that separates storage, compute, and cloud services layers so analytic workloads do not contend with ingestion (Snowflake). Google positions BigQuery similarly as a fully managed, serverless analytics platform (Google Cloud). Databricks refers to the same design point differently, calling it a lakehouse, an architecture that combines the low-cost object-store substrate of a data lake with the transactional semantics and schema enforcement of a warehouse (Databricks). The original lakehouse design paper was published at CIDR 2021 and is still the cleanest academic statement of why this substrate exists (CIDR). Amazon Redshift is the fourth substrate in common use, with RA3 cross-cluster data sharing enabled across accounts without data copy (AWS). The warehouse-native pattern treats any of these four as interchangeable at the substrate layer. The vendor choice is real, but the architectural pattern above the substrate is portable.

At FL0 we use a warehouse-native pattern to produce real-time buyer-intent signals for outbound teams, and almost every design decision we have reversed in the last year has been a decision that tried to move work out of the warehouse and into a packaged tool.

Why the warehouse, not the CDP, is the natural home for B2B intent

The traditional customer data platform emerged in B2C around 2013, and its core assertion was that a vendor-owned profile store was the correct home for customer data. In B2B in 2026, that assertion is weak. RudderStack's own documentation lays out the composable, warehouse-native pattern explicitly, where ingestion, identity, and activation are unbundled across tools that share the warehouse as a single source of truth (RudderStack). Census frames the same argument from the other direction, positioning the warehouse as the profile store and reverse ETL as the activation bus (Census). Hightouch calls the pattern "warehouse-native" in its own materials and documents the reverse-ETL primitive at length (Hightouch). The practical consequence is that in 2026 a team can pick a best-of-breed ingestion tool (Fivetran or Airbyte), a best-of-breed transformation layer (dbt), and a best-of-breed activation tool (Hightouch, Census, Polytomic, or RudderStack in warehouse-first mode) without owning a traditional CDP at all.

Segment, the canonical first-generation CDP, was acquired by Twilio in a deal valued around $3.2 billion and now ships as Twilio Segment (Diginomica). mParticle still ships a traditional profile-store CDP (mParticle). Both remain valid choices when the team has deep mobile-SDK needs or a tightly coupled ad-tech workflow that benefits from a vendor-owned profile. For most B2B intent programs we see in 2026, the warehouse does the profile work, and the former CDP becomes a thin event-ingestion layer at most.

The substrate layer: Snowflake, Databricks, BigQuery, Redshift

Snowflake's core architectural claim is that storage and compute are decoupled and elastically scalable per workload, which is why virtual warehouses can be sized per query (Snowflake). For intent workloads this matters because a rolling-window scoring model runs on a different shape of compute than a nightly identity-graph rebuild, and Snowflake lets you bill those independently. Snowflake's resource monitors are the canonical pattern for capping runaway cost on either workload (Snowflake). Warehouse sizing guidance from the vendor is that you scale out for concurrency, up for complexity, and separate warehouses by workload rather than by team (Snowflake).

Databricks frames the same substrate as the lakehouse, with Delta Lake providing ACID transactions, time travel, and schema enforcement on object storage (Databricks). The medallion architecture pattern (bronze-silver-gold) is the canonical Databricks recommendation for progressively cleaning and modeling data, and is directly applicable to intent pipelines where raw clickstream lands in bronze, identity-stitched events land in silver, and scored account-level signals land in gold (Databricks, Databricks Docs).

BigQuery's distinguishing architectural trait is that it is serverless; Google positions it as a platform where you pay per query or per slot reservation and never provision cluster size yourself (Google Cloud). For intent teams on Google Cloud, the practical implication is that cost control happens via slot reservations and query plans, not cluster sizing (Google Cloud).

Redshift is the AWS-native substrate, with RA3 nodes providing managed storage separation and data sharing working across clusters without data movement (AWS). For teams already heavily invested in the AWS stack, Redshift is the lowest-friction substrate for a warehouse-native intent program.

Event schema design: what a good intent event actually looks like

The hardest thing to get right in a warehouse-native intent architecture is the event schema, because every scoring model, segment, and activation is downstream of it. Two schools of thought dominate. The narrow-wide pattern stores one row per event with dozens of typed columns and a small number of event types, which is cheap to query and easy to index but expensive to evolve. The entity-attribute-value pattern stores one row per event-property pair, which is easy to evolve but expensive to query because every query joins back to the same long table. Segment's track/identify/group spec is the canonical B2B event model and lands in the middle, using a typed schema with a flexible properties JSON payload, which maps cleanly onto Snowflake VARIANT or BigQuery STRUCT types (Segment). Amplitude publishes a similar pattern under the "optimal event tracking" heading that explicitly recommends fewer, higher-signal event types with rich property payloads (Amplitude). When we rebuilt our own schema in early 2026, we moved to a Segment-style spec with VARIANT property columns in Snowflake, and the re-grounding produced more stable downstream scoring than any model tweak we had tried before.

Event governance is the unsolved half of the schema problem. Iteratively, now part of Amplitude, documented the tracking-plan pattern where event schemas are defined and versioned in source control before events are emitted, and the same pattern is now reflected across Amplitude's own tracking-plan tooling (Amplitude). For intent programs where a single schema drift can poison scoring for a week, event governance is not optional.

Identity resolution: deterministic, probabilistic, and where to run it

Identity resolution is the step that turns a stream of anonymous events into account-level intent. Deterministic resolution joins on known identifiers: email, domain from a business email, known Salesforce account ID, or a cookie previously tied to a known email. Probabilistic resolution infers a match from device, IP, timezone, and behavioral fingerprint when no deterministic key is present. Most serious B2B intent stacks in 2026 use deterministic-first, probabilistic-only-when-consented. The canonical reference for UUID-based identifier design is RFC 9562, which formalizes UUIDv7 with a monotonic time-ordered component ideal for event keys (IETF).

Running identity resolution inside the warehouse, rather than inside a packaged CDP, is the dominant warehouse-native pattern. dbt snapshots are the idiomatic way to capture slowly-changing dimensions like account identity over time, preserving history in a type-2 shape that lets you reason about when a contact joined a known account (dbt, Wikipedia). Surrogate keys are the dbt-native primitive for the stitched-identity column itself (dbt), and idempotent transformations are the non-negotiable property of any identity pipeline that must be safe to re-run after a schema change (dbt).

Time-series patterns: deltas, rolling windows, and decay

Intent is a time-series phenomenon, and the warehouse pattern that matters most is incremental modeling. dbt's incremental materialization is the canonical primitive, processing only new or changed rows on each run rather than rebuilding the full table (dbt), and the choice of incremental strategy (append, merge, insert_overwrite) determines whether late-arriving events are handled correctly (dbt). Rolling-window scoring, where an account's intent score decays exponentially over 14 or 30 days, lives naturally on top of these incremental tables. Snowflake's performance guidance on query pruning is directly relevant here: clustering on event_date lets the rolling window skip partitions outside the window rather than scan them (Snowflake).

The transformation layer: why dbt is the de facto standard

dbt is the analytics-engineering layer on top of the warehouse, and it is now the de facto industry pattern for modeling intent data. The vendor defines dbt as a SQL-first transformation workflow that brings software-engineering practices (tests, version control, documentation, modularity) to analytics code (dbt, dbt). The four dbt primitives that show up in every intent pipeline are models (SQL transformations compiled to warehouse DDL) (dbt), sources (declared warehouse inputs with freshness checks) (dbt), snapshots (type-2 history for slowly-changing dimensions) (dbt), and data tests (assertions run against built models) (dbt). dbt's own "how we structure our dbt projects" guide is the reference staging-intermediate-marts layout that every serious intent stack we have seen in 2026 uses (dbt).

The packages system (via the dbt hub) is where reusable identity-resolution logic usually lives (dbt), and the semantic-layer metrics primitive lets the scoring definitions be governed in a single place rather than duplicated across BI and activation tools (dbt). The documentation primitive ships a lineage graph that is the only reliable way to answer "what breaks if I change this column?" in a mature intent pipeline (dbt).

The ingestion layer: Fivetran, Airbyte, and event streams

Ingestion is the one layer where the choice is closer to commoditized. Fivetran and Airbyte are the two dominant connector-catalog vendors, both documented for warehouse-first ingestion (Fivetran, Airbyte). Fivetran's public positioning is fully managed ELT with a large SaaS connector catalog (Fivetran). Airbyte is the open-source alternative, with a published connector-development kit for building custom connectors when the catalog does not cover a niche data source (Airbyte, Airbyte). For event streams, Segment's track/identify specification is still the most widely implemented B2B ingestion contract (Segment), and Twilio Segment remains the default choice when a team wants a managed collector with a warehouse destination (Twilio). RudderStack ships an open-source alternative with a warehouse-first mode, where events are landed directly into the warehouse rather than into a vendor profile store (RudderStack).

Fivetran's 2025 acquisition of Census folded reverse ETL into Fivetran's own catalog, consolidating the ingestion and activation sides under one roof (Fivetran, TechTarget). We track that consolidation but have not let it change our architecture choices, because the underlying warehouse-native primitives are independent of which vendor owns which box.

Streaming ingestion: when batch is not fast enough for intent

Most warehouse-native intent pipelines start batch and stay batch for 80% of the signal volume, because a 15-minute Fivetran pull of Salesforce state is fine for CRM context. The 20% that genuinely needs streaming is the real-time behavioral layer: a high-value account visiting the pricing page should not wait for the next batch cycle to trigger a seller alert. The three streaming patterns that FL0 sees work in the warehouse-native context are Snowflake Snowpipe for continuous micro-batch ingestion (Snowflake, Snowflake), BigQuery's streaming inserts via the Storage Write API (Google Cloud, Google Cloud), and Databricks Auto Loader for event-driven object-store ingestion (Databricks). For CRM and MAP state change, the idiomatic pattern is change-data-capture rather than full-table pulls, via Debezium on Kafka or a managed CDC connector from Fivetran or Airbyte (Confluent, Debezium, Airbyte, Kafka). On AWS, Kinesis Data Firehose is the managed-service counterpart for streaming events directly into S3 and then Redshift (AWS).

The engineering judgment FL0 has landed on is that a pure streaming intent pipeline is rarely worth its operational cost. A hybrid of 5-minute micro-batch on the hot path (pricing-page views, product-qualified events, Gong-meeting signals) plus 15-minute to hourly batch on the warm path (CRM sync, firmographic enrichment) plus nightly batch on the cold path (identity-graph rebuild) is the pattern that has produced the best operational stability for FL0's own internal stack.

The activation layer: reverse ETL as the operational bus

Reverse ETL is the pattern of moving modeled rows out of the warehouse into operational SaaS tools: Salesforce, HubSpot, Marketo, Outreach, Slack, ad platforms. Hightouch documents the primitive as a sync from a warehouse query to a destination object, with idempotent upserts as the correctness property (Hightouch, Hightouch Docs). Census documents the same pattern under the same name (Census, Census Docs). Polytomic ships a similar product with a stronger bias toward operational workflows and bi-directional syncs (Polytomic). RudderStack's warehouse-first mode includes reverse-ETL-style warehouse destinations (RudderStack).

The reason reverse ETL matters for intent specifically is latency budget. A scoring model that updates every five minutes in the warehouse is useless if it takes two hours to reach a seller's Salesforce view. A mature reverse-ETL configuration with sub-minute sync intervals, plus an event-driven webhook path for threshold-crossing alerts, is what closes the loop between signal and seller action.

Vendor data-share patterns: Secure Data Sharing, Delta Sharing, Analytics Hub, Data Exchange

B2B intent data frequently comes from partners, and the warehouse-native way to receive partner data is a no-copy share rather than a file drop. Snowflake's Secure Data Sharing pattern lets a provider grant read access to live data without copying it across accounts, and the receiver queries the provider's storage directly with their own compute (Snowflake, Snowflake Docs, Snowflake). The Snowflake Marketplace is the discovery layer built on top, where providers publish listings and consumers subscribe (Snowflake, Snowflake). The SQL primitive itself is CREATE SHARE (Snowflake).

Databricks ships the same pattern under the name Delta Sharing, and has open-sourced the protocol so non-Databricks receivers can consume it (Databricks, Delta, Databricks Docs). The original launch blog is still the clearest public explainer (Databricks). Unity Catalog is the governance layer that controls who can share what (Databricks, Databricks Docs).

BigQuery's equivalent is Analytics Hub, where publishers expose datasets as listings and subscribers link them directly into their own BigQuery projects (Google Cloud, Google Cloud). AWS Data Exchange is the AWS-native counterpart, brokering both file-based and API-based datasets and documented with a full subscription workflow (AWS, AWS, AWS). Redshift also supports native cross-cluster data sharing without file movement for Redshift-to-Redshift cases (AWS).

Signal sources that actually belong in the warehouse

A warehouse-native intent architecture is only as valuable as the signals flowing into it. The signal sources FL0 sees working hardest in B2B in 2026 fall into four groups, each with a distinct ingestion path. First, owned web and product behavior, collected through an event SDK like Segment's analytics.js or the RudderStack equivalent, landed in the warehouse with a typed schema (Segment, RudderStack). Second, CRM and MAP state, pulled with a managed connector like Fivetran or Airbyte on a 15-minute schedule (Fivetran, Airbyte). Third, partner-shared data, landed via no-copy share using Snowflake Secure Data Sharing, Delta Sharing, Analytics Hub, or AWS Data Exchange depending on the counterparty's substrate (Snowflake, Databricks, Google Cloud, AWS). Fourth, third-party firmographic enrichment, pulled through the same ingestion layer with more aggressive caching because the source data is slow-moving.

The FL0 house rule on signal sources is that every signal must be idempotent on replay, time-stamped in UTC at the edge, and tagged with a consent basis at ingestion time. The first two are engineering properties; the third is a compliance property that is impossible to retrofit. Idempotency in particular is where most intent pipelines break first, and dbt's idempotent-transformation guidance is the canonical remedy (dbt).

How warehouse-native scoring models are actually structured

The reason FL0 runs scoring in the warehouse rather than in a vendor ML product is that the scoring logic needs to be governed, versioned, and tested alongside the rest of the analytics codebase. A typical FL0 scoring model is a dbt incremental SQL model that joins the silver-layer event table against the identity-graph snapshot, applies a weighted rolling-window calculation, and writes to a gold-layer account-score table on a five-minute cadence (dbt, Databricks). The model carries dbt data tests that fail the build if the score distribution drifts more than a documented tolerance against the prior run (dbt), and the Elementary package is configured to page the data team if a freshness SLA is missed (Elementary). The gold table is then the single reverse-ETL source of truth for the Salesforce and HubSpot sync rules, which is how FL0 keeps seller-facing scores and marketing-facing scores from diverging.

The single most common mistake FL0 sees on scoring models in other teams' stacks is scoring everything in one giant monolithic dbt model, which becomes impossible to debug when a seller asks why an account scored the way it did. The canonical fix is to split the scoring into a staging model per signal type, an intermediate model that aligns all signals on the account grain, and a final mart model that combines them, each with its own tests and documentation (dbt). The analytics-engineering discipline that dbt has built the vocabulary for is, at this point, the single most load-bearing skill set on an intent team (dbt).

Observability: the unsexy layer that keeps intent data honest

Data observability is the monitoring layer for the pipeline itself: schema drift, row-count anomalies, freshness misses, distribution shifts. Monte Carlo is the category-defining vendor and publishes the canonical "five pillars of data observability" framing (Monte Carlo). Elementary ships an open-source alternative that runs inside dbt itself, with anomaly tests and an observability UI that reads from the dbt artifacts (Elementary, Elementary, GitHub). For intent programs specifically, observability matters because a 20% drop in identify-event volume on a Tuesday is usually a broken tracker, not a drop in demand, and the scoring model will absolutely score it as the latter unless the observability layer flags it first.

Cost control: the warehouse line item that grows faster than revenue

The most common operational failure of a warehouse-native intent program is not architectural, it is financial. Snowflake's resource-monitor pattern is the baseline cost-control primitive, setting credit quotas per virtual warehouse and suspending on breach (Snowflake). Sizing guidance from Snowflake itself is to scale out for concurrency and up for complexity, not both at once (Snowflake). On BigQuery, the equivalent control is the query plan and slot reservation model (Google Cloud). We have seen more intent-program budgets blown on unbounded rolling-window scoring jobs than on any other category of query, and the fix is almost always a narrower window and a clustered event table rather than a bigger warehouse.

Comparison table: warehouse-native intent architecture components

Sorted alphabetically. Same columns apply to every row. Where a fact is not public, the cell is marked "not public".

Vendor

How FL0 approaches warehouse-native intent architecture

FL0 is an AI revenue engine that produces real-time buyer-intent signals for B2B outbound teams. Our internal architecture uses Snowflake as the substrate, dbt as the transformation layer, Fivetran plus a bespoke event-ingestion path for signal sources, Hightouch for reverse-ETL activation into Salesforce and HubSpot, and Elementary for observability on the dbt layer. Identity resolution is deterministic-first with a documented three-tier fallback. The scoring model runs as a dbt incremental model on a 5-minute schedule, with a separate Snowflake virtual warehouse sized for the workload so that the nightly identity rebuild does not contend with it. We publish all of this because the warehouse-native pattern is an open architecture, and our edge is the intent-signal quality on top of it, not the substrate choice. For teams building a similar stack, our single strongest recommendation is to move identity resolution into the warehouse before anything else. Every other box can be swapped; the identity model is what everything else is joined to.

Limitations

This playbook covers the architectural substrate, not the content of an intent-signal dataset. It is deliberately silent on which signals produce the best scoring lift, because that question has a defensible answer only inside a specific funnel. It is silent on pricing for every vendor discussed because list prices are negotiable and stale within a quarter. It is silent on G2-style review counts, headcount, and funding totals because those move faster than this document can be responsibly updated. The lakehouse-versus-warehouse distinction is presented as substantive because the transactional-semantics layer materially affects intent workloads, but the distinction is narrower than it was five years ago and we expect it to continue narrowing (Databricks). Finally, the vendor consolidation pattern in this category (Fivetran acquiring Census, Twilio acquiring Segment, Amplitude acquiring Iteratively) is likely to continue through 2026 and 2027, and any vendor-specific recommendation in this playbook is a recommendation about the category the vendor currently occupies, not a prediction of independent-company survival.

FAQ

What is warehouse-native intent architecture? An architecture where the cloud data warehouse is the system of record for behavioral and firmographic signals, and identity resolution, scoring, and segmentation all happen inside the warehouse before the result is pushed to operational tools via reverse ETL. The pattern is documented by RudderStack, Census, and Hightouch under names like composable CDP and warehouse-native CDP (RudderStack, Census, Hightouch).

How do I model B2B intent data in Snowflake? The dominant pattern is a staging-intermediate-marts dbt project layered on Snowflake, with raw events landing in a staging schema, identity-stitched events in intermediate models, and scored account-level signals in marts. Slowly-changing account identity is captured with dbt snapshots, scoring runs as an incremental model on a short schedule, and a dedicated Snowflake virtual warehouse is sized per workload (dbt, Snowflake).

Is there a canonical intent-data schema reference? There is no single industry-wide schema, but the closest public reference is Segment's track/identify specification, which most B2B event-ingestion pipelines implement or emulate (Segment). Amplitude's "optimal event tracking" guidance is the closest second on event-property modeling (Amplitude).

What is reverse ETL and why does it matter for intent? Reverse ETL is the pattern of moving modeled rows from the warehouse into operational SaaS tools. It matters for intent because the warehouse is where signals are joined and scored, and reverse ETL is what delivers the scored account list to a seller's CRM view at the latency the seller actually needs (Hightouch, Census).

Which substrate should I pick, Snowflake, Databricks, BigQuery, or Redshift? For most B2B intent programs, the substrate choice follows the existing cloud footprint. Snowflake is the most common neutral choice; Databricks suits teams with lakehouse workloads sharing the substrate; BigQuery suits Google-native stacks; Redshift suits AWS-native stacks. The warehouse-native pattern described in this playbook is portable across all four (Snowflake, Databricks, Google Cloud, AWS).

Do I still need a CDP? For most B2B intent programs in 2026, a traditional vendor-owned profile-store CDP is optional. A warehouse plus ingestion plus reverse ETL plus a thin event collector will cover the same use cases. Segment and mParticle remain defensible choices for mobile-heavy or ad-tech-heavy workflows (Twilio, mParticle).

What is the single most common failure mode in warehouse-native intent? Identity resolution that was never moved fully into the warehouse. When identity is split between a vendor profile store and the warehouse, every scoring model produces two different answers and the team spends its time reconciling rather than selling.

Sources

Snowflake, About Snowflake, https://www.snowflake.com/en/company/overview/about-snowflake/
Google Cloud, BigQuery documentation introduction, https://cloud.google.com/bigquery/docs/introduction
Databricks, What is a data lakehouse, https://www.databricks.com/glossary/data-lakehouse
CIDR 2021, Lakehouse: A New Generation of Open Platforms, https://www.cidrdb.org/cidr2021/papers/cidr2021_paper17.pdf
AWS, Redshift data sharing overview, https://docs.aws.amazon.com/redshift/latest/dg/datashare-overview.html
RudderStack, What is a composable CDP, https://www.rudderstack.com/blog/what-is-a-composable-cdp/
Hightouch, Reverse ETL platform page, https://hightouch.com/platform/reverse-etl
Census, What is a composable CDP, https://www.getcensus.com/blog/composable-cdp
dbt, How we structure our dbt projects, https://docs.getdbt.com/best-practices/how-we-structure/1-guide-overview
Databricks, Delta Sharing product, https://www.databricks.com/product/delta-sharing
Google Cloud, Analytics Hub introduction, https://cloud.google.com/bigquery/docs/analytics-hub-introduction
Monte Carlo, What is data observability, https://www.montecarlodata.com/blog-what-is-data-observability/