Why We Built a Real-Time Integration That's 900% More Powerful Than Containers

At Cognitiv, we run large, sophisticated deep learning models in real time to drive advertising performance. Not rules engines. Not logistic regression. Full-scale neural networks with large input signals, executing inference on billions of impressions—in under 10 milliseconds.

This is not possible using any of the current containerized solutions. 

That's not a knock on containerized solutions. They serve a purpose, and I'll explain why. Our solution is highly sophisticated and requires advanced engineering. But when your mission is to bring genuine intelligence to every impression, everywhere it runs, you need infrastructure that can keep up with the ambition of your models. So we built one.

We Needed More

We wanted to do more than what is possible with containers. We wanted our contextual product to do more than a simple embedding on the URL. We wanted a deep understanding of the content and to be able to layer models on top of that. This is impossible with the constraints in containerized environments. Whereas container solutions require a new deployment that can take hours or days.

That’s why we built a system that had 908% more central processing unit (CPU) cores, 7,260% more memory, and the ability to connect it to networks and petabyte scale high performance non-volatile memory express (NVMe) solid state drives.

What We Mean by "Deep Learning First"

While most programmatic optimization uses fast, cheap rules or lightweight ML to target proxy metrics like click-through rate (CTR), these approaches quickly plateau.

When a brand wants to optimize for real business outcomes—conversions, store visits, incremental return on ad spend (ROAS)—the signal is sparse, the feature space is vast, and the relationships between inputs are deeply non-linear. This is exactly where deep learning excels: learning complex patterns across hundreds of features that simpler models can't represent.

Cognitiv’s approach uses bespoke neural networks—including Custom Algorithms for outcomes data, ContextGPT for ID-less privacy-safe open web, and Performance Connected TV (CTV) for connected television—to ensure large, expressive neural networks deliver high-fidelity predictions on every impression opportunity, in real time.

The common thread across all three products is the same: large, expressive neural networks making high-fidelity predictions on every single impression opportunity, in real time.

Activating Intelligence Everywhere

Our philosophy is different from most companies in this space. We don't think the value should be locked inside a single buying platform. Intelligence is the product—not the pipes.

This is why Cognitiv offers two activation paths. Advertisers can run campaigns directly through our Deep Learning DSP, where they get the full end-to-end experience. Or they can keep their existing DSP workflow entirely intact and access Cognitiv's models through Curation and Dynamic Deals—we supply the intelligence, the buyer keeps their tools. We call this approach Cognitiv Everywhere.

That second path is where the engineering challenge gets interesting. To evaluate every impression opportunity with a full deep learning model, regardless of where the media is ultimately purchased. The standard industry approaches simply didn't offer enough capability.

How Our Real-Time Integration Works

Here's the flow: a publisher sends an ad request to the supply-side platform (SSP). The SSP forwards that request to Cognitiv. Within 5 milliseconds, we must receive the request, resolve identity, load hundreds of features from our stores, run a large neural network, score the impression, and respond. We label qualifying impressions with a private marketplace (PMP) deal ID. The SSP then passes those deal IDs downstream to the demand-side platform (DSP), where the buyer can purchase the inventory through their normal workflow.

Five milliseconds. To put that in perspective, we need to complete this entire pipeline—identity resolution, feature lookup, neural network inference, and response—faster than the most popular cloud based databases can return a single record.

Meeting that latency budget required us to optimize every layer of the system. Our servers are co-located inside the same data centers as the SSPs, connected over low-latency links. The application code is C++ compiled to native binary, running on bare metal—no virtualization overhead anywhere in the stack. We built a custom low latency communication protocol with an optimized binary message format. We wrote our own threading library tuned for this workload. We built our own high-performance inference engine. Internally, we measure in microseconds and are constantly pushing the boundary.

This isn't infrastructure for infrastructure's sake. Every optimization exists to serve a single goal: giving our models enough compute headroom to make the best decision on every impression.

Why Containerized RTB Can't Do This?

Containerized real-time bidding (RTB) (which evolved into the Agentic Real Time Framework, or ARTF) is a cost-effective, fast-to-deploy, standardized framework that works well for simple operations, such as applying a blocklist, running a rules engine, or executing a logistic regression model.

However, the constraints are severe: Containerized RTB environments offer limited CPU, minimal memory, and essentially no persistent storage. This lack of capacity makes it impossible to maintain the large feature stores, model weights, and complex multi-step inference pipelines required for high-quality deep learning predictions.

The scale of difference is massive: our real-time integration provides over two orders of magnitude more compute capacity than market alternatives. This isn't a minor difference; it's the gap between running a lookup table and running a neural network, between matching keywords and understanding language, and between optimizing for a proxy metric versus optimizing for real business outcomes.