I Have Seen The Future Of Measurement, And It Is … Messy

Data-Driven Thinking” is written by members of the media community and contains fresh ideas on the digital revolution in media. Today’s column is written by Martin Kihn, senior vice president of strategy, Salesforce Marketing Cloud.

Measurement is a footnote – outglamorized by targeting and the opera of browsers, pushed into the corner during debates about the future of media. But it’s arguably more important than aim and requires more discipline.

A few weeks ago, the World Federation of Advertisers (WFA) released a statement and a technical proposal for a cross-media measurement framework. It was the outcome of a yearlong global peer review and an even longer discussion among participants including Google, Facebook, Twitter, holding companies, industry groups such as the ANA and MRC and large advertisers including Unilever, P&G and PepsiCo.

Reactions ranged from enthusiastic to less so, but few people seem to have read more than the press release. After all, it’s not a product and could be yet another in a parade of grand ambitions in online-offline media measurement, dating back to Facebook’s Atlas.

But it describes a realistic scenario for the future of measurement. Sketchy in spots, the WFA’s proposal is ironically the clearest screed of its kind and is worth a closer look.

To be sure, this is a project focused on solving a particular problem: measuring the reach and frequency of campaigns that run on both linear TV and digital channels, including Facebook, YouTube and CTV. In other words, the kinds of campaigns that cost participating advertisers such as P&G a reported $900 billion a year.

And P&G’s own Marc Pritchard is on record calling the proposal “a positive step in the right direction.”

The need is clear. Advertisers today rely on a patchwork of self-reported results from large publishers, ad server log files, aggregate TV ratings data and their own homegrown models to try to triangulate how many different people saw their ads (reach), how often (frequency) and how well those ads fueled desired outcomes, such as sales lift.

The latter goal is acknowledged in the current proposal, which doesn’t try to solve it. But the WFA, building on previous work from the United Kingdom’s ISBA, Google, the MRC and others, lays out a multi-front assault on reach and frequency that covers a lot of ground.

How does it work?

The proposal combines a user panel with census data provided by participating publishers and broadcasters, as well as a neutral third-party data processor. The technical proposal spends some time talking about various “virtual IDs” and advanced modeling processes that are loosely defined – and the goal of which is to provide a way for platforms that don’t share a common ID to piece together a version of one.

Needless to say, a lot of the virtualizing and modeling and aggregating in the WFA’s workflow exists to secure user-level data. It’s a privacy-protection regime. It also engages with the much-discussed third-party cookieless future.

Panel of truth

The proposal leans heavily on a single-source panel of opted-in users. At one point, it calls this panel the “arbiter of truth,” and it’s clear most of the hard work is done here. Panelists agree to have an (unnamed) measurement provider track their media consumption online and offline. Panels are a workhorse of media measurement as provided by Nielsen and others, but they are expensive to recruit and maintain. It’s not clear who would build or fund this one.

In the past, other panels have struggled to collect certain kinds of cross-device data, particularly from mobile apps. Panels also get less reliable in regions or publishers where they have less coverage, a problem that could be addressed by joining multiple panels together.

In addition to the media consumption, demographic and attitudinal data it provides, the panel is used to “calibrate and adjust” much more detailed census data voluntarily provided by publishers (including broadcasters).

Publisher-provided data

No walls here – at least in theory. Given that Google and Facebook support the WFA’s proposal, it’s implied they’re open to some form of data sharing. It’s already been reported – although is not in the proposal itself – that some participants will only share aggregated data, but it’s better than nothing. The WFA’s idea of “census data” includes publisher log files, TV operator data and information collected from set-top boxes.

This census data is married at the person-level with the panel data using a series of undefined “double-blind joins of census log data with panelist sessions.” Joined together, the different data sets can correct one another: The panel fills gaps where there is no census data, and the more detailed census data can adjust the panel’s output.

Virtual ID’s, anyone?

The census data will have to be freely provided, and so wide-ranging participation across many publishers is required for success. Another requirement is a way to tie impressions that occur on different publishers (which don’t share a common ID, remember) to individuals to calculate unduplicated reach and frequency.

In a general way, the proposal describes a process of assigning a “Virtual ID” (VID) to every impression. This VID may – but may not – denote a unique individual. How is it assigned? Based on a publisher-specific model that is refreshed periodically and provided by the neutral measurement provider. It appears to use cookies (and other data) in its first version, graduating to a cookieless solution based on publisher first-party data in the future.

The output here is a pseudonymized log file with a VID attached to each impression, overlaid with demographic data – at least TV-style age and gender cohorts – extrapolated from the panel.

Doing the math

In the final step, each individual publisher will perform some kind of aggregation into “sketches.” These sketches are likely groups of VIDs that belong to the same demographic or interest segment, by campaign. And it is worth noting here that the “sketches” can’t be reidentified to individuals and are somewhat similar to proposals in Google’s “privacy sandbox.”

At the penultimate step, each individual publisher sends their “sketches” to an unnamed independent service that will “combine and deduplicate VIDs” to provide an estimate of reach and frequency across the campaign. The WFA has a proposal for this Private Reach & Frequency Estimator posted on GitHub.

A GitHub explainer mentioning data structures and count vector algorithms is ad tech’s new sign of sincerity.

Finally, outputs are provided via APIs and dashboards, which support both reporting and media planning. End to end, it’s an ambitious proposal that has many of the right players and pieces to work. Its next steps are validation and feasibility testing led by the ISBA in the United Kingdom and the ANA in the United States.

Whatever happens, we’ve learned something from the WFA’s proposal. Even in a best-case-scenario, accurate global campaign measurement will definitely require heroic levels of cooperation.