One of the most common questions I get as the CEO and Co-Founder of Mutinex is why a generalized marketing mix model is better than a fitted model? If someone builds a model bespoke to your business, don’t they understand your business better? Hasn’t the model been built around your business, and therefore is far more robust for your needs?
I’ve struggled with this answer for some years until reading an excellent blog post by Byron Sharp. Sharp argues that most models are built for fit (no surprise that he’s right here) by data scientists, and thus are poor at explaining reality. I happen to agree with this perspective. I’ve seen many a model with a strong predictive fit but poorly explained marketing dynamics (like Search generating 60%+ of sales – something I have seen commonly in bad marketing mix models). The post incredibly insightful as to the limitations of conventional marketing mix modeling and marketing mix modeling (MMM) research.
Sharp essentially highlights four key components in the post:
- Modelers have immense bias
- Modelers are picking for fit
- Causality is a big issue
- Lack of variation is also a big issue
So why is marketing mix modeling suddenly improving?
I think the answer lies in changing our assumptions around MMM. We need to move away from the world of well fitted data, and started moving towards the world of generalized models that fundamentally understand domains. In doing so, we can run more controlled experiments across datasets to better understand our model.
The best way to understand the difference is through priors. Priors allow us to incorporate stronger domain beliefs established by the academic world into models. Additional tests, like causality testing and experimentation frameworks, allow us to prove those out.
When we talk about priors at Mutinex, it’s important to note we don’t believe priors should be heavily configured. In fact, we think configuring priors is a surefire way to move away from data science and towards consulting. This is why we use uninformed priors – elements that tell the model how to think, but do not inform the model what data should effectively be outputted.
A good analogy for this exists in physics. If I tell the model gravity exists and it should expect a ball dropped from a height to fall, I don’t need to specify how fast. The model can measure that for me. Whereas models configuring priors heavily are looking at the height (often from a distance) and saying “I think the ball will travel at this exact trajectory and speed”.
In this analogy, a brand is the ball. We don’t know how fast the ball will travel in your specific case – it depends on the height, trajectory, how fast it was thrown and so on. We only know it will. That’s why we use uninformed priors to tell the model gravity exists without telling it how fast the ball will travel.
Modeling for fit is fake science
Byron Sharp is right – modeling for fit is fake science. Anyone can build a fitted model if they’re able to feature engineer a dodgy signal. And anyone can turn fitted models into reasonable outputs. The major challenge we have at the moment in data science is overfitted signals which constantly breakdown in their domains because they lack the first principles link between the data science and the actual way things work.
In essence: if I can make a model work in one, tightly controlled environment it’s not evidence that my model fundamentally understands the problem. In fact, it’s far from it. The fact that my model works is simply evidence that I can make it fit the data, usually with a collinear signal.
In case you’re wondering what a collinear signal is, it’s essentially a signal that goes up at the same time as another signal but isn’t causative. The big ones in MMM are ad spend and sales. By way of example you can have ad spend go up at key seasonal sales periods (e.g Christmas). Because sales are spiking – are you modelling for fit or something else?
To give an example: if I am in model land, and just modelling for fit without causal inference, then I can just throw in a bunch of broken up weeks referencing every holiday known to man (trust me, there’s a variable for every week if you want to find it). The model will fit very, very well on all MAPE and Rhats, yet still be virtually useless in the hands of a marketer because it holds little predictive or marketing knowledge within it.
Generalized models: big models that understand domain fundamentals
Bayesian statistics, a branch of statistics where we can largely define how a machine learning model can search for answers, offers some really good ways to build in domain understanding and certainty. For example we can begin to build clever channel constraints that reflect reality
The best thing about Bayesian inference though is we can also delineate in a generalised context. If I build something on one customer that fundamentally breaks others, I’m modelling for fit rather than building something that adds to a general understanding of the domain. This is a massive and important distinction between bespoke and generalised domain modeling; and why it is far more powerful to work in the world of generalized modeling as much as possible.
Fundamental dynamics versus bespoke dynamics
To give an example: if I build a feature, does that feature fundamentally reflect a marketing dynamic that needs to hold true across a range of customers? Does it break when I expose it to different conditions? Does it hold true, predictive value in the real-world?
Here are some examples:
- Promotions need to be considered as relative indexes and have their “competitiveness” or at least relative movements engineered into the underlying model; with competitor data loaded in as the ideal. Me simply saying promotions are valuable to this customer can see that breakdown at another customer because of the relative value and movement.
- Weather is especially important to a lot of categories. Weather should fundamentally impact a product, but the absolute fundamental shifts come from extremes in weather. Looking for those relative extremes or behavioural breakpoints is key. For example rainfall should trigger reductions in retail and automotive, and extreme temperature should spike ice cream sales.
- Seasonality is an important consideration for businesses. When estimating a baseline, the model should expect to see a natural variation pattern. In face not seeing this natural variation can be a good signal that your model is wrong or not fitting correctly. An unchanging baseline is cause for alarm, because it simply wouldn’t reflect what is actually happening in your business.
The risk: a model that becomes so big it can always fit
The big risk with generalized models is they become so big and clunky that they always try to fit the space. That’s why a good model should be additive (gradually adding in features that make sense until it finds a solid fit) versus subtractive (adding all features in and then stripping them back to find fit). The biggest challenge with most modeling techniques in machine learning is they tend to move towards subtractive modelling, which in turn is likely to create the “always fit” problem and exacerbate it.
Why a generalized marketing mix model model is better than a fitted model
Commonly when I listen to businesses’ who want consulting style services within the MMM space, I hear the argument that bespoke models will understand their business better. I think it’s a dangerous assertion. What it really means is the modeler will understand how to fit a model more effectively to the data. Bespoke modelling isn’t a path to marketing science for your business. Instead, it’s a path to you configuring a model that tells you what you want to hear. In data this is the single most dangerous thing I can think of.
Generalized modeling should hold up over different conditions with different datasets in the longer term
One of the true benchmarks of a solid model is that it can hold up under different conditions well and provide insights that make sense to someone trained in marketing science. If the model is able to hold up well both over time and between brands, we know we have a genuine solution to many of the issue plaguing market mix models that are built for fit. And importantly we know that the MMM model we’re using fundamentally works off the domain, rather than working to fit the data.
Long term: we need prediction monitoring which should lead to more robust track and test frameworks
The long-term future of MMM is prediction tracking in my view. Most robust vendors already talk about the value of operating in a test and learn environment to ensure that models are robust. And as time goes on, the real power of predictions is in the model.
This is why tracking predictions is so important. By monitoring how our model performs over time, we can gain valuable insights into how the market is changing. We can see which channels are becoming more or less effective, and we can identify new trends and patterns that were not present when the model was first built.
Of course, tracking predictions is not always easy. And to be honest, this feels less like an operations problem and more like a product problem. Other vendors have created feedback loops around the last 30 days of a model’s dataset and how well the model holds up. This is a great first step, but not far enough. We need to be able to deeply understand and surface insight into how predictions hold up as a key product feature within MMM 2.0, or run the risk of eroding the value longer term.
At the end of the day, MMM is all about understanding the complex dynamics of the market. And by tracking predictions, we can gain the insights we need to stay ahead of the curve and make better decisions for our business. A generalized model approach helps with this and that’s why a generalized marketing mix model is better than a fitted model.