What machine learning has to say about the essence of life (satisfaction)
So, I’m a little obsessed with the Understanding Society dataset, and have spent some time getting my head around it. With more than 3,000 variables and around 30,000 responses on most variables, there’s a lot to explore. [Indeed, I’m wondering if someone has written a book about Britain based on this data — if not, maybe I should do it?]
I’m starting modestly, by only looking at the individual responses in a single wave and wanted to continue to better understand potential drivers of life satisfaction. With so many variables to start with, many of which will not be independent of each other, any kind of linear regression is out of the question. [And, in any case, I was taught when I was doing my MBA that you are not allowed to touch the regression until you have a solid theory and some hypotheses you are trying to prove.]
The nice thing about machine learning is that there’s no such “technical” limitation. Even if it is “wrong”, the computer will not throw an error (or other abuse at you) if you ask it to run a prediction algorithm using tonnes of correlated variables. I’m not saying you’d want to use this model “in real life”, but I do think it’s a great way to quickly get an idea about the patterns in the data. We can then do seriously careful causal work on a more filtered and targeted set of variables. [In reality, I only included around 400 variables in the model — those for which there were more than 15,000 valid responses. For now, I also restricted the sample to those in paid work.]
So, I trained a deep learning model to predict the variable “life satisfaction”, and then asked the model which of the explanatory variables were most important in predicting that outcome. The results are shown in the chart, and are fascinating, but perhaps not surprising. A number of patterns strike me.
Just like previous work, I found the key predictors of life satisfaction to fall into a couple of categories. Markers for mental health were particularly predominant: among the top 20 variables (out of around 400) were whether someone felt downhearted, depressed, lonely, worthless, isolated, or unhappy; or conversely, optimistic, close to others, relaxed, or useful. Physical health also features (but less prominently), with general health, and health not interfering with social life, being predictive of overall life satisfaction.
There are a couple of items, too, that speak to people’s sense of control and resilience over their own lives (which has been previously found to be one of the most important drivers of satisfaction). People who were able to say that they were dealing with problems well, that they had fewer problems overcoming difficulties, and that they were thinking clearly, also had higher life satisfaction.
Finally, there are the important matters of money and jobs. Whether someone said they were living comfortably, or finding it very difficult to make ends meet, was the second most important variable in terms of its predictive power. [The exact question asked was: “How well would you say you yourself are managing financially these days?” and the scale was from 1 (comfortable) to 5 (very difficult).]
And, as I’ve argued (ad nauseam?), people’s satisfaction with their job had a major link to their satisfaction with their overall lives. Out of the nearly 400 variables, job satisfaction was the 11th most predicive one — and, arguably, if one were to group many of the mental health variables together (which I will do later), then its importance might be even higher than that.
Clearly, there are complex interactions between these variables, so proving causality is probably going to be quite challenging. But for now, I thought it was just a fun exercise to see what a machine learning alrorithm would make of the data!