I’m spending a lot of time on the Autotrader website, looking at near-identical pictures of Mercedes Sprinters. When I’m doing this, I’m trying to answer one of two related questions:
- Is this van good value for the condition it’s in?
- How does it compare in value with other vans I’ve seen that look similar?
But there are so many of them. They’re all white, they’re all in much the same condition. After a while they all blur into one another.
I love to nerd out with systems and technology, and I’m always interested in defining methods for solving problems, rather than ad hoc approaches. And this problem is a great candidate for such an approach. That ‘blurriness’ is a factor of the quantity of data, and its homogeneity. However, those factors also favour a data analysis approach – albeit a pretty casual one.
There are many considerations to weigh up when looking for a van for a campervan conversion. But we’ve fixed many of the variables:
- Model: Mercedes Sprinter
- Long wheelbase
- High top
- NCV (post-2006) variant
- From a dealer, not a private seller
- Under 100,000 miles on the clock
There are lots of Sprinters on the market with these specs, so it’s easy to compare them across the other factors that we’re considering:
- Price
- Mileage
- Age
- Condition
Condition seems to be mostly a factor of age and mileage, so that gives us three variables to compare. So I made a spreadsheet to record some basic information about each van:
- URL or identifying label
- Registration plate, location, or other comments
- Mileage
- Year
- Price (inc. VAT if applicable)
(Interestingly, VAT-free vans don’t seem to be much better value. It looks like the sellers are recovering the 20% ‘discount’.)
Normalising
I wanted to give each van a score to indicate how ‘good’ it is – a factor of how new, and how few miles are on the clock. In order to compare these very different values I needed to normalise them to a standard range. Fortunately, there’s a formula for that:
zi = ( xi − min(x) ) ÷ ( max(x)−min(x) )
Bear in mind when doing this calculation that higher numbers are better for year of manufacture, but lower numbers are better for mileage. So ‘max’ and ‘min’ could be more usefully substituted for ‘best’ and ‘worst’
That gives each van a score of 0 – 1 for year, and 0 – 1 for mileage. Combining (adding) these gives an overall score for each van.
But those factors aren’t equally important, so I also included a variable weighting modifier. For example, if we think that mileage is twice as important as age, we can multiply the mileage score by 2 before adding the two values together. Then I multiplied the result by 100 to make the numbers easier to read.
The final ‘how good is this van’ quality score is calculated like this:
Quality = 100 * ( (Mileage Weighting * Normalised Mileage Score) + (Year Weighting * Normalised Year Score) )
Using a weighting of 1 for the age and 2 for the mileage, this gives every van a score of 0 – 300.
Plotting a trend
Putting this quality score on a scatter chart mapped against asking price gives a picture of how all the vans vary in value for money.
Adding a trend line allows us to see which vans are good value for money across the price range. Everything below the line is comparatively cheap, and can be easily contrasted with other vans that are of the same ‘quality’ or the same price.
It’s also easy to set a price limit (draw a horizontal line across the chart at the maximum price we willing to pay) and identify all the good value vans below this limit.
Outliers
The more data we can collect the better. It helps to reduce the impact of outliers on the overall picture – outlier data can skew the average and make the trend line misleading. For example, private sellers are often much cheaper, but not necessarily where we want to buy from. Adding lots of cheap vans from private sellers pulls the trend line down, making the other vans seem much worse value (which may be accurate if you’re happy to buy from a private seller). Conversely vans from Mercedes dealerships come at a premium, and push the trend line up.
On the flipside, the chart makes it very easy to see outliers, and then prompt further research to explain them.
Perfection is the enemy of good
I’ve found this to be a useful tool to help make sense of all this data. But I’m conscious that fiddling around with spreadsheets and scatter charts could just be my way of stalling; of avoiding making a decision – a commitment.
To some extent this approach is about managing uncertainty and the discomfort that comes with it. We want to buy a good van, at a good price, and it’s uncomfortable negotiating that purchase process when we’re not experts in the field. But you can’t remove all risk, you can’t avoid uncertainty in life (that’s called ‘being dead’). At some point we have to let go.