Improving Forecasting for Programmers

Using computers to forecast is a common task. Forecasting can mean many different things but in this case I mean “making a prediction on what number X will be at time Y”. Examples include “how many movie tickets will we sell next week at all 50 locations” or “what will be the price of all the ingredients in our burrito be in January”

Basically 1/4 of what people use spreadsheets for is forecasting. The spreadsheet method works well enough at the start, when the project is small. But as the project becomes more important, the forecast needs to be improved, and the spreadsheet method of forecasting usually hits its limits. So if you are a programmer, in a lot of different domains, a not uncommon goal is to “improve our forecast of $FOO”.

That’s who this article it targeted at

Weighted Average Percentage Error.

To do this the first thing you need to do is is measure how well the forecast is doing. The standard way to do this if you’re forecasting for dozens to thousands of items is WAPE.

cast

WAPE is taking the error in all your forecast and dividing it by the actual value. WAPE strikes a balance between being easy to explain (sum of errors over sum of actual) while still weighing larger error more heavily. It’s a better starting point than the simpler Mean Absolute Percentage Error (MAPE). While also being much easier to explain than Weighted MAPE and not needing business to rank the relative importance of different forecasts. A fuller explanation

So once you have this forecast and you’re measuring it how do you improve it? A lot of times developers will throw up their hands since they “aren’t data scientists” or don’t have the statistical background. Not having a data science background can limit improving a forecast that’s already working very well, but that’s rarely the case.

Usually there are easy wins through better code and data engineering

By this I mean there are often issues in the code that does the forecast, such as rounding issues, handling nulls incorrectly, inner joins on tables, missing data sources etc. None of these require “Data Science” to fix or improve.

Step 1. Make sure the forecast and measurement are correct

A lot of times the initial forecast was done by someone on the business side who needed it and was done in excel or SQL. While they’re probably an expert in their field, there’s a high likelihood that there is a mismatch in the logic that they think they programmed vs what they actually did. Usually the first step is to source control and code review what they were already doing. Depending on how you implement it you might be able to add unit tests.

Implementing the current solution has the added benefit of when improve the forecast you get credit for it

The reason to source control and understand the measurement are different. WAPE & MAPE are fairly standard ways to measure error, but again it’s possible there was a coding error. The main reason though is that you want to understand your feedback loop and how to measure what actually matters. In the movie example above, if you’re predicting ticket sales 4 weeks from now, but then it takes an additional 2 weeks to get the actual sales numbers, and it requires canceling out any complimentary tickets that were used. You actually have a 6 week feedback loop and need to pull in an additional data source. You want to know this sooner rather than later.

Step 2. Group forecasts error by type

Once you have the forecasting method, and trust it, you start improving it. Each forecast fits into 1 of the 5 categories

No Actual - Items you forecast for but don’t have actual of. You forecast 100 ticket sales of a movie and had 0 sales
Overbiased Forecast - Items you forecast for and have less actual sales of. You forecast 100 ticket sales of a movie and sold 80
Perfect - Items you forecast for perfectly.
Underbiased Forecast - Items you forecast for and have more actual sales of
No Forecast - Items you don’t forecast for but have actual sales of.

Once you have these 5 categories you can get the total error grouped at the type of forecast. This can tell you where to focus your efforts. For example if the data looked like this

id	type	actual ticket sales	forecasted ticket sales	correct	overbias	underbias
1	no actual	0	220	0	220	0
2	overbiased	100	230	100	130	0
3	overbiased	220	320	220	100	0
4	perfect	150	150	150	0	0
5	perfect	240	240	240	0	0
6	underbiased	210	170	170	0	40
7	underbiased	180	140	140	0	40
8	underbiased	230	220	220	0	10
9	underbiased	330	310	310	0	20
10	no forecast	100	0	0	0	100
					450	210

The majority of your wrong forecasts is in underbiased forecasts but 1/3 of your total error is coming from the case where you have a forecast of 220 but no actual. This would Each category usually has a different cause for it’s error and different solution.

Step 3. Visualize it

When you group your results by the 5 categories you can then visualize it. This is easier for visual thinkers to grasp the size/percentage of the errors. You might notice that most of your forecasts are ‘close enough’ or that a small subset of items is causing most of your error.

The brain is better at spotting patterns in images than spreadsheets

For each item you want a row with the id, how much of the forecast was overbiased, how much it was underbiased and how correct it was. You want all the forecasts in the same category to be together, but within categories you might want to sort by the error or the total depending on what matters more. So the 10 item data set above you would need the values from the id, correct, overbias and underbias columns. For smaller data sets (10k) you can easily do this in excel by using stacked graphs with sum of correct, overbias and underbias on the one axis and the ids on the other axis, this makes it easier to deep dive specific cases.

cast

For larger 50k+ datasets I needed to use the command gnuplot with each row as a single pixel wide column. Even then it becomes harder to spot patterns. Often times it’s worth looking at specific subgroups of forecasts such as a specific brand or region you are forecasting for.

Graphing a 100+ forecast data set ends up looking something like

cast

With a visualization it’s easy to see

spikes in errors - cases where the error is much larger (either in actual terms of % of nearby forecasts) than the actual. Usually there’s some simple fix for these, either an incorrect assumption in the model or missing data source
size of error of each category - you can get the raw numbers from excel but it’s easier to visualize to get a sense for it. In this case the underbias has many more data points
average percentage of error in each category - In the data set above while there were more forecasts that were underbiased but the average errors was actually the same in underbiased vs overbiased.

Step 4. Look at the ‘No Forecast’ or ‘No Actual’ errors first.

People overlook the categories but they’re one of the easier wins because for a large % of the error in these categories it is usually just the case that your forecasting is missing a dataset from your forecast inputs. If a movie theatre has a “only show movies for 90 days policy” or “Play Rocky Horror the week of Halloween” and you’re not taking the policy into account in your forecast it doesn’t matter how good your statistics are.

Usually the ‘No Forecast’ and ‘No Actual’ categories each have 2 or 3 data sources that need to be pulled into the forecast to improve their error rate. Underbias and Overbias also are usually missing data sources, usually seasonal and any ads/promotions, but it’s not as large a percentage and mainly helped me with spiky errors.

5. Does under or over forecasting matter more

WAPE has a known issue that it penalizes over forecasting more than under forecast. If you under forecast 100 and the actual is 150 your WAPE is 50/150 only 33%. If you over forecast 150 and your actual is 100 your WAPE is 50/100 which is 50%. If the business thinks of these errors as basically the same error it can be contrary to what they want. This can be handled with Symetric MAPE

Besides this issue under vs over forecasting being worse is very business dependant. In the movie theatre example above suppose you are using the forecast to decide how much popcorn and candy to buy. Candy and popcorn have a long shelf life. so if you over forecast you can likely under buy and use those supplies in the following weeks. But if you underestimate then you might run out of them. Since concession sales is where they make most of their money, under forecasting (in those categories) is much worse.

In these types of cases you want to understand which is more important and if it’s worth making your measurement more complicated by weighing underbias and overbias asymetrically. Similarly if specific categories of forecast are more important than others, such as pop corn forecast being more important than soda Weighted MAPE might make more sense.

Another source of false errors is when you’re doing a point forecast and it is rounded. In the movie theatre example if you’re forecasting “we will sell 118.5 Twizzlers” and they have to buy Twizzlers in packs of 50. If the manager only purchases 2 packs of 50 then your forecast is “wrong” because only 100 Twizzlers could be purchased. There isn’t a standard fix for this problem, but it’s one to be aware of.

6. Learn some data science

If the error seems to be across multiple categories you likely have taken all the low hanging fruit from a software engineering perspective. To get better results you might have to do some ‘data science’ such as linear regression. Data Science is a relatively new field and until ~2020 most of the Data Scientists I worked with were CS majors. There are plenty of articles on the web with baeldung and towarddatascience being the most obvious places to start. I also found this book helpful on how to think about forecasting