Customizing Time Series Forecasting Baseline Models: Improving Accuracy with Trend and Seasonality
In Part 1, we introduced the basics of time series decomposition, focusing on the Seasonal Naive model to forecast daily minimum temperatures. While the model captured broad seasonal patterns, it failed to account for the underlying trend, leading to a Mean Absolute Percentage Error (MAPE) of 28.23%. To improve accuracy, we need to extend our approach beyond simple baseline models by incorporating both trend and seasonality into our forecast. Understanding Decomposition Methods We began by manually decomposing a 14-day sample of temperature data to understand the decomposition process. This involved: Calculating the Trend: Using a 3-day centered moving average to smooth out daily fluctuations and capture the overall direction of the data. Isolating Seasonality: Subtracting the trend from the observed values to get the detrended series, then averaging detrended values for each day of the week to identify recurring seasonal patterns. Extracting Residuals: Computing the difference between the observed value, trend, and seasonality to capture random noise. This manual process helped us understand the decomposition method, but for practical applications, we used Python's seasonal_decompose function. This tool automatically calculates the trend, seasonality, and residuals for larger datasets. Building a Better Baseline Model To create a more accurate baseline model, we applied the seasonal_decompose function to the full dataset, decomposing it into trend and seasonality components. We then constructed a baseline forecast by adding the trend and seasonality, assuming residuals were negligible. Here are the key steps: Data Preparation: Loaded the dataset, converted the 'Date' column to datetime format, and set it as the index. Missing values were filled using forward filling. Train-Test Split: Split the data into a training set (first 9 years) and a test set (final year). Decomposition: Applied seasonal_decompose to the training data to extract the trend and seasonality components. Forecast Construction: Seasonality: Repeated the last 365 days of the extracted seasonal component for the test set. Trend: Extended the last valid trend value across the entire test period. Model Evaluation: Computed the MAPE to assess the model's performance. The MAPE for this baseline model was 21.21%, a significant improvement over the Seasonal Naive model. Creating Custom Baselines Day-of-Year Average Method Recognizing that the temperature dataset exhibits strong seasonal patterns, we developed a custom baseline using the average temperature for each day of the year. This method: Calculated Daily Averages: Grouped the training data by day of the year (1 to 365) and computed the mean temperature. Forecasted Test Data: Mapped each test day to its corresponding average from the training data. This approach achieved a MAPE of 21.17%, almost matching the performance of the decomposition-based baseline. Calendar-Day Average Method To address potential issues in leap years, we further customized the baseline using the average temperature for each calendar day (month and day). Steps included: Extract Month and Day: Added 'month' and 'day' columns to both the training and test sets. Calculated Calendar Day Averages: Grouped the training data by (month, day) pairs and computed the mean temperature. Forecasted Test Data: Mapped each test row to the corresponding calendar day average. This method achieved a MAPE of 21.09%, slightly better than the day-of-year average. Blended Custom Baseline Combining the strengths of the calendar-day average and the previous day’s temperature, we created a blended custom baseline. The forecasted value was a weighted sum of the calendar day average (70%) and the previous day’s temperature (30%). This approach: Created Previous Day’s Temperature Column: Shifted the temperature data by one day in the dataset. Added to Test Set: Merged the previous day’s temperature with the test data. Generated Blended Forecast: Calculated the weighted average for each test day. Evaluated Performance: Achieved a MAPE of 18.73%, demonstrating the highest accuracy among the tested baselines. Summary of Models and Their Performance | Model | MAPE (%) | |-----------------------------|----------| | Seasonal Naive | 28.23 | | Decomposition-Based Baseline| 21.21 | | Day-of-Year Average | 21.17 | | Calendar-Day Average | 21.09 | | Blended Custom Baseline | 18.73 | Next Steps Before diving into more complex time series models like ARIMA and SARIMA, it’s crucial to solidify our understanding of decomposition techniques and custom baseline creation. In Part 3, we will explore STL decomposition, which provides a more refined way to decompose time series data, and continue enhancing our forecasting capabilities. Industry Insights and Company Profiles Industry insiders emphasize the importance of building robust baseline models before deploying advanced algorithms. This ensures that improvements from complex models are substantial and meaningful. Companies like Facebook and Google, which deal with vast amounts of time series data, often use custom baselines to establish benchmarks and validate the effectiveness of more sophisticated models. Understanding and applying these baseline methods not only improves forecast accuracy but also builds a foundational knowledge essential for tackling real-world, multivariate time series problems.
