This line of research began in 2004 and has been gaining popularity since then, boosted by the increasing use of the Internet worldwide. The last section highlights the main findings of the paper. 7 analyzes the robustness of the previous results. Section 6 compares the forecasting results of the proposed models relative to the benchmark and Sect. The latter are based on data reduction methods, which are introduced in Sect. Section 4 presents the benchmark model, the proposed alternatives and their relation with other common methods in the literature. Section 3 details the data employed in the analysis, paying particular attention to the GT queries and how those are generated and obtained. Section 2 provides a revision of the literature in the use of GT as explanatory variables, focusing on unemployment applications. This gain depends on the way the GT information is treated, with Principal Components Analysis (PCA) or Forward Stepwise Selection (FSS), and is robust to the variables that affect the results of the forecasting exercise.
This issue is treated in the paper in an application to the Spanish unemployment forecasting, although the procedures suggested could be applied in other contexts.īy means of a recursive forecasting exercise, we find that a SARIMA model with additional GT queries, applied to the Spanish unemployment series and relative to a univariate benchmark model, yields a statistically significant improvement in terms of forecasting accuracy that ranges 10–25%. As we will discuss in the next sections, some not trivial decision must be made when trying to optimize the information gathered from GT. However, any forecaster will soon discover that GT is not the panacea. Our hypothesis is that, using updated search indices obtained from GT there is a large margin to improve the predictions of the Spanish unemployment provided by a suitable univariate model.
More specifically, we use one of its tools, known as Google Trends (GT). In this paper, without losing generality, we focus on searches in Google. These applications contain a large amount of information, available almost instantaneously, and reveal many aspects of the individuals’ preferences through their search histories. We look for this information on the Internet search engines.
We search for models which include additional, free of charge and available-to-everyone up-to-date information. With this in mind, the aim of this work is to propose some simple alternatives to univariate models for predicting the Spanish unemployment. Footnote 1 Typically, data unemployment is released with certain delay which means that the use of leading, or coincident, indicators will be useful to anticipate its evolution and improving its forecasts (see, e.g., Stock and Watson 1993, for details on leading indicators). For the purpose of this study, we use the official figures provided by the Spanish Public Employment Service (SEPE). Spain is a country with a high unemployment level compared with its peers, peaking, in the 2013 recession, to 5 million registered unemployed workers. At least for these reasons, it is of most importance to correctly predict and evaluate unemployment in order to monitor its evolution, anticipate trend shifts, and design pro-employment policies. Moreover, unemployment is also related to welfare problems as inequality and social exclusion. Numerous jobless suffer constrains that generate problems of a macroeconomic nature, such as a decrease in consumption and investment which, eventually, affect GDP. Analyses are often based on offering explanations, consequences and possible solutions to the problem, by different models that simplify real complexity. It is a red-hot topic in studies carried out by economists and forecasters. Unemployment is an issue currently faced by the vast majority of economies.