Numerical weather prediction models continue to move toward higher resolution, which, in turn, provides both a finer level of detail and a more realistic structure in the resulting forecast. However, it is widely acknowledged that using traditional verification metrics for evaluation may unfairly penalize these high-resolution forecasts. Traditional verification requires near-perfect spatial and temporal placement for a forecast to be considered good; this approach favors smoother forecast fields of coarser resolution models and offers no meaningful insight regarding why a forecast is considered good or bad. In contrast, more advanced spatial verification techniques, such as object-based methods, can provide information on differences between forecast and observed objects in terms of displacement, orientation, intensity and coverage areas; neighborhood methods can provide information on the spatial scale at which a forecast becomes skillful.

The Developmental Testbed Center (DTC) performed an extensive evaluation of the Global Forecast System (GFS) and the North American Mesoscale (NAM) operational models to quantify the differences in the performance of Quantitative Preciptation Forecasts (QPF) produced by two modeling systems that vary signficantly in horizontal resolution. Traditional verification metrics computed for this test included frequency bias and Gilbert Skill Score (GSS). Two advanced spatial techniques were also examined - the Method for Object-based Diangostic Evaluation (MODE) and the Fraction Skill Score (FSS) - in an attempt to better associate precipitation forecast differences with different model horizontal scales.