This paper introduces and provides results from a rigorous, scientific testing methodology that allows pure building model calibration systems to be compared fairly to traditional output errors (e.g.,how well does simulation output match utility bills?) as well as input-side errors (e.g., how well, variable-by-variable, did the calibration capture the true building's description?). This system is then used to generate data for a correlation study of output and input error measures that validates CV(RMSE) and NMBE metrics put forth by ASHRAE Guideline 14 and suggests possible alternatives.