We compare different evaluation functions that are all designed to measure the quality of a timetable from passengers’ perspective. Already in small examples fundamentally different timetables can be preferred by evaluation functions that seem to be similar. To investigate this effect in practice, we design a set of evaluation functions as representatives for a wide range of commonly used evaluation functions in optimization models, evaluation applications, or choice models. These functions are compared by analyzing their evaluation values of multiple timetables in three case studies. To investigate to what extent these evaluation functions agree on a good or a bad timetable, we apply cluster analysis as well as a novel methodology to quantify the similarity of pairs of evaluation functions based on the values they yield on different timetables. We empirically show that the choice of the evaluation function can have a significant impact on the assessed quality of timetables, and thus also on which timetable is considered optimal, even though all evaluation functions are meant to evaluate the same - the quality of a timetable from passengers’ perspective. Due to the structure of the designed evaluation functions, it is further possible to identify which components of the functions influence the results of an evaluation and under which conditions they this is most pronounced. This can be very beneficial when designing timetable evaluation functions for passengers.
|Publication status||Published - 2019|