TY - GEN
T1 - Predicting septic shock outcomes in a database with missing data using fuzzy modeling: Influence of pre-processing techniques on real-world data-based classification
AU - Pereira, RDMA
AU - Almeida e Santos Nogueira, Rui
AU - Kaymak, U
AU - da Silva Vieira, SM
AU - Sousa, JMC
AU - Reti, SR
AU - Howell, MD
AU - Finkelstein, SN
PY - 2011/6/27
Y1 - 2011/6/27
N2 - Real-world databases often contain missing data and existing correction algorithms deliver varying performance. Also, most modeling techniques are not suitable to deal with them automatically. In this study we examine different approaches to predicting septic shock in the presence of missing data. Some preprocessing techniques for managing missing data include disregarding data, or replacing it with information that by design introduces bias. In this study, we show that predictive performance improves by employing a minimum pre-processing technique, the Zero-Order-Hold (ZOH) method, by applying a Fuzzy C-Means clustering technique based on the partial distance calculation strategy (FCM-PDS) and by computing the final classification regarding the samples from each patient. Performance improvements continue to occur where up to approximately 60% of the data is missing, though for higher percentage the classification performance still is statistically improved. We further validate this approach by making comparisons with previous studies.
AB - Real-world databases often contain missing data and existing correction algorithms deliver varying performance. Also, most modeling techniques are not suitable to deal with them automatically. In this study we examine different approaches to predicting septic shock in the presence of missing data. Some preprocessing techniques for managing missing data include disregarding data, or replacing it with information that by design introduces bias. In this study, we show that predictive performance improves by employing a minimum pre-processing technique, the Zero-Order-Hold (ZOH) method, by applying a Fuzzy C-Means clustering technique based on the partial distance calculation strategy (FCM-PDS) and by computing the final classification regarding the samples from each patient. Performance improvements continue to occur where up to approximately 60% of the data is missing, though for higher percentage the classification performance still is statistically improved. We further validate this approach by making comparisons with previous studies.
U2 - 10.1109/FUZZY.2011.6007606
DO - 10.1109/FUZZY.2011.6007606
M3 - Conference proceeding
SP - 2507
EP - 2512
BT - Proceedings of the 2011 IEEE International Conference on Fuzzy Systems
ER -