Status
Ready
Created On
Updated On
Data relating to life circumstances and physiological characteristics are requested in order to obtain an assessment of the risk of stroke. The development of the app is documented in detail on Kaggle. The app is part of a project that shows what needs to be considered when developing apps that contain AI.

More details

Use Cases Limitations Evidence Owner's Insight

The analysis deals with the risk resp. probability of a stroke. It can be shown that such predictions are possible and can be a suitable tool for physicians, e.g. to be able to make a more reliable statement about possible risks when advising patients. Several techniques from the field of statistical learning are applied. The underlying data set, as it exists, is not directly suitable for such predictions because of a strong minority class problem. This problem is solved synthetically. Various supervised machine learning algorithms are then applied. These are examined for plausibility of feature impacts with the model output using model-agnostic methods (SHAP). Finally, an ensemble of the strongest methods is created, which shows a stronger performance than all previous methods individually. This ensemble is then deployed as a freely accessible app that can be used on a PC/Mac or mobile phone.

Despite the good results, there are several points of criticism. It is not entirely clear how the data was collected. Specifically, there could be systematic biases. For example, it could be that the association between BMI and stroke decreases that strongly at a BMI exceeding 30 because patients with a very high BMI die earlier for other reasons. Therefore there might be too few cases in the dataset to be able to make a statistically sound statement. Conversely, the prognosis for recovery is often significantly better for overweight patients than for normal-weight patients. This effect is called the "obesity paradox" [cf. Forlivesi, Chappellari, Bonetti, 2020 p. 417]. The dataset used for learning should therefore urgently be enriched with further data (observations). This brings us to the next problem. The developed app must be constantly checked and updated as soon as new data is available. It is not enough to simply relearn. Every algorithm must be validated in the same way according to scientific findings - as was done above. In order to refine the model considerably, further variables should be added, such as sporting activity, alcohol consumption, further blood values, medication, or dietary habits. The possibility of making a prognosis from the life status of a person has the advantage that the data is easily available. The main difficulty lies in the average glucose level, which is more likely to be present in diabetics. However, the addition of medication data would increase accuracy. For example, it was not possible to test for the use of statins, which have a positive effect on strokes.

Furthermore, it might be worthwhile to increase the folds in the cross-validation in order to give the training data set more systematics and thus greater comparability in the cross-comparison.

https://www.kaggle.com/code/frankmollard/machine-learning-process-idea-2-app

The performance of the models is generally satisfactory. It has been shown that the data contain strong non-linearities. These are difficult or impossible to detect with human perception. There are several combinations of life situations that lead to very different results. The resulting model can provide added value in making people aware that they are in a risk situation. As the results show, this is not necessarily immediately obvious in every constellation. The final model can help people orient themselves in a healthier direction.

Prototype

Warning: Not intended for clinical use. Assume outputs are unsafe and unvalidated. Use carefully.


  • Favorites: 1
  • Executions: 21

  • Clinical Informatics

Owner

M Mr Minister

Member since