Code Availability: Received: 5 February 2020 Accepted: 17 April 2020 Published: XX XX XXXX
Code Availability: Received: 5 February 2020 Accepted: 17 April 2020 Published: XX XX XXXX
com/scientificdata
machine learning algorithms as provided for example by scikit-learn (https://scikit-learn.org) or popular deep
learning frameworks such as TensorFlow (https://www.tensorflow.org) or PyTorch (https://pytorch.org).
Code availability
The code for dataset preparation is not intended to be released as it does not entail any potential for reusability.
We provide the stratified sampling routine in Supplementary File 1 to allow users to create stratification folds
based on user-defined preferences.
References
1. Dagenais, G. R. et al. Variations in common diseases, hospital admissions, and deaths in middle-aged adults in 21 countries from
five continents (PURE): a prospective cohort study. The Lancet (2019).
2. Hannun, A. Y. et al. Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural
network. Nature Medicine 25, 65–69 (2019).
3. Attia, Z. I. et al. An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus
rhythm: a retrospective analysis of outcome prediction. The Lancet 394, 861–867 (2019).
4. Schläpfer, J. & Wellens, H. J. Computer-Interpreted Electrocardiograms. Journal of the American College of Cardiology 70, 1183–1192
(2017).
5. Wagner, P., Strodthoff, N., Bousseljot, R., Samek, W. & Schaeffter, T. PTB-XL, a large publicly available electrocardiography dataset.
PhysioNet. https://doi.org/10.13026/6sec-a640 (2020).
6. Bousseljot, R., Kreiseler, D. & Schnabel, A. Nutzung der EKG-Signaldatenbank CARDIODAT der PTB über das Internet.
Biomedizinische Technik/Biomedical Engineering 40, 317–318 (1995).
7. Bousseljot, R. & Kreiseler, D. Ergebnisse der EKG-Interpretation mittels Signalmustererkennung. Herzschrittmachertherapie +
Elektrophysiologie 11, 197–206 (2000).
8. Bousseljot, R. & Kreiseler, D. Waveform recognition with 10,000 ECGs. Computers in Cardiology 27, 331–334 (2000).
9. Bousseljot, R. & Kreiseler, D. ECG signal pattern comparison via Internet. Computers in Cardiology 28, 577–580 (2001).
10. Bousseljot, R. et al. Telemetric ECG diagnosis follow-up. Computers in Cardiology 30, 121–124 (2003).
11. Bousseljot, R., Kreiseler, D., Mensing, S. & Safer, A. Two probabilistic methods to characterize and link drug related ECG changes to
diagnoses from the PTB database: Results with Moxifloxacin. Computers in Cardiology 35, 217–220 (2008).
12. ISO Central Secretary. Health informatics – Standard communication protocol – Part 91064: Computer-assisted electrocardiography.
Standard ISO 11073-91064:2009, International Organization for Standardization, Geneva, CH (2009).
13. Goldberger, A. L. et al. PhysioBank, PhysioToolkit, and PhysioNet. Circulation 101, e215–e220 (2000).
14. Clifford, G. et al. AF Classification from a Short Single Lead ECG Recording: the Physionet Computing in Cardiology Challenge
2017. In 2017 Computing in Cardiology Conference, vol. 44, 1–4 (Computing in Cardiology, 2017).
15. Liu, F. et al. An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality
Detection. Journal of Medical Imaging and Health Informatics 8, 1368–1373 (2018).
16. Arnaud, P. et al. Common Standards for Quantitative Electrocardiography: Goals and Main Results. Methods of Information in
Medicine 29, 263–271 (1990).
17. Kohavi, R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In 14th International Joint
Conference on Artificial Intelligence (IJCAI), vol. 2, 1137–1143 (1995).
18. Sechidis, K., Tsoumakas, G. & Vlahavas, I. On the Stratification of Multi-label Data. In Gunopulos, D., Hofmann, T., Malerba, D. &
Vazirgiannis, M. (eds) Machine Learning and Knowledge Discovery in Databases, 145–158 (Springer Berlin Heidelberg, 2011).
19. Mason, J. W., Hancock, E. W. & Gettes, L. S. Recommendations for the standardization and interpretation of the electrocardiogram.
Journal of the American College of Cardiology 49, 1128–1135 (2007).
20. Moody, G. B. & Mark, R. G. Development and evaluation of a 2-lead ecg analysis program. Computers in Cardiology 9, 39–44 (1982).
21. Zhang, J., Wang, L., Liu, X., Zhu, H. & Dong, J. Chinese Cardiovascular Disease Database (CCDD) and Its Management Tool. In
2010 IEEE International Conference on BioInformatics and BioEngineering, 66–72 (2010).
22. Couderc, J.-P. The telemetric and holter ECG warehouse initiative (THEW): A data repository for the design, implementation and
validation of ECG-related technologies. In 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology,
6252–6255 (IEEE, 2010).
23. Moody, G. B., Muldrow, W. & Mark, R. G. A noise stress test for arrhythmia detectors. Computers in Cardiology 11, 381–384 (1984).
24. Moody, G. & Mark, R. The impact of the MIT-BIH Arrhythmia Database. IEEE Engineering in Medicine and Biology Magazine 20,
45–50 (2001).
25. Greenwald, S. D. The development and analysis of a ventricular fibrillation detector. Master’s thesis, Massachusetts Institute of
Technology (1986).
26. Nolle, F., Badura, F., Catlett, J., Bowser, R. & Sketch, M. CREI-GARD, a new concept in computerized arrhythmia monitoring
systems. Computers in Cardiology 13, 515–518 (1986).
27. Taddei, A. et al. The European ST-T database: standard for evaluating systems for the analysis of ST-T changes in ambulatory
electrocardiography. European Heart Journal 13, 1164–1172 (1992).
Acknowledgements
The authors thank Dr. Lothar Schmitz for numerous annotations and providing medical expertise and Dr.
Hans Koch for initiating and overseeing the creation of the original database. This work was supported by
the Bundesministerium für Bildung und Forschung (BMBF) through the Berlin Big Data Center under Grant
01IS14013A and the Berlin Center for Machine Learning under Grant 01IS18037I and by the EMPIR project
18HLT07 MedalCare. The EMPIR initiative is cofunded by the European Union’s Horizon 2020 research and
innovation program and the EMPIR Participating States.
Author contributions
Creation and maintenance of the original database: R.D.B. and D.K.; ECG quality assessment: R.D.B., D.K.
and F.I.L.; Conception of the release process: P.W., N.S. and T.S.; Data harmonization: P.W. and N.S.; Providing
conversion routines: P.W.; Manuscript preparation: P.W. and N.S.; Supervision of the project: W.S. and T.S.;
Critical comments and revision of manuscript: all authors.