

Sven Van Poucke, Anesthesiologist, Emergency Physician, ZOL Genk
In this presentation, Sven discusses the current position of RapidMiner as a tool for personalized medicine and what is needed to promote the understanding and adoption of RapidMiner in the new era of personalized medicine, genomics, etc.
00:04 [music] Okay. So in less than 15 minutes, I guide you through my work, my data, my RapidMiner solutions and scientific output. The face of Maria, her virtual name, speaks a thousand words. There’s something not right. She’s not healthy. Even a layperson notices that. But what is actually wrong? What we see on her face is a representation of an underlying disease without any label yet. We can, at best, make some good guesses. But in order to fully diagnostically investigate her, several additional tests are required. As you all know, treating chronic diseases require multiple visits over an extended period of time. In contrast, split‐second decisions are sometimes needed to differentiate life from death in situations with more uncertainties, without a diagnosis, without a treatment plan and outcome estimation. This is the real‐world environment the physician‐‐ so myself‐‐ faces on a daily basis.
01:21 Luckily, the human mind excels at estimating the motion and interaction of objects in the physical world, and inferring cause and effect from a limited number of examples, and extrapolating those examples to determine plans of action to cover previously unencountered circumstances. Computers, on the other hand, are not good at coming to decisions. And classical approaches to artificial intelligence do not easily capture the idea of a good enough solution, which is, in the majority of decisions, needed. But the data generated and required to make a correct diagnosis for a patient at a certain point in time and dynamically adapt to treatment in each new situation in the evolution of one or more diseases in the patient, well, the amount of data exceeds the computational power and memory of a human brain. Also mine.
02:29 For most of human history, the practice of medicine‐‐ and maybe that’s a surprise, but the practice of medicine was predominantly heuristic and anecdotal. Traditionally, quantitative patient data was relatively sparse. Decision‐making was based on clinical impression, and outcomes were difficult to relate with much certainty to the quality of the decision made. The transition, however, to evidence‐based medicine, to the integration of AI and ML, is quite an endeavor in the medical sector. One can even question what was actually meant by evidence‐based medicine, taking into consideration all steps needed in machine learning to reduce bias, validity correctly, etc. In one of the papers I published, I looked for answers if RCTs, randomized clinical trials, are still the gold standard. And I put the gold, the G, between brackets.
03:34 The proportion of patients with two or more medical conditions simultaneously is, however, rising steadily. Some of these multimorbidity clusters will occur by chance alone. Many, however‐‐ and Martin can assure that or admit that‐‐ will be non-random because of common genetic, behavioral, or environmental pathways of disease. Identifying these clusters is a priority and will help us to more systematically approach and treat multimorbidity. Clinical trials that resulted in a common standard of good clinical practice often excludes patients with more than one clinical condition. Qualitative vertical integration exists from bench to bedside for a single condition or disease. But there’s little or no horizontal integration between diseases that often coexist. Additionally, as bartenders will tell you, “Never mix, never worry.” Many patients take more than one medication, and not everyone reacts the same way to the various combinations of drugs. So the drug interaction when you take more than two drugs is notoriously difficult. This high‐dimensional context will require an intellectual shift in research, training, practice, and virtually every discipline.
05:08 In the next slides, I will be covering a variety of complex but also low‐hanging fruit of use cases I used RapidMiner for in the last decade. It’s essential to understand that many companies mentioning biomed or health on their website frequently provide solutions only related to logistic problems or related to hospital admissions or optimization of patient flows. As a use case or, well, as a simple example, we used Auto Model for our emergency department. But in contrast, only a few enterprises have the guts or the resources able to really dive into medical problems and provide solutions resulting in better care and outcomes. There are several explanations for that, with the data privacy, thus the lack of proper data, and the complexity of medical science repeatedly reported. The latter can only be solved by reducing the gap between the data science community and the medical community. This will also be my take-home message. Physicians might be weird people, but they’re essential to get them actively on board, on your team, when entering biomed‐related domains.
06:33 Luckily, the last decades, data becomes available. We have algorithms to identify medical records, and even with the European GDPR regulations and HIPAA rules, secondary analysis of electronic health records can be used for and can be seen, in fact, as analyzing the historical footprint of how medicine is performed. However, providing solutions able to demonstrate a medical action is causing an effect; thus, actually integrating causality in a model at any moment is quite a challenge. One of the reasons is the fact that the data, the patient’s state, changes when your intervention had any effect. Moreover, the genotype/phenotype divide has limited the practical value of genomic science in treating disease since people with the same genetic mutations experience different symptoms for the same disease. And in contrast, in some instances, experience no symptoms at all.
07:52 The medical literature has been quick to embrace big data, but out of data privacy laws, competition based on profit motives, a culture wary of innovations and collaboration, and disparate data representations continue to hinder efforts to truly benefit from the fruits of the big data revolution. An interesting database we have been using in several papers is a MIMIC database originally generated here in Boston in a joint effort between the Beth Israel Deaconess Hospital and MIT. Currently available for use on two platforms, Google and Amazon. Our first access and research on the MIMIC data set was dated already from 2015 using RapidMiner. I was invited to present this at MIT, which as a physician, I felt honored to present this to an ICT community. And recently, we could connect to the BigQuery cloud version of MIMIC‐III. Now, MIMIC‐IV is coming within a few weeks, as demonstrated in the following slides. Excuse me. Without any doubt, more to follow in the near future regarding this excess.
09:32 As a RapidMiner community member, I feel responsible to spread the word what RapidMiner is capable of. Example by using ensemble methods using the MIMIC database. The paper was quite popular. I don’t have so much points as number one here in the community, but my paper was quite successful with more than 8,000 views, and I find that very, very comforting‐‐ or that gave me a lot of positive influence. Let’s see. Besides data from electric health records, tons of data become available. And initiatives such as SEBI portal provide interesting tools with web APIs and interactive dashboards where genomics can be integrated in your RapidMiner research. I must admit, the dashboards they provide are state‐of‐the‐art and are very fast. I really invite you to take a look at them. We also tested whether we could access these web APIs from RapidMiner, and that was no problem using the Read URL operator. Next, the data could be easily extracted, and analytical processes could follow. However, the dimensionality adding genomics or the omics environment to the clinical environment, the dimensionality is enormous. And the changes of tumor cells, resistance for chemotherapy, changing over time, make the use for the Auto Model simulator not suited yet, although there’s a significant need to use simulators in order to predict therapeutic impact for the patient like mind. Meaning patient similarity is a topic which will only gain importance in the near future.
11:53 With the growing need for precision medicine, the next generation of electronic health records will need to support dynamic clinical data mining. In particular, the DCDM‐‐ so the dynamic clinical data mining‐‐ would enable examination of any single medical encounter within the context of similar encounters where similarity is defined by some metric for grouping, which is quite unknown, not actually. To illustrate the complexity of precision medicine, a recent paper suggests that the microbiome‐‐ so the germs in your belly‐‐ governs the cancer‐immune set point for cancer‐bearing individuals, and that manipulating the gut ecosystem circumvents primary resistance to therapy may become feasible.
12:55 Explainability and interpretability form a significant barrier to implement AI and ML tools when physicians need to be convinced that the investment in new technology leads to a higher standard of care, as we experienced using papers as these in meetings with my colleagues. There’s a reason to be optimistic, however. Across the globe, governments are partnering with universities and industry to build a machine learning roadmap or multiple roadmaps. Programs and events bring together clinicians, computer scientists, and engineers to create collaborative ecosystems that can leverage the power of data science. The explainability and interpretability become even more an issue as presented as just an example in a paper where they didn’t use RapidMiner, but it’s an ensemble method where they use LSTMs to predict and to show superiority in a framework to handle the diversity of clinical data. This is becoming very difficult to explain to physicians without any educational background in AI or machine learning. However, in order to induce some smooth adoption of AI/ML, I recently uploaded a webinar where I explained how Auto Model could be used based on medical data. I hope in the future to continue doing this. And finally, thank you for listening, and please, contact me. Collaborate, collaborate, collaborate. Any questions? [music]