Open Source Software Simplifies and Streamlines Machine Learning

Introduction

For quite some time now, health care providers and hospitals have been investing huge amounts of time and energy into adopting electronic health care records, changing hurriedly scribbled doctors’ notes into robust sources of information. However, collating this data is less than half of the work. More effort and time is required to change these records into real insights — those that utilize the learnings of the past to make future decisions.

The software engineers and researchers at MIT’s Data to AI Lab (DAI Lab), have built a software system, named as Cardea to help with this process. By steering hospital data through an ever-rising set of Machine Learning (ML) models, the system could help hospitals in preparing for events as small as no-show appointments and as large such as global pandemics.

With Cardea, hospitals may finally be able to solve “various ML problems,” states Kalyan Veeramanchaneni, principal investigator of the DAI Lab and a principal research scientist in MIT’s Laboratory. As Cardea has an open-source framework and uses generalized techniques, hospitals can also share solutions with each other; thus, enabling teamwork and increasing transparency.

Automated for Humans

Cardea fits into a field termed automated ML, or AutoML. ML is currently being used for everything from drug development to credit card fraud detection. The objective of AutoML is to make these predictive tools easier for humans which also include the non-experts — to develop, use, and understand them, states Veeramachaneni.

AutoML systems such as Cardea do not require humans to design and code a complete ML model. These systems set up the existing models coupled with explanations of what the models do and how they work. Users can then utilize and integrate these modules to achieve their goals, such as going to a buffet instead of cooking a meal from scratch.

Cardea is currently designed to help with four types of resource-allocation queries. However, because the pipeline includes so many different models, the system can be easily be adapted to various other scenarios that may arise. As Cardea develops, the aim of the stakeholders is to be able to use it to solve different prediction problems in the healthcare sector.

Researchers have tested the accuracy of Cardea against users of a popular data science platform and have found that Cardea outperforms 90 percent of them. The team has also tested the system’s efficiency by asking data analysts to use Cardea to make predictions on a demo healthcare dataset. The results have shown that Cardea has improved their efficacy significantly.

Hospital workers are frequently asked to make vital decisions that have high stakes. Thus, it is essential that they put their trust into the tools they use along the way, which include Cardea. It is not enough for users to just key in some numbers, press a button, and then be displayed with an answer. “Users need to get a sense of the model and they need to know what exactly is going on,” states Dongyu Liu, a postdoc in LIDS.

Conclusion

To accomplish more transparency, Cardea’s next step is a model audit. Like all predictive apparatuses, ML models have both strengths and weaknesses. By laying these out, Cardea provides a user with the ability to choose whether to accept the results or to start all over again. The team also plans to build in more explanations and data visualizers and to provide an even understanding to make the software system more accessible to non-experts. The hope is for people to implement it and then start contributing towards it.