Welcome to the second entry in Verto’s three-part blog series on AI in healthcare with Verto’s Director of AI & Technology Transformation, Martin Persaud. Last time out, Martin kicked off the series with a blog designed to provide a brief history of AI and demystify some of the terms commonly encountered relating to the subject.

For this instalment, we will be diving into the importance of the emerging field of Machine Learning Operations (“MLOps”) as the essential foundation for any trusted, scalable, and effective AI application in healthcare. Just as a forewarning, this blog will be a bit more technical than our previous entry but we will do our best to make it as accessible as possible for a general audience. And just like last time, there will be a list of resources at the end for those keen to learn more.

Negative RO(A)I

It’s time someone addressed the elephant in the room when it comes to AI: it’s really freakin’ expensive. While much of the promise of AI is that it can help us automate tasks and save humans time and money, the reality is that the vast majority of AI applications today are more expensive than their human equivalent, as per a recent MIT study.

The question then becomes: how can organizations ensure that their AI teams don’t just become cost centers? How can AI be implemented to meet current business objectives despite its novelty in real-world settings? To answer these questions, it’s worth looking at the reasons why AI has failed in the past.

When an organization first introduces AI, talented individuals typically start by building models in silos without the necessary understanding, processes, and tools to feasibly deploy solutions in the real world. Below are a few key considerations that are generally missed right out of the gate:

  • Emphasis on clean, modular code
    • Code should be formatted to allow for it to be applied across all stages of a typical development cycle (development, staging, and production).
  • Creating proper infrastructure
    • Ensure there is easy access to models and the appropriate pre-processing steps in a production environment.
  • The importance of inference time
    • Inference time is how long a model takes to process new data, make a prediction, and generate the expected outcome. Doctors and clinicians in healthcare settings often require near real-time predictions, while users in a research setting usually have less demanding timelines.
  • Controls and versioning management
    • Required to drive data hygiene and enable traceability across a model repository.
  • Constant monitoring
    • Tools and controls should be in place to monitor the effectiveness of a model over time and to manage (or even better, avoid) model drift.

    MLOps to the Rescue

    MLOps can create the foundation to address and solve these problems. In a nutshell, MLOps borrows from the principles of traditional DevOps–which combines development and operations to increase the efficiency, speed, and security of software development and delivery compared to conventional processes. Agile software development lifecycles create a competitive advantage for businesses and their customers. MLOps aims to provide the same benefits for AI development. It’s a foundation of principles that drive iterative, efficient development of machine learning models that can be deployed in production. Nowhere is solving for these issues more important than in healthcare for the following reasons:

    Data Heterogeneity

    Data is king for any application of AI, yet the current state of healthcare data makes it challenging to surface the right type of data. While there are efforts across the globe to standardize healthcare, clinical workflows and terminology remain heavily localized. This in turn can make it near-impossible to amass large-scale, clean data.

    Even if the healthcare world had successfully standardized, the industry was one of the last to move toward digitization, something that’s still in process today. Therefore, there is a significant investment in using historical health data commonly found in document-based formats and free-text. As a result, data collection, transformation, and management across models will be integral to building AI-based solutions efficiently.

    Emphasis on Real-Time Interaction

    Healthcare is highly interactive and inherently service-oriented. As a result, slow inference times in healthcare AI applications would limit adoption in the real world. If we create architecture and testing to reduce inference time right out of the gate, we can save precious development time and ensure the overall success of the solution in the long run.

    Interpretability & Traceability

    Your average personal user of an AI application, like ChatGPT, is only interested in the answer to their prompt; they typically aren’t too interested in the algorithms being used to generate the response. Recommender systems for streaming services are another example–typically, consumers don’t really care how the algorithm comes to a recommendation. Simply put, the general population is primarily concerned with whether or not these recommendations provide the content we want to consume.

    However, when using AI in a healthcare context, an integrator must understand the decision path of automation integrations to build trust and controls around any system designed to dispense healthcare. MLOps can serve as a tool to track data transformations, configurations, and layers (including their hyperparameters), making it easier to build interpretable AI systems.

    Continual Learning

    AI solutions start becoming stale the minute we deploy them. This is especially true with an initial go-live, where we inevitably come across data and patterns not reflected in the limited training data. To mitigate the pervasiveness of data model drift (poorer performance over time), we should closely monitor at the point of go-live. Since an increased level of idiosyncrasies is present in healthcare data, the risk of model drift is all the higher in this domain.

    Use cases in healthcare will continue to evolve, so models will be forced to adapt to new, related tasks or environments continuously. Managing the multiple versions of new data sets, models, pre-processing steps, etc. can quickly become overwhelming in these real-world scenarios, making MLOps a critical backstop against unruly data.

    The MLOps Cycle

    The above image depicts a non-exhaustive cycle that AI perpetually goes through. Let’s quickly break down each stage.

    Exploratory Data Analysis (EDA)

    This is where we discover and explore new data to (i) determine whether a data set is appropriate for the specific training task, (ii) plan for any feature engineering that would be appropriate, and (iii) identify the pre-processing steps required for training. Key outputs include visualizations and profiling that other team members can easily understand.

    Data Prep

    The continuous aggregation and transformation of data to prepare for training. This includes the task of feature engineering (e.g., new columns, derived values, etc.) and storage of those features for easy access across team members.

    (Re)Training

    Train and fine-tune specific models based on the task and the data available. There are many tools available to automate and simplify the training process. For instance, HuggingFace simplifies modelling for natural language processing tasks with pre-trained “tokenizers” or models and pipeline tools

    Review

    Continuously evaluate models, manage versioning, and create data/model lineage across the pipeline to determine the appropriate architecture for deployment. There are many open-source tools, such as MLFlow, to track this process and manage artifacts that can be reviewed and reused in the future.

    Inference

    Manage the models to be deployed in production and ensure a workflow exists for continuously maintaining and enhancing production pipelines. There are many well-known tools, such as Airflow, to orchestrate each step through to inferencing.

    Monitor

    Store and manage models (i.e. from a model registry) that can be easily accessed (e.g. using FastAPI) and return evaluation and monitoring reports for continuous assessment.

    Key MLOps Responsibilities

    From the point that the data is usable, the AI team is able to execute the rest of the cycle. However, if done without the appropriate processes in place, it is very easy for things to go off the rails and for costs to balloon. Below are only a handful of items MLOps teams must dynamically manage, track, and version across the pipeline to land on the right model for a problem.

    Run Management

    A ‘run’ consists of a single (or multiple) blocks of code using inputs and producing outputs that are used either in a subsequent step or the final task itself. Simply testing without organized runs makes it difficult to (i) understand computational resource usage and manage costs, (ii) run evaluation metrics on specific tasks, and (iii) identify specific components that were aggregated to achieve certain results.

    Components of Pipeline

    Because even a single task may require the deployment of multiple models, pipeline components must be readily available for testing and orchestration within either a single run or multiple runs. A strong MLOps practice will ensure these components are modular, well understood, and easily accessible where necessary throughout a pipeline.

    Versioned Data

    The best practice is to version data based on the use case and algorithm requirements. At Verto, we independently test multiple algorithms and components of the pipeline by managing various versions of the same data. Organizing our data sets by version and maintaining data lineage is paramount so we can accurately document pipelines and our choices.

    Hyperparameters

    These are parameters in which values are set before the machine learning process begins. The fine-tuning of hyperparameters is a critical step for developing any AI model. Complex models can have many hyperparameters with an even larger number of combinations that must be validated and tested. For this reason, one of the key responsibilities of MLOps teams is ensuring hyperparameters are consistently logged and easily accessible for testing.

    Model Versioning and Access

    Each time models are trained, validated, and tested, they must be saved and tracked. These models will have their own expected inputs, outputs, and tasks. Sometimes, versions can be based simply on the data on which they were trained (i.e. one model version for one particular data set). As a practice grows, ensuring the ability to easily access and manage models in a registry for a given task will be vital to successful use cases.

    Monitoring and Continual Learning

    Simply deploying a model doesn’t mean the job is done. We must also build the tools and mechanisms to consistently track performance and detect data drift before it becomes pervasive.

    MLOps and Healthcare

    So, what does all of this mean for healthcare? While we can’t speak for the entire industry, Verto is working toward solutions that drive the utility of healthcare data for a wide breadth of users across a variety of different systems and use cases.

    Using an MLOps framework helps our technology aggregate data across heterogeneous sources and enrich this data from multiple angles–be it patient record consolidation, transformation to a provenance-rich time series data set, a drive toward syntactic or data model standardization, or semantic standardization.

    Achieving this level of aggregation and insight has historically required brute force to transform such widely varying heterogeneous data, but that doesn’t have to be the case. Verto’s solutions are making significant strides toward this level of standardization and aggregation with a novel approach that leverages AI technology as a critical component.

    A big part of that approach has become the use of Large Language Models (LLMs) to manage the fact that a large proportion of healthcare data is in free-text/document format. However, LLMs are complex, ever-growing, and require even more rigour within the MLOps practice. Frankly, “LLM-Ops” is specific enough to be its own topic.

    On top of these complexities is the fact that even simple tasks, such as obtaining healthcare data and creating the appropriate data mapping labels, can quickly become expensive. As a result, MLOps has been a key factor in managing complexity and cost and is playing a more and more significant role in our AI development efforts.

    The Foundation for the Future

    Without the appropriate strategy and use of MLOps, our team would quickly drown in multiple layers of decision-making around pipeline components, hyper-components, model selection, data set creation… the list goes on. We were lucky enough to adopt some of these early MLOps trends and set the foundation for a future of sustainable development.


    Thanks so much for reading the second entry in our AI series with Verto’s Director of AI and Technology Transformation, Martin Persaud. Stay tuned for the series’ conclusion, where we sit down and discuss some of the pressing issues of the day regarding AI and healthcare with Martin.

    Did this blog pique your interest? Check out the following links to learn more about the ever-evolving world of  MLOps.

    For those that are just eager to learn more: https://www.databricks.com/glossary/mlops

    For those that are interested in learning more about MLOps for LLMs: https://www.databricks.com/glossary/llmops

    For those that are interested in learning more about MLOps in the healthcare context: https://www.researchgate.net/publication/369540862_MLHOps_Machine_Learning_for_Healthcare_Operations