Updated: Dec 6, 2020
The convergence of Artificial Intelligence (AI) / Machine Learning (ML) and automation software development methodologies is accelerating the delivery of mission critical capabilities . By taking a factory approach to software development, Department of Defense (DoD) stakeholders are rapidly building, testing, and deploying software using automated pipelines. There are new sources of data, new types of application architectures (Microservices / Service Mesh), new platform capabilities (Serverless) to deliver Machine Learning and Data Applications.
These automated deployment methods facilitate the testing, evaluation, and categorization of performance metrics to provide faster insights through continuous feedback. The new application stacks support rich data processing such as voice to text, image recognition, natural language processing and Internet of things (IoT) systems. The figure below shows the three main user personas for the AI/ML platform: Data Scientists, Application Developers and DevSecOps Engineers, who will interact with the ML platform. The use cases for all three of these personas need to be addressed in the platform to deploy and manage AI/ML application development at scale.
The MLOps lifecycle begins with Data Set Development/Curation and Test Harness Development. Machine learning application development requires Dataset management which is prepared using MLOps pipelines. Figure below shows an MLOps application being deployed in stages. Notebooks can be stored in a version control repository like GitLab. Changes to the repository will trigger an action that compiles the Jupyter Notebook into a KubeFlow pipeline. Kubernetes supports the concept of init-containers which allows us to plug specific versions of machine learning models as containers. When we serve different versions of the machine learning models, this capability allows us to only change the model, not the serving program. These containers can then be uploaded to a registry for reuse in other pipelines.
The data scientists focus on model building algorithms, hyperparameter tuning and model validation, but they are not trained in building deployment pipelines. The DevSecOps teams focus on CI/CD deliver, but they are not data science experts. There is a need to bridge the gap between these personas by construction secure MLOps pipelines with custom interfaces to only expose the right models, data sets and containers based on Identity and Access Management (IdAM) controls that extend to the cloud. The automation of build and test on a Secure Processing platform needs to take into account Human Centric Design Principles to simplify the onboarding and handoffs for deploying applications faster.
The MLOps data lifecycle has an Ingest, Infer and Monitor flow as shown below. These stages need specific tooling to orchestrate and manage the application lifecycle beyond the initial deployment since real world data might drift from the test data. The curation of these diverse data sources involves the analysis to narrow down the features that will be critical to manage the development of AI models in a pragmatic manner. The data ingestion and access will be tightly controlled using Identity and Access Management tied to Role Based Access Controls (RBAC). Data governance includes controls at the data store level. Sensitive data should be handled with strict controls using capabilities like column and cell level masking and differential privacy. Decentralized data governance can be put in place using a combination of IdAM and Multi-level purpose-based controls at the resource level. There are techniques like homomorphic encryption that can further restrict the access and provide better privacy during model development.
The figure above shows the three stages of an AI/ML application. The left-hand side showcases the different data sources that will form the input to the models and applications. In some cases where the raw data is not available, other synthetic data generation tools can also be used to generate/hide the underlying data. These synthetic data generation requirements can be addressed using python libraries such as Gratel and SDGym. The middle layer shows the processing steps and the capabilities required to run the higher end services to provide consistent access to the App team.
The Applications teams will utilize the DevSecOps capabilities and end points provided by Platform One. The Infer stage in the middle of figure above shows the automation steps for technologies that can to be stitched together to build a solution. The ML Pipeline is built using KubeFlow which performs the data ingestion, data preparation, testing, model development, security testing and model serving. That process requires interaction with data sources that can be located on cloud storage and notebooks which would be stored in Git. The model development workspace on top allows data scientists to choose the framework of choice to allow them the capability to quickly test and tweak the AI algorithms.
The right side of the figure shows the data outputs from the process. The data processing and monitoring at this stage is critical to identify inference deviations from a security perspective to address adversarial and concept drift detection. This stage also includes higher level functions such as model inspection and interpretations which is critical to understand why the model predicted the outcomes that it did. This requires integration with tools like Seldon Alibi for model monitoring and interpretation of the most critical features that are impacting the outcomes. Custom dashboards can provide insights into the quality and the characteristics of the raw and curated data. Visualization dashboards and other custom applications can form the testing and monitoring layer required to make sure that the lifecycle of data and applications is managed in a comprehensive manner.
When multiple teams are looking to build AI/ML applications rapidly, a catalog of prebuilt pipelines and pipeline components is a critical requirement. By standardizing the types of tools and the compliance controls that need to be in place for ensuring security, Platform team members can address and onboard different types of applications at scale. Managing the lifecycle of these ML applications involves the automation of a different set of tooling and a new approach to pipeline construction.