NannyML: Revolutionizing Post-Deployment Model Performance Monitoring

Introduction to NannyML

NannyML is an open-source Python library designed to estimate post-deployment model performance without needing access to targets. It enables data scientists to detect data drift and intelligently link alerts back to changes in model performance. With its user-friendly interface and interactive visualizations, NannyML is completely model-agnostic and supports all tabular use cases, including classification and regression.

Key Features of NannyML

Performance Estimation: Utilize the confidence-based performance estimation (CBPE) and direct loss estimation (DLE) algorithms to estimate model performance metrics such as ROC AUC and RMSE.
Data Drift Detection: Detect multivariate feature drift using PCA-based data reconstruction and univariate feature drift through statistical tests.
Intelligent Alerting: Reduce alert fatigue by linking data drift alerts to performance drops, ensuring that data scientists react only when necessary.
Easy Setup: NannyML can be easily integrated into any environment, allowing for seamless monitoring of machine learning models.

Technical Architecture and Implementation

NannyML is built on a robust architecture that leverages novel algorithms for performance estimation and data drift detection. The library is designed to be model-agnostic, meaning it can be applied to any machine learning model without requiring modifications. The core algorithms include:

Confidence-Based Performance Estimation (CBPE): Estimates model performance based on historical data.
Direct Loss Estimation (DLE): Provides a direct estimation of loss for regression tasks.
PCA-Based Data Reconstruction: Detects multivariate feature drift effectively.

Installation Process

To install NannyML, you can use either pip or conda. Here are the commands:

pip install nannyml

conda install -c conda-forge nannyml

For Docker users, you can run:

docker -v /local/config/dir/:/config/ run nannyml/nannyml nml run

Usage Examples

Here’s a quick start example demonstrating how to use NannyML for performance estimation:

import nannyml as nml
import pandas as pd

# Load real-world data:
reference_df, analysis_df, _ = nml.load_us_census_ma_employment_data()

# Initialize estimator:
estimator = nml.CBPE(
    problem_type='classification_binary',
    y_pred_proba='predicted_probability',
    y_pred='prediction',
    y_true='employed',
    metrics=['roc_auc'],
)

# Fit estimator and estimate performance:
estimator = estimator.fit(reference_df)
estimated_performance = estimator.estimate(analysis_df)

# Show results:
figure = estimated_performance.plot()
figure.show()

Community and Contribution

NannyML is a community-driven project, and contributions are welcome! You can propose new features or report bugs on the GitHub Issues page. Join our Community Slack to connect with other users and contributors.

License and Legal Considerations

NannyML is distributed under the Apache License Version 2.0. You can find the complete license details in the repository.

Project Roadmap and Future Plans

The NannyML team is continuously working on enhancing the library. Check out our roadmap for upcoming features and improvements.

Conclusion

NannyML is a powerful tool for data scientists looking to maintain visibility and trust in their deployed machine learning models. By providing robust performance monitoring and data drift detection capabilities, it empowers users to react proactively to model performance issues.

Learn More

For more information, visit the official NannyML Website or check out the Documentation. here is the NannyML Github repository

FAQ Section

What is NannyML?

NannyML is an open-source Python library for estimating post-deployment model performance and detecting data drift.

How can I install NannyML?

You can install NannyML using pip or conda. For pip, use pip install nannyml.

Where can I find the documentation?

The documentation is available at Read the Docs.