Machine learning landscape insights

How productive and engaged are teams working on products applying machine learning?That was a question we asked while talking about present product and tech opportunities. Is there anything we can help with our experience from building products and making dev teams productive? We were fascinated by achievements that machine learning made possible in the last years. But how mature is a tooling that helps teams build those incredible things when compared with a standard product development and DevOps practices?We interviewed over a dozen companies to learn what is the state of the art of the current machine learning projects and here we share the insights with you.Resonating topicsDefining the right goals and measures is trickyProblem framing and definition of objectives is surprisingly a tricky task to do: what do we actually optimize and how to measure it?E. g. what is our definition of “importance”? How can we be sure we don’t oversimplify the metric (like to evaluate just conversion but not retention)? How to ensure the learning data won’t introduce a feedback loop?While it definitely takes any company a while to learn this art, its not any special to machine learning.Orchestrate experiments’ evaluation and replicate older experimentsModel experiments’ orchestration and versioning is a choreMachine-learning scientists like experimenting with model network architectures as a creative task of applying their experiences, intuition and knowledge from research papers, but they hate chores related to pipeline orchestration.They don’t like a chore of managing versioning of all the code, model hyperparameters and data that must be kept together to replicate the past experiment. It’s easy to forget a small change that can later result in having hard times replicating older experiments and retraining models.E. g normalization in code might influence the model performance.Machine-learning scientists are not that comfortable with DevOps practices and version control management and even the specialized tools like DVC struggle to version the whole ML pipeline simply or efficiently enough and visualize what changes were made.The trend in model experimenting is to quickly prototype an initial model, compare it with a baseline, and then optimize it further iteratively and efficiently. The orchestration and versioning start to be harder when parallel models experiments are being conducted or when you iterate to improve an older model that is already up and running.AutoML is The Next Big ThingAutoML solutions that search for the best ML model for a given data by comparing dozens of alternatives automatically are getting traction. They might find a model that is good enough for the task in hours instead of weeks.That can accelerate work of machine-learning scientists or even come up with a model that is good enough for the task so the further fine-tuning of the last percent improvement might not even be worth the effort for a given business.E. g. H2O.ai claims their autoML is like a better-than-average ML scientist and Google bets on their own AutoML as well.ToolsJupyter Notebook, Google Colab, TernsorBoard, Deepnote, neptune.ai, MLFlow Tracking are being used for managing and comparing experiments. DVC addresses version control management of large data structures.AutoML solutions: so far we saw just H2O.ai, but Google/MS have autoML solutions as well and there are more like sas, data iku, DataRobot…CI/CD: Automated deployment of the whole ML pipelineWhen it comes to productization, machine-learning scientists struggle with deployment and orchestration to integrate the winning model into the company production infrastructure or runtime environment. They don’t feel as confident as developers in related tooling and DevOps processes.It’s common that ML scientists are not part of engineering teams that use their ML models in their app, because the teams’ dev cycles might quite differ. Thus the ML scientists either struggle with the DevOps practices or they hand over the model to engineering teams that make them losing touch with runtime behavior of the models and productivity is reduced.Whole ML pipeline deploymentEasy deployment orchestration of the whole ML pipeline on production would make teams more productive. A simpler deployment would help machine-learning scientists be more engaged and prevent future doubts on what was actually deployed (e.g. ensuring rules like linking the related experiment documentation or sandboxing).Customer deploymentsCompanies and agencies that adapt their core models to specific data of their customers need to maintain, version and distribute these adapted models to their customer implementations selectively.Agencies providing custom ML solutions to their clients struggle to deploy the models in their infrastructure. Due to siloes or different priorities across company departments, or when IT is not experienced in ML infrastructure enough, it can take weeks to build a ML pipeline and months to go through the client’s IT approval and change management processes and security reviews.ToolsSeldon, GoCD, Pachyderm, Kubeflow, MLflow, MS ML Studio, TensorFlow Extended (TFX)Feature engineering (and data cleaning)Time-consuming data cleaningCleaning initial data from multiple sources in various formats with duplicities and noise is the most time-consuming task (apart from labeling).It’s hard to automate because it’s necessary to understand the data semantics. What does the data actually mean? How is it related? Aren’t there any biases? Aren’t the series from multiple sources actually matching?Feature engineeringThe real data can be quite complex and contained in deep structures (e.g. JSONs) that are hard to encode into vectors suitable for model training.The process of denormalizing, deduplicating, debiasing, encoding, recoding, removing data leaks and separating data into characteristic features for machine learning is being done mostly manually now, except for H2O.ai, that provides an auto-feature-engineering (of pre-cleaned data).ToolsIt’s being done mostly manually with help of custom visualizations now. Auto feature-engineering (like the H2O.ai) might take over.Observability: debug, interpret and monitor model biasesWhat made a model decide how it decided? Strictly regulated businesses (like banking/health) are required to ensure and prove a model (e. g. a scoring model) is not biased both while training and on production. Model fairness might actually be more important than its precision for them.E. g. what inputs made the model suggest not to provide a loan?While training the model, they need to understand the training data distribution and uncover its biases (like feedback, training/testing dataset dependence, what is the model sensitive on and more).On production, they need to compare it with reference (did inputs change, isn’t the model drifting?).It can be important even in non-regulated business, e. g. to ensure that the Top-10 results are actually being updated in time.The caveat might be how to age the data when the inputs are changing in time.ToolsH2O.ai, tensorflow.org/tfx, ShapMonitor model performanceMonitor how a model performs on production and detect when it starts to underperform or drift is often underestimated.A basic solution might be to monitor just business KPIs but when a company has tens or hundreds of models deployed they need also to ensure that iterative improvements won’t degrade model performance or introduce blind spots. The caveat is to ensure the company does not evaluate a wrong metric.E. g. to find out a new kind of data that the model wasn’t trained on starts appearing.Regulated businesses (banking/health) need to go deeper and log reasons about what inputs made the model decide how it decided using a model observability/debuggability techniques.Standard analytics tools are typically being used for monitoring now.Annotate/label dataThere are myriads of tools and services to annotate, classify, rate or highlight zillions of data samples effortlessly, efficiently and accurately. Yet many companies built their own tools because especially for classification/ranking or rating it might be as complex as adapting an existing one and they want to have total control over the quality of the annotations and prevent annotators’ biases.Relationship with good annotators, their training and monitoring of their work is essential to the quality. The trend is to automate the work for easy cases and use the manual annotations only for the hard ones.Companies adapting models to customer data need to provide them with their own labeling/annotation tool to ensure it is easy to use for inexperienced operators in their context.ToolsWhile most of the companies we researched use their own tools, there are plenty of tools and services for that like Scale AI, appen, Hive, prodi.gy, Lionbridge.ai, Supervise.ly, Hasty.ai, datagym.ai, Tagtog, LightTag, Humans in the Loop to name just a few.Bonus: Transfer-learning can compensate lack of dataEspecially for agencies there is often not enough training data for a specific task (e. g. in a local language). The transfer-learning approach is to take pre-trained networks and retrain the top layers to adapt the output to the new use-case.E. g. it can be used to change the way a lexical analysis is evaluated in NLP.Researched companiesFine-tuning internal models: Avast, Google, O2, Roboauto, Seznam.cz, Socialbakers, ZuriAutoML framework: H2O.aiAdapting a vertical use-case to customers: DataVision, PEKAT VISION, ROSSUMCustom development for clients: Artin, DataSentics, Trask solutionsHow about you?This probe summarizes how the machine learning is being done in companies now. When we started to explore this area as ML outsiders we did not see how advanced but fragmented the whole market is and how many tools are already available yet not always ideal.So how about you? What does your company find challenging the most and why? Or where do you see a boundary between the data cleaning and feature engineering? Share in comments.Thank you, Petr Meissner & Roman PichlíkPhoto by Alexander SinnMachine learning landscape insights was originally published in dagblog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Web: dagblog
Aktualizace: 18.9.2024

projít na článek

Machine learning landscape insights

Let's meet at Vision Stuttgart 2022

Machine tool probing increases productivity for Australian machine shop

QC20-W ballbar system boosts confidence in pre-owned machine tool sales

KMTC promotes the quality standards of Chinese 5-axis CNC machine tools

Why I’m learning to code