1. Introduction 2. Methodology 3. AutoML Systems 4. Characteristics Requirements and Boundaries
4.1. Characteristics 4.2. Requirements and Boundaries
5. Performance 6. Functionalities
6.2. Data Preparation 6.1. Data Integration 6.3. Modeling 6.4. Deployment
7. Results 8. Practical Examples 9. Wrap Up 10. Bibliography

AutoML Benchmark in Production

Comparison and Analysis of Different AutoML Systems in the Production Domain.
AutoML Systems are tools that propose to automate the machine learning (ML) pipeline: integration, preparation, modeling and model deployment. Although all AutoML systems aim to facilitate the usage of ML in production, they may differ on how to accomplish this objective, approaching the ML pipeline in different levels. The purpose of this benchmark is to, using currently available AutoML systems in the market, evaluate how each system approaches the ML pipeline and help a user to choose which system to pick. 25 AutoML systems are presented in this benchmark, listed in chapter AutoML Systems. For a more objective evaluation approach, this research includes evaluations concerning different criteria of each system: Requirements and Boundaries, Performance, Data Integration, Data Preparation, Modeling, Deployment and Results.
Ending the benchmark, chapter Practical Examples presents what systems are recommended to four exemplified users.

1. Introduction to AutoML in Production

 In the pursuit of leveraging the potential of data, companies usually rely on Machine Learning (ML) technologies. Applications of ML range from detecting anomalies in the operation and non-specialized machines designed to learn by themselves, to the optimization of routing and demand forecasting. Its most common use cases are for knowledge extraction operations and as the core technology underlying the control of processes and products. Projects that rely on the application of ML are usually referred to as ML projects.

 However, the development of an ML project is not an automated task - much of it still relies on the expertise of data scientists and knowledge of the manufacturing process. Seeking a solution to this problem, AutoML systems are currently being developed, enabling a wide range of users to benefit from the valuable potential of data and facilitating the use of ML in real-world problems.

 In short, AutoML systems propose to automate the ML pipeline. Although this naming is useful to get a straightforward view of the main idea, it conceals several tasks, as depicted in Fig. 1.

AutoMLPipeline
Fig. 1. AutoML pipeline in the context of production

2. Methodology of the AutoML Benchmark

 This benchmark is divided into 3 steps: AutoML Systems, Evaluation and Usage.

3. AutoML Systems

 AutoML systems are evaluated based on publicly available data from the production domain. These are the systems included in this benchmark:

  • Hyperopt-sklearn1
  • Auto-sklearn2
  • TPOT3
  • H2O AutoML4
  • SAS5
  • MLBox6
  • Google AutoML7
  • Azure Machine Learning8
  • MLJar9
  • ATM10
  • Auto_ml11
  • Amazon SageMaker12
  • AutoKeras13
  • Feature Tools14
  • tsfresh15

 Some were included in the analysis, but were not evaluated in every criteria:

 The following ones were not included in the analysis:

4. Characteristics, Requirements and Boundaries

 The chapter Characteristics presents general information about each system and Requirements and Boundaries focuses on demonstrating how personal information of the user affects the choice of system.

4.1 Characteristics

 Each benchmarked system was tested in a specific version, as can be seen in the "Tested at" column of Table 1.

 Even though this research is focused on AutoML systems, not every evaluated system covers the whole AutoML Pipeline. This distinction is presented in the "AutoML" column of Table 1, where systems marked with a Yes cover a large portion of the Pipeline and the ones which do not are defined in plain text.

 Feature Tools and tsfresh automate only the Data Preparation step of the AutoML Pipeline. Nevertheless, they were kept in the analysis since they propose to automate the most time-consuming step of it.

4.2 Requirements and Boundaries

 The systems differ in user knowledge, hardware and software requirements and price, which may imply limitations for different users and use cases. Table 2 exhibits these limitations for each system.

 Most of the systems are free to use through a Python-based API and require little knowledge in the programming language. When the user has no experience in programming, cloud-based paid systems should be chosen, since these offer an interface easy to use.
A deep knowledge in Data Science is usually not required, but may impact on results depending on the use case.

5. Performance of the AutoML Systems

 In order to compare performances between models created by the AutoML systems, they were tested on an ML use case from production, where the following data from a CNC mill was used: CNC Mill Tool Wear data set . This is a classification problem, where the objective here is to predict the success of a test. The dataset was pre-processed before being tested by each system, having in the end 7586 instances and 50 dimensions. The results are presented in Table 3.

 A future version of this benchmark will further include the results generated with the SECOM data set.

 An overview of other publicly available data sets for production can be found at Fraunhofer IPT Application Fields and Free-Access Data Records .

 With state of the art results, high Accuracy, high F1 score and low Loss, all systems presented a more than acceptable model as a solution for this particular problem and a distinction between better and worse system is hard to establish here.
Performance also depends on the runtime, since running a system for more time can output better results.

6. Functionalities of the AutoML systems

 The systems presented here propose to automate the AutoML Pipeline, or parts of it. Having that in mind, this chapter of the benchmark will evaluate to which degree every step of the Pipeline is covered by each system.

6.1 Data Integration

 Data Integration aims to integrate data residing in different sources. In Table 4, data types accepted by each system are displayed, as well as if these automate the data integration process.

 No systems proposes to automate the Data Integration step, since adapting to the vast quantity of different kinds of data sources is not trivial. Regarding the data type, some systems are more limited and others are more inclusive, for example, Auto_ml and Google AutoML respectively.
Regarding a specific use case, when looking for what system to use, some systems can already be filtered out based on Table 4. For example, a user facing an audio classification problem and having audio files as input data, can pick Uber Ludwig, but not ATM.

6.2 Data Preparation

 After assembling the dataset in a way it can be utilized by AutoML Systems, it's time to prepare it for the Modeling phase. That means increasing the quality of the data (Data Preprocessing) and restructuring it in order to facilitate the extraction of knowledge by an ML algorithm (Feature Engineering). As can be seen in Table 5, each system was evaluated with respect to how deep this preparation step is explored.

 The exact approach taken for each of these methods is not specified in this benchmark, e.g. System X uses PCA for Data Reduction, therefore a Yes means some approach is used for the method and a No means no approach is taken. Nevertheless, the concept might be helpful when deciding which system to use for a certain use case. H2O AutoML might grant better results when dealing with unbalanced data sets, whereas other systems may be more effective for data sets with many dimensions, for example.

6.3 Modeling

 The Modeling phase requires a prepared dataset to output effective results. To that end, the systems might iterate through Data Preparation and Modeling multiple times before reaching the final result.
Some of the algorithms used for each system, as well as how they are selected, are shown in Table 6.

 Have in mind that some systems allow exporting specific algorithms, others just output the best model and the results. See Table 8 for more information. More visual results can be obtained by a system which provides graphs and other metrics. See column Diagnosis of table Table 6.

6.4 Deployment

 Ending the AutoML Pipeline, deployment aims to make the model available to users. For that, Table 7 answers the following questions:

  1. How can the model be deployed?
  2. How can the model be accessed by the user?
  3. Is training during production possible?

 There is not a standard way of exporting the generated model, as can be seen in the Productionize Model column of Table 7.
Regarding the Web Service, usually a REST API is provided, where the model can be accessed through an endpoint.

7: Results

Table 8 aims to show how the trained models results of each system are presented to the user.
Some systems also provide intermediary results - in case Modeling takes a long time, it can be interesting to have access to a model that has already fully trained, even if that is not the best one.

 As can be seen in the table, the results vary from outputting sometimes a list of models and their characteristics, and sometimes just the best model and evaluations. Consequently, the user is not always allowed to decide which model can be exported, and therefore run in production.
 Looking at the non-AutoML System, since Feature Tools and tsfresh propose to automate only the Data Preparation phase, their outputs are structured as a new dataset, containing new features and transformed old ones.

9: Practical Examples

 In order to simulate how the process of finding systems that fulfill a person's restrictions and requirements occurs, the Personas Cards were created.
 Each card represents a persona - an exemplified user. At the end of each card, the chosen systems are displayed, followed by the reasons they were designated for the specific persona and use case.

Paid systems, such as Google AutoML and Azure, can be used by the personas with no budget since these systems provide free trials, but it would not be a long term solution.
Recommended systems are not ordered from most recommended to less recommended.

 To test the systems, in case of security issues or non-availability of data, it is possible to use publicly available data sets from production. See Fraunhofer IPT Application Fields and Free-Access Data Records .

10. Wrap Up

 The past developments in the area of AutoML indicate that progress towards improved automation of specific steps within the AutoML Pipeline can be expected. Overall, a full automation of the whole AutoML Pipeline from Data Integration to Deployment is a concept that requires more research. In the near future, it can be expected that tasks such as Modeling will be automated enough so that ML models will be created with little to no ML knowledge. Semi-AutoML systems that support data scientists in other activities can be expected in the future as well.

11. Bibliography

  1. Komer B, Bergstra J, Eliasmith C (2019) Hyperopt-Sklearn. in Hutter F, Kotthoff L, Vanschoren J, (Eds.). Automated Machine Learning. Springer International Publishing. Cham, pp. 97–111. 

  2. Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2019) Auto-sklearn: Efficient and Robust Automated Machine Learning. in Hutter F, Kotthoff L, Vanschoren J, (Eds.). Automated Machine Learning. Springer International Publishing. Cham, pp. 113–134. 

  3. Olson RS, Moore JH (2019) TPOT: A Tree-Based Pipeline Optimization Tool for Automating Machine Learning. in Hutter F, Kotthoff L, Vanschoren J, (Eds.). Automated Machine Learning. Springer International Publishing. Cham, pp. 151–160. 

  4. H2O.ai. AutoML: Automatic Machine Learning — H2O 3.26.0.10 documentation. http://docs.h2o.ai/h2o/latest-stable/h2o-docs/automl.html. Accessed on 20.11.2019. 

  5. SAS Institute Inc. SAS Visual Data Mining and Machine Learning. https://www.sas.com/en_us/software/visual-data-mining-machine-learning.html. Accessed on 26.01.2020. 

  6. ARONIO DE ROMBLAY A. MLBox Documentation. https://mlbox.readthedocs.io/en/latest/index.html. Accessed on 20.11.2019. 

  7. Google Cloud. Best practices for creating training data | AutoML Tables Documentation | Google Cloud. https://cloud.google.com/automl-tables/docs/data-best-practices#tables-does. Accessed on 20.11.2019. 

  8. Microsoft Azure. Azure Machine Learning documentation. https://docs.microsoft.com/en-us/azure/machine-learning/. Accessed on 20.11.2019. 

  9. MLJAR. mljar-docs. https://docs.mljar.com/. Accessed on 06.02.2020. 

  10. Swearingen T, Drevo W, Cyphers B, Cuesta-Infante A, Ross A, Veeramachaneni K (2017 - 2017) ATM: A distributed, collaborative, scalable system for automated machine learning. 2017 IEEE International Conference on Big Data (Big Data). IEEE, pp. 151–162. 

  11. Parry P. auto_ml 0.1.0 documentation. https://auto-ml.readthedocs.io/en/latest/index.html. Accessed on 20.11.2019. 

  12. AWS. Amazon SageMaker - Developer Guide. https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-dg.pdf . Accessed on 20.11.2019. 

  13. Feurer M, Klein A, Eggensperger K, Springenberg JT, Blum M, Hutter F (2019) Auto-sklearn: Efficient and Robust Automated Machine Learning. in Hutter F, Kotthoff L, Vanschoren J, (Eds.). Automated Machine Learning. Springer International Publishing. Cham, pp. 113–134. 

  14. Feature Labs. Featuretools 0.12.0 documentation. https://docs.featuretools.com/en/stable/index.html. Accessed on 20.11.2019. 

  15. Christ M, Braun N, Neuffer J. tsfresh — tsfresh 0.12.0 documentation. https://tsfresh.readthedocs.io/en/latest/index.html. Accessed on 06.02.2020. 

  16. Uber Ludwig documentation. https://uber.github.io/ludwig/. Accessed on 27.05.2020.

  17. Salesforce.com, Inc. AutoML library for building modular, reusable, strongly typed machine learning workflows on Spark from Salesforce Engineering. https://transmogrif.ai/. Accessed on 26.01.2020. 

  18. The Automatic Statistician. https://link.springer.com/chapter/10.1007/978-3-030-05318-5_9. Accessed on 20.05.2020.

  19. Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2019) Auto-WEKA: Automatic Model Selection and Hyperparameter Optimization in WEKA. in Hutter F, Kotthoff L, Vanschoren J, (Eds.). Automated Machine Learning. Springer International Publishing. Cham, pp. 81–95. 

  20. SparkCognition (2019) From Data to Application: DARWINS UNIQUE APPROACH TO AUTOML. 

  21. DataRobot. https://www.datarobot.com/. Accessed on 12.05.2020.

  22. Devol. https://github.com/joeddav/devol. Accessed on 20.05.2020.

  23. ExploreKit. http://people.eecs.berkeley.edu/~dawnsong/papers/icdm-2016.pdf. Accessed on 20.05.2020.

  24. AutoML Zero. https://arxiv.org/pdf/2003.03384.pdf. Accessed on 12.05.2020.

  25. Hector Mendoza, Aaron Klein, Matthias Feurer, Jost Tobias Springenberg, Matthias Urban, Michael Burkart, Max Dippel, Marius Lindauer, Frank Hutter (2018) Towards Automatically-Tuned Deep Neural Networks: 7. in Hutter F, Kotthoff L, Vanschoren J, (Eds.). AutoML: Methods, Sytems, Challenges. Springer, pp. 141–156.