Automation of feature engineering is a research topic that dates back to the 1990s. Machine learning software that incorporates
automated feature engineering has been commercially available since 2016. Related academic literature can be roughly separated into two types: • Multi-relational Decision Tree Learning (MRDTL) uses a supervised algorithm that is similar to a
decision tree. • Deep Feature Synthesis uses simpler methods.
Multi-relational Decision Tree Learning (MRDTL) Multi-relational Decision Tree Learning (MRDTL) extends traditional decision tree methods to
relational databases, handling complex data relationships across tables. It innovatively uses selection graphs as
decision nodes, refined systematically until a specific termination criterion is reached.
Open-source implementations There are a number of open-source libraries and tools that automate feature engineering on relational data and time series: •
featuretools is a
Python library for transforming time series and relational data into feature matrices for machine learning. •
MCMD: An open-source feature engineering algorithm for joint clustering of multiple datasets. •
getML community is an open source tool for automated feature engineering on time series and relational data. It is implemented in
C/
C++ with a Python interface. It evaluates the quality of the features using hypothesis testing. •
tsflex is an open source Python library for extracting features from time series data. Despite being 100% written in Python, it has been shown to be faster and more memory efficient than tsfresh, seglearn or tsfel. •
seglearn is an extension for multivariate, sequential time series data to the
scikit-learn Python library. •
tsfel is a Python package for feature extraction on time series data. •
kats is a Python toolkit for analyzing time series data.
Deep feature synthesis The deep feature synthesis (DFS) algorithm beat 615 of 906 human teams in a competition. == Feature stores ==