MarketMulti-task learning
Company Profile

Multi-task learning

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Inherently, Multi-task learning is a multi-objective optimization problem having trade-offs between different tasks. Early versions of MTL were called "hints".

Methods
The key challenge in multi-task learning, is how to combine learning signals from multiple tasks into a single model. This may strongly depend on how well different task agree with each other, or contradict each other. There are several ways to address this challenge: Task grouping and overlap Within the MTL paradigm, information can be shared across some or all of the tasks. Depending on the structure of task relatedness, one may want to share information selectively across the tasks. For example, tasks may be grouped or exist in a hierarchy, or be related according to some general metric. Suppose, as developed more formally below, that the parameter vector modeling each task is a linear combination of some underlying basis. Similarity in terms of this basis can indicate the relatedness of the tasks. For example, with sparsity, overlap of nonzero coefficients across tasks indicates commonality. A task grouping then corresponds to those tasks lying in a subspace generated by some subset of basis elements, where tasks in different groups may be disjoint or overlap arbitrarily in terms of their bases. Task relatedness can be imposed a priori or learned from the data. Hierarchical task relatedness can also be exploited implicitly without assuming a priori knowledge or learning relations explicitly. For example, the explicit learning of sample relevance across tasks can be done to guarantee the effectiveness of joint learning across multiple domains. and combine tasks efficiently. Transfer of knowledge Related to multi-task learning is the concept of knowledge transfer. Whereas traditional multi-task learning implies that a shared representation is developed concurrently across tasks, transfer of knowledge implies a sequentially shared representation. Large scale machine learning projects such as the deep convolutional neural network GoogLeNet, an image-based object classifier, can develop robust representations which may be useful to further algorithms learning related tasks. For example, the pre-trained model can be used as a feature extractor to perform pre-processing for another learning algorithm. Or the pre-trained model can be used to initialize a model with similar architecture which is then fine-tuned to learn a different classification task. Multiple non-stationary tasks Traditionally Multi-task learning and transfer of knowledge are applied to stationary learning settings. Their extension to non-stationary environments is termed Group online adaptive learning (GOAL). Sharing information could be particularly useful if learners operate in continuously changing environments, because a learner could benefit from previous experience of another learner to quickly adapt to their new environment. Such group-adaptive learning has numerous applications, from predicting financial time-series, through content recommendation systems, to visual understanding for adaptive autonomous agents. Multi-task optimization Multi-task optimization focuses on solving optimizing the whole process. The paradigm has been inspired by the well-established concepts of transfer learning and multi-task learning in predictive analytics. The key motivation behind multi-task optimization is that if optimization tasks are related to each other in terms of their optimal solutions or the general characteristics of their function landscapes, the search progress can be transferred to substantially accelerate the search on the other. The success of the paradigm is not necessarily limited to one-way knowledge transfers from simpler to more complex tasks. In practice an attempt is to intentionally solve a more difficult task that may unintentionally solve several smaller problems. There is a direct relationship between multitask optimization and multi-objective optimization. In some cases, the simultaneous training of seemingly related tasks may hinder performance compared to single-task models. Commonly, MTL models employ task-specific modules on top of a joint feature representation obtained using a shared module. Since this joint representation must capture useful features across all tasks, MTL may hinder individual task performance if the different tasks seek conflicting representation, i.e., the gradients of different tasks point to opposing directions or differ significantly in magnitude. This phenomenon is commonly referred to as negative transfer. To mitigate this issue, various MTL optimization methods have been proposed. It has been reported that meta-knowledge transfer could help avoid negative transfer.Besides, the per-task gradients are combined into a joint update direction through various aggregation algorithms or heuristics. There are several common approaches for multi-task optimization: Bayesian optimization, evolutionary computation, and approaches based on Game theory. The method builds a multi-task Gaussian process model on the data originating from different searches progressing in tandem. The captured inter-task dependencies are thereafter utilized to better inform the subsequent sampling of candidate solutions in respective search spaces. Evolutionary multi-tasking Evolutionary multi-tasking has been explored as a means of exploiting the implicit parallelism of population-based search algorithms to simultaneously progress multiple distinct optimization tasks. By mapping all tasks to a unified search space, the evolving population of candidate solutions can harness the hidden relationships between them through continuous genetic transfer. This is induced when solutions associated with different tasks crossover. Recently, modes of knowledge transfer that are different from direct solution crossover have been explored. Game-theoretic optimization Game-theoretic approaches to multi-task optimization propose to view the optimization problem as a game, where each task is a player. All players compete through the reward matrix of the game, and try to reach a solution that satisfies all players (all tasks). This view provide insight about how to build efficient algorithms based on gradient descent optimization (GD), which is particularly important for training deep neural networks. In GD for MTL, the problem is that each task provides its own loss, and it is not clear how to combine all losses and create a single unified gradient, leading to several different aggregation strategies. This aggregation problem can be solved by defining a game matrix where the reward of each player is the agreement of its own gradient with the common gradient, and then setting the common gradient to be the Nash Cooperative bargaining of that system. == Applications ==
Applications
Algorithms for multi-task optimization span a wide array of real-world applications. Recent studies highlight the potential for speed-ups in the optimization of engineering design parameters by conducting related designs jointly in a multi-task manner. In addition, the concept of multi-tasking has led to advances in automatic hyperparameter optimization of machine learning models and ensemble learning. Applications have also been reported in cloud computing, with future developments geared towards cloud-based on-demand optimization services that can cater to multiple customers simultaneously. Recent work has additionally shown applications in chemistry. In addition, some recent works have applied multi-task optimization algorithms in industrial manufacturing. == Mathematics ==
Mathematics
Reproducing Hilbert space of vector valued functions (RKHSvv) The MTL problem can be cast within the context of RKHSvv (a complete inner product space of vector-valued functions equipped with a reproducing kernel). In particular, recent focus has been on cases where task structure can be identified via a separable kernel, described below. The presentation here derives from Ciliberto et al., 2015. suggested setting F as the Frobenius norm \sqrt{tr(A^\top A)}. They optimized directly using block coordinate descent, not accounting for difficulties at the boundary of \mathbb R^{n\times T} \times S_+^T. Clustered tasks learning - Jacob et al suggested to learn A in the setting where T tasks are organized in R disjoint clusters. In this case let E\in \{0,1\}^{T\times R} be the matrix with E_{t,r}=\mathbb I (\text{task }t\in \text{group }r). Setting M = I - E^\dagger E^T, and U = \frac 1 T \mathbf{11}^\top , the task matrix A^\dagger can be parameterized as a function of M : A^\dagger(M) = \epsilon _M U+\epsilon_B (M-U)+\epsilon (I-M) , with terms that penalize the average, between clusters variance and within clusters variance respectively of the task predictions. M is not convex, but there is a convex relaxation \mathcal S_c = \{M\in S_+^T:I-M\in S_+^T \land tr(M) = r \} . In this formulation, F(A)=\mathbb I(A(M)\in \{A:M\in \mathcal S_C\}) . Generalizations Non-convex penalties - Penalties can be constructed such that A is constrained to be a graph Laplacian, or that A has low rank factorization. However these penalties are not convex, and the analysis of the barrier method proposed by Ciliberto et al. does not go through in these cases. Non-separable kernels - Separable kernels are limited, in particular they do not account for structures in the interaction space between the input and output domains jointly. Future work is needed to develop models for these kernels. ==Software package==
Software package
A Matlab package called Multi-Task Learning via StructurAl Regularization (MALSAR) implements the following multi-task learning algorithms: Mean-Regularized Multi-Task Learning, Multi-Task Learning with Joint Feature Selection, Robust Multi-Task Feature Learning, Trace-Norm Regularized Multi-Task Learning, Alternating Structural Optimization, Incoherent Low-Rank and Sparse Learning, Robust Low-Rank Multi-Task Learning, Clustered Multi-Task Learning, Multi-Task Learning with Graph Structures. == Literature ==
Literature
• Multi-Target Prediction: A Unifying View on Problems and Methods Willem Waegeman, Krzysztof Dembczynski, Eyke Huellermeier https://arxiv.org/abs/1809.02352v1 ==See also==
tickerdossier.comtickerdossier.substack.com