標籤:

Some Comments on Gradient-Free Optimization

1. 慢

OpenAI: Evolution Strategies as a Scalable Alternative to Reinforcement Learning

Blog- Evolution Strategies as a Scalable Alternative to Reinforcement Learning

In our preliminary experiments we found that using ES to estimate the gradient on the MNIST digit recognition task can be as much as 1,000 times slower than using backpropagation. It is only in RL settings, where one has to estimate the gradient of the expected reward by sampling, where ES becomes competitive.

Comments by Y. LeCun twitter.com/soumithchin

「….Gradient-Based optimization is immensely more efficient than black box optimization…..black box optimization should be a last resort, when there absolutely no way to use Gradient-Based opt. Also, no need to backprop gradients is a deficiency, not a feature……this blog post would be unnecessary if people knew the difference between RL and black box optimization」

twitter.com/hardmaru/st

It seems many senior deep learning researchers have a deep bias against the use of evolutionary computing methods.

2. 很慢

Random Gradient-Free Minimization of Convex Functions

Our experiments confirm the following conclusion. If the computation of the gradient is feasible, then the cost of the iteration of random methods, and the cost of iteration of the gradients methods are the same. In this situation, the total time spent by the random methods is typically in O(n) times bigger than the time of the gradient schemes. Hence, the random gradient-free methods should be used only if creation of the code for computing the gradient is too costly or just impossible.

3. 卡

Differential Evolution algorithms applied to Neural Network training suffer from stagnation

Large number of population-based Differential Evolution algorithms has been proposed in the literature. Their good performance is often reported for benchmark problems. However, when applied to Neural Networks training for regression, these methods usually perform poorer than classical Levenberg–Marquardt algorithm. The major aim of the present paper is to clarify, why? In this research, in which Neural Networks are used for a real-world regression problem, it is empirically shown that various Differential Evolution algorithms are falling into stagnation during Neural Network training. It means that after some time the individuals stop improving, or improve very occasionally, although the population diversity remains high.

4. 很小

Linear Convergence of Comparison-based Step-size Adaptive Randomized Search via Stability of Markov Chains

Comparison based adaptive evolution strategies converge linearly. The convergence rate is inverse proportional to the dimension of search space  lim_{t 
ightarrow infty} frac{|x_{t + 1}-x^* |}{|x_t -x^*|} = expleft(- frac{c}{n}
ight) .

5. 你以為重要的,其實是扯淡

Theory of Evolution Strategies: A New Perspective

We believe that global convergence is rather meaningless in practice. The pure random search converge with probability one to the global optimum of functions belonging to a broad class, where the main assumption is that a neighborhood of the global optimum should be reachable by the search distribution with a positive probability. However, the convergence rate is sub-linear with degree 1/n, therefore, the running time is proportional to (1/epsilon)^n. Even for moderate dimension, e.g., n=10, this is prohibitively slow in practice.

「Fast convergence may be more important than global yet slow convergence, especially when considering the restart strategies.」

Mike Powell

On the other hand, he (M. Powell) is probably partly responsible for the sceptical view that the classical optimization community always has held of random search methods, be they inspired by evolutionary analogues or not. He (M. Powell) believed that local optimality is all one can get in the nonconvex case.

6. 走向玄學,最後的遮羞布

Continuous lunches are free!

Continuous Lunches Are Free Plus the Design of Optimal Optimization Algorithms

No-Free-Lunch theorems in the continuum

Teoremes de No Free Lunch en el continu i el model del Pont Brownià en optimització

This paper investigates extensions of No Free Lunch (NFL) theorems to countably infinite and uncountable infinite domains. The original NFL due to Wolpert and Macready states that all search heuristics have the same performance when averaged over the uniform distribution over all possible functions. For infinite domains the extension of the concept of distribution over all possible functions involves measurability issues and stochastic process theory. For countably infinite domains, we prove that the natural extension of NFL theorems does not hold, but that a weaker form of NFL does hold, by stating the existence of non-trivial distributions of fitness leading to equal performance for all search heuristics. Our main result is that for continuous domains, NFL does not hold.

On the non-separable functions of our small test function set, s-PSO (standard PSO) is consistently outperformed by the CMA-ES. Our benchmark functions were specifically selected to test the behavior on non-separable, ill-conditioned problems, where dependencies between the design parameters play a decisive role. One might argue that this implies s-PSO must be better on other benchmark functions or on real world problems as the no free lunch (NFL) theorem seems to implicate [24, 25]. This would render experimental results fairly meaningless for practical implications and would indeed be true, if the necessary conditions for NFL would hold, namely, if the set of all considered functions were closed under permutation [26]. Fortunately, we do not have much reason to believe that this condition holds on any interesting set of test and/or real world problems (including all interesting or all ever considered problems)[27, 28]. Additionally, in continuous domain, NFL seems not to be available at all [29]. After all, NFL theorems can hardly have a grave impact on the relevance of such empirical findings.

Reinterpreting no free lunch

[1507.00581] Continuous Lunches Are Not Free

Stemming from a paper of Auger and Teytaud, there is a common misconception that for continuous domains No Free Lunch (NFL) does not hold. However, Rowe, Vose, and Wright have demonstrated that NFL holds for arbitrary domains and co-domains. This paper resolves the apparent contradiction.

推薦閱讀:

廣義線性模型與邏輯回歸
report of learning optimization
指導人生演算法之最佳停時問題
(無約束優化問題)最優化方法理論總結1
DL中的優化演算法與實現

TAG:筆記 | 最優化 |