神经网络数学_神经网络与纯数学之间的联系

阅读量：2518 次

发布时间：2019-05-11

本文共 8042 字，大约阅读时间需要 26 分钟。

神经网络数学

by Marco Tavora

由Marco Tavora

神经网络与纯数学之间的联系 (Connections between Neural Networks and Pure Mathematics)

深奥定理如何提供有关人工神经网络功能的重要线索 (How an esoteric theorem gives important clues about the power of Artificial Neural Networks)

Nowadays, artificial intelligence is present in almost every part of our lives. Smartphones, social media feeds, recommendation engines, online ad networks, and navigation tools are examples of AI-based applications that affect us on a daily basis.

如今，人工智能几乎存在于我们生活的每个部分。智能手机，社交媒体供稿，推荐引擎，在线广告网络和导航工具是每天都会影响我们的基于AI的应用程序的示例。

Deep learning has been systematically improving the state of the art in areas such as speech recognition, autonomous driving, machine translation, and visual object recognition. However, the reasons why deep learning works so spectacularly well are not yet fully understood.

深度学习已在语音识别，自动驾驶，机器翻译和视觉对象识别等领域系统地改善了现有技术。然而，深度学习之所以如此出色的原因尚不完全清楚。

数学提示 (Hints from Mathematics)

, one of the fathers of quantum mechanics and arguably the greatest English physicist since , once remarked that progress in physics using the “” would

( 是量子力学之父之一，并且可以说是自以来最伟大的英国物理学家。曾经指出，使用“ ”在物理学上的进步将

“…enable[s] one to infer results about experiments that have not been performed. There is no logical reason why the […] method should be possible at all, but one has found in practice that it does work and meets with reasonable success. This must be ascribed to some mathematical quality in Nature, a quality which the casual observer of Nature would not suspect, but which nevertheless plays an important role in Nature’s scheme.”

“……使人们能够推断出尚未进行的实验的结果。完全没有可能使用[…]方法的逻辑原因，但是在实践中发现它确实有效并且取得了一定的成功。这必须归因于自然界中的一些数学性质，自然界的随便观察者不会怀疑这种性质，但它在自然界的计划中仍起着重要作用。”

— Paul Dirac, 1939

保罗·狄拉克(Paul Dirac)，1939年

There are many examples in history where purely abstract mathematical concepts eventually led to powerful applications way beyond the context in which they were developed. This article is about one of those examples.

历史上有许多例子，纯抽象的数学概念最终导致了超出其发展背景的强大应用方式。本文是有关这些示例之一。

Though I’ve been working with machine learning for a few years now, I’m a by training, and I have a soft spot for pure mathematics. Lately, I have been particularly interested in the connections between deep learning, pure mathematics, and physics.

尽管我从事机器学习已经有几年了，但我是一名经过训练的，并且对纯数学也情有独钟。最近，我对深度学习，纯数学和物理学之间的联系特别感兴趣。

This article provides examples of powerful techniques from a branch of mathematics called . My goal is to use rigorous mathematical results to try to “justify”, at least in some respects, why deep learning methods work so surprisingly well.

本文提供了数学分支中称为的强大技术的示例。我的目标是至少在某些方面使用严格的数学结果来尝试“证明”深度学习方法为何如此出色地起作用。

一个美丽的定理 (A Beautiful Theorem)

In this section, I will argue that one of the reasons why artificial neural networks are so powerful is intimately related to the mathematical form of the output of its neurons.

在本节中，我将指出，人工神经网络如此强大的原因之一与其神经元输出的数学形式密切相关。

I will justify this bold claim using a celebrated theorem originally proved by two Russian mathematicians in the late 50s, the so-called .

我将使用一个著名的定理，即两位，最初由50年代末的两位俄罗斯数学家证明，证明了这一大胆的主张。

希尔伯特的第13个问题 (Hilbert’s 13th problem)

In 1900, , one of the most influential mathematicians of the 20th century, presented a famous that effectively set the course of the 20th-century mathematics research.

1900年， ( 是20世纪最有影响力的数学家之一，他提出了一系列著名，这些有效地确立了20世纪数学研究的进程。

The Kolmogorov–Arnold representation theorem is related to one of the celebrated , all of which hugely influenced 20th-century mathematics.

Kolmogorov-Arnold表示定理与之一有关，所有这些都极大地影响了20世纪的数学。

与神经网络建立联系 (Closing in on the connection with neural networks)

A generalization of one of these problems, the problem specifically, considers the possibility that a function of n variables can be expressed as a combination of sums and compositions of just two functions of a single variable which are denoted by Φ and ϕ.

这些问题之一的概括，特别是问题，考虑了以下可能性： n个变量的函数可以表示为单个变量的两个函数的和与组成的组合，用Φ和denoted表示。

More concretely:

更具体地说：

Here, η and the λs are real numbers. It should be noted that these two univariate functions are Φ and ϕ can have a highly complicated (fractal) structure.

在此， η和λs是实数。应该注意的是，这两个单变量函数是Φ，而ϕ可以具有高度复杂的(分形)结构。

Three articles, by Kolmogorov (1957), Arnold (1958) and (1965) provided a proof that there must exist such representation. This result is rather unexpected since according to it, the bewildering complexity of multivariate functions can be “translated” into trivial operations of univariate functions, such as additions and function compositions.

Kolmogorov(1957)，Arnold(1958)和 (1965)的三篇文章提供了必须存在这种表示的证明。这个结果是相当出乎意料的，因为据此，多元函数令人困惑的复杂性可以被“转化”为单变量函数的琐碎运算，例如加法和函数组合。

怎么办？ (Now what?)

If you got this far (and I would be thrilled if you did), you are probably wondering: how could an esoteric theorem from the 50s and 60s be even remotely related to cutting-edge algorithms such as artificial neural networks?

如果您走了这么远(如果您这样做的话，我会很兴奋)，您可能想知道：50年代和60年代的神秘定理怎么能与诸如人工神经网络之类的尖端算法远程联系？

快速提醒神经网络激活 (A Quick Reminder of Neural Networks Activations)

The expressions computed at each node of a neural network are compositions of other functions, in this case, the so-called activation functions. The degree of complexity of such compositions depends on the depth of the hidden layer containing the node. For example, a node in the second hidden layer performs the following computation:

在神经网络的每个节点上计算的表达式是其他函数的组合，在这种情况下，就是所谓的激活函数。这种合成的复杂程度取决于包含该节点的隐藏层的深度。例如，第二个隐藏层中的节点执行以下计算：

Where the ws are the weights, and the bs are the biases. The similarity with the multivariate function f shown a few paragraphs above is evident!

其中w s是权重， b s是偏差。显然与上面几段所示的多元函数f有相似之处！

Let us quickly write down a function in Python only for forward-propagation which outputs the calculations performed by the neurons. The code for the function below has the following steps:

让我们在Python中快速写下一个仅用于正向传播的函数，该函数输出神经元执行的计算。以下函数的代码包含以下步骤：

First line: the first activation function ϕ acts on the first linear step given by:
第一行 ：第一个激活函数ϕ作用于由以下公式得出的第一个线性步骤：

x0.dot(w1) + b1

where x0 is the input vector.

其中x0是输入向量。

Second line: the second activation function acts on the second linear step
第二行：第二激活函数作用于第二线性步骤

y1.dot(w2) + b2

Third line: a is used in the final layer of the neural network, acting on the third linear step
第三行：在神经网络的最后一层使用，作用于第三线性步骤

y2.dot(w3) + b3

The full function is:

完整功能是：

def forward_propagation(w1, b1, w2, b2, w3, b3, x0):        y1 = phi(x0.dot(w1) + b1)    y2 = phi(y1.dot(w2) + b2)    y3 = softmax(y2.dot(w3) + b3)        return y1, y2, y3

To compare this with our expression above we write:

为了与上面的表达式进行比较，我们编写：

y2 = phi(phi(x0.dot(w1) + b1).dot(w2) + b2)

The correspondence can be made more clear:

对应关系可以变得更清楚：

两个世界之间的联系 (A Connection Between Two Worlds)

We, therefore, conclude that the result proved by Kolmogorov, Arnold, and Sprecher implies that neural networks, whose output is nothing but the repeated composition of functions, are extremely powerful objects, which can represent any multivariate function or equivalently almost any process in nature. This partly explains why neural networks work so well in so many fields. In other words, the generalization power of neural networks is, at least in part, a consequence of the Kolmogorov-Arnold representation theorem.

因此，我们得出的结论是，由Kolmogorov，Arnold和Sprecher证明的结果表明，其输出仅是函数的重复组成而已，其结果仅是神经网络，它们是极其强大的对象，可以表示自然界中的任何多元函数或几乎任何过程。。这部分地解释了为什么神经网络在这么多领域都表现出色。换句话说，神经网络的泛化能力至少部分是Kolmogorov-Arnold表示定理的结果。

As pointed out by , the generalization power of forming functions of functions of functions ad nauseam was, in a way, “discovered independently also by nature” since neural networks, which work as shown above doing precisely that, are a simplified way to describe how our brains work.

正如所指出的，形成函数ad的函数的泛化能力恶心从某种意义上说，它是“自然而然也被独立发现的”，因为如上所述工作的神经网络正是这种简单的方式，它描述了我们的大脑如何工作。

Thanks a lot for reading! Constructive criticism and feedback are always welcome!

非常感谢您的阅读！总是欢迎建设性的批评和反馈！

My and my website have some other interesting stuff both about data science and physics.

我的和我的网站还有其他一些有关数据科学和物理学的有趣东西。

There is a lot more to come, stay tuned!

还有更多内容，敬请期待！