📄 Extracted Text (591 words)
From: Misha Gromov <
Sent: Wednesday, October 11, 2017 7:47 PM
To: Jeffrey E.
Subject: Re: Fwd:
Like Bach's comments:)
On Wed, 11Oct 2017 20:01:46 +0200, Jeffrey E. wrote:
Forwarded message
Fro=: Joscha Bach <
Date: Wed, Oct 11, 2017 at 7:55 PM
Subject: Re:
To: Jeffrey Eps=ein <[email protected] <mailto:[email protected]> =gt;
After skimming their paper, the idea seemed unexcitin= to me at first: basically, if we have enough feature
dimensions we can al=ost always find a linear separation. This is also related to how Support V=ctor Machines work:
they project the data into an extremely high-dimension=l space, find a separating hyperplane with linear regression, and
then pro=ect that plane back into the original space as the separator. A similar id=a is behind Echo State networks,
which use a randomly wired recurrent neur=l network and then only train the output layer with a single linear
regres=ion.
The authors take an existing trained neural network, and whenev=r it makes a mistake, they train a linear
classifier on the network state =nd data, i.e. they try to find out when the network goes wrong. Instead of=improving the
network (which is also likely to make it worse in other case=), they add an additional layer to it. For engineering, this
makes a lot o= sense, because large neural networks are cheap to use and deploy but expe=sive to train.
On a more philosophical level, it is tempting t= ask if that might be a general learning principle for brains: when
you do='t perform well, add more control structure on top. It probably makes sens= whenever you are confident that
training the existing structure won't imp=ove it that much, but unless training the weights in an existing network, =t also
adds quite a few milliseconds to the processing time. There is prob=bly an optimal tradeoff for this. The other thing is
that the new layer is=a linear classifier only (at least in this paper), and it is creating a lo=al override on the system's
results, instead of integrating with it, somew=at similar to how reasoning might override our subconscious behavior.
One =f the drawbacks is that this won't allow us to use the new layer for simul=ting/understanding the structure of the
domain modeled by the rest of the =etwork.
— Joscha
> On Oct 10, 2017,=at 09:43, Jeffrey E. <[email protected] <mailto:[email protected]» wrote:
> h=tps://www.sciencedaily.com/releases/2017/08/170821102725.htm
<https://www=2Esciencedaily.com/releases/2017/08/170821102725.htm>
EFTA_R1_01766503
EFTA02586434
>=
> --
> please note
> The information contained in this communication is
> confi=ential, may be attorney-client privileged, may
> constitute insid= information, and is intended only for
> the use of the addressee=2E It is the property of
> JEE
> Unauthorized use, discl=sure or copying of this
> communication or any part thereof is st=ictly prohibited
> and may be unlawful. If you have received this=br /» communication in error, please notify us immediately
by
&=t; return e-mail or by e-mail to [email protected] <mailto:[email protected]> , and
> destroy this communication and al= copies thereof,
> including all attachments. copyright -all righ=s reserved
please note
The information contained in this communication is
confidential, =ay be attorney-client privileged, may
constitute inside information, =nd is intended only for
the use of the addressee. It is the property =f
JEE
Unauthorized use, disclosure or copying of this
commu=ication or any part thereof is strictly prohibited
and may be unlawfu=. If you have received this
communication in error, please notify us =mmediately by
return e-mail or by e-mail to [email protected] <mailto:[email protected]> , and
destroy this communicat=on and all copies thereof,
including all attachments. copyright -all =ights reserved
2
EFTA_R1_01766504
EFTA02586435
ℹ️ Document Details
SHA-256
518554b049a201cbd3206d567b8632741d5e1624f2139e7cf4bfa1e42c6ab9a2
Bates Number
EFTA02586434
Dataset
DataSet-11
Document Type
document
Pages
2
Comments 0