5 views

Uploaded by Duy Nguyen

ppty s sldk adjfhalk j ajdlkj hkjfhalkdj owiu d

- Artificial Neural Networks
- Intro to Deep Learning
- Deep Learning (2016.09.23-00.17.14Z).epub
- NVIDIA DGX Systems Avitas GE AI Infographic
- Chapter 4
- Deep Learning_ Which Loss and Activation Functions Should I Use
- PHI Learning Monthly Book News Bulletin January 2014
- Deep Neural Networks Papers For Youtube.pdf
- A New Formulation for Coordination of Directional
- Beginners Guide to Data Science- A Twics Guide 1
- Deep Learning Neural Network_Lecture1.pdf
- ML Financial Prediction
- redbrand
- 2005_Two-Layer Real Time Optimization of the Drying of Pastes in a Spouted Bed_Experimental Implementation
- leture01
- MB0032 Operations Research Assignments
- Engineering Wireless Mesh Networks
- Lecture
- Linear_programming_solver.pdf
- 11

You are on page 1of 78

html

Toward Theoretical

Understanding of Deep Learning

Sanjeev Arora

Princeton University

Institute for Advanced Study

http://www.cs.princeton.edu/~arora/ @prfsanjeevarora

Group website: unsupervised.princeton.edu

Amazon Resarch, Mozilla Research. DARPA/SRC

7/10/2018

Some History

1950s and 1960s: “Perceptron” (single-layer, 1 output node)

(Riding on convex optimization theory, generalization theory,

expressivity theory of kernels, etc.).

Theory??

?

7/10/2018 Theoretically understanding deep learning

Deep Learning: Terminology

q: Parameters of deep net

from distribution D

(training data)

label y on point x) Can be l2, cross-entropy….

Objective

Gradient Descent

7/

Optimization concepts already shape deep learning

(together with GPUs, large datasets)

wavenets, batch-normalization, …

Came from convex world.

intuitions, leading to new insights and concepts.

Talk overview Optimization: When/how can it find

decent solutions? Highly nonconvex.

Overparametrization/Generalization:

# parameters ≫ training samples.

Does it help? Why do nets generalize

Training error (predict well on unseen data)?

Role of depth?

Test error

Unsupervised learning/GANs

Simpler methods to replace deep

learning? (Examples of Linearization

from NLP, RL…)

Part 1: Optimization in deep learning

Hurdle: Most optimization problems in deep learning

are nonconvex, and on worst-case instances are

NP-hard.

Basic concepts

Note: ∇ ≠ 0 ∃ descent direction.

Find “local optimum”, ie bottom of some valley (∇2 is psd/ ∇2 ≽ 0)

Find global optimum 𝛉*.

Convergence from all starting points 𝛉?

Random initial point? Special initial points?

(naive: exp(d/𝜀)). Recall: d > 106

7/10/2018 Theoretically understanding deep learning

Curse of dimensionality

whose pairwise angle is > 60 degrees

angle at most e with one of these (“e-net”, “e-cover”)

“Time to explore d-

dimensional parameter

space.”

(INFEASIBLE)

Black box analysis for deep learning

Why: Don’t know the landscape, really

𝛉 f must settle for weaker soln concepts

∇f|𝛉

f = training error

statistical physics; not much success so far..]

7/10/2018 Theoretically understanding deep learning

Gradient descent in unknown landscape.

𝛉t+1= 𝛉t - 𝜂 ∇f(𝛉t)

□ ∇ ≠ 0 ∃ descent direction

fluctuate a lot!

determined by smoothness 𝛻 2 𝑓 𝛉 ≼ 𝛃𝐼

gaussian smoothening

of f.

7/10/2018

Theoretically understanding deep learning

Gradient descent in unknown landscape (contd.)

𝛉t+1= 𝛉t - 𝜂 ∇f(𝛉t)

□ Smoothness −𝜷𝐼 ≼ 𝛻 2 𝑓 𝛉 ≼ 𝛃𝐼

in #steps proportional to 𝛽/𝜺2.

1 2

Pf: 𝑓 𝛳𝑡 − 𝑓 𝛳𝑡+1 ≥ −𝛻𝑓 𝛳𝑡 𝛳𝑡+1 − 𝛳𝑡 − 𝛽|𝛳𝑡 − 𝛳𝑡+1 ቚ

2

2

1 2 2

1 2

= 𝜂 𝛻𝑡 − 𝛽𝜂 𝛻𝑡 = 𝛻

2 2𝛽 𝑡

➜ update reduces function value by 𝜺2/2β

is a weak solution concept.

7/10/2018

Theoretically understanding deep learning

Evading saddle points..

[Ge, Huang, Jin, Yuan’15 ] Add noise

to gradient (“perturbed GD”)

(Runtime improvements: Jin et al’17]

Analysis of random walk

Within poly(d/e) time “escape”

Min in n-1 dimensions, max in one all saddle points and achieve

“Approx 2nd order minimum”

| 𝛻𝑓 | ≤ 𝜖 𝛻2𝑓 ≽ − 𝜖 𝐼

7/10/2018

= “Langevin dynamics” in stat. physics)

Theoretically understanding deep learning

2nd order optimization for deep learning? (Newton

𝑥𝑡+1 = 𝑥𝑡 − 𝜂 [𝛻 2 𝑓(𝑥)]−1 𝛻𝑓(𝑥) method?)

Backprop can compute (∇2f)v

𝛉 f ∇f|𝛉 in linear time. [Pearlmutter’94]

∇2f|𝛉 [Werbos’74]

Can do approximate 2nd order optimization asymptotically faster than

1st order! (empirically, slightly slower) [Agarwal et al’17, Carmon et al’17]

Idea: (∇2)−1 = 𝐼 − 𝛻2 𝑖

(but use finite truncation)

𝑖=0 𝑡𝑜 ∞

so far.

7/10/2018 Theoretically understanding deep learning

Non-black box analyses

Various ML problems that’re subcases of depth 2 nets

(i.e., one hidden layer between input and output).

etc. (landscape is mathematically known) s

minimization, convex optimization …) than GD/SGD.

Topic modeling [A., Ge, Moitra’12] [A. et al’13]

Sparse coding [A., Ge, Ma, Moitra’14, ‘15] [Anandkumar et al’14]

Phase retrieval [Candes et al’15]

Matrix completion [Many papers, eg Jain et al.’13]

Matrix sensing

Learning Noisy Or nets [A., Ge, Ma’17]

7/10/2018 Theoretically understanding deep learning

Convergence to global optimum from arbitrary

initial point ( via understanding the landscape)

Matrix completion

Given O(nr) random entries

of an nxn matrix M of rank r,

𝑀 = 𝑈 ⋅ 𝑉⊤

predict missing entries.

Subcase of learning depth 2 nets

V

output at one random output node. Learn the net!

U

i [Ge, Lee, Ma’17] All local minima are global minima. So

0000001000 perturbed GD finds global minimum from arbitrary initial point.

(proof is fairly nontrivial)

7/10/2018 Theoretically understanding deep learning

Any theorems about learning multilayer nets?

Yes, but usually only for linear nets (i.e., hidden nodes

compute f(x) = x).

= itself a linear transformation

[Kawaguchi’16] [Hardt and Ma’16] (landscape of linear resnets);

[A., Cohen, Hazan’18]

Some other optimization ideas I did not cover

(some are only semi-theorems)

• Budding connections with “physics” ideas: natural gradient,

Lagrangian method, ..

models), reinforcement learning.

eg “information bottleneck”

Part 3: Overparametrization and

Generalization theory

e.g., Why is it a good idea to train VGG19 (20M parameters)

on CIFAR10 (50K samples)? No overfitting?

Overparametrization may help optimization :

folklore experiment e.g [Livni et al’14]

Difficult to train a new net

feeding random input vectors

using this labeled data

Into depth 2 net with

with same # of hidden nodes

hidden layer of size n

explaining this… bigger hidden layer!

But textbooks warn us: Large models can “Overfit”

Generali-

zation error

of the net

7/10/2018 Theoretically understanding deep learning

But, excess capacity is still there!

will also identify

intrinsic structure of a

“well-trained”

[Understanding deep learning requires rethinking generalization, Zhang et al.net.

17]

7/10/2018 Theoretically understanding deep learning

BTW: Excess capacity phenomenon exists in linear models!

g = margin

Quantify

effective

capacity of

deep nets??

If has margin g , then possible to learn it with

< O(log d)/g2 datapoints! (low effective capacity!)

randomly labeled datapoints as well.

See also [Understanding deep learning requires understanding kernel learning. Belkin et al’18]

Effective Capacity Roughly, log (# distinct a priori models)

exhibits 2k distinct states we say

it has k bits of memory. )

Generalization

Theory (Main Thm)

m = # training samples. N = effective capacity

For deep nets, all of these are about the same (usually

7/10/2018

vacuous)

Theoretically understanding deep learning

Proof Sketch

• Fix deep net 𝛉 and its parameters

Err𝛉 = test error

Err𝛉,S = avg error on S = training error

“Effective capacity” PrS[diff. between them ≤𝝴] > 1- exp(- 𝝴2m)

= log W, = N

Complication: Net depends upon training sample S

If # (possible 𝛉) = W, suffices to let m > (log W)/𝝴2

7/10/2018 Theoretically understanding deep learning

Old notion: Flat Minima

[Hinton-Camp’93] [Hochreiter, Schmidhuber’95]

Multiple arguments ➜

“Noise in SGD favors flat minima”

better empirically

Flat Sharp

[Keskar et all’16]

Fewer # of “possible models” with such small descriptions

(and false for some defns of “sharp” [Dinh et al’17])

7/10/2018 Theoretically understanding deep learning

Current status of generalization theory: postmortem analysis…

It has Property 𝚽

shared by very

few nets…

TRAINED

NET 𝛉

Generalization error

generalization?

and hence effective capacity (much harder!)

Nonvacuous estimate of “true capacity”

has proved elusive (starting point [Langfod-Caruana02]

[Dziugaite-Roy’17] have more nonvacuous bounds for MNIST** but not an asymptotic

7/10/2018 Theoretically understanding deep learning

“complexity measure”

Nonvacuous boundonon“true

Nonvacuous bound “trueparameters”

parameters”

hasproved

has proved elusive..

elusive..

Main idea: Using

VGG19 noise stability to

(19 layers) bound effective

Capacity.

[A., Ge,

Neyshabur,

[Bartlett-Mendelson’02]

[Neyshabur et al’17] Zhang’18]

[Neyshabur et al’17]

[Bartlett et al’17],

[Dziugaite-Roy’17] have more nonvacuous bounds for MNIST** but don’t get any

“complexity measure”

7/10/2018 Theoretically understanding deep learning

Noise stability for deep nets [A., Ge, Neyshabur,

Zhang ICML’18]

(can be seen as a “margin” notion for deep nets)

(|𝜂| = |x| )

then net is noise stable.)

Noise stability of VGG19 [A., Ge, Neyshabur,

Zhang ICML’18]

attenuated as it passes through

to higher layers.

introduced at lower layers!)

discovered in experiments of [Morcos et al ICLR’18]

7/10/2018 Theoretically understanding deep learning

Von Neumann, J. (1956).

Probabilistic logics and the

synthesis of reliable organisms

from unreliable components.

components…

We have, in human and animal brains, examples

of very large and relatively reliable systems Shannon, C. E. (1958).

constructed from individual components, the Von Neumann’s contributions

to automata theory.

neurons, which would appear to be anything but

reliable.

…

In communication theory this can be done by

properly introduced redundancy.”

7/10/2018 Theoretically understanding deep learning

Understanding noise stability for one layer

(no nonlinearity)

h : Gaussian noise

x Mx |Mx|/|x| ≫ |M h|/|h|

x+h M(x + h)

=

Layer Cushion = ratio

(roughly speaking..)

a filter of layer 10 of VGG19.

Such matrices are compressible…

7/10/2018 Theoretically understanding deep learning

Overview of compression-based method for

generalization bounds [A.,Ge, Neyshabur, Zhang’18]; user-friendly

version of PAC-Bayes

# parameters ≪ # datapoints

# parameters ≫ # datapoints

Important: compression method allowed to use any number

of new random bits, provided they don’t depend on data.

7/10/2018 Theoretically understanding deep learning

Proof sketch : Noise stability deep net compressible with minimal

change to training error

Idea 1: Compress a layer (randomized;

errors introduced are “Gaussian like”)

network, as noted earlier.

Compression:

(1) Generate 𝑘 random sign matrices

𝑀1 , … , 𝑀𝑘 (impt: picked before seeing

data)

1 𝑘

መ

(2) 𝐴 = σ𝑡=1 𝐴, 𝑀𝑡 𝑀𝑡

𝑘 (Nontrivial extension to

7/10/2018 Theoretically understanding deep learning convolutional nets)

The Quantitative Bound

2

depth × 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑛𝑡𝑟𝑎𝑐𝑡𝑖𝑜𝑛

Nonvacuous

capacity ≈ bound on “true parameters”

layer cushion × interlayer cushion

has proved elusive..

VGG19

(19 layers)

#param

[A., Ge,

Neyshabur,

[Bartlett-Mendelson’02]

[Neyshabur et al’17] Zhang’18]

[Neyshabur et al’17]

[Bartlett et al’17],

7/10/2018 [Dziugaite-Roy’17] have more nonvacuous

Theoretically bounds

understanding for MNIST** but don’t get any

deep learning

“complexity measure”

Correlation with Generalization (qualitative

check)

when trained on normal data on normal data..

than on corrupted data

Concluding thoughts on generalization

20M parameters generalizes with 50k training datapoints.

algorithm and/or data distribution.

([Gunasekar et al’18] Quantify “Implicit bias” of gradient descent

towards “low capacity” models in simple settings.)

COMING UP:ROLE of DEPTH; THEORY FOR

UNSUPERVISED LEARNING (GANS, TEXT

EMBEDDINGS, ..)

Part 3: Role of depth

Ultimate hope: theory informs what architectures are “best” for a

given task.

An old question: Role of depth?

Ideal result: exhibit natural learning problem which cannot be done

using depth d but can be done with depth d+1

formalization of “natural” learning problem…

in function computed by depth d net of some size

(ii) Show depth d+1 can compute function with

more oscillations .

So does more depth help or hurt in deep learning?

unless use special architectures like resnets..)

“accelerate” the optimization (including for

classic convex problems…)

Acceleration by increasing depth: Simple example

[A., Cohen, Hazan ICML’18]

lp regression.

depth-2 w2 (overparametrize!)

linear circuit

GD now amounts to

7/10/2018 Theoretically understanding deep learning

Overparametrization acceleration effect

lp regression, p=4

connected layer by two layers.

• Some theoretical analysis for multilayer linear nets.

• Proof that acceleration effect due to increase of depth

not obtainable via any regularizer on original architecture.

Theoretically understanding deep learning

Part 4: Theory for Generative Models

and Generative Adversarial Nets (GANs)

Unsupervised learning motivation: “Manifold assumption”

X : Image

the Image ➔ Code mapping

Z: Its ”code”

on manifold

Typically modeled as learning the joint prob. density p(X, Z)

(Code of X = sample from Z|X)

Unsupervised learning Motivation: “Manifold

assumption” (contd)

is good substitute for Image X in downstream

classification tasks.

labeled samples if use Z instead of X)

Z: Its ”code”

on manifold

(Code of X = sample from Z|X)

Deep generative models

(e.g., Variational AutoEncoders)

Usual training:

Max Ex[log p(x)]

(“log likelihood”)

Code Z Image X

N(O, I) Dreal

Generative Adversarial Nets (GANs) [Goodfellow et al. 2014]

it favors outputting fuzzy images.

discriminative deep learning to improve the generative model.

“Difference in expected

output on real vs synthetic

Generative Adversarial Nets (GANs)

[Arjovsky et al’17] **

images” Wasserstein

[GoodfellowGAN

et al. 2014]

Real (1) or

Fake (0)

• Discriminator trained to output 1 on

Dv

real inputs, and 0 on synthetic inputs.

Dreal

Gu synthetic outputs that make

discriminator output high values.

h

v = trainable parameters of Discriminator net

Generative Adversarial Nets (GANs) [Goodfellow et al. 2014]

Real (1) or

Fake (0)

• Discriminator trained to output 1 on

Dv real inputs, and 0 on synthetic inputs.

Dsynth synthetic outputs that make

Dreal

Gu discriminator output high values.

and further training of discriminator

doesn’t help. (“Equilibrium.”)

u= trainable parameters of Generator

v = trainable parameters of Discriminator

What spoils a GANs trainer’s day: Mode Collapse

unable to teach generator to produce distribution Dsynth with

sufficiently large diversity

New Insight from theory: problem not with # of training samples, but

size (capacity) of the discriminator!

Thm [A., Ge, Liang, Ma, Zhang ICML’17] : If discriminator

size = N, then ∃ generator that generates a distribution

supported on O(Nlog N) images, and still wins against all

possible discriminators.

(tweaking objectives or increasing training set doesn’t help..)

O(N logN) random real images. Consider “all possible

discriminators of size N” (suffices to consider “𝛆-net” ).

Use concentration bounds to argue that none of them

can distinguish Dreal from this low-support distribution.

How to check support size of

generator’s distribution??

avoid mode-collapse.

Empirically detecting mode collapse (Birthday Paradox Test)

(A, Risteski, Zhang ICLR’18)

> 1/2 that two of them share a birthday.

Then Pr[sample of size √N has a duplicate image] > ½.

Birthday paradox test* [A, Risteski, Zhang] : If a sample of size s has near-duplicate

images with prob. > 1/2, then distribution has only s2 distinct images.

near-duplicates. Rely on human in the loop verify duplicates.

Estimated support size from well-known GANs

500 samples. Support size (500)2 = 250K

and ALI (Dumoulin et al’17]:

Support size = (1000)2 = 1M

CelebA (faces):

200k training images

(Similar results on CIFAR10)

uplicat e pairs found in a bat ch of 640 generat ed faces samples from a DCGAN

diversity of GANs on nat ural images (CIFAR-10) is t ricky since Euclidean dist ance

not informat ive. T hen we adopt a pret rained classifying CNN and use it s t op layer

eddings for similarity t est . Also, result of t he t est is a↵ect ed by t he quality of

Followup:

at ural t o imagine [Santurkar

t hat t he most similar et al’17]

samples are Different

blurry blobs testif most ofgenerat

diversity;ed confirms lack of diversity.

w-quality. 7/10/2018

T his is indeed t he case if we t est a DCGAN Theoretically

(even t he best variant

understanding wit h

deep learning

core), where pairs ret urned are most ly blobs. To get meaningful t est result , we t urn

Part 4.1 (brief): Need to rethink

unsupervised learning.

AKA “Representation Learning.”

Unsupervised learning Motivation: “Manifold

assumption” (contd)

for Image X in downstream classification

tasks.

labeled samples if have the code)

Z: Its ”code” “Semisupervised

on manifold learning??”

Typically modeled as learning the joint prob. density p(X, Z)

(Code of X = sample from Z|X)

Caveat : Posssible hole in this whole story

that I’m unable to resolve

X : Image

For Z to be good substitute for image X

in downstream classification, density

p(X,Z) needs to be learnt to very high

numerical accuracy.

Z: Its ”code” [See calculation in blog post

on manifold by A. + Risteski’17, www.offconvex.org]

Joint Density p(X,Z)

usual story doesn’t justify it.

7/10/2018 Theoretically understanding deep learning

Food for thought…

Maximizing log likelihood (presumably approximately)

may lead to little usable insight into the data.

“utility” approach (What downstream tasks are we interested in

and what info do they need about X?)

e.g., What would a “representation learning competition” look like?)

Part 5: Deep Learning-free text

embeddings (illustration of above principles)

“Hand-crafted with Black box that’s

love and care….” end-to-end

differentiable…

7/10/2018 Theoretically understanding deep learning

Two sentences that humans find quite similar

How to capture similarity and other properties of pieces of text?

Obvious inspiration: Word embeddings (word2vec, GloVe etc.)

Usual method: Recurrent neural net, LSTM (Long Short Term Memory), etc.

supervision (eg InferSent [Conneau et al’17]).

Learnt using LSTMs…

model figure out what the linear methods can do….”

Cottage industry of text embeddings that’re linear

“Word embeddings + linear algebra”

(Inspired by word2vecCBOW )

dataset. (“semi-supervised”)

denoising using top singular vector….

Performance (similarity/entailment tasks)

SIF

Important:

Classifier should be

simple (eg linear!)

of time! Perhaps embedding must

capture all/most of the

information in text? (e.g., Bag-of-

Words info)

Word extraction out of linear embeddings ⇔

sparse recovery

Recall sum-of-words embedding: Recovering Sparse x given Ax

≍ “Compressed Sensing”

(aka “Sparse recovery”)

x [Donoho06, Candes-Romberg-Tao06]

0

0

A vw 0

1

0

Do-able if A satisfies

0 “RIP”/”Random”/”Incoherence”.

w 0

1

(unfortunately none are satisfied

Columns of A = Word embeddings 1 by matrix of GloVe embeddings..)

Bag-of-word

vector

(Bag-of-Words only)

Recovered Embedding

BoW

doesn’t imply the embedding is good

for classification via linear classifier….

Recovery algorithm: Basis Pursuit

on compressed vector Ax is essentially as good as x.

downstream tasks! (Even provably so, under some

compressed sensing type conditions.)

[Some inspiration from

[Plate 95] [Kanerva’09]

More powerful linear embeddings

Distributed Co-occurrence (DisC) Embeddings

[A, Khodak, Saunshi, Vodrahalli ICLR’18]

(bi-gram = word pairs, tri-gram = word triples..)

Vector for bigram (w,w’) = vw ⊙ vw’ (entrywise product)

[Khodak, Saunshi, Liang, Ma, Stewart, A. ACL’18]

using linear regression. (Induced embeddings useful in other

NLP tasks too!)

Comparison with state-of-the art text embeddings on

suite of downstream classification tasks

Bag of n-gram

Aside: Another illustration of Linearization

principle in RL [Mania,Guy,Recht’18]

plus simple Random Search beats state of the art deepRL on

some standard RL tasks

Wrapping up…

What to work on (suggestions for theorists)

1. Use Physics/PDE insights, such as calculus of

variations (Lagrangians, Hamiltonians, etc.) “The revolution

will not be

2. Look at unsupervised learning (Yes, everything supervised.”

is NP-hard and new but that’s how theory grows.)

(Currently very little.)

interactive learning (of language, skills, etc.). Both

theory and applied work here seems to be

missing some basic idea. (Theory focuses on

simple settings like linear classifiers/clustering.)

7/10/2018 Theoretically understanding deep learning

Concluding thoughts

• Best theory will emerge from engaging with real data and real

deep net training. (Nonconvexity and attendant complexity

seems to make armchair theory less fruitful.)

simplified.

es kein ignorabimus

D. Hilbert

Advertisements

http://unsupervised.cs.princeton.edu/deeplearningtutorial.html

2019-20 http://www.math.ias.edu/sp/

Resources: www.offconvex.org

http://www.cs.princeton.edu/courses/archive/fall17/cos597A/

Extra slides..

Convergence to global optimum:

Measure of progress

update: zs+1 = zs – η gs Need NOT be gradient!

Also, starting point special

Show: direction -gs is substantially correlated with best direction one

could take, namely, zs – z*

Simple proof shows it implies quick convergence.)

- Artificial Neural NetworksUploaded bySoundarya Svs
- Intro to Deep LearningUploaded byhiperboreoatlantec
- Deep Learning (2016.09.23-00.17.14Z).epubUploaded byAndres Tuells Jansson
- NVIDIA DGX Systems Avitas GE AI InfographicUploaded byAbraham Lama
- Chapter 4Uploaded bysmartlife0888
- Deep Learning_ Which Loss and Activation Functions Should I UseUploaded byعاصم اللہ کٹی خیل
- PHI Learning Monthly Book News Bulletin January 2014Uploaded byPHI Learning Pvt. Ltd.
- Deep Neural Networks Papers For Youtube.pdfUploaded bySyamel Izzat
- A New Formulation for Coordination of DirectionalUploaded byKléberVásquez
- Beginners Guide to Data Science- A Twics Guide 1Uploaded byJeffin Varghese
- Deep Learning Neural Network_Lecture1.pdfUploaded byWin Roedily
- ML Financial PredictionUploaded byEric Caceres
- redbrandUploaded bysarayont
- 2005_Two-Layer Real Time Optimization of the Drying of Pastes in a Spouted Bed_Experimental ImplementationUploaded byademargcjunior
- leture01Uploaded byjustmejosh
- MB0032 Operations Research AssignmentsUploaded byvarsha
- Engineering Wireless Mesh NetworksUploaded byallanokello
- LectureUploaded byArun Yadav
- Linear_programming_solver.pdfUploaded bySupriya Kadam Jagtap
- 11Uploaded byaqeel shoukat
- Assignment problemsUploaded byTharaka Methsara
- IST Workshop SyllabusUploaded byebrahem_sahil5188
- dms nd ie.docxUploaded byNagendra Manral
- 2012 - 11 Ship Machinery COSSMOS.pdfUploaded byteriax
- Best Transportation Choice ModelUploaded byDimi Proto
- Maths RemediationUploaded byFawwaz Ramadona
- program (1)Uploaded byshubham derhgawen
- DistributedSubgradientMethodsfor MultiagentOptimizationUploaded byHugh Ouyang
- 916567Uploaded byWissam Jarmak
- Diseño Evaporador ÓptimoUploaded bysantiago

- An Overview of Gradient Descent Optimization AlgorithmsUploaded byAhmad Karlam
- backprop_old.pdfUploaded byDuy Nguyen
- VennUploaded byDuy Nguyen
- talent-vs-luck-arxiv.pdfUploaded byDuy Nguyen
- talent-vs-luck-arxiv.pdfUploaded byDuy Nguyen
- MathandCSUploaded byLe Tam
- Dionysus SlidesUploaded byDuy Nguyen
- Li2011_IDRUploaded byDuy Nguyen
- What the Difference Between a Matrix and a Tensor DLUploaded byDuy Nguyen
- Hiểu hơn về BERT_ Bước nhảy lớn của Google - Viblo.pdfUploaded byDuy Nguyen
- PartitionsUploaded byDuy Nguyen
- NotesUploaded byDuy Nguyen
- Writing ResumeUploaded byDuy Nguyen
- R Refence CardUploaded byMktNeutral
- Handouts ShwartzUploaded byDuy Nguyen
- RJournal 2011-1 WickhamUploaded byDuy Nguyen
- Interview 2011Uploaded byDuy Nguyen
- Toan-daisotohop-chuong2Uploaded byhaph
- Doug Bates Mixed ModelsUploaded byDuy Nguyen
- 20151103 CFDUploaded byDuy Nguyen
- Caret TransformUploaded byDuy Nguyen
- Texas Department of Public Safety - Courtesy, Service, ProtectionUploaded byDuy Nguyen

- Worked Example for Engineering Mechanics-IUploaded bynvnrev
- Primary Maths Curriculum FrameworkUploaded byMathiarasu Muthumanickam
- Affiliated Colleges and Courses List Pub on 10nov2014 (1)Uploaded bynidhin
- STHELPUploaded byNeha_ Sharma08
- wavelets.pdfUploaded byNurjamin
- Maths LemmaUploaded byVamsi Srimathirumala
- Collisions of MD5Uploaded byundergl
- tamra j yeager two page resume efolioUploaded byapi-299975737
- Analysis Sol CompUploaded byJean Rene
- 72333609-Lecture-1-Probability-Theory-Statistical-Sciences-2141A-B-for-Steven-Kopp-on-2011-09-09-at-University-of-Western-Ontario.pdfUploaded byHassan Ali
- Assignment Factor AnalysisUploaded byAmit Sinha
- 0580_s08_qp_2Uploaded byElissar129
- Fourier Transform(1)Uploaded byManish Kumar Mourya
- Akbulut LecturesUploaded byChris Hays
- Matematik ElektifUploaded bymyzaza
- Kova RikUploaded byQKing
- Good Question DSPUploaded byMudit Pharasi
- Cem 112 ProjectUploaded byCalvin Paulo Mondejar
- course file of M-I (2016-2017) 12-10-2017Uploaded byManjulajonnadula
- Notes on Physical Geodesy - Tim Jensen.pdfUploaded byhiwire03
- pme39-shinnoUploaded byShiela Vanessa Riparip
- estimating_cragg_appendix.pdfUploaded bygfb1969
- Lesson 3.5 Video Tutorials Rational FunctionsUploaded bymomathtchr
- Solution of collatz conjectureUploaded bymuzammil
- Product NotationUploaded byMark
- FIELD_THEORY_NOTES-2.docxUploaded byNithin Srinivas
- lec 9 OT problem formulation.pptUploaded byMuhammad Bilal Junaid
- olenick math tutoring reflectionsUploaded byapi-281846254
- DelaunayUploaded bychinmoyghorai
- Machine Learning __ Cosine Similarity for Vector Space Models (Part III) _ Terra Incognita Part3Uploaded bySeun -nuga Daniel