Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Deep Learning Help
View unanswered posts
View posts from last 24 hours

Goto page 1, 2, 3  Next  
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Tue Jun 12, 2018 5:57 pm    Post subject: Deep Learning Help Reply with quote

I'm unsure how deep this will go, but here I go.

I'm a seasoned programmer in several languages. I now have a need to code something that will recognize types/breeds of animals.

Now I'm aware there are some API's available, but I wouldn't learn much beyond, "Hello API - here are 1,000 samples of what I want you to learn. Now, can you tell me if this image matches?"

I'd like to code this kind of thing myself, the only requirement is that the language be Linux compatible.

That said, any tutorials I can go over to learn this? Or would this kind of thing be too much for one person to handle and I will need some prewritten AI? If so, any (hopefully free) suggestions?
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
krinn
Watchman
Watchman


Joined: 02 May 2003
Posts: 6816

PostPosted: Tue Jun 12, 2018 7:17 pm    Post subject: Reply with quote

You have all (maybe not, but i safely just assume all, because i don't care about some obscure language made) languages that exists in linux.
I wouldn't be worry about what language linux could handle.
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Tue Jun 12, 2018 7:27 pm    Post subject: Reply with quote

krinn wrote:
You have all (maybe not, but i safely just assume all, because i don't care about some obscure language made) languages that exists in linux.
I wouldn't be worry about what language linux could handle.


I was more referring to Dot NET. :wink:
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
John R. Graham
Administrator
Administrator


Joined: 08 Mar 2005
Posts: 10091
Location: Somewhere over Atlanta, Georgia

PostPosted: Tue Jun 12, 2018 8:58 pm    Post subject: Reply with quote

With the understanding that I am mostly "Google Knowledgeable" on this topic, check out sci-libs/tensorflow. There's a O'Reilly book, too. Key phrases for your future search are "Neural Network" and "Machine Learning".

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
Back to top
View user's profile Send private message
Naib
Watchman
Watchman


Joined: 21 May 2004
Posts: 5434
Location: Removed by Neddy

PostPosted: Tue Jun 12, 2018 9:03 pm    Post subject: Reply with quote

Bigun wrote:
krinn wrote:
You have all (maybe not, but i safely just assume all, because i don't care about some obscure language made) languages that exists in linux.
I wouldn't be worry about what language linux could handle.


I was more referring to Dot NET. :wink:
or matlab-script :) they have a really good deeplearning toolbox but the source is not available, just the api and help :)


Look into TensorFlow
_________________
The best argument against democracy is a five-minute conversation with the average voter
Great Britain is a republic, with a hereditary president, while the United States is a monarchy with an elective king
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1698

PostPosted: Tue Jun 12, 2018 9:30 pm    Post subject: Reply with quote

That's an interesting topic. Looks like a good task for an ANN, which in turn calls for GPU's horsepower.
Quote:
That said, any tutorials I can go over to learn this? Or would this kind of thing be too much for one person to handle and I will need some prewritten AI? If so, any (hopefully free) suggestions?
There are quite a lot of papers online.
They point out that it's very important to pick the right type of network for the task.
They point out that it's extremely important to pick a "good" data representation - something simple enough you can work on it.
They point out that training an ANN takes a lot of time, a lot of data, and it only gets worse as you make bigger and bigger networks (and also, an oversized ANN would memorize training material instead of learning patterns, so bigger networks need more samples before training becomes effective) Unfortunately, I haven't found any information on sizing your network, beside it should be "just enough to do the job". Not a single hint on how to estimate what is "enough" in any particular case.

An interesting fact was that neural network does not necessarily have to be of a single type. You can use many types together, so they can e.g. process data at different stages.
Bonus: what sort of data do you have? Is it normalized in some way? Do you have a separate set for testing results after training?
There was a pretty famous failure when some army tried to teach ANN distinguishing real tanks from baits. It worked perfectly fine until they tested it in field. Turned out, photos of tanks in their training set were taken on another day than photos of baits, and their neural network learnt to recognize weather conditions.
Back to top
View user's profile Send private message
Akkara
Administrator
Administrator


Joined: 28 Mar 2006
Posts: 6512
Location: &akkara

PostPosted: Tue Jun 12, 2018 10:51 pm    Post subject: Reply with quote

A lot of this stuff is all open-source.

Good toolkits are tensorflow + Python, as is the Caffe stuff out of UC.Berkley. I compiled the latter from source and it works well. Needs CUDA + nvidia graphics card(s) if you have a lot of data to run thru, or need to do more than a trivial amount of training. (For reference, you need on the order of a billion arithmetic ops per recognition, and about 3x that per image per 'epoch' if you're training, for ~256px-square images.) There's also open-cv, which is a toolkit of older but still useful computer-vision algorithms as well as the learning stuff.

If you need stand-alone, a number of manufacturers make kits and evaluation boards geared to that. Nvidia's "Jetson" board comes to mind, which runs a linux distribution. If your ultimate goal mostly involves inference, there's other options, including some very low-powered ones, such as the Movidius stuff (I think they are owned by Intel now, if I recall). You can get cameras and stuff for a Raspberry Pi too. Google's "AIY" is a add-on accelerator board for the Raspberry that can do visual recognition.

To get up to speed in the field, search for Geoffrey Hinton of U.Toronto. He is one of the pioneers and has placed some good introductory material on youtube. I highly recommend to become at least somewhat familiar with the fundamentals at the lowest levels even if you end up working at a higher level. Doing so will give a better idea of the kinds of things that can go wrong in training and how to avoid them, as well as a better idea of how to set up your networks.

Generally the way one starts is with a pre-trained network, which you can then train (typically just the last 2-3 stages) to re-target it to your application. Check out the "ImageNet" competition. Lots of publically available networks and coefficients to get you started. Something like GoogLeNet or later from that competition (search for the papers) is an excellent starting point.
_________________
Humility means having to eat less crow when you are shown to be wrong.
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Fri Jun 15, 2018 1:56 pm    Post subject: Reply with quote

szatox wrote:
...
Bonus: what sort of data do you have? Is it normalized in some way? Do you have a separate set for testing results after training?
...


Odd scenario, but geese... specifically, Canadian Geese.

Going to build a prototype device to detect and track these geese on a webcam, then shoo them off with a waterhose or something.
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1698

PostPosted: Fri Jun 15, 2018 6:29 pm    Post subject: Reply with quote

That's really though now. You'll have to detect some area with motion (this takes time), probably crop and scale it into a "standard" size input image, and then run it through whatever you use for recognition (which takes time), and then trigger your water hose (which also needs some time to fire).
Even if you aim in advance and wait for your target to get into your showering zone rather than following it, you can expect the whole process to give you a few seconds delay. Doing it in real time is... Well, hard.
I'd skip all the coding and go with a good, old-fashioned laser beam trigger instead (or a tripwire... Or anything "physcial" in nature actually). It would surely flood the area with false positives, but this doesn't seem like a big sacrifice.

Training your own ANN is still a really cool idea. I just don't think it's going to do the job for you. Maybe motion alone would provide you with sufficient information. Maybe it would already be too laggy.
Back to top
View user's profile Send private message
n17r4m
n00b
n00b


Joined: 22 Jan 2018
Posts: 19

PostPosted: Fri Jun 15, 2018 9:38 pm    Post subject: Reply with quote

Machine learning researcher here;

What you are trying to do is actually considered fairly trivial nowadays, but requires at least half a decade of experience such that it becomes trivial. Sorry about that, but it is a fun journey, especially if you really like math/calculus. Anyways, if it was myself approaching this problem, I would probably take a nice juicy pre-trained model, maybe something like resnet-50 or vgg-19 trained on ImageNet, or whatever, and fine-tune the last layer or two with your 1000 sample points + 1000-negative samples: a simple yes-goose no-goose classification task. You should pretty quickly get up to ~95%+ accuracy. Guesstimating here... you'll probably need 5-10,000+ samples to get up to 98-99% accuracy. If you are trying to find out where in a frame the goose is, you can use more advanced ANN's such as Faster-RCNN: https://arxiv.org/abs/1506.01497

Caveats however are that you will need to be using a fairly high power GPU (200W+system power, which might get expensive?, otherwise, using CPU inferencing it may take several seconds or more to process a frame. which may be too much delay, as szatox mentioned. Not to mention training will take days/weeks instead of minutes/hours.

If you really want to go down the ANN rabbit hole though, HBO actually built a fairly similar system to what I think you are looking for, in the form of a fun HotDog/Not-HotDog app (If you haven't watched Silicon Valley, maybe consider it, it's great), but the really cool thing is they laid out how they built it using tensorflow, keras, etc. Check out:

https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-keras-react-native-ef03260747f3

Edit: I just saw another post that briefly explains the steps on solving Where's Waldo using a Mask-rcnn that might help get some gears turning:
http://lifepluslinux.blogspot.com/2018/06/finding-wheres-waldo-using-mask-r-cnn.html
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Mon Jun 18, 2018 7:26 pm    Post subject: Reply with quote

n17r4m wrote:
Machine learning researcher here;
...


I may be asking permission to talk with you at some point, as I have watched several hours of tutorials, and began to read many articles (most of which went over my head).

From what I've gathered, it seems most of the learning will happen by doing, then maybe the terms used in machine learning will start to become commonplace. Also, it seems python is the language of choice when it comes to machine learning. Are both of these correct assumptions?
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
n17r4m
n00b
n00b


Joined: 22 Jan 2018
Posts: 19

PostPosted: Mon Jun 18, 2018 9:35 pm    Post subject: Reply with quote

No problemo.

As for your questions, yeah, learning by doing goes a long way, and python is super common in ML (Lua, C++, and MATLAB are also somewhat common). I was fortunate enough to have started off by taking a class where we developed various neural network architectures and layers (I.e, affine transforms, ReLU, convolutions, max pool, drop-out, batch normalization) from scratch using numpy, and trained them using various toy datasets (MNIST, CIFAR-10). It really helped cement a lot of the basic concepts. (back propagation, optimizers, regularization)..

I'm not necessarily saying anyone needs to follow this same path to become proficient at building ANNs, and surely jumping right into keras or pytorch, one can get some pretty amazing results very quickly. However these abstractions are a bit leaky, and it's sometimes hard to get a feel for why/why-not something is happening. For example, it's generally important to rescale/transform/whiten/normalize your input feature vectors to a 0-1 range, but the NN will gladly accept, say, raw image 0-255 ranged input and the only indicator that something is wrong is the opaque fact that training loss is slow to lower and validation accuracy is poor, which could be caused by 100+ other things.

Anyways, if you have any particular questions or things you would like a plainly worded explanation for, please feel free to hit me up either by a reply here, PM, or email.

Edit: found a really good intro slide deck, it's for caffe, but at a glance, all the topics are framework agnostic.
https://docs.google.com/presentation/d/1HxGdeq8MPktHaPb-rlmYYQ723iWzq9ur6Gjo71YiG0Y
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Wed Jun 20, 2018 5:14 pm    Post subject: Reply with quote

Finding out I'm missing some basic calculus math skills in order to learn this stuff. Going over geometric and arithmetic summations before going any further.

Crazy part is, I've been doing this kind of stuff for years and didn't even know it!
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Wed Jun 20, 2018 6:39 pm    Post subject: Reply with quote

So, I leave it to you if we make this private or keep it here in case it helps someone else, but I'm having some issues executing some code from a tutorial page (section 2.1): http://adventuresinmachinelearning.com/python-tensorflow-tutorial/

I run the code and I get an error:

Code:
bigun@meeseeks ~/dl/first $ python2.7 first.py
Traceback (most recent call last):
  File "first.py", line 25, in <module>
    a_out = sess.run(a, feed_dict={b: np.arange(0, 10)[:, np.newaxis]})
NameError: name 'np' is not defined


I've read through the tutorial thoroughly, and I don't believe I've missed anything. I understand what the code is trying to do, I just think the tutorial (or maybe me) missed something.

*edit*

The tutorial did miss something. Apparently there's a numpy library that needed to be imported. Source
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim


Last edited by Bigun on Wed Jun 20, 2018 7:06 pm; edited 2 times in total
Back to top
View user's profile Send private message
bec
Apprentice
Apprentice


Joined: 30 Sep 2004
Posts: 212
Location: Cali - Colombia

PostPosted: Wed Jun 20, 2018 7:05 pm    Post subject: Reply with quote

add:

Code:
import numpy as np

_________________
abe
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1698

PostPosted: Wed Jun 20, 2018 7:31 pm    Post subject: Reply with quote

Bigun, keep going here. It's an interesting topic and I expect it to become a solid base of links, tips, documents etc. Would be really great to have it all in one place.
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Thu Jun 21, 2018 4:25 pm    Post subject: Reply with quote

@everyone
There appears to be some legitimate interest in this subject in the forum, so I guess we'll keep the topic in the thread.

@n17r4m
I've ran through the entire tutorial and have a functional ANN running in python, but to be honest I got lost about halfway through the tutorial, and just began to read "words" without full understanding of what the article meant, I continued anyway with the purpose of just making sure the ANN and the tensorflow software was working on the machine - and it is. So mission accomplished there.

In the spirit of actually understanding what the frack I was doing, I went to an article that the tutorial pointed out as sort of a prerequisite before tackling the python example. I understood the first bit until I hit the term activation function.

Now before you bother explaining this function to me, I wanted to state I have no formal training in calculus. I've only dealt with algebra, geometry, proofs, and (of course) extensive amounts of programming (PHP, C#, HTML, SQL, etc, etc). I started digging into what sigma symbols meant, as I had seen them floating around countless examples - and after checking them out, I find I've been doing them (kind of) in "for loops" in programming. So basically, after filling in that knowledge gap, I had a much better grasp on how to calculate the formulas manually - but that is as far as my math training goes.

That said, I've never heard of an activation function before (section 2.1), and the tutorial talks as if it needed to be second nature and never explains the "why" behind the use of the formula. Is this a case of, "trust us, this is used all the time - move on" or "yeah, you've seen this kind of problem before, and you know to use this function, so we won't explain it further"? Do I need to actually go through an online course for calculus or something else before continuing? Also, running the code in section 2.1 gives me an error that it can't find the matplotlib.pylab module.
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
n17r4m
n00b
n00b


Joined: 22 Jan 2018
Posts: 19

PostPosted: Thu Jun 21, 2018 6:56 pm    Post subject: Reply with quote

Indeed, the sigma symbol (capital greek S) is very common in mathmatics, and it's simply, the "sum" operator.

So, suppose v is the array (or vector, or matrix, or tensor) = [1,2,3,4]

Then Σ v == 10 (1 + 2 + 3 + 4)

You may also see the product operator (capital greek P) ∏ around, which is like the sigma, but for multiplication.

So ∏ v == 24 ( 1 * 2 * 3 * 4 )

Sometimes the Σ or ∏ symbols have numbers or variables above/below them. The bottom initializes variables, and the top declares stopping conditions.
See: https://en.wikipedia.org/wiki/Summation

Now, as for activation functions. These are the bad boys that you put between the weight+bias layers to introduce non-linearity in your model. Common choices include

ReLU (Rectified linear unit) - basically: if x > 0 return x else return 0 for each element in some vector
Leaky ReLU : if x > 0 return x else return 0.1*x

Many others exist as well, including but not limited to: tanh, elu, selu, sigmoid, softmax, softplus, etc..

So, a toy Perceptron NN classifier pipeline with two hidden layers may look like:

Input -> DenseLayer -> ReLU -> DenseLayer -> ReLU -> DenseLayer -> SoftMax -> Output

Where each DenseLayer is super vanilla and are just a weight matrix and a bias vector.

A = Input * DenseLayer1_W + DenseLayer1_B
B = ReLU(A)
C = B * DenseLayer2_W + DenseLayer2_B
D = ReLU(C)
E = D * DenseLayer3_W + DenseLayer3_B
Output = SoftMax(E)


This stack overflow answer is a fairly gentle introduction,
https://stackoverflow.com/questions/9782071/why-must-a-nonlinear-activation-function-be-used-in-a-backpropagation-neural-net
Quote:
without a non-linear activation function in the network, a NN, no matter how many layers it had, would behave just like a single-layer perceptron, because summing these layers would give you just another linear function


The gist is that without the activation function, the network simply becomes a series of linear matrix multiplications, which by definition is also linear. The choice of activation function is pretty arbitrary actually, and it's only purpose is to make the transition of data through layers non-linear. Seriously, people have tried all sorts of wacky stuff, including really weird stuff like sine, sqrt, 1/x, etc.. In practice ReLU or Leaky ReLU works pretty good and is very fast for both forward and back passes. If you need output that includes negative values, consider tanh. Generally you want to avoid sigmoid, despite it being fairly prevalent in early ANN literature. The real magic though, is that regardless of which specific activation function you choose, simply adding in non-linear transitions between the layers allows ANN's to tease out patterns and solve highly non-linear problems. (e.g., taking image input and mapping it to a category).

In case your wondering, linear here means that the operation can be expressed in a form of y = Wx + b, where y, x, and b are vectors, and W is a matrix. This is an extension from the equation of a line y = m*x + b, but in n-dimensions. Non-linear means it cannot be expressed in such a form.

I'll also just tack on here, that to make the network actually do something useful, it needs to be trained. Simply feed in inputs, and then find out how 'wrong' the network is at the task.. (This is the "Loss" function, of which there are also many. RMSE or L2 are pretty common.). The goal is to to, "back propagate" values through the network so that all the numbers in the weights and biases become a bit less wrong for that given input. This is the core of "stochastic gradient descent" - The art of just showing a network a helluva lot of examples of the mapping you are trying to get it to learn, and making the net perform a little less bad on each example every time.

A last bit of terminology: the "optimizer" handles controlling the magnitude of the update to the weights and biases. Often it may implement "momentum." that is, if many updates during training are all making the weights and biases move in a certain direction, acceleration towards that direction may occur.

Yikes!, starting to touch upon a few advanced ideas here. If anyone needs clarification on any of the above, please don't hesitate to ask. It's possible to get working ANN's without calculus knowledge, but it really does help. Especially if you are trying to build one from scratch. Fortunately, frameworks such as PyTorch and TensorFlow perform automatic differentiation, so most of the calculus is hidden from you. Some knowledge of linear algebra may be required though.

As for that matplotlib.pylab error. Do you have matplotlib installed? It should be available via dev-python/matplotlib or pip.


Last edited by n17r4m on Sat Jun 23, 2018 2:11 am; edited 3 times in total
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Fri Jun 22, 2018 3:36 pm    Post subject: Reply with quote

@everyone

I highly suggest watching 3b1b's video series on machine learning (starting here). With that under my belt, I understand a *lot* more of the code and why its done the way it's done.

@n17r4m

I've cemented quite a few more terms in machine learning: cost, weight, bias, activation

By association, I understand a lot more of what the code is doing. Now for some clarification.

Using the code from the previous example again:

Section 3.1:

Q1)
Code:
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')


Is this setting up the initial weights at random before machine learning? Is the stddev argument used to declare a range from -0.03 to 0.03?

Q2)
Code:
hidden_out = tf.nn.relu(hidden_out)


Just to clarify - the ReLU function was decided on based on the type of output desired, correct? Also ReLU is a type of sigmoid function? Same question - Softmax?

Q3)
Code:
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)


This part I get, (log(0) = bad things) so convert to extremely low number. However:

Code:
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped) + (1 - y) * tf.log(1 - y_clipped), axis=1))


I'm lost, what is cross entropy and how does it apply to machine learning? What's happening here? If it was explained in one of 3b1b's videos, I missed it.

Overall:

Q1)
So I get this example is creating a brand new, fresh ANN, training it, and showing the resulting cost. The overall value of the code only seems to serve the purpose of showing the end user how the code works. But in practical application, I would imagine the weights and biases would be the important part of the learning process, at least until your cost is as low as you can get it. Then what? The code never shows you how to actually save and re-use these weights and biases for predictions in the future. How would you save these for future use?

Q2)
One of the use-cases that was mentioned (and actually has current relevance in my life) is property values. This example has 10 outputs between 0 and 1. If I wanted 1 output with values between 0 and n, I assume I would use ReLU instead of Softmax?
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
n17r4m
n00b
n00b


Joined: 22 Jan 2018
Posts: 19

PostPosted: Fri Jun 22, 2018 8:03 pm    Post subject: Reply with quote

Bigun wrote:

Code:
W1 = tf.Variable(tf.random_normal([784, 300], stddev=0.03), name='W1')

Is this setting up the initial weights at random before machine learning? Is the stddev argument used to declare a range from -0.03 to 0.03?


That's almost exactly it, although it isn't quite "a range from -0.03 to 0.03". It randomly initializes the initial weights using a normal, a.k.a. gaussian distribution, https://en.wikipedia.org/wiki/Normal_distribution so there will be some values laying outside of +/- 0.03, with about 68% of values drawn laying between -0.03 to 0.03, and about 95% being between -0.06 and 0.06.


Bigun wrote:

Code:
hidden_out = tf.nn.relu(hidden_out)

Just to clarify - the ReLU function was decided on based on the type of output desired, correct? Also ReLU is a type of sigmoid function? Same question - Softmax?


ReLU (Rectified linear unit) is defined as taking a vector x (hidden_out in your case), and returning a new vector where any negative values are clipped to 0.

Sigmoid is a squashing function, that will squeeze an input vector such that all returned elements are between 0 and 1.

Softmax is a function that normalizes elements of a input vector such that the sum of the elements equals 1. Useful as the final activation layer when performing classification, as it roughly corresponds to category confidence as a percent. E.g., if you have an image classifier and categories: [Cat, Dog, Bird] and the penultimate layer outputs: [3,4,1], softmax maps this to: [0.26, 0.71, 0.03], or 71% chance the image is a Dog, according to the net.

All three of these are "activation functions" that can introduce non-linearity to your network. Another notable one is ELU and it's cousin, SELU, which are "Exponential linear units."

Bigun wrote:

Code:
y_clipped = tf.clip_by_value(y_, 1e-10, 0.9999999)


This part I get, (log(0) = bad things) so convert to extremely low number. However:

Code:
cross_entropy = -tf.reduce_mean(tf.reduce_sum(y * tf.log(y_clipped) + (1 - y) * tf.log(1 - y_clipped), axis=1))


I'm lost, what is cross entropy and how does it apply to machine learning? What's happening here? If it was explained in one of 3b1b's videos, I missed it.


Your right about the y_clipped bit. This trick is also useful when doing division and you suspect there may be 0's hiding in the denominator. Note that your example use of it also clips values greater than 1 to 0.99999.

As for cross-entropy, there are two common variants. Categorical cross-entropy (n categories) and binary cross-entropy (2 categories). Both are generally used as a type of loss function. Loss functions are in general, the method in a neural network that estimates how "wrong" it currently is for the current input while training. Cross-entropy is a statistics driven measure of how different two distributions are. So essentially, it's for taking the results from the network, and comparing against what the results should actually be, and quantifying the error, or "loss".

Loss function need to be able to be differentiated so that the gradient (what direction all the weights need to be changed in such that error gets reduced) can be back-propagated through the network. Other common loss functions, depending on the task, include mean square error (great for regression problems) and Hinge, which is an alternative to cross-entropy in classification problems. Keras has a pretty good list: https://keras.io/losses/

Bigun wrote:

So I get this example is creating a brand new, fresh ANN, training it, and showing the resulting cost. The overall value of the code only seems to serve the purpose of showing the end user how the code works. But in practical application, I would imagine the weights and biases would be the important part of the learning process, at least until your cost is as low as you can get it. Then what? The code never shows you how to actually save and re-use these weights and biases for predictions in the future. How would you save these for future use?


Again, your spot on. The example is just showing training stages, not inference. In practice, while training you may want to save checkpoints, and after training, save the trained model. Then, in your production software, you would load the model, and use it to perform inference. (I.e. actually run classification on new data). Different frameworks have different methods to do this. See: http://cv-tricks.com/tensorflow-tutorial/save-restore-tensorflow-models-quick-complete-tutorial/ and https://www.tensorflow.org/programmers_guide/saved_model for an intro for tensorflow. Keras, which runs on top of tensorflow (and I totally recommend it! (after you know a bit more about tensorflow, so you'll have a feel for what's under the hood)) makes it super simple. The following will save/load the model architecture, the optimizer state, layers, biases, and weights. Other frameworks have similar mechanisms.
Code:

# ... build and train model...
model.save_weights("my_model.h5")
# later...
model = load_model('my_model.h5')
model.predict(new_data)


So, raw tensorflow takes a tiny bit of work to save/restore your model and weights, but it is mostly just boilerplate. Once your do have the saved model though, you can endlessly re-use it, or even convert it to a different framework, like caffe or torch, or tensorflow-lite which runs on phones. :)


Bigun wrote:

One of the use-cases that was mentioned (and actually has current relevance in my life) is property values. This example has 10 outputs between 0 and 1. If I wanted 1 output with values between 0 and n, I assume I would use ReLU instead of Softmax?


Catching on quick, eh!? You got it exactly right. to have it output a single value, 0-n, You would bottleneck down to a layer that has only one dimension, drop ReLU on it, (and maybe use a different loss function, such as MSE, but that depends on the task), and start training it up.

I hope I've answered everything clearly enough. Let me know if something was vague or if you have any more questions. Cheers!


Last edited by n17r4m on Sat Jun 23, 2018 2:22 am; edited 3 times in total
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Fri Jun 22, 2018 10:30 pm    Post subject: Reply with quote

Ok, in the world of statistics (and by inference, deep learning), there is a difference between mean and average, correct? Because that was throwing me off.

*edit*

OK, wow. Keras makes tensorflow look simple!

*edit*

I'd like to try it, is there an overlay you use for it?
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
szatox
Veteran
Veteran


Joined: 27 Aug 2013
Posts: 1698

PostPosted: Fri Jun 22, 2018 10:59 pm    Post subject: Reply with quote

Quote:
in the world of statistics (and by inference, deep learning), there is a difference between mean and average, correct?
There is, as soon as you get an asymmetrical distribution. A simple real life example: wages.
There are very few people who earn enormous amounts of money. The vast majority does not. The vast majority contributes to both, mean and average. Those few very rich drive average value up (because by being much richer then the rest they own significant amount of money), but their impact on mean is negligible because they only contribute with their tiny headcount. "Mean" means "split population count in half". Average is "sum them up and divide by head count". Mean gives you a "typical" value.
So, if you want to know how much you can earn in one position vs the other, always compare mean values (well, you may figure out a better way once you have enough data, but this allows you make reasonable decision with very limited knowledge in no time)
Back to top
View user's profile Send private message
n17r4m
n00b
n00b


Joined: 22 Jan 2018
Posts: 19

PostPosted: Fri Jun 22, 2018 11:17 pm    Post subject: Reply with quote

Bigun wrote:
Ok, in the world of statistics (and by inference, deep learning), there is a difference between mean and average, correct? Because that was throwing me off.


Just to follow up on szatox, I think there is a mistake here: ""Mean" means "split population count in half". Average is "sum them up and divide by head count""

Mean is "sum the values up and divide by count".
Median is "Sort the values, and pick the value located at the center" (or, "split population count in half, and pick middle value")
Mode is "Choose the value that occurs most frequently"

Average can refer to any of these terms, or even more exotic strategies (e.g. the various types of running average), although it usually refers to "mean" in colloquial conversation. but it is still considered ambiguous. For example, "The average person gets the ham & cheese sandwich" is actually referring to the "mode" of a sandwich buying distribution. Another is, "There are 10x or even 100x developers that are more productive then your average programmer" - This one is going to be referring to the "median", since it implies we don't want outliers shifting the mean. Finally, "Canadians walk more distance a day on average than Americans" - This one would probably be referring to a comparison between two "mean"s.
[edit: added examples]

Bigun wrote:
OK, wow. Keras makes tensorflow look simple!

I'd like to try it, is there an overlay you use for it?


If you have a recent tensorflow installed, you already have it!
https://www.tensorflow.org/api_docs/python/tf/keras

Code:
 from tensorflow import keras


The API is the same as from http://keras.io, So, all those docs are relevant, although you may have to modify the imports.

Code:
# instead of:
from keras.layers import Input, Dense
from keras.models import Model
# do this instead:
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model


Easy Peasy!
Back to top
View user's profile Send private message
Bigun
Advocate
Advocate


Joined: 21 Sep 2003
Posts: 2147

PostPosted: Sat Jun 23, 2018 4:06 pm    Post subject: Reply with quote

n17r4m wrote:
...
As for cross-entropy, there are two common variants. Categorical cross-entropy (n categories) and binary cross-entropy (2 categories). Both are generally used as a type of loss function. Loss functions are in general, the method in a neural network that estimates how "wrong" it currently is for the current input while training. Cross-entropy is a statistics driven measure of how different two distributions are. So essentially, it's for taking the results from the network, and comparing against what the results should actually be, and quantifying the error, or "loss".
...


So was the example above binary cross entropy?
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Back to top
View user's profile Send private message
n17r4m
n00b
n00b


Joined: 22 Jan 2018
Posts: 19

PostPosted: Sat Jun 23, 2018 4:42 pm    Post subject: Reply with quote

Bigun wrote:
So was the example above binary cross entropy?


Categorical cross-entropy, since the example was a classification problem. Binary cross-entropy works for binary data, such as segmentation masks, or yes/no analyzers. Categorical cross-entropy is for pretty much anything else where there are more than two classes, such as determining what category an image belongs to.

To clarify, binary cross-entropy is just a special case of categorical cross-entropy that is more computationally efficient when number of classes = 2. Basically, the equations get simplified.
Check out: http://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html#cross-entropy
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Goto page 1, 2, 3  Next
Page 1 of 3

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum