in All availableThis forumThis topic
 Deep Learning Help View unanswered posts View posts from last 24 hours Goto page Previous  1, 2, 3  Next
Author Message
Bigun

Joined: 21 Sep 2003
Posts: 2153

 Posted: Sat Jun 23, 2018 5:07 pm    Post subject: 1) Ok, binary cross entropy is used for the output layer when values are expected to be a binary 0 or 1? If I still don't get it, maybe a use-case so I can wrap my head around it. 2) I saw Jupyter in a video, it looks nice, opinions?_________________"It's ok, they might have guns but we have flowers." - Perpetual Victim
n17r4m
n00b

Joined: 22 Jan 2018
Posts: 19

Posted: Sat Jun 23, 2018 6:41 pm    Post subject:

 Bigun wrote: 1) Ok, binary cross entropy is used for the output layer when values are expected to be a binary 0 or 1? If I still don't get it, maybe a use-case so I can wrap my head around it.

Sorta, bear in mind that we are not necessarily talking about a single 0 or 1, but could be a vector, or matrix or tensor of them. So as an example, something I'm working on involves analyzing images and an attempt to segment the foreground from the background. To do so, I'm going to be developing an ANN that will output a binary image the same size as the input frame, such that foreground detection will be 1's, and the background 0's. Binary cross-entropy will be likely used as the loss function.

Now, suppose that I want to also perform classification end-to-end alongside the segmentation task, and I have 5 classes of interest. For my real-life project, I want to have a separate ANN that performs classification, since I want to formulate it as a joint auto-encoder/classifier so I can store the latent feature vector to recreate the object image and do other analysis such as finding other objects located nearby in latent space. But, anyways, let's pretend I want my segmentation system to also do classification. Instead of outputting a single B/W image representing the segmentation, I would have my ANN output an image with 6 binary channels, one for each category, and one for background. I'd use categorical cross-entropy in this case during training, as each pixel is being categorized into one of 6 buckets. Not to open a can of worms, but another option would be to use the dice coefficient or KL-divergence as a loss function.

Binary cross-entropy is just a special case when you have only two classes, and categorical cross-entropy is when you have three or more. Regardless, it's maybe best to think of them both of them as formulations tuned for performing logistic regression. Fundamentally, cross-entropy is trying to capture the number of bits required to describe the difference between two distributions. In my segmentation case, it's like counting how many pixels are wrong. If there are many pixels (bits) incorrect, then the magnitude of the subsequent back-propagation will be large. If only a few pixels are incorrect, then the magnitude will be very small. A bit of complexity I'm leaving out is that convolutional layers are involved so it's more fuzzy then simple pixel-wise calculations, and that you'll need a lot more training data to get good results, but I'll save that stuff for a different reply. When trying to categorize a whole image, you would just have a small vector, where each element represents a class, but as in my example, you can also generalize this to classify at the pixel level as well and the math is (mostly) the same.

All that said, the best option is to explore, learn, and try to build up a mathematical intuition of what works where and why it works.

 Bigun wrote: 2) I saw Jupyter in a video, it looks nice, opinions?

It is nice. A lot of research gets published using it as a format. GitHub also has first-class support for it. E.g: https://github.com/pedroprates/gcloud-cs231n/blob/master/TensorFlow.ipynb. It's a nice tool for quick iteration and prototyping. Personally, I like good old fashioned IDEs, .py files, and good code structures, especially since you can't
 Code: import
from a Jupyter notebook (though you can import into one), but for one-off experiments Jupyter is very handy. Many of my colleagues do quite a bit of their work in it. So --- It's absolutely great for prototyping, sharing documented snippits, and is a easy way to display various visualizations, but isn't necessarily useful for building production software. Or, at least that's my two cents on the topic.
peakeyed
n00b

Joined: 18 Nov 2004
Posts: 58

Posted: Sun Jun 24, 2018 8:57 pm    Post subject:

Thanks!
n17r4m wrote:

 Code: # instead of: from keras.layers import Input, Dense from keras.models import Model # do this instead: from tensorflow.keras.layers import Input, Dense from tensorflow.keras.models import Model

Easy Peasy!
Mod edit: Corrected quote attribution. — JRG
Bigun

Joined: 21 Sep 2003
Posts: 2153

 Posted: Mon Jun 25, 2018 11:45 pm    Post subject: @everyone On keras: They have some sample keras code here. Very helpful when combined with a tutorial code found here. A few of the commands in the tutorial are out of date, but it's still pretty easy to follow._________________"It's ok, they might have guns but we have flowers." - Perpetual Victim
Bigun

Joined: 21 Sep 2003
Posts: 2153

 Posted: Thu Jun 28, 2018 2:24 pm    Post subject: So, on the topic of how long the learning process takes, it's starting to impede on the learning. Came across this, it says it "requires" Ubuntu. Just wondering if that's sales talk for "Linux" or what. Specs say it supports tensorflow._________________"It's ok, they might have guns but we have flowers." - Perpetual Victim
Bigun

Joined: 21 Sep 2003
Posts: 2153

 Posted: Thu Jun 28, 2018 11:28 pm    Post subject: Ok, the "property values" idea is out. I started checking into this, turns out, tons of companies have invested a lot more time and effort into this. So, my next idea was to write a LSTM RNN that downloads all the tweets from a user, trains on their tweets (until something resembling English comes out), then let it occasionally spit out new tweets. That said, I got the idea from here and here. I've read over the article, but compared to the ANN's I've studied so far, RNN seems a lot more complicated. And to make matters worse, the examples in the code I want to follow are written in torch. Have any tutorials I can follow to write similar code? I'm current reading over this tutorial, but unsure if that's what I need to follow or not._________________"It's ok, they might have guns but we have flowers." - Perpetual Victim
n17r4m
n00b

Joined: 22 Jan 2018
Posts: 19

 Posted: Fri Jun 29, 2018 9:14 am    Post subject: RNNs explained (LSTM and GRU included): For some iterative process, simply pass along some previous state. Oh. So, I have no opinion at all on that hardware. But, I'm absolutely certain that if there is ubuntu support there is a method to get it working using gentoo.
Bigun

Joined: 21 Sep 2003
Posts: 2153

Posted: Fri Jun 29, 2018 11:17 am    Post subject:

 n17r4m wrote: ...But, I'm absolutely certain that if there is ubuntu support there is a method to get it working using gentoo.

That's what I was thinking.
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Bigun

Joined: 21 Sep 2003
Posts: 2153

 Posted: Fri Jul 06, 2018 5:29 pm    Post subject: So, I got an RNN LSTM tutorial in keras. For this project, this code will work fine for a one-time training exercise, but I'm wanting to continue to train on newer data. In your opinion, would it be better to completely retrain the RNN every time I download new data, or load up the weights and biases from previous exercises, download new data, and improve on previous training? *edit* Nope, re-loading the weights and continuing training is an impossibility. The moment a new character gets introduced to the character set, it breaks the weights in the model. It's going to have to re-train every time._________________"It's ok, they might have guns but we have flowers." - Perpetual Victim
n17r4m
n00b

Joined: 22 Jan 2018
Posts: 19

Posted: Fri Jul 06, 2018 6:57 pm    Post subject:

 Bigun wrote: Nope, re-loading the weights and continuing training is an impossibility. The moment a new character gets introduced to the character set, it breaks the weights in the model. It's going to have to re-train every time.

Could you elaborate on this? It should be possible to fine tune an existing model.
Bigun

Joined: 21 Sep 2003
Posts: 2153

Posted: Fri Jul 06, 2018 8:13 pm    Post subject:

n17r4m wrote:
 Bigun wrote: Nope, re-loading the weights and continuing training is an impossibility. The moment a new character gets introduced to the character set, it breaks the weights in the model. It's going to have to re-train every time.

Could you elaborate on this? It should be possible to fine tune an existing model.

It may be possible, but the way I'm understand things, I'm unsure how to get this to work with my limited knowledge.

The original training script stripped out the uppercase letters, and ended up with a "n_vocab" value of 100. "n_vocab" is determined below:

 Code: # load ascii text and covert to lowercase filename = "somefile.txt" raw_text = open(filename).read() raw_text = raw_text.lower() # create mapping of unique chars to integers, and a reverse mapping chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars)) int_to_char = dict((i, c) for i, c in enumerate(chars)) # summarize the loaded data n_chars = len(raw_text) n_vocab = len(chars)

From what I can tell, the input shape is determined by the "n_vocab" variable:

 Code: # reshape X to be [samples, time steps, features] X = numpy.reshape(dataX, (n_patterns, seq_length, 1)) # normalize X = X / float(n_vocab)

Not quite sure what the above code is doing, but it definitally has an impact on the input shape later:

 Code: # define the LSTM model model = Sequential() model.add(LSTM(256, input_shape=(X.shape[1], X.shape[2]), return_sequences=True))

As it trains, the weights are saved:
 Code: # define the checkpoint filepath="weights-improvement-{epoch:02d}-{loss:.4f}-bigger.hdf5" checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=1, save_best_only=True, mode='min') callbacks_list = [checkpoint]

So, I let it train, and I didn't like the output.

Next, I pulled out the "lower()" method and allowed uppercase letters. I then try to generate some text using previous weights (from the 100 character vocabulary) to see what comes out:

(forgive the large code block)

Keeping uppercase letters changed the "n_vocab" to 133.

I load the lowest weighted file I had, went to generate output, and (I didn't have the exact output of the error - but) it complained the input shape was 133 and there were only 100 weights to be used.

The code doesn't have an example of loading weights to continue training, but I'm assuming if the input shape changes (and it can if it is organic data generated by humans), then I think it will breaks the weights, because the model essentially changed since the weights were generated.

Do I have it wrong? Or could the model be written better to bypass this issue?
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
n17r4m
n00b

Joined: 22 Jan 2018
Posts: 19

 Posted: Fri Jul 06, 2018 8:30 pm    Post subject: Ok, so unless I'm mistaken you are trying to create a per-character LSTM sequence generator? I'd suggest you have a fixed vocabulary size (E.g. 26 letters, 10 numbers, and a few special symbols, such as punctuation, and replace all other "rare" symbols to some other catch-all character such as ~, for example. This is just to ensure your input_size is reasonably sized and consistent. You'll lose some information, like if some text commonly uses » and « symbols, that would coalesce to your catch-all symbol, but in general you'll get pretty decent results. Alternatively, just shoot for the moon and have your input_shape accept any valid ascii (size: 255), or even utf-16 (size: 1,112,064 -- in terms of data size, this is about the same as a 1000x1000px image, so not totally unreasonable...) Edit: Just to be clear, 1) your correct that the input_shape needs to be consistent between datasets 2) smaller input_shapes usually are easier to train. 3) I'm not actually suggesting you have all of UTF-16 supported as your input, but it is theoretically possible, but it would be more difficult to train and would use a lot more compute to do so. Some fixed subset of ascii should definitely suffice if your just experimenting.
Bigun

Joined: 21 Sep 2003
Posts: 2153

Posted: Fri Jul 06, 2018 9:48 pm    Post subject:

Expanding the input range to the full UTF-8 character set is something I'd like to try.

So I manually set the input size to the full 1,112,069 characters, I don't see the training taking any longer, but the loss made a gigantic leap to nearly 5 when the training started, but it is going down quickly.

That said, I need to know what this is doing:

 Code: # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars))

I understand it converts characters to integers, however, will introducing a new character change integers of other characters? I believe that would break the outputs when they get used.

If so, is there somewhere I can download (or generate) the full UTF-8 characterset and create the list that way. Or would there be some other way to tackle this so any new and unseen UTF-8 character get introduced to the set and not break everything else?

*edit*

Yeah, there are some circumstances where this would break. I need some way to convert a UTF-8 character to an integer.
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
n17r4m
n00b

Joined: 22 Jan 2018
Posts: 19

Posted: Fri Jul 06, 2018 10:30 pm    Post subject:

 Bigun wrote: Expanding the input range to the full UTF-8 character set is something I'd like to try.

Watch out for multi-byte characters!

 Bigun wrote: So I manually set the input size to the full 1,112,069 characters, I don't see the training taking any longer, but the loss made a gigantic leap to nearly 5 when the training started, but it is going down quickly.

Awesome! TBH though, I'd really recommend starting with just 127 (limited ascii) or 255 (extended ascii)

Bigun wrote:
That said, I need to know what this is doing:

 Code: # create mapping of unique chars to integers chars = sorted(list(set(raw_text))) char_to_int = dict((c, i) for i, c in enumerate(chars))

Ok, breaking it down into the operations that snipped performs...

 Code: a = set(raw_text)

Converting raw_text to a set will split raw_text implicitly into a list of individual characters, and then by making it a set, will shrink the list such that only unique entries are included.

 Code: b = list(a)

Turns it from a set data type back into a list data type.

 Code: chars = sorted(b)

Sorts the unique characters by alpha-numeric ordering

 Code: char_to_int = dict((c, i) for i, c in enumerate(chars))

Create a dict that maps (starting at 0) an integer to each distinct character. E.G., if your raw text had only letters and spaces:
char_to_int = {" " : 0, "a": 1, "b": 2, .... }

So, what you want to do is create a char_to_int dictionary that simply corresponds to ascii codes.
e.g. {..., "a": 97, "b": 98, "c": 99, ...}
See: https://commons.wikimedia.org/wiki/File:Ascii-codes-table.png for the list. This dictionary can be pragmatically generated in just a line or two though.
 Code: char_to_int = {chr(i): i for i in range(127)} # for example

Hope this helps!
Bigun

Joined: 21 Sep 2003
Posts: 2153

Posted: Fri Jul 06, 2018 11:32 pm    Post subject:

I can't let this go:

 Code: #echos full range of UTF-8 characters filename="utf-8_charset.txt" #open file f = open(filename, 'w', encoding="utf-8") for i in range(255):     for j in range(255):         for k in range(255):             #convert each number to hex             i_hex = str.encode(chr(i))             j_hex = str.encode(chr(j))             k_hex = str.encode(chr(k))             #convert all three to UTF-8             character = b"" + i_hex + j_hex + k_hex             character = character.decode('utf-8')             #write character to file             f.write(character) #close file f.close

I use the file generated as a master key for the UTF-8 character set.

Overkill?
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
n17r4m
n00b

Joined: 22 Jan 2018
Posts: 19

 Posted: Fri Jul 06, 2018 11:49 pm    Post subject: Madness. Feel free to do as you please, but please let me know how training goes! I'm very curious how performance will be when only < 1% of the input vector is regularly used. I might have to do some experiments myself. In the vein of what your trying to accomplish (All utf-8 code points), you may want to cross compare your work with https://github.com/bits/UTF-8-Unicode-Test-Documents
Bigun

Joined: 21 Sep 2003
Posts: 2153

Posted: Sat Jul 07, 2018 12:23 am    Post subject:

Ehh:

 Code: \$ python3.5 train_lstm.py Using TensorFlow backend. Total Characters:  148645 Total Vocab:  1111997 Total Patterns:  148505 Traceback (most recent call last):   File "train_lstm.py", line 41, in     y = np_utils.to_categorical(dataY)   File "/usr/local/lib/python3.5/dist-packages/keras/utils/np_utils.py", line 31, in to_categorical     categorical = np.zeros((n, num_classes), dtype=np.float32) MemoryError

That page you gave me was really helpful, I used this character-set. Going to try one of the printable ones and see if it doesn't puke.
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Bigun

Joined: 21 Sep 2003
Posts: 2153

Posted: Sat Jul 07, 2018 12:35 am    Post subject:

Went with a printable one, I got a different error:

 Code: \$ python3.5 train_lstm.py Using TensorFlow backend. Total Characters:  148645 Total Vocab:  235128 Traceback (most recent call last):   File "train_lstm.py", line 34, in     dataY.append(char_to_int[seq_out]) KeyError: '????'

The '????' is the bed character
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Bigun

Joined: 21 Sep 2003
Posts: 2153

 Posted: Sat Jul 07, 2018 12:44 am    Post subject: Oh, now this works. I encode the entire thing with str.encode("utf-8") then train on it. This limits the input to 215 neurons, and it learns the UTF-8 charset as well. I then just decode back._________________"It's ok, they might have guns but we have flowers." - Perpetual Victim
Bigun

Joined: 21 Sep 2003
Posts: 2153

 Posted: Sun Jul 08, 2018 1:59 am    Post subject: I can't stand it anymore. I need a GPU to assist. Any suggestions without breaking the bank too badly?_________________"It's ok, they might have guns but we have flowers." - Perpetual Victim
Akkara

Joined: 28 Mar 2006
Posts: 6586
Location: &akkara

Posted: Tue Jul 10, 2018 12:11 am    Post subject:

 Bigun wrote: Oh, now this works. I encode the entire thing with str.encode("utf-8") then train on it. This limits the input to 215 neurons, and it learns the UTF-8 charset as well. I then just decode back.

That is a very good idea. Go with that.

But since this thread has become a nice summary of machine-learning stuff, I'll add what I was going to say anyway, just to have it all in one place:

The problem with a huge input set where most of the inputs never occur in your training data, is that learning can't happen on the corresponding weights. If the data is always 0, the weight-gradients are always 0, and the corresponding weights can't change from its initial randomized value. It just spends a lot of cycles calculating the irrelevant. Or worse - if you use some sort of weight-decay regularization - those weights will gradually decay to zero. And then if after billions of runs one of those inputs does show up, now its weight is zero and that input has no effect on the output, and can't learn from it. Essentially the net has "learned" that these inputs don't matter, and eventually loses the ability to respond to them.

A better approach is to train on the character-set that actually appears in your data. Then when a new character shows up, append it to the end of your input vector. And append a corresponding randomly-initialized row to the matrix that vector multiplies. And now you have a pre-trained net that can process the larger input set. You'll need a translation array to map from input codepoints to positions where they end up in your vectors. But that's a trivial amount of extra calculation compared to the billions of ops that these sorts of net require.

Having said that, I believe (but don't have the data to back it up), that training directly on utf8 will give even better results. Reason for my hunch is that the initial byte gives hints of what code-block the character is in. This jump-starts the categorization process and is easier and faster to learn from that, than from a very sparse encoding.
_________________
Isn't it odd that no job openings ever list being potty-trained as one of the requirements?
Bigun

Joined: 21 Sep 2003
Posts: 2153

Posted: Wed Jul 11, 2018 1:21 pm    Post subject:

 Akkara wrote: ...Then when a new character shows up, append it to the end of your input vector. And append a corresponding randomly-initialized row to the matrix that vector multiplies. And now you have a pre-trained net that can process the larger input set. You'll need a translation array to map from input codepoints to positions where they end up in your vectors. But that's a trivial amount of extra calculation compared to the billions of ops that these sorts of net require.

1) How is this done in keras? If I load previous weights with the new characters, it gives an error. Unsure how to load weights and load random weights for the unassigned

2) Is there someway to execute the model.fit method and specify a target loss instead of a target epoch?

3) How do you decide how many neurons to use on the hidden layers? Or how many hidden layers to use?
_________________
"It's ok, they might have guns but we have flowers." - Perpetual Victim
Bigun

Joined: 21 Sep 2003
Posts: 2153

 Posted: Fri Jul 13, 2018 6:51 pm    Post subject: Also, in trying to get things setup, I've purchased a card that only supports CUDA 3, and now I've been bitten because tensorflow 1.9.0 only support CUDA 3.5 or higher, and 1.8.0 isn't compiling. Is there a way for force 1.9.0 to use CUDA 3 or can someone assist with getting 1.8.0 to compile?_________________"It's ok, they might have guns but we have flowers." - Perpetual Victim
szatox
Veteran

Joined: 27 Aug 2013
Posts: 1712

Posted: Fri Jul 13, 2018 6:59 pm    Post subject:

 Quote: Is there a way for force 1.9.0 to use CUDA 3 or can someone assist with getting 1.8.0 to compile
Create another thread and post your build information. A few people here have really good eyes for those things.
John R. Graham

Joined: 08 Mar 2005
Posts: 10156
Location: Somewhere over Atlanta, Georgia

Posted: Fri Jul 13, 2018 9:28 pm    Post subject:

szatox wrote:
 Quote: Is there a way for force 1.9.0 to use CUDA 3 or can someone assist with getting 1.8.0 to compile
Create another thread and post your build information. A few people here have really good eyes for those things.
++

FYI, builds successfully here: amd64 stable (with package.keywords for tensorflow and its ~amd64 dependencies).

- John
_________________
I can confirm that I have received between 0 and 499 National Security Letters.
 Display posts from previous: All Posts1 Day7 Days2 Weeks1 Month3 Months6 Months1 Year Oldest FirstNewest First
 All times are GMTGoto page Previous  1, 2, 3  Next Page 2 of 3

 Jump to: Select a forum Assistance----------------News & AnnouncementsFrequently Asked QuestionsInstalling GentooMultimediaDesktop EnvironmentsNetworking & SecurityKernel & HardwarePortage & ProgrammingGamers & PlayersOther Things GentooUnsupported Software Discussion & Documentation----------------Documentation, Tips & TricksGentoo ChatGentoo Forums FeedbackOff the WallDuplicate Threads International Gentoo Users----------------中文 (Chinese)DutchFinnishFrenchDeutsches Forum (German)  Diskussionsforum  Deutsche DokumentationGreekForum italiano (Italian)  Forum di discussione italiano  Risorse italiane (documentazione e tools)Polskie forum (Polish)  Instalacja i sprzęt  Polish OTWPortuguese  Documentação, Ferramentas e DicasRussianScandinavianSpanishOther Languages Architectures & Platforms----------------Gentoo on AMD64Gentoo on ARMGentoo on PPCGentoo on SparcGentoo on Alternative ArchitecturesGentoo for Mac OS X (Portage for Mac OS X)
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum