Character Recurrent Neural Network

  • mimicing Shakespeare's writing style
  • LSTM
In [1]:
!rm -r data
import os 

try:
  os.mkdir("./data")
except:
  pass

!wget https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt -P ./data
--2019-06-03 09:34:17--  https://raw.githubusercontent.com/dmlc/web-data/master/mxnet/tinyshakespeare/input.txt
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.0.133, 151.101.64.133, 151.101.128.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.0.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1115394 (1.1M) [text/plain]
Saving to: ¡®./data/input.txt¡¯

input.txt           100%[===================>]   1.06M  --.-KB/s    in 0.07s   

2019-06-03 09:34:17 (14.8 MB/s) - ¡®./data/input.txt¡¯ saved [1115394/1115394]

1. Settings

1) Import required libraries

In [0]:
import torch
import torch.nn as nn
In [0]:
import unidecode
import string
import random
import re
import time, math

2) Hyperparameter

In [0]:
num_epochs = 2000
print_every = 100
plot_every = 10
chunk_len = 200
hidden_size = 100
batch_size = 1
num_layers = 1
embedding_size = 70
lr = 0.002

2. Data

1) Prepare characters

In [5]:
all_characters = string.printable
n_characters = len(all_characters)
print(all_characters)
print('num_chars = ', n_characters)
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ 	

num_chars =  100

2) Get text data

In [6]:
file = unidecode.unidecode(open('./data/input.txt').read())
file_len = len(file)
print('file_len =', file_len)
file_len = 1115394

3. Functions for text processing

1) Random Chunk

In [7]:
def random_chunk():
    start_index = random.randint(0, file_len - chunk_len)
    end_index = start_index + chunk_len + 1
    return file[start_index:end_index]

print(random_chunk())
:
Depress'd he is already, and deposed
'Tis doubt he will be: letters came last night
To a dear friend of the good Duke of York's,
That tell black tidings.

QUEEN:
O, I am press'd to death through want

2) Character to tensor

In [8]:
def char_tensor(string):
    tensor = torch.zeros(len(string)).long()
    for c in range(len(string)):
        tensor[c] = all_characters.index(string[c])
    return tensor

print(char_tensor('ABCdef'))
tensor([36, 37, 38, 13, 14, 15])

3) Chunk into input & label

In [0]:
def random_training_set():    
    chunk = random_chunk()
    inp = char_tensor(chunk[:-1])
    target = char_tensor(chunk[1:])
    return inp, target

3. Model & Optimizer

1) Model

In [0]:
class RNN(nn.Module):
    def __init__(self, input_size, embedding_size, hidden_size, output_size, num_layers=1):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.num_layers = num_layers
        
        self.encoder = nn.Embedding(self.input_size, self.embedding_size)
        self.rnn = nn.LSTM(self.embedding_size,self.hidden_size,self.num_layers)
        self.decoder = nn.Linear(self.hidden_size, self.output_size)
        
    
    def forward(self, input, hidden, cell):
        out = self.encoder(input.view(1,-1))
        out,(hidden,cell) = self.rnn(out,(hidden,cell))
        out = self.decoder(out.view(batch_size,-1))
        return out,hidden,cell

    def init_hidden(self):
        hidden = torch.zeros(self.num_layers,batch_size,self.hidden_size)
        cell = torch.zeros(self.num_layers,batch_size,self.hidden_size)
        return hidden,cell
    

model = RNN(n_characters, embedding_size, hidden_size, n_characters, num_layers)
In [11]:
inp = char_tensor("A")
print(inp)
hidden,cell = model.init_hidden()
print(hidden.size())

out,hidden,cell = model(inp,hidden,cell)
print(out.size())
tensor([36])
torch.Size([1, 1, 100])
torch.Size([1, 100])

2) Loss & Optimizer

In [0]:
optimizer = torch.optim.Adam(model.parameters(), lr=lr)
loss_func = nn.CrossEntropyLoss()

3) Test function

In [0]:
def test():
    start_str = "b"
    inp = char_tensor(start_str)
    hidden,cell = model.init_hidden()
    x = inp

    print(start_str,end="")
    for i in range(200):
        output,hidden,cell = model(x,hidden,cell)

        output_dist = output.data.view(-1).div(0.8).exp()
        top_i = torch.multinomial(output_dist, 1)[0]
        predicted_char = all_characters[top_i]

        print(predicted_char,end="")

        x = char_tensor(predicted_char)

4. Train

In [14]:
for i in range(num_epochs):
    inp,label = random_training_set()
    hidden,cell = model.init_hidden()

    loss = torch.tensor([0]).type(torch.FloatTensor)
    optimizer.zero_grad()
    for j in range(chunk_len-1):
        x  = inp[j]
        y_ = label[j].unsqueeze(0).type(torch.LongTensor)
        y,hidden,cell = model(x,hidden,cell)
        loss += loss_func(y,y_)

    loss.backward()
    optimizer.step()
    
    if i % 100 == 0:
        print("\n",loss/chunk_len,"\n")
        test()
        print("\n\n")
 tensor([4.5800], grad_fn=<DivBackward0>) 

\.o:w$(OkE:]bk><oj"&H{-0wCUWD.^J-7W<MB/8}y5rgZkcp{s lkc6HY\v-zc(Ry%9bS~M.u1 81M}2=VP <,|$>,NRc?}Ba?]mb?e^ S6k1Z=KW/C~hua



 tensor([2.5637], grad_fn=<DivBackward0>) 

b5-,ainthht hakof ts the so, wese macel the wiuor Eaod sd.st es dfed hothiros torp thind ish  yov;

Wheny thenbs wea illso oruse I thilild.

Soufntha,
I
:

b loun, tor be leg teth mafr ed

Foll tomvs w



 tensor([2.5285], grad_fn=<DivBackward0>) 

besy my wipike saur I tho wor urder the tod and thar mBr in there lod ding tar sondt pool that st hean ospessoy les nith ak, sout shace fyousd ice
EHhers cir annd beath the tho swer at thler dy ous?
ei



 tensor([2.3063], grad_fn=<DivBackward0>) 

bre, dstiner suld tho west the I colur, calld mor the,
The ond thas thied ang all dor de thy fone nonede.
pot int, in the sonncende rise co but the for oust be hy-udd an as, pof ton thech cor bumeras w



 tensor([2.3465], grad_fn=<DivBackward0>) 

bn hof's nange to in sing brath beang high mais sof muimes I anterk or to funcour houry me klof ife thay dith ols and hace.

ANCAnESRIOL:
Le coed fus afis
Now grais, was thers four lome of mor mims wha



 tensor([2.2852], grad_fn=<DivBackward0>) 

bveancus hakant Dhand of beg lofer,
That I cowe or muse lor to the he pat, fill cand what my, coff ice so sou wom late wats mat theer!
I thow armdcer!
The wit lor sof art yy sut the yo with no cot yind



 tensor([2.1903], grad_fn=<DivBackward0>) 

blake nod so and deed Wave yout, fortered, to me corstsore,
The were and blow, hell
Got my cound see, wienpes c and cor he hand are and frolis you moneces
Do myse salmes sthat Thay are foncers beavern.



 tensor([2.1496], grad_fn=<DivBackward0>) 

bur's fintres for this ion it and you gerse
Is disg the theaven,
The the the the hald my quelfor gourse, hee the frith, the will hest the gore that hear and mis that in wire hell wime of mat beasse.

D



 tensor([2.1944], grad_fn=<DivBackward0>) 

bard weak ir me shice spatail meary wall delin the have bother beres the may to for the hat prat mold hand the enorger'd the ondas the dosions, surch ar ald, with your me tommish as anst he to loul!
An



 tensor([2.0629], grad_fn=<DivBackward0>) 

bor your tor and her with wiand.

Thind to hear my good and and pemave am I hou himing arose of or sha! sill hill, shais the but and be thour but mee
Nord, with me that hing dis brenth at as the lesh g



 tensor([1.8610], grad_fn=<DivBackward0>) 

bramce, os drearm: houghe.

UERCING IINE:
Thou dout, urle the lord the man on,
And I ofther conith, forsentorened and his ho'st gelllound you hel.
 I thlat hand you land, sher her this, for there of to



 tensor([2.1274], grad_fn=<DivBackward0>) 

be be mad ouf to coundizforefer, the with
Froke akent, shous lile blesh is the sep mand, for
ble vowl: sruucy friin hattrer do hing, madane?

PUCHESLAD:
And I pring: will dir and to a Eof.
Ally in a cr



 tensor([2.0586], grad_fn=<DivBackward0>) 

ble hin 'as my me me here.

DERD ERY INIO:
To wrord you mey of elf the theak my and me weress angour prome of the chash fich a mast dickion:
And grediness the arcecn's a hatus soppade the prace fard.





 tensor([1.9983], grad_fn=<DivBackward0>) 

be sims a a do Lirsiman of he,
Ou werde the from wet you conith, be romige the cae trise:
ThA leing be weld,
And be hous the suld bot thit of gent the morter and mose arther'd of the preabes;
And Vor i



 tensor([1.9271], grad_fn=<DivBackward0>) 

beourfup'tyer's wirst,
I tathe toomes ith cearl susk noouss loulmess.

Andon:
He s us ound sound the bughine
To sto lado'l shee k wariton,
And in his your stand lands.

LOUCET:
So reition what the eart



 tensor([1.8590], grad_fn=<DivBackward0>) 

be to tat benous four for thou firch all decher man.

MARINIA:
He dearp withned on my is to dife,
Wain the diit pearn the the rowbly gaugen?

MERWARD III:
Frain entook lithesjust welt.

GLOUCESTELNER:




 tensor([1.8758], grad_fn=<DivBackward0>) 

bser and coulder was adem
To kn of in withent of the worter thou my's fur have rist there in that not to good a speaitting her to drowth of touther-ser do? Crume indess,
And mefore hif gare for is so t



 tensor([2.0135], grad_fn=<DivBackward0>) 

be your should to doch aidy. aul the igow
As you going my hath so weinces, swas doter you becing all deocs'r fift
To cose of uet, came awl atter she veed.

SICINIUS:
What contures hees siend to Beat it



 tensor([1.8559], grad_fn=<DivBackward0>) 

but then look, beanding longh,
Marly vicending to comen your hem's aclavest thats
Norsed have Coru, him promwill the saccely.

TRLAURSENR:
What then rear condoind your love your your and to your poth
W



 tensor([1.9378], grad_fn=<DivBackward0>) 

bis sorm:
Thall the orst, son my by the fright with apprie.

BELIONCELO:
Thay our in thim so my, this to indion for gray frover
As the corily theear thoust mank and passour word shall matard,
For hosk 


In [0]: