Generatıng englısh names wıth lstm

Today, artificial intelligence develops not only models that predict well, but also models that are successful generators. Generator models that draw pictures, generator ai models that write poetry can be given as examples.

This project is a simple example of the generator function of artificial intelligence. Within the scope of the project, the model was trained using a dataset of 6782 English first names, and it was aimed that this model could generate an English name by decoding the word structure patterns of English names. Since the order and position of the letters in the names are important, the bi-directional LSTM model was used.

MAIN STEPS OF THE PROJECT:

Preprocessing:

Names which has some characters not include in unicode are dropped into dataset.
All letters of names are converted to lowercase.
Stop character (‘_’) are added to all end of names. (model can understand with this char to where is the end of the name generation and we can easily extract the name which is generated.)

2. Encoding:

All letters of names are encoded with index of the list that cover all lowercase letter list.

3. Creating Dataset:

Encoded names split into their characters. Datasets x and y were created by setting each letter as input for the next letter, respectively in the name.
Structural data is separated as train and test data and converts into tensor dataset format, then grouped within themselves depending on the batch size value.

4. Creating LSTM Model:

Since the order and position of the letters in the name carries the actual information to be extracted from the word, the Bi-LSTM model was used, which is suitable for data containing a series.
Bi-LSTM model is configured. Components of the model decided in this configuration are:

- type of layers (embedding, Bi-LSTM, linear etc.)

- number of layers (1 embedding layer, 2 linear layer, 1 Bi-LSTM layer)

- dimension of layers (ex: how many neurons embedding layer has)

- type of activation functions (ex: last neuron is activated by sigmoid, because last neuron must give binary solution)

- Hyperparameters are determined. These are:

- Learning rate (0.01)

- Epochs (10)

- Optimizer function (Adam Optimizer)

- Loss function (Cross Entropy Loss)

3. Training and Evaluation:

During the weight update process(training) in each epoch, we can clearly see that validation and training loss are decreased and difference between these 2 loss type are not increased. So we can say that most probably, out model has no any overfitting problem.

BONUS:

Let’s give a random letter to our trained model and see what is happening:

When we give ‘j’ as input to our pretrained model, it generates ‘jon’.

When we give ‘d’ as input to our pretrained model, it generates ‘daria’, not bad 😊

Lastly,When we give ‘b’ as input to our pretrained model, it generates ‘brin’.

< View Code

Back to Projects