NLP Machine Translation with RNNs, LSTMs, and Transformers

PYTORCH

SCIKIT-LEARN

PYTORCH

TORCHTEXT

Natural Language Processing (NLP) has seen remarkable progress in recent years, particularly in the field of machine translation. This project tackles the challenge of translating German sentences into fluent English using two of the most influential deep learning architectures in NLP: Sequence-to-Sequence (Seq2Seq) models and Transformers.

The project began with implementing traditional recurrent models—namely RNNs and LSTMs—to capture the temporal dependencies in language. These models were structured in an encoder-decoder framework: the encoder processed German input sequences into dense latent vectors, and the decoder generated English translations one token at a time. To improve performance and enable the model to focus on relevant parts of the input sequence during decoding, I integrated a custom attention mechanism based on cosine similarity.

After optimizing the Seq2Seq model through careful tuning of hidden layer sizes and dropout rates, the best-performing version used an LSTM-based encoder-decoder with attention, achieving significantly reduced perplexity scores on both training and validation data.

In the second phase, I implemented the Transformer architecture, first as a custom one-layer encoder, then as a full encoder-decoder model using PyTorch’s nn.Transformer. These models employed multi-head self-attention, learned positional encodings, and deep feedforward layers to capture complex dependencies in the data. A key aspect of the implementation was managing autoregressive decoding using masked attention to ensure that predictions were made one word at a time during training.

Quantitative and qualitative evaluations demonstrated that Transformer-based models outperformed the Seq2Seq architecture, producing more fluent translations and converging faster during training. Hyperparameter tuning further improved the results, with the full Transformer model emerging as the top performer.

This project not only deepened my understanding of neural translation systems but also showcased the transition from traditional recurrent architectures to modern attention-based models—reinforcing how design innovations can drive meaningful gains in language understanding.

Published April 2025

Go back home