Part 2: continuing from Part 1

Previously, I showed a visualization of an LSTM by drawing with your touch; the LSTM predicted the future segments of the curve by remembering portions of what you drew in the past.

In this post, I will explore the internals of an LSTM via a simple 1-unit LSTM that detects a 2-bit pattern.

To recap, an LSTM (Long Short-Term Memory) is a Recurrent Neural Network that can be trained via supervised learning to remember and forget patterns. You have likely come across LSTMs in the context of text predictions, speech recognition and so on.

More background material on LSTMs

For reference, I recommend reading these excellent LSTM deep-dives from the experts in the field Colah’s Blog, Schmidhuber’s slides. I am exploring from the POV of an engineer with no background in ML. This was useful to me as I am learning/exploring practical RL with Puffer.

Demo

Below is a playable demo that shows an 1-bit LSTM. It is trained to detect a 2-bit pattern fed 1-bit at a time. During inference, the LSTM can remember the current/previous bits and verify if the trained target pattern has been fed sequentially. At the end of this blogpost, I will also show a 4-bit LSTM with long-term/short-term properties.

First set the Target pattern (by clicking on the 0’s to flip to 1’s as needed) and click Train. Next, set any inference pattern under the Test Input Sequence and click Run inference (step 1).

You can see how the hth_t and ctc_t flow through the gates. Click Run inference (step 2) next and you will see the final ‘pattern recognized’ if it matches the pattern fed through during training (and a green 1 or a red 0 depending on success/fail). NOTE: I am directly interpreting the final output probability as a 0 (fail) or 1 (success).

(If you have trouble scrolling, click here to go past the iframe)

Scroll to top

This was cool for me to visualize as it shows how the h/c states are fed into the neural network in a recurrent fashion. In particular, we can kinda see how the backprop/gradient descent has trained the sigmoid and the tanh non-linear activations that are part of the LSTM internals to mimic a pseudo-Karnaugh map. You can try different combinations to see the flow of information through the LSTM.

4-bit LSTM

We can zoom out a bit and see how this applies to a slightly larger network to remember the last four bits as part of the h/c states. Here is the same concept applied to a 4-bit pattern recognizer fed bit-by-bit in a sequential manner.

(If you have trouble scrolling, click here to go past the iframe)

Scroll to top


Side note on LLM usage Given that this is a throwaway toy/demo, I used LLMs (Opus 4.6 mainly, GPT5.3 Codex for some parts) to vibe-code this mostly for a few reasons:
  • Well, it’s a throwaway. It’s a toy demo.
  • I am really bad at UI. LLMs are much much (infinitely?) better at UI than I am. It is really, really, really good at manipulating swathes of HTML/TS/CSS (and SVG!) without breaking a sweat.
  • LLMs are really good at helping learning new things that is part of their corpus already.

LLMs are really good in certain contexts and are really bad in others, here is my general usage pattern:

  • Aid learning, curiosity and explorations
  • Build quick demos, long-lasting scripts and for prototyping
  • Refactoring/understanding large-ish code bases

In future posts, I will explore (and exploit) reinforcement learning by building a (toy-ish) pixel platformer using raylib and Puffer. Follow me on X for updates.