Finetuning an LLM on trippy mathematical reasoning (with a couple of RTX 2080 cards)

24.08.25 — The first blog post about this project is out, more to come (hopefully) soon.
blog post · code

SUMMARY

What happens when you teach language models to solve math problems through surreal, psychedelic reasoning that somehow still gets the right answer?
This is the first post in a series about the project. I started this journey not just for fun, but also as a way to teach myself, explore, and maybe discover something interesting along the way.
The experiment: Instead of boring step-by-step logic, I am training models to generate trippy, stream-of-consciousness rationales wrapped in custom tags like and .
The recipe needs three key ingredients: (1) A dataset of trippy answers to questions, (2) Supervised finetuning with parameter-efficient methods, and (3) Preference tuning with lightweight RL to enforce “trippy” thoughts.