Finetuning an LLM on trippy mathematical reasoning (with a couple of RTX 2080 cards)
SUMMARY
What happens when you teach language models to solve math problems through surreal, psychedelic reasoning that somehow still gets the right answer?
This is the first post in a series about the project. I started this journey not just for fun, but also as a way to teach myself, explore, and maybe discover something interesting along the way.
The experiment: Instead of boring step-by-step logic, I am training models to generate trippy, stream-of-consciousness rationales wrapped in custom tags like
The recipe needs three key ingredients: (1) A dataset of trippy answers to questions, (2) Supervised finetuning with parameter-efficient methods, and (3) Preference tuning with lightweight RL to enforce “trippy” thoughts.