>>4858 (OP)>>4858 (OP)It seems like nothinghappens is happening over easter and the interregnum between Deepseek R1 and the soon to be released Qwen 3, so I am recapping my personal all time favorite models:
Historical
> GPT-J, OPT Old but gold. Waiting for the next token to be shown, sentences
to be pieced together part by part… The curiosity of having semi coherent discussions with my graphics card is something I will never forget.
> Pygmalion Makes me pretty nostalgic, the novelity of it all back then was something to remember.
> LLAMA 1 Really started the whole finetuning scene. Alpaca made conversations much more coherent.
> LLAMA 2 Quite a lot better than Llama 1 but most importantly, spawned a wide variety of finetunes including all time favorites like Mythomax.
> Mistral v0.1 Solid base model with a lot of finetunes like OpenHermes2 or Neuralchat.
> Mixtral First time Open Source ever came close to proprietary performance. Great model for its time. The team behind it continues to remain relevant, albeit not in the top tier.
Also, for me, it was the first local model to be actually useful for some coding assistance.
> LLAMA 3 Lame base model but the upgraded versions were better. Spawned a number of okayish finetunes or something.
> Mistral Nemo Good model for RP, retarded for other purposes. I use it to drive Skyrim NPCs in the form of Nemo uncensored.
> Gemma 2 Decent model for coding and general knowledge but superseeded by Gemma 3 and Qwen 2.5 x R1 / QwQ
> Qwen 2.5 R1 Distilled / QwQ My current daily drivers for coding assistance. The sweetspot between size and performance.
> Deepseek R1 The big one, the king, nothing else to add currently. It's really shines in every category but all "intelligent" systems have their limits and so do current frontier models.
I mostly use it to discuss more complex engineering processes where I really need the breath and depth of knowledge in that large network.
> LLAMA 4 Its future shines as bright as 4cucks.
Some chuds also appreciated "Frankenmerges" although I have no personal experience with them.
Along the way we had many more or less important upgrades regarding samplers, optimization, training procedures, data curation, GUIs, …
and a whole bunch of inference frameworks (llama.cpp, transformers got plenty of upgrades, ktransformers, vllm, TensorRT,…) along the way!
Qwen 3, it's your turn now