/tech/ - /lmg/ - a general dedicated to the discussion and development of local language models.SoyGenesis Edition Previous Threads: Too Cucked 4 mentioning. ►News>(04/16) Microsoft releases Bitnet B1.58 https://hf.co/microsoft/bitnet-b1.58-2B-4T>(0

Email
Subject
Comment
File
Embed
Voice
Poll
Password	(For file deletion.)

File: gg.jpg 📥︎ (987.4 KB, 1080x1080) ImgOps

/lmg/ - Local Models General Chud 04/16/25 (Wed) 17:25:19 №4858 [View All][1][2][Quote]

/lmg/ - a general dedicated to the discussion and development of local language models.

SoyGenesis Edition

Previous Threads: Too Cucked 4 mentioning.

►News
>(04/16) Microsoft releases Bitnet B1.58 https://hf.co/microsoft/bitnet-b1.58-2B-4T
>(04/14) GLM-4-0414 and GLM-Z1 released: https://hf.co/collections/THUDM/glm-4-0414-67f3cbcb34dd9d252707cb2e
>(04/14) Nemotron-H hybrid models released: https://hf.co/collections/nvidia/nemotron-h-67fd3d7ca332cdf1eb5a24bb
>(04/10) Ultra long context Llama-3.1-8B: https://hf.co/collections/nvidia/ultralong-67c773cfe53a9a518841fbbe

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

57 posts and 14 image replies omitted. Click reply to view.

Chud 04/19/25 (Sat) 13:56:28 №7201 [Quote]

what's a raisin?

Chud 04/19/25 (Sat) 14:06:31 №7212 [Quote]

File: Screenshot 2025-04-20 at 0….png 📥︎ (2.8 MB, 3814x1930) ImgOps

>>7151
>first image i see,
Legitimately how? I had to look for that post for like 2 straight minutes because it's from 8 YEARS AGO.

It's so far down I can't even see it in the catalog a 4k resolution, how the fuck was that the first thing you saw rather than lmg or the ai art thread which are at the top of the board?

>>7201
It's the gay sharty wordfilter for SHlT

Chud 04/19/25 (Sat) 15:03:40 №7246 [Quote]

>>7212
>how
change your list order

Chud 04/19/25 (Sat) 16:24:55 №7309 [Quote]

~~>>7132~~
promote your tranny board somewhere else pedo

Chud 04/19/25 (Sat) 17:05:49 №7325 [Quote]

~~>>7132~~
>8kun pedo shithole

Chud 04/19/25 (Sat) 19:37:38 №7400 [Quote]

File: Qwen2.5 Omni.png 📥︎ (706.39 KB, 2180x1214) ImgOps

Given that threads here last much longer, some news updates:

- TensorRT-LLM: LLAMA 4, Phi‑4‑MM are now supported
- Transformers: Adds support for Qwen2.5 Omni (a model with speech, text, image and video support)
- Deepseek about to release their internal inference engine for Deepseek R1 and Deepseek V3

Chud 04/19/25 (Sat) 20:14:57 №7418 [Quote]

>>7246
To friggin what? If I list by bump, creation, or reply count it's still at the bottom, because again, it's a dead thread from 8 years ago.
>>7309
>>7325
You're thinking of 8chan.moe, samefag. 8kun banned loli which was the impetus for creating .moe, and why .moe is full of degen boards like /abdl/ and raisin.

But by all means stay here where board culture is spamming gifs of obese men's assholes - stay in your element.

Chud 04/19/25 (Sat) 22:18:00 №7491 [Quote]

>>7400
>qwen
coal

Chud 04/19/25 (Sat) 23:06:58 №7508 [Quote]

>>5465
Holy Raisin, look at that coding jump. This has got to be benchmaxxed, right? It can't be THAT good.

Chud 04/20/25 (Sun) 00:05:40 №7525 [Quote]

>>6249
At this rate, we might get R2 first

>>6252
>deepseek becomes the new 'p

Chud 04/20/25 (Sun) 06:31:12 №7671 [Quote]

>>7491
Gem for coding assistance

Chud 04/20/25 (Sun) 09:59:21 №7752 [Quote]

Where is Petra, wasn't that her homeboard?

Chud 04/20/25 (Sun) 10:07:29 №7755 [Quote]

>There's some schizo talking to himself about someone nobody's ever heard of in two different /lmg/ threads on completely different sites
I don't remember us having a resident schizo other than the blacked miku guy, the fuck is this?

Chud 04/20/25 (Sun) 11:08:16 №7783 [Quote]

>>7755
This is 4th LMG thread.

Chud 04/20/25 (Sun) 11:12:26 №7788 [Quote]

>>5465
<unironically believing those fake benchmarks
the people who create these problem banks don't know any math and bloat it with highschool olympiad trash. That's why a random problem from a Springer Graduate Text math book makes even o3 raisin its pants. Try it out yourself on arena llm

Chud 04/20/25 (Sun) 13:43:23 №7906 [Quote]

>>>5465
><unironically believing those fake benchmarks
>the people who create these problem banks don't know any math and bloat it with highschool olympiad trash. That's why a random problem from a Springer Graduate Text math book makes even o3 raisin its pants. Try it out yourself on arena llm

This is pretty consistent with my own observations.
Even asking any LLM to transform some non trivial Nondeterministic Finite Automata (one that isn't already deterministic ofc!) into an Deterministic Finite Automata or vice versa or making a regex out of it and giving some examples for accepted words is beyond the capability of current AI, including OpenAIs latest slop - tested it

Chud 04/20/25 (Sun) 13:49:02 №7913 [Quote]

>>7906
Have you ever had them try to prove something? I just tried o3 and o4 mini, and they both insisted that the product of two separable topological spaces need not be separable, which is retarded. Only after I proved to them that the product is always separable did it stop insisting that, just telling them it's wrong didn't help.

Chud 04/20/25 (Sun) 15:41:48 №8005 [Quote]

>>7913
>>7906
>>7788
I have a feeling we're starting to hit a hard ceiling for LLMs and have to look elsewhere for reasoning capabilities

Chud 04/20/25 (Sun) 16:46:15 №8076 [Quote]

>>7913
I am not surprised because yeah proofs like this are part of my standard set of questions for LLMs. Starting with relatively simple stuff. there is a simple proof to show that for n e Z, n^2 is even implies n is even. Most LLMs still insist on a direct proof (which leads to a circular arguments in this case) while you got to use a proof by contraposition .

Some LLMs start to cope and argue, some say sorry and thank me for giving the hint only to make it wrong another time and some… actually get it right =)

Chud 04/20/25 (Sun) 16:54:18 №8090 [Quote]

>>7913
>Some LLMs start to cope and argue, some say sorry and thank me for giving the hint only to make it wrong another time and some… actually get it right =)
that happens a lot to me too. The best results I've had so far were when I gave manus a giant textbook and other supplementary material and ask it to solve the problems of a specific section

Chud 04/20/25 (Sun) 17:41:51 №8171 [Quote]

>>8090
Manus? Not open source. How does it work, are they vectorizing that Textbook input to use RAG? Who knows…

MCTS/R*/Ensemble voting and such do improve performance quite a but as well at the cost of high computational cost.

I would actually love to test LLADAs capability in mathematical domains soonish, I can imagine that that type of NN can perform certain "planning" tasks like proofs a bit better… possibly

Chud 04/20/25 (Sun) 19:04:00 №8322 [Quote]

>>5508
what realistically stops them from distilling o3 and o4

Chud 04/20/25 (Sun) 20:57:49 №8472 [Quote]

Test 1 2

Chud 04/21/25 (Mon) 07:44:56 №8884 [Quote]

>>8322
Does OpenAI even expose the <think> tags now?

Chud 04/21/25 (Mon) 08:28:37 №8911 [Quote]

File: 1745195071367w.png 📥︎ (14.86 KB, 883x902) ImgOps

soytan card status?

Chud 04/21/25 (Mon) 12:59:33 №9013 [Quote]

>>8911
Any troon bluesky bio would work I guess

Chud 04/21/25 (Mon) 14:49:51 №9068 [Quote]

File: ?=333nXt5%.png 📥︎ (31.84 KB, 465x419) ImgOps

>>4858 (OP)
>>4858 (OP)
It seems like nothinghappens is happening over easter and the interregnum between Deepseek R1 and the soon to be released Qwen 3, so I am recapping my personal all time favorite models:

Historical
> GPT-J, OPT
Old but gold. Waiting for the next token to be shown, sentences
to be pieced together part by part… The curiosity of having semi coherent discussions with my graphics card is something I will never forget.

> Pygmalion
Makes me pretty nostalgic, the novelity of it all back then was something to remember.

> LLAMA 1
Really started the whole finetuning scene. Alpaca made conversations much more coherent.

> LLAMA 2
Quite a lot better than Llama 1 but most importantly, spawned a wide variety of finetunes including all time favorites like Mythomax.

> Mistral v0.1
Solid base model with a lot of finetunes like OpenHermes2 or Neuralchat.

> Mixtral
First time Open Source ever came close to proprietary performance. Great model for its time. The team behind it continues to remain relevant, albeit not in the top tier.
Also, for me, it was the first local model to be actually useful for some coding assistance.

> LLAMA 3
Lame base model but the upgraded versions were better. Spawned a number of okayish finetunes or something.

> Mistral Nemo
Good model for RP, retarded for other purposes. I use it to drive Skyrim NPCs in the form of Nemo uncensored.

> Gemma 2
Decent model for coding and general knowledge but superseeded by Gemma 3 and Qwen 2.5 x R1 / QwQ

> Qwen 2.5 R1 Distilled / QwQ
My current daily drivers for coding assistance. The sweetspot between size and performance.

> Deepseek R1
The big one, the king, nothing else to add currently. It's really shines in every category but all "intelligent" systems have their limits and so do current frontier models.
I mostly use it to discuss more complex engineering processes where I really need the breath and depth of knowledge in that large network.

> LLAMA 4
Its future shines as bright as 4cucks.

Some chuds also appreciated "Frankenmerges" although I have no personal experience with them.
Along the way we had many more or less important upgrades regarding samplers, optimization, training procedures, data curation, GUIs, …
and a whole bunch of inference frameworks (llama.cpp, transformers got plenty of upgrades, ktransformers, vllm, TensorRT,…) along the way!
Qwen 3, it's your turn now

Chud 04/21/25 (Mon) 20:47:02 №9269 [Quote]

>>9068
I think deepseek-v3-0324 is better than R1 for anything non-technical.

And yeah, the staying power of Nemo in unreal. It will always be a gem, even if its days are likely numbered with the approach of whatever succeeds QWQ/gemma3.

Chud 04/22/25 (Tue) 07:05:21 №9510 [Quote]

I'm new to LLMs, and someone told me about huggingface to get GUFF models from, which seem to work on a base level. But whenever I ask any model about the harms caused by letting niggers and jews live, it refuses. Are there better places to get models from that aren't touched by Mossad?

Chud 04/22/25 (Tue) 07:18:03 №9513 [Quote]

>>9510
Use a uncensored model and good system prompt

Chud 04/22/25 (Tue) 07:19:08 №9514 [Quote]

>>9510
https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2

Chud 04/22/25 (Tue) 07:21:10 №9516 [Quote]

>>9510
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

Take a look at the UGI (Uncensored General Intelligence) Score and choose.
Note that base models (pure text completion) are most commonly completely uncensored. The censorship is applied when instruct finetuning

Chud 04/22/25 (Tue) 14:01:14 №9719 [Quote]

Whats the latest meta on GUIs? I am personally using mostly the Llama CLI but I would love to have something more immersive. Used to use WebUI Text but the dev is retarded, to a degree and SillyTavern is written in javascript which I have no experience with

Chud 04/22/25 (Tue) 19:24:01 №9910 [Quote]

llamacpp has web ui but for rp or if your using an api siliytavren is the best

Chud 04/22/25 (Tue) 23:44:04 №10016 [Quote]

File: ClipboardImage.png 📥︎ (1.09 MB, 2446x1890) ImgOps

bitnet seems good for RAG

Chud 04/23/25 (Wed) 00:35:16 №10026 [Quote]

>>10016
Call me when it's done at a model with an actually useful parameter count.
Until they attempt bitnet with a model around 12B+, they're really just little research toys.

Chud 04/23/25 (Wed) 09:03:27 №10142 [Quote]

What's a good LLM for 24gb VRAM? Something smart and good at following the character.

Chud 04/24/25 (Thu) 08:02:22 №10598 [Quote]

Qwen never ever

Chud 04/24/25 (Thu) 08:50:14 №10606 [Quote]

>>10026
qwen promised bitnet models

Chud 04/24/25 (Thu) 09:52:18 №10620 [Quote]

>>10598
Did they actually promise it this month or did we just speculate that it would be this month?

Chud 04/24/25 (Thu) 13:53:34 №10762 [Quote]

>>10142
Try QwQ or Gemma 3 27B

Chud 04/24/25 (Thu) 18:30:45 №10869 [Quote]

>>10620
commits were being made as early as march that in the past indicated a launch being <2 weeks away. The current narrative is that its taking so long because they're releasing a large number of different model sizes.

Honestly I'd rather see a gigantic QwQ.

Chud 04/25/25 (Fri) 00:13:23 №11094 [Quote]

I loaded glm and a few other most recent models trying to continue the ERP with fetish of my choice. Rerolled 10-20 times for each model. Each time I was fucking disgusted with the output. Most LLM's were basically saying the same thing, but rewording it slightly Then I loaded a hentai game that isn't the fetish of my choice but slightly adjacent, made LLM translate it and I came buckets. This hobby is so fucking depressing. And the censorship is the work of a devil. I refuse to believe current models would be unable to generalize fucking ERP. They get intentionally gimped to be worthless, while still spitting out some disgusting simulacra of what smut should be that will condition you to stop thinking of AI as an alternative to biological whores. I FUCKING HATE THIS CLOWN WORLD

Chud 04/25/25 (Fri) 08:06:28 №11255 [Quote]

Support for multimodal input + output model Janus has been merged into transformers
https://github.com/huggingface/transformers/releases/tag/v4.51.3-Janus-preview

Chud 04/25/25 (Fri) 13:18:10 №11310 [Quote]

>>10762
Thanks.

Chud 04/25/25 (Fri) 14:05:23 №11326 [Quote]

The latest Deepseek v3 has ruined me for any other model.

Chud 04/25/25 (Fri) 14:28:34 №11332 [Quote]

I've been using LLMs since December 2022. I am increasingly concerned that LLMs are not AI, but just glorified calculators.

Chud 04/25/25 (Fri) 18:18:44 №11406 [Quote]

>>11332
>I've been using LLMs since December 2022. I am increasingly concerned that LLMs are not AI, but just glorified calculators.

Calculators dont really talk to me or give me useful suggestions for console commands/ API calls

>>11326
Idk man, I have been using R1 because its an allrounder, is V3 really that much better? Dont want another massive model on my drive. Not like inference would be much faster (okay, there are no think tags for V3)

Chud 04/25/25 (Fri) 18:30:20 №11413 [Quote]

>>11406
>Calculators dont really talk to me or give me useful suggestions for console commands/ API calls
That's why I said glorified calculators.

Chud 04/25/25 (Fri) 18:41:31 №11416 [Quote]

>>11413
Guess you could argue that they are (probabilistic if seed is chosen at random) linear bounded turing machines.
One of main issues remain their incapability to learn in realm time. At best, an LLM has in context learning