/tech/ - /lmg/ - a general dedicated to the discussion and development of local language models.SoyGenesis Edition Previous Threads: Too Cucked 4 mentioning. ►News>(04/16) Microsoft releases Bitnet B1.58 https://hf.co/microsoft/bitnet-b1.58-2B-4T>(0

Email
Subject
Comment
File
Embed
Voice
Poll
Password	(For file deletion.)

File: gg.jpg 📥︎ (987.4 KB, 1080x1080) ImgOps

/lmg/ - Local Models General Chud 04/16/25 (Wed) 17:25:19 №4858 [Quote]

/lmg/ - a general dedicated to the discussion and development of local language models.

SoyGenesis Edition

Previous Threads: Too Cucked 4 mentioning.

►News
>(04/16) Microsoft releases Bitnet B1.58 https://hf.co/microsoft/bitnet-b1.58-2B-4T
>(04/14) GLM-4-0414 and GLM-Z1 released: https://hf.co/collections/THUDM/glm-4-0414-67f3cbcb34dd9d252707cb2e
>(04/14) Nemotron-H hybrid models released: https://hf.co/collections/nvidia/nemotron-h-67fd3d7ca332cdf1eb5a24bb
>(04/10) Ultra long context Llama-3.1-8B: https://hf.co/collections/nvidia/ultralong-67c773cfe53a9a518841fbbe

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Chud 04/16/25 (Wed) 17:27:53 №4861 [Quote]

'risu on the 'log

Chud 04/16/25 (Wed) 17:40:03 №4873 [Quote]

Fuck off.

Chud 04/16/25 (Wed) 17:42:38 №4875 [Quote]

nu-oai model are raisin

still waiting for china to make agi

Chud 04/16/25 (Wed) 17:57:03 №4893 [Quote]

>>4858 (OP)
fuck off with your tranime

Chud 04/16/25 (Wed) 17:58:07 №4894 [Quote]

>>4875
>agi
not happening

Chud 04/16/25 (Wed) 18:06:28 №4902 [Quote]

>>4893
never forget where you came from.

Chud 04/16/25 (Wed) 18:09:52 №4903 [Quote]

File: ohh_noooooo_its_armageddon.jpg 📥︎ (65.96 KB, 632x500) ImgOps

>>4893
Missed opportunity

Chud 04/16/25 (Wed) 18:14:08 №4908 [Quote]

we're so back!!!

Chud 04/16/25 (Wed) 18:32:07 №4924 [Quote]

home is back

Chud 04/16/25 (Wed) 19:48:17 №4970 [Quote]

>>4858 (OP)
WARNING: PEDO GENERAL

Chud 04/17/25 (Thu) 00:50:53 №5151 [Quote]

>>4858 (OP)
Holy raisin, didn't expect /lmg/ to come back before 4chan did.
Now I can get back to my regular schedule.

Qwen3 when?

Chud 04/17/25 (Thu) 01:07:11 №5156 [Quote]

File: 2019-11-06_13-04-11_Pentax….jpg 📥︎ (13 MB, 6000x4000) ImgOps

>>4858 (OP)
Thanks for making the general.

>>5151
As long as they fare well against the new mini OpenAI models, probably as soon as possible.

Chud 04/17/25 (Thu) 04:54:45 №5297 [Quote]

>>4858 (OP)
>Microsoft releases Bitnet B1.58
Use case?

Chud 04/17/25 (Thu) 08:00:58 №5380 [Quote]

Good to see this thread made it over here. I don't visit often but what's the current meta for LLMs?

Any way to distribute a model across GPUs with various vram capacities? I've got two Tesla M40s, one is 24gb and one 12gb.

Chud 04/17/25 (Thu) 08:08:40 №5383 [Quote]

>>5297
There isn't really one at present, bitnet is still decidedly in the research and proof-of-concept stage.

>>5380
Really depends on usecase and what type of model you're aiming to run - People are generally using llama.cpp or koboldcpp as their backends, but I'm using tabbyAPI because it runs on ExLlama which is way faster for things that live entirely in GPU memory
>Any way to distribute a model across GPUs with various vram capacities?
Pretty much all current backends have some form of multi-GPU support, but if I recall correctly Tesla M40's are too old an architecture to support tensor parallelism or NVLINK, so they have some noticeable slowdown and overhead when being pooled.

Chud 04/17/25 (Thu) 08:12:11 №5386 [Quote]

>>4858 (OP)
kept the THREAD ALIVE ARYAN BABY

Chud 04/17/25 (Thu) 08:13:32 №5387 [Quote]

>>4858 (OP)
here, samefagging doe.

do you know about a vidya called MyRobot? is able to use LLM like the nvidia GPU and other stuff also support AI voice generation.

Chud 04/17/25 (Thu) 08:55:31 №5416 [Quote]

>>5387
>MyRobot
It seems pretty cool but Skyrim (VR) kind of already have the whole framework for interacting with LLM NPCs setup
> VR support
> Massive pool of mods
> Voice Generation tools available
> Integration of world environment (including the game's scripting system) with an LLM: Mandella or CHIM

Chud 04/17/25 (Thu) 10:11:49 №5436 [Quote]

>>5416
The game does something like that too but in a different spin, the robots can be personalized with many combinations and the main part is being able to have stable conversations without nonsensical stuff.
Also it allows using already existing AI like deepseek which their logical model is very advanced.

Chud 04/17/25 (Thu) 10:48:22 №5447 [Quote]

>>4858 (OP)
Sams o3 and o4 mini are amazing. Local lost.

Chud 04/17/25 (Thu) 10:55:21 №5451 [Quote]

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

NIGGER SLAVE GENERAL

raisin EATING PAJEET GENERAL

Chud 04/17/25 (Thu) 10:59:42 №5454 [Quote]

>>5451
sir please be wholesome bloody bastard dalit

Chud 04/17/25 (Thu) 11:11:24 №5459 [Quote]

File: 1744645336685p.gif 📥︎ (6.64 MB, 720x928) ImgOps

You gooners made it here geg
Remember you are guests do not raisin up my 'site

Chud 04/17/25 (Thu) 11:13:12 №5461 [Quote]

File: Average Localcucks.png 📥︎ (673.89 KB, 1792x1024) ImgOps

new 'toss

Chud 04/17/25 (Thu) 11:16:19 №5462 [Quote]

>>5461
I don't cry like this and my ram isn't rgb

Chud 04/17/25 (Thu) 11:22:05 №5465 [Quote]

File: 0948372635121.png 📥︎ (200.4 KB, 2241x1030) ImgOps

>b-but R1
It's over

Chud 04/17/25 (Thu) 12:26:52 №5508 [Quote]

>>5465
r2 in 2 more weeks, trust the plan

Chud 04/17/25 (Thu) 12:44:02 №5521 [Quote]

https://x.com/9to5mac/status/1912658286383411360
METAKEKS GEEEG

Chud 04/17/25 (Thu) 13:56:49 №5575 [Quote]

>>4858 (OP)
stfu AI pajeet

Chud 04/17/25 (Thu) 14:04:51 №5583 [Quote]

>>5465
DeepSikh was always coal.

Chud 04/17/25 (Thu) 15:52:09 №5656 [Quote]

>>>5416 (You)
>The game does something like that too but in a different spin, the robots can be personalized with many combinations and the main part is being able to have stable conversations without nonsensical stuff.
>Also it allows using already existing AI like deepseek which their logical model is very advanced.

I would expect from any game which is serious about AI integration (and I am of course not referring to generic neural network tricks like superresolution) the option to call upon any local LLM

>>5465
It's pretty tied with R1 on my internal benchmark. as far as I am aware OAI uses Q# internally, if the results are currently tied, R1 + R* would beat o4 high, no reason for me to switch ;P

Chud 04/17/25 (Thu) 17:01:31 №5721 [Quote]

File: 1707955277078.gif 📥︎ (2.24 MB, 227x255) ImgOps

OpenAi won btw

Chud 04/17/25 (Thu) 17:14:22 №5730 [Quote]

>>4894
China will manage. US AI companies are ngmi, because they know what they are doing isn't the way to ever achieve AGI. Its one big lie to keep the money rolling it.
Even with the GPU handicap they dealt to China, they will still win. Because China doesn't give a fuck.

Chud 04/17/25 (Thu) 19:33:18 №5854 [Quote]

File: land_of_le_free.png 📥︎ (447.22 KB, 665x711) ImgOps

We must ban Deepseek saaaars
> … Three people familiar with the discussions told The New York Times that the Trump administration is considering penalties to block DeepSeek’s access to U.S. technology and possibly banning access to its services for Americans.

Always download your favorite models, you never know when they might just try to take them away from the public domain

Chud 04/17/25 (Thu) 20:17:22 №5880 [Quote]

>>5854
>The US starts building a great firewall around China from the Outside.
Kek, it's amazing just how butthurt deepseek has made saltman and his fellow kikes in the us gov.

Chud 04/17/25 (Thu) 21:21:17 №5926 [Quote]

File: peterson.png 📥︎ (71.57 KB, 255x207) ImgOps

>>5854
>blocking open source tech because you cant get your cut or soemthing
ameriniggers always trying to slow down progress because of their selfishness

Chud 04/17/25 (Thu) 21:47:56 №5948 [Quote]

File: 1711743149875387.jpg 📥︎ (244.35 KB, 1024x1024) ImgOps

/lmg/ shall never die

Chud 04/17/25 (Thu) 21:50:51 №5949 [Quote]

>>5854
>>5880
>>5926
Much ado about nothing. The US lacks the legal and technical framework to prevent determined end users from using Deepseek. The best they might be able to do is enforce non-use on corpos.

Chud 04/17/25 (Thu) 22:12:06 №5966 [Quote]

>>5465
>mememarks

Chud 04/17/25 (Thu) 22:54:25 №5990 [Quote]

File: 1744928766174p.png 📥︎ (420.79 KB, 3007x1021) ImgOps

Interesting…

Chud 04/18/25 (Fri) 00:15:07 №6027 [Quote]

>>5949
I'm not concerned about them actually stopping me from using Deepseek - I'm not even american.
But the precedent it sets is concerning, especially since huggingface is based in the US and that's basically where local LLM/ML lives in its entirety - and them getting all chinese stuff or whatever upsets corpos nuked would suck hardcore and set us all back.

Chud 04/18/25 (Fri) 00:26:13 №6045 [Quote]

Seriously though, why is this thread here instead of 8kun? We're moments away from getting spammed by jaks and nikado asses on here.

Chud 04/18/25 (Fri) 00:44:15 №6052 [Quote]

>>6045
Last time I went to that place it was deader than my boner after trying to ERP with Gemma

Chud 04/18/25 (Fri) 01:05:46 №6058 [Quote]

>>6052
Currently it has about the same posts per hour as this site's /pol/ only 0% of its posts are spammed jaks and cobs with continually indented quotes.
/sp/ appears to have moved a bunch of their generals to 8kun's /comfy/, and a few others are setting up in /random/ and /pol/ because they're more active boards.
I think roughly half the userbase is 4chan refugees right now, and the other half are just quietly ignoring us and sitting in /qresearch/

Chud 04/18/25 (Fri) 08:54:16 №6248 [Quote]

bump

Chud 04/18/25 (Fri) 08:54:48 №6249 [Quote]

qwen3 when?

Chud 04/18/25 (Fri) 09:05:18 №6252 [Quote]

>>6249
Integration of Qwen3 into vllm is already being worked on, should release soon
https://github.com/vllm-project/vllm/pull/15289

>>5949
There are absolute scumlords like Josh Hawley who want you to go to jail for 20 years for downloading Deepseek R1
https://www.fox29.com/news/deepseek-ban-senate-bill

So >some< are trying to make it legal to hunt down… free people

Chud 04/18/25 (Fri) 09:10:53 №6253 [Quote]

>>6249
2 more weeks.
The real question is whether it'll come out before 4chan is back.

>>6252
>implementation of the Qwen3 and Qwen3MoE model.
Oh neat, so we're getting dense and MoE. both GPUmaxxers and Rammaxxers will have something to chew on.

Chud 04/18/25 (Fri) 09:53:46 №6268 [Quote]

>>6253
>2 more weeks.
>The real question is whether it'll come out before 4chan is back.
4cuck is finished unless they use vichan

Chud 04/18/25 (Fri) 14:31:58 №6423 [Quote]

>>6253
> Oh neat, so we're getting dense and MoE. both GPUmaxxers and Rammaxxers will have something to chew on.

Rammaxxer here, I hope they give you the option between multiple sizes 7B/32B/700B something like that but I expect there will only be distills from the biggest model

Chud 04/18/25 (Fri) 15:51:21 №6486 [Quote]

>>6423
One of the Qwen devs mentioned the reason this particular release was taking so long was the wide variety of model sizes, and wondering if they should ditch them for the next releases, so you may be in luck here, unlike with llama4's dograisin which would have been a problem even if the models didn't suck.

Chud 04/19/25 (Sat) 00:04:59 №6866 [Quote]

File: 574846.jpeg 📥︎ (157.64 KB, 2048x1131) ImgOps

local 32b o3 mini when? Surely Zuck has something planned. Maybe Altman or Elon will open source it.

Chud 04/19/25 (Sat) 02:12:34 №6923 [Quote]

>>4858 (OP)
this place won't beat 4chan, it's raisintier.

Chud 04/19/25 (Sat) 03:14:23 №6944 [Quote]

~~>>6807~~
Let's fucking gooooo

Chud 04/19/25 (Sat) 07:20:10 №7069 [Quote]

>>6866
Altcuck promised to release an open source model, still waiting on that.
I bet he was busy sucking off the Trump administration's dick, he really want the competition dead

Chud 04/19/25 (Sat) 09:59:54 №7131 [Quote]

>>5151
this
/lmg/ and webm with sound. this is new /g/ now.

Chud 04/19/25 (Sat) 11:22:35 №7151 [Quote]

File: 72a1cfd442258965373386b7f6….jpg 📥︎ (11.87 KB, 255x191) ImgOps

~~>>7132~~
>Click on link, first image i see, tranny socks… pass

Chud 04/19/25 (Sat) 13:56:28 №7201 [Quote]

what's a raisin?

Chud 04/19/25 (Sat) 14:06:31 №7212 [Quote]

File: Screenshot 2025-04-20 at 0….png 📥︎ (2.8 MB, 3814x1930) ImgOps

>>7151
>first image i see,
Legitimately how? I had to look for that post for like 2 straight minutes because it's from 8 YEARS AGO.

It's so far down I can't even see it in the catalog a 4k resolution, how the fuck was that the first thing you saw rather than lmg or the ai art thread which are at the top of the board?

>>7201
It's the gay sharty wordfilter for SHlT

Chud 04/19/25 (Sat) 15:03:40 №7246 [Quote]

>>7212
>how
change your list order

Chud 04/19/25 (Sat) 16:24:55 №7309 [Quote]

~~>>7132~~
promote your tranny board somewhere else pedo

Chud 04/19/25 (Sat) 17:05:49 №7325 [Quote]

~~>>7132~~
>8kun pedo shithole

Chud 04/19/25 (Sat) 19:37:38 №7400 [Quote]

File: Qwen2.5 Omni.png 📥︎ (706.39 KB, 2180x1214) ImgOps

Given that threads here last much longer, some news updates:

- TensorRT-LLM: LLAMA 4, Phi‑4‑MM are now supported
- Transformers: Adds support for Qwen2.5 Omni (a model with speech, text, image and video support)
- Deepseek about to release their internal inference engine for Deepseek R1 and Deepseek V3

Chud 04/19/25 (Sat) 20:14:57 №7418 [Quote]

>>7246
To friggin what? If I list by bump, creation, or reply count it's still at the bottom, because again, it's a dead thread from 8 years ago.
>>7309
>>7325
You're thinking of 8chan.moe, samefag. 8kun banned loli which was the impetus for creating .moe, and why .moe is full of degen boards like /abdl/ and raisin.

But by all means stay here where board culture is spamming gifs of obese men's assholes - stay in your element.

Chud 04/19/25 (Sat) 22:18:00 №7491 [Quote]

>>7400
>qwen
coal

Chud 04/19/25 (Sat) 23:06:58 №7508 [Quote]

>>5465
Holy Raisin, look at that coding jump. This has got to be benchmaxxed, right? It can't be THAT good.

Chud 04/20/25 (Sun) 00:05:40 №7525 [Quote]

>>6249
At this rate, we might get R2 first

>>6252
>deepseek becomes the new 'p

Chud 04/20/25 (Sun) 06:31:12 №7671 [Quote]

>>7491
Gem for coding assistance

Chud 04/20/25 (Sun) 09:59:21 №7752 [Quote]

Where is Petra, wasn't that her homeboard?

Chud 04/20/25 (Sun) 10:07:29 №7755 [Quote]

>There's some schizo talking to himself about someone nobody's ever heard of in two different /lmg/ threads on completely different sites
I don't remember us having a resident schizo other than the blacked miku guy, the fuck is this?

Chud 04/20/25 (Sun) 11:08:16 №7783 [Quote]

>>7755
This is 4th LMG thread.

Chud 04/20/25 (Sun) 11:12:26 №7788 [Quote]

>>5465
<unironically believing those fake benchmarks
the people who create these problem banks don't know any math and bloat it with highschool olympiad trash. That's why a random problem from a Springer Graduate Text math book makes even o3 raisin its pants. Try it out yourself on arena llm

Chud 04/20/25 (Sun) 13:43:23 №7906 [Quote]

>>>5465
><unironically believing those fake benchmarks
>the people who create these problem banks don't know any math and bloat it with highschool olympiad trash. That's why a random problem from a Springer Graduate Text math book makes even o3 raisin its pants. Try it out yourself on arena llm

This is pretty consistent with my own observations.
Even asking any LLM to transform some non trivial Nondeterministic Finite Automata (one that isn't already deterministic ofc!) into an Deterministic Finite Automata or vice versa or making a regex out of it and giving some examples for accepted words is beyond the capability of current AI, including OpenAIs latest slop - tested it

Chud 04/20/25 (Sun) 13:49:02 №7913 [Quote]

>>7906
Have you ever had them try to prove something? I just tried o3 and o4 mini, and they both insisted that the product of two separable topological spaces need not be separable, which is retarded. Only after I proved to them that the product is always separable did it stop insisting that, just telling them it's wrong didn't help.

Chud 04/20/25 (Sun) 15:41:48 №8005 [Quote]

>>7913
>>7906
>>7788
I have a feeling we're starting to hit a hard ceiling for LLMs and have to look elsewhere for reasoning capabilities

Chud 04/20/25 (Sun) 16:46:15 №8076 [Quote]

>>7913
I am not surprised because yeah proofs like this are part of my standard set of questions for LLMs. Starting with relatively simple stuff. there is a simple proof to show that for n e Z, n^2 is even implies n is even. Most LLMs still insist on a direct proof (which leads to a circular arguments in this case) while you got to use a proof by contraposition .

Some LLMs start to cope and argue, some say sorry and thank me for giving the hint only to make it wrong another time and some… actually get it right =)

Chud 04/20/25 (Sun) 16:54:18 №8090 [Quote]

>>7913
>Some LLMs start to cope and argue, some say sorry and thank me for giving the hint only to make it wrong another time and some… actually get it right =)
that happens a lot to me too. The best results I've had so far were when I gave manus a giant textbook and other supplementary material and ask it to solve the problems of a specific section

Chud 04/20/25 (Sun) 17:41:51 №8171 [Quote]

>>8090
Manus? Not open source. How does it work, are they vectorizing that Textbook input to use RAG? Who knows…

MCTS/R*/Ensemble voting and such do improve performance quite a but as well at the cost of high computational cost.

I would actually love to test LLADAs capability in mathematical domains soonish, I can imagine that that type of NN can perform certain "planning" tasks like proofs a bit better… possibly

Chud 04/20/25 (Sun) 19:04:00 №8322 [Quote]

>>5508
what realistically stops them from distilling o3 and o4

Chud 04/20/25 (Sun) 20:57:49 №8472 [Quote]

Test 1 2

Chud 04/21/25 (Mon) 07:44:56 №8884 [Quote]

>>8322
Does OpenAI even expose the <think> tags now?

Chud 04/21/25 (Mon) 08:28:37 №8911 [Quote]

File: 1745195071367w.png 📥︎ (14.86 KB, 883x902) ImgOps

soytan card status?

Chud 04/21/25 (Mon) 12:59:33 №9013 [Quote]

>>8911
Any troon bluesky bio would work I guess

Chud 04/21/25 (Mon) 14:49:51 №9068 [Quote]

File: ?=333nXt5%.png 📥︎ (31.84 KB, 465x419) ImgOps

>>4858 (OP)
>>4858 (OP)
It seems like nothinghappens is happening over easter and the interregnum between Deepseek R1 and the soon to be released Qwen 3, so I am recapping my personal all time favorite models:

Historical
> GPT-J, OPT
Old but gold. Waiting for the next token to be shown, sentences
to be pieced together part by part… The curiosity of having semi coherent discussions with my graphics card is something I will never forget.

> Pygmalion
Makes me pretty nostalgic, the novelity of it all back then was something to remember.

> LLAMA 1
Really started the whole finetuning scene. Alpaca made conversations much more coherent.

> LLAMA 2
Quite a lot better than Llama 1 but most importantly, spawned a wide variety of finetunes including all time favorites like Mythomax.

> Mistral v0.1
Solid base model with a lot of finetunes like OpenHermes2 or Neuralchat.

> Mixtral
First time Open Source ever came close to proprietary performance. Great model for its time. The team behind it continues to remain relevant, albeit not in the top tier.
Also, for me, it was the first local model to be actually useful for some coding assistance.

> LLAMA 3
Lame base model but the upgraded versions were better. Spawned a number of okayish finetunes or something.

> Mistral Nemo
Good model for RP, retarded for other purposes. I use it to drive Skyrim NPCs in the form of Nemo uncensored.

> Gemma 2
Decent model for coding and general knowledge but superseeded by Gemma 3 and Qwen 2.5 x R1 / QwQ

> Qwen 2.5 R1 Distilled / QwQ
My current daily drivers for coding assistance. The sweetspot between size and performance.

> Deepseek R1
The big one, the king, nothing else to add currently. It's really shines in every category but all "intelligent" systems have their limits and so do current frontier models.
I mostly use it to discuss more complex engineering processes where I really need the breath and depth of knowledge in that large network.

> LLAMA 4
Its future shines as bright as 4cucks.

Some chuds also appreciated "Frankenmerges" although I have no personal experience with them.
Along the way we had many more or less important upgrades regarding samplers, optimization, training procedures, data curation, GUIs, …
and a whole bunch of inference frameworks (llama.cpp, transformers got plenty of upgrades, ktransformers, vllm, TensorRT,…) along the way!
Qwen 3, it's your turn now

Chud 04/21/25 (Mon) 20:47:02 №9269 [Quote]

>>9068
I think deepseek-v3-0324 is better than R1 for anything non-technical.

And yeah, the staying power of Nemo in unreal. It will always be a gem, even if its days are likely numbered with the approach of whatever succeeds QWQ/gemma3.

Chud 04/22/25 (Tue) 07:05:21 №9510 [Quote]

I'm new to LLMs, and someone told me about huggingface to get GUFF models from, which seem to work on a base level. But whenever I ask any model about the harms caused by letting niggers and jews live, it refuses. Are there better places to get models from that aren't touched by Mossad?

Chud 04/22/25 (Tue) 07:18:03 №9513 [Quote]

>>9510
Use a uncensored model and good system prompt

Chud 04/22/25 (Tue) 07:19:08 №9514 [Quote]

>>9510
https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2

Chud 04/22/25 (Tue) 07:21:10 №9516 [Quote]

>>9510
https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard

Take a look at the UGI (Uncensored General Intelligence) Score and choose.
Note that base models (pure text completion) are most commonly completely uncensored. The censorship is applied when instruct finetuning

Chud 04/22/25 (Tue) 14:01:14 №9719 [Quote]

Whats the latest meta on GUIs? I am personally using mostly the Llama CLI but I would love to have something more immersive. Used to use WebUI Text but the dev is retarded, to a degree and SillyTavern is written in javascript which I have no experience with

Chud 04/22/25 (Tue) 19:24:01 №9910 [Quote]

llamacpp has web ui but for rp or if your using an api siliytavren is the best

Chud 04/22/25 (Tue) 23:44:04 №10016 [Quote]

File: ClipboardImage.png 📥︎ (1.09 MB, 2446x1890) ImgOps

bitnet seems good for RAG

Chud 04/23/25 (Wed) 00:35:16 №10026 [Quote]

>>10016
Call me when it's done at a model with an actually useful parameter count.
Until they attempt bitnet with a model around 12B+, they're really just little research toys.

Chud 04/23/25 (Wed) 09:03:27 №10142 [Quote]

What's a good LLM for 24gb VRAM? Something smart and good at following the character.

Chud 04/24/25 (Thu) 08:02:22 №10598 [Quote]

Qwen never ever

Chud 04/24/25 (Thu) 08:50:14 №10606 [Quote]

>>10026
qwen promised bitnet models

Chud 04/24/25 (Thu) 09:52:18 №10620 [Quote]

>>10598
Did they actually promise it this month or did we just speculate that it would be this month?

Chud 04/24/25 (Thu) 13:53:34 №10762 [Quote]

>>10142
Try QwQ or Gemma 3 27B