• 0 Posts
  • 4 Comments
Joined 2 months ago
cake
Cake day: November 30th, 2024

help-circle
  • pcalau12i@lemmygrad.mltoOpen Source@lemmy.mlProton's biased article on Deepseek
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    edit-2
    23 minutes ago

    There is no “fundamentally” here, you are referring to some abstraction that doesn’t exist. The models are modified during the fine-tuning process, and the process trains them to learn to adopt DeepSeek R1’s reasoning technique. You are acting like there is some “essence” underlying the model which is the same between the original Qwen and this model. There isn’t. It is a hybrid and its own thing. There is no such thing as “base capability,” the model is not two separate pieces that can be judged independently. You can only evaluate the model as a whole. Your comment is just incredibly bizarre to respond to because you are referring to non-existent abstractions and not actually speaking of anything concretely real.

    The model is neither Qwen nor DeepSeek R1, it is DeepSeek R1 Qwen Distill as the name says. it would be like saying it’s false advertising to say a mule is a hybrid of a donkey and a horse because the “base capabilities” is a donkey and so it has nothing to do with horses, and it’s really just a donkey at the end of the day. The statement is so bizarre I just do not even know how to address it. It is a hybrid, it’s its own distinct third thing that is a hybrid of them both. The model’s capabilities can only be judged as it exists, and its capabilities differ from Qwen and the original DeepSeek R1 as actually scored by various metrics.

    Speaking of its “base capabilities” is a meaningless floating abstraction which cannot be empirically measured and doesn’t refer to anything concretely real. It only has its real concrete capabilities, not some hypothetical imagined capabilities.

    You accuse them of “marketing” even though it is literally free. You genuinely are not making any coherent sense at all, you are insisting a hybrid model which is objectively different and objectively scores and performs differently should be given the exact same name, for reasons you cannot seem to actually articulate. It clearly needs a different name, and since it was created utilizing the DeepSeek R1 model’s distillation process to fine-tune it, it seems to make sense to call it DeepSeek R1 Qwen Distill. Yet for some reason you insist this is lying and misrepresenting it and it actually has literally nothing to do with DeepSeek R1 at all and it should just be called Qwen and we should pretend it is literally the same model despite it not being the same model as its training weights are different (you can do a “diff” on the two model files if you don’t believe me!) and it performs differently on the same metrics.

    There is simply no rational reason to intentionally want to mislabel the model as just being Qwen and having no relevance to DeepSeek R1. It is clear to me that you and your other friends here have some sort of alternative agenda that makes you not want to label it correctly, yet I am entirely uncertain what it could possibly be. The current name for it is perfectly fine and pretending it is just a Qwen model (or Llama, for the other distilled versioned) is straight-up misinformation, and anyone who downloads the models and runs them themselves will clearly see immediately that they perform differently.


  • pcalau12i@lemmygrad.mltoOpen Source@lemmy.mlProton's biased article on Deepseek
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    2
    ·
    edit-2
    21 minutes ago

    The 1.5B/7B/8B/13B/32B/70B models are all officially DeepSeek R1 models, that is what DeepSeek themselves refer to those models as. It is DeepSeek themselves who produced those models and released them to the public and gave them their names. And their names are correct, it is just factually false to say they are not DeepSeek R1 models. They are.

    The “R1” in the name means “reasoning version one” because it does not just spit out an answer but reasons through it with an internal monologue. For example, here is a simple query I asked DeepSeek R1 13B:

    Me: can all the planets in the solar system fit between the earth and the moon?

    DeepSeek: Yes, all eight planets could theoretically be lined up along the line connecting Earth and the Moon without overlapping. The combined length of their diameters (approximately 379,011 km) is slightly less than the average Earth-Moon distance (about 384,400 km), allowing them to fit if placed consecutively with no required spacing.

    However, on top of its answer, I can expand an option to see its internal monologue it went through before generating the answer, which you can find the internal monologue here because it’s too long to paste.

    What makes these consumer-oriented models different is that that rather than being trained on raw data, they are trained on synthetic data from pre-existing models. That’s what the “Qwen” or “Llama” parts mean in the name. The 7B model is trained on synthetic data produced by Qwen, so it is effectively a compressed version of Qen. However, neither Qwen nor Llama can “reason,” they do not have an internal monologue.

    This is why it is just incorrect to claim that something like DeepSeek R1 7B Qwen Distill has no relevance to DeepSeek R1 but is just a Qwen model. If it’s supposedly a Qwen model, why is it that it can do something that Qwen cannot do but only DeepSeek R1 can? It’s because, again, it is a DeepSeek R1 model, they add the R1 reasoning to it during the distillation process as part of its training. They basically use synthetic data generated from DeepSeek R1 to fine-tune readjust its parameters so it adopts a similar reasoning style. It is objectively a new model because it performs better on reasoning tasks than just a normal Qwen model. It cannot be considered solely a Qwen model nor an R1 model because its parameters contain information from both.


  • As I said, they will likely come to the home in form of cloud computing, which is how advanced AI comes to the home. You can run some AI models at home but they’re nowhere near as advanced as cloud-based services and so not as useful. I’m not sure why, if we ever have AGI, it would need to be run at home. It doesn’t need to be. It would be nice if it could be ran entirely at home, but that’s no necessity, just a convenience. Maybe your personal AGI robot who does all your chores for you only works when the WiFi is on. That would not prevent people from buying it, I mean, those Amazon Fire TVs are selling like hot cakes and they only work when the WiFi is on. There also already exists some AI products that require a constant internet connection.

    It is kind of similar with quantum computing, there actually do exist consumer-end home quantum computers, such as Triangulum, but it only does 3 qubits, so it’s more of a toy than a genuinely useful computer. For useful tasks, it will all be cloud-based in all likelihood. The NMR technology Triangulum is based on, it’s not known to be scalable, so the only other possibility that quantum computers will make it to the home in a non-cloud based fashion would be optical quantum computing. There could be a breakthrough there, you can’t rule it out, but I wouldn’t keep my fingers crossed. If quantum computers become useful for regular people in the next few decades, I would bet it would be all through cloud-based services.


  • If quantum computers actually ever make significant progress to the point that they’re useful (big if) it would definitely be able to have positive benefits for the little guy. It is unlikely you will have a quantum chip in your smartphone (although, maybe it could happen if optical quantum chips ever make a significant breakthrough, but that’s even more unlikely), but you will still be able to access them cheaply over the cloud.

    I mean, IBM spends billions of on its quantum computers and gives cloud access to anyone who wants to experiment with them completely free. That’s how I even first learned quantum computing, running algorithms on IBM’s cloud-based quantum computers. I’m sure if the demand picks up if they stop being experimental and actually become useful, they’ll probably start charging a fee, but the fact it is free now makes me suspect it will not be very much.

    I think a comparison can be made with LLMs, such as with OpenAI. It takes billions to train those giant LLMs as well and can only be trained on extremely expensive computers, yet a single query costs less than a penny, and there are still free versions available. Expense for cloud access will likely always be incredibly cheap, it’s a great way to bring super expensive hardware to regular people.

    That’s likely what the future of quantum computing will be for regular people, quantum computing through cloud access. Even if you never run software that can benefit from it, you may get benefits indirectly, such as, if someone uses a quantum computer to help improve medicine and you later need that medicine.