markush_ an hour ago

Interesting choice from PyTorch to release yet another DSL, on positive side it's one more point in the design space on the other hand it's even more difficult to choose the right technology among Triton, Gluon, CuTe, ThunderKittens and a few others.

bobajeff an hour ago

It's good to see more effort for making things not device specific but I only see benchmarks for NVIDIA B200 and AMD MI350X. Also what's the experience of using one of these Python DSLs like? Are the tools good enough to make code completion, jump to definition, setting breakpoints, watching variables, copying as expression etc. nice?

brap 3 hours ago

Asking as someone who is really out of the loop: how much of ML development these days touches these “lower level” parts of the stack? I’d expect that by now most of the work would be high level, and the infra would be mostly commoditized.

  • embedding-shape 2 hours ago

    > how much of ML development these days touches these “lower level” parts of the stack? I’d expect that by now most of the work would be high level

    Every time the high level architectures of models change, there are new lower level optimizations to be done. Even recent releases like GPT-OSS adds new areas for improvements, like MXFP4, that requires the lower level parts to created and optimized.

  • brrrrrm 2 hours ago

    a recent wave of interest in bitwise equivalent execution had a lot of kernels this level get pumped out.

    new attention mechanisms also often need new kernels to run at any reasonable rate

    theres definitely a breed of frontend-only ML dev that dominates the space, but a lot novel exploration needs new kernels

uoaei 22 minutes ago

Tangential question related to the example kernel: in GPU programming is it idiomatic/standard to initialize the out array as zeros rather than empty? are the performance savings negligible?

dachworker 3 hours ago

I'm super excited to give this one a spin. It seems like a neat idea, Triton, but simpler and with automatic autotuning. My head is spinning with options right now. I love how everyone was hyping up CUDA this and CUDA that a couple of years ago, and now CUDA is all but irrelevant. There's now so many different and opinionated takes on how you should write high performant accelerator cluster code. I love it.

It's also kinda of ironic that right now in 2025, we have all this diversity in tooling, but at the same time, the ML architecture space has collapsed entirely and everyone is just using transformers.

  • pjmlp 44 minutes ago

    In what alternative reality is that the case?

  • embedding-shape 2 hours ago

    > CUDA that a couple of years ago, and now CUDA is all but irrelevant

    What? CUDA won't be irrelevant for years even if all the competitors figure out the holy grail, the ecosystem doesn't suddenly migrate over night. People learning CUDA today will continue to be find jobs and opportunities across the sector for the near future without any worries.

    > but at the same time, the ML architecture space has collapsed entirely and everyone is just using transformers.

    That's also not true, the ML space is still growing, and lots of things outside of Transformers, but it requires you to actually look and pay attention, not just browse the HN and r/localllama frontpage.

    Overall, these do not seem to be the sentiments coming from someone inside the ML space, but rather from an onlookers perspective.

  • almostgotcaught 2 hours ago

    > and now CUDA is all but irrelevant.

    Lol this is so wrong it's cringe.

    > There's now so many different and opinionated takes on how you should write high performant accelerator cluster code. I love it.

    There are literally only 2: SIMT (ie the same as it always was) and tiles (ie Triton). That's it. Helion is just Triton with more auto-tuning (Triton already has auto-tuning).

    • the__alchemist 27 minutes ago

      Even for non-ML things like chem simulations: CUDA (and cuFFT) are more pleasant to use than Vulkan Compute and vkFFT.

      • ozgrakkurt 3 minutes ago

        I just learned the graphics api of vulkan, can’t imagine anything being less pleasant than vulkan