Well, you still need the right kind of hardware to run it, and my money has been on AMD to deliver the solutions for that.

[email protected]

Well, you still need the right kind of hardware to run it, and my money has been on AMD to deliver the solutions for that. Nvidia has gone full-blown stupid on the shit they are selling, and AMD is all about cost and power efficiency, plus they saw the writing on the wall for Nvidia a long time ago and started down the path for FPGA, which I think will ultimately be the same choice for running this stuff.

[email protected]

Built a new PC for the first time in a decade last spring. Went full team red for the first time ever. Very happy with that choice so far.

[email protected]

I'm way behind on the hardware at this point.

Are you saying that AMD is moving toward an FPGA chip on GPU products?

While I see the appeal - that's going to dramatically increase cost to the end user.

[email protected]

I think the idea is that you can optimise it for the model or maybe? (Guessing mostly)

[email protected]

No.

GPU is good for graphics. That's what is designed and built for. It just so happens to be good at dealing with programmatic neural network tasks because of parallelism.

FPGA is fully programmable to do whatever you want, and reprogram on the fly. Pretty perfect for reducing costs if you have a platform that does things like audio processing, then video processing, or deep learning, especially in cloud environments. Instead of spinning up a bunch of expensive single-phroose instances, you can just spin up one FPGA type, and reprogram on the fly to best perform on the work at hand when the code starts up. Simple.

AMD bought Xilinx in 2019 when they were still a fledgling company because they realized the benefit of this. They are now selling mass amounts of these chips to data centers everywhere. It's also what the XDNA coprocessors on all the newer Ryzen chips are built on, so home users have access to an FPGA chip right there. It's efficient, cheaper to make than a GPU, and can perform better on lots of non-graphic tasks than GPUs without all the massive power and cooling needs. Nvidia has nothing on the roadmap to even compete, and they're about to find out what a stupid mistake that is.

[email protected]

From a "compute" perspective (so not consumer graphics), power... doesn't really matter. There have been decades of research on the topic and it almost always boils down to "Run it at full bore for a shorter period of time" being better (outside of the kinds of corner cases that make for "top tier" thesis work).

AMD (and Intel) are very popular for their cost to performance ratios. Jensen is the big dog and he prices accordingly. But... while there is a lot of money in adapting models and middleware to AMD, the problem is still that not ALL models and middleware are ported. So it becomes a question of whether it is worth buying AMD when you'll still want/need nVidia for the latest and greatest. Which tends to be why those orgs tend to be closer to an Azure or AWS where they are selling tiered hardware.

Which... is the same issue for FPGAs. There is a reason that EVERYBODY did their best to vilify and kill opencl and it is not just because most code was thousands of lines of boilerplate and tens of lines of kernels. Which gets back to "Well. I can run this older model cheap but I still want nvidia for the new stuff...."

Which is why I think nvidia's stock dropping is likely more about traders gaming the system than anything else. Because the work to use older models more efficiently and cheaply has already been a thing. And for the new stuff? You still want all the chooch.

[email protected]

Your assessment is missing the simple fact that FPGA can do things a GPU cannot faster, and more cost efficiently though. Nvidia is the Ford F-150 of the data center world, sure. It's stupidly huge, ridiculously expensive, and generally not needed unless it's being used at full utilization all the time. That's like the only time it makes sense.

If you want to run your own models that have a specific purpose, say, for scientific work folding proteins, and you might have several custom extensible layers that do different things, N idia hardware and software doesn't even support this because of the nature of Tensorrt. They JUST announced future support for such things, and it will take quite some time and some vendor lock-in for models to appropriately support it.....OR

Just use FPGAs to do the same work faster now for most of those things. The GenAI bullshit bandwagon finally has a wheel off, and it's obvious people don't care about the OpenAI approach to having one model doing everything. Compute work on this is already transitioning to single purpose workloads, which AMD saw coming and is prepared for. Nvidia is still out there selling these F-150s to idiots who just want to piss away money.

[email protected]

Your assessment is missing the simple fact that FPGA can do things a GPU cannot faster

Yes, there are corner cases (many of which no longer exist because of software/compiler enhancements but...). But there is always the argument of "Okay. So we run at 40% efficiency but our GPU is 500% faster so..."

Nvidia is the Ford F-150 of the data center world, sure. It’s stupidly huge, ridiculously expensive, and generally not needed unless it’s being used at full utilization all the time. That’s like the only time it makes sense.

You are thinking of this like a consumer where those thoughts are completely valid (just look at how often I pack my hatchback dangerously full on the way to and from Lowes...). But also... everyone should have that one friend with a pickup truck for when they need to move or take a load of stuff down to the dump or whatever. Owning a truck yourself is stupid but knowing someone who does...

Which gets to the idea of having a fleet of work vehicles versus a personal vehicle. There is a reason so many companies have pickup trucks (maybe not an f150 but something actually practical). Because, yeah, the gas consumption when you are just driving to the office is expensive. But when you don't have to drive back to headquarters to swap out vehicles when you realize you need to go buy some pipe and get all the fun tools? It pays off pretty fast and the question stops becoming "Are we wasting gas money?" and more "Why do we have a car that we just use for giving quotes on jobs once a month?"

Which gets back to the data center issue. The vast majority DO have a good range of cards either due to outright buying AMD/Intel or just having older generations of cards that are still in use. And, as a consumer, you can save a lot of money by using a cheaper node. But... they are going to still need the big chonky boys which means they are still going to be paying for Jensen's new jacket. At which point... how many of the older cards do they REALLY need to keep in service?

Which gets back down to "is it actually cost effective?" when you likely need

[email protected]

Is XDNA actually an FPGA? My understanding was that it's an ASIC implementation of the Xilinx NPU IP. You can't arbitrarily modify it.

[email protected]

I'm thinking of this as someone who works in the space, and has for a long time.

An hour of time for a g4dn instance in AWS is 4x the cost of an FPGA that can do the same work faster in MOST cases. These aren't edge cases, they are MOST cases. Look at a Sagemaker, AML, GMT pricing for the real cost sinks here as well.

The raw power and cooling costs contribute to that pricing cost. At the end of the day, every company will choose to do it faster and cheaper, and nothing about Nvidia hardware fits into either of those categories unless you're talking about milliseconds of timing, which THEN only fits into a mold of OpenAI's definition.

None of this bullshit will be a web-based service in a few years.

[email protected]

And you are basically a single consumer with a personal car relative to those data centers and cloud computing providers.

YOUR workload works well with an FPGA. Good for you, take advantage of that to the best degree you can.

People;/companies who want to run newer models that haven't been optimized for/don't support FPGAs? You get back to the case of "Well... I can run a 25% cheaper node for twice as long?". That isn't to say that people shouldn't be running these numbers (most companies WOULD benefit from the cheaper nodes for 24/7 jobs and the like). But your use case is not everyone's use case.

And, it once again, boils down to: If people are going to require the latest and greatest nvidia, what incentive is there in spending significant amounts of money getting it to work on a five year old AMD? Which is where smaller businesses and researchers looking for a buyout come into play.

At the end of the day, every company will choose to do it faster and cheaper, and nothing about Nvidia hardware fits into either of those categories unless you’re talking about milliseconds of timing, which THEN only fits into a mold of OpenAI’s definition.

Faster is almost always cheaper. There have been decades of research into this and it almost always boils down to it being cheaper to just run at full speed (if you have the ability to) and then turn it off rather than run it longer but at a lower clock speed or with fewer transistors.

And nVidia wouldn't even let the word "cheaper" see the glory that is Jensen's latest jacket that costs more than my car does. But if you are somehow claiming that "faster" doesn't apply to that company then... you know nothing (... Jon Snow).

unless you’re talking about milliseconds of timing

So... its not faster unless you are talking about time?

Also, milliseconds really DO matter when you are trying to make something responsive and already dealing with round trip times with a client. And they add up quite a bit when you are trying to lower your overall footprint so that you only need 4 notes instead of 5.

They don't ALWAYS add up, depending on your use case. But for the data centers that are selling computers by time? Yeah,. time matters.

So I will just repeat this: Your use case is not everyone's use case.

[email protected]

I remember Xilinx from way back in the 90s when I was taking my EE degree, so they were hardly a fledgling in 2019.

Not disputing your overall point, just that detail because it stood out for me since Xilinx is a name I remember well, mostly because it's unusual.

[email protected]

They were kind of pioneering the space, but about to collapse. AMD did good by scooping them up.

[email protected]

I mean...I can shut this down pretty simply. Nvidia makes GPUs that are currently used as a blunt force tool, which is dumb, and now that the grift has been blown, OpenAI, Anthropic, Meta, and all the others trying to make a business center around a really simple tooling that is open source, are about to be under so much scrutiny for the cost that everyone will figure out that there are cheaper ways to do this.

Plus AMD, Con Nvidia. It's really simple.

[email protected]

Ah. Apologies for trying to have a technical conversation with you.

[email protected]

FPGAs have been a thing for ages.

If I remember it correctly (I learned this stuff 3 decades ago) they were basically an improvement on logic circuits without clocks (think stuff like NAND and XOR gates - digital signals just go in and the result comes out on the other side with no delay beyond that caused by analog elements such as parasitical inductances and capacitances, so without waiting for a clock transition).

The thing is, back then clocking of digital circuits really took off (because it's WAY simpler to have things done one stage at a time with a clock synchronizing when results are read from one stage and sent to the next stage, since different gates have different delays and so making sure results are only read after the slowest path is done is complicated) so all CPU and GPU architecture nowadays are based on having a clock, with clock transitions dictating things like when is each step of processing a CPU/GPU instruction started.

Circuits without clocks have the capability of being way faster than circuits with clocks if you can manage the problem of different digital elements having different delays in producing results I think what we're seeing here is a revival of using circuits without clocks (or at least with blocks of logic done between clock transitions which are much longer and more complex than the processing of a single GPU instruction).

[email protected]

Yes, but I'm not sure what your argument is here.

Least resistance to an outcome (in this case whatever you program it to do) is faster.

Applicable to waterfall flows, FPGA makes absolute sense for the neural networks as they operate now.

I'm confused on your argument against this and why GPU is better. The benchmarks are out in the world, go look them up.

[email protected]

I have you an explanation, and how it is used and perceived. You can ignore that all day long, but point is still valid

[email protected]

What "point"?

Your "point" was "Well I don't need it" while ignoring that I was referring to the market as a whole. And then you went on some Team Red rant because apparently AMD is YOUR friend or whatever.

[email protected]

I'm not making an argument against it, just clarifying were it sits as technology.

As I see it, it's like electric cars - a technology that was overtaken by something else in the early days when that domain was starting even though it was the first to come out (the first cars were electric and the ICE engine was invented later) and which has now a chance to be successful again because many other things have changed in the meanwhile and we're a lot closes to the limits of the tech that did got widely adopted back in the early days.

It actually makes a lot of sense to improve the speed of what programming can do by getting it to be capable of also work outside the step-by-step instruction execution straight-jacked which is the CPU/GPU clock.