Good point, although it's possible that with the extreme price of GPUs that it c...

greenavocado · on April 4, 2024

Linux reserved cost of p3.16xlarge is $146,362.0800 annually. On-demand cost is $214,444.8000 annually

I am pretty damn sure I could build a 8 GPU Intel Xeon E5-2686 v4 (Broadwell) (that's what Amazon uses - it's $30 to $75 on eBay) server for less than that and come out ahead on electricity even at full throttle. RTX 4090 are just under $2000 each on eBay.

8 GPU × $2000 (RTX 4090) + $1000 (for the rest of the computer) = $17,000

If pulling 2kW continuously at 15 cents per kW*hr for 1 year that's 2000 watts × 365 days × (0.15/(kW×hr)) or $2,628

In total the computer will cost $19,628 if you throw it in the dumpster at the end of each calendar year of using it.

If you stack internet cost of $200 a month on top, that's $2400 a year, which raises your annual cost to: $22,028

This is still $124,334 cheaper per year than one AWS 8-GPU server if you fully depreciate your own hardware at the end of year 1 to $0.

I could hire an engineer in America to babysit it with the money left over.

OtherShrezzing · on April 4, 2024

Are consumer grade RTX 4090 cards going to be suitable for running full tilt 24/7 for a year? Those things are fine to stress on the latest game for a few hours at a time, but would probably cause some defects from significant heat stress after just a few days at 100%.

This is inconsequential when you're playing Overwatch for a few hours a night and a frame drops now and again. If you're training an iteratively developed LLM though, physical defects could propagate into huge deficiencies in the final model.

oceanplexian · on April 4, 2024

Yep absolutely, crypto miners have been doing it for years.

I still think it would be impractical at scale because they are so much more hot and power hungry than the datacenter cards, and you would be lucky to score one or two if you’re on a wait list.

eek2121 · on April 4, 2024

Except you can absolutely obtain 4090s today, while enterprise hardware is (was? haven't looked at the data) recently, which is the exact opposite scenario you mentioned.

I'm actually really surprised that you can still buy 4090s for under $2,000 (cheapest available I saw was $1,800 new and I only took 30 seconds to look), but you can usually sell certain models for quite a bit more. For example, my used 4090 FE is currently worth more than I paid for it.

I've played with AI, and while admittedly I've not done anything super serious, I can tell you that both the 3090 and 4090 are more than capable of performing. Tie them with a power efficient AMD CPU and you have something that can be competitive with enterprise (somewhat).

I've seen the pricing of "cloud" offerings and I've toyed with the idea of creating an "AI Cloud" because I have access to really fast internet and super cheap electricity, but I haven't executed because I'm most certainly not a salesperson. I do, however, know enough about marketing that one should not target price, so there is that...

pizza · on April 5, 2024

You could under-volt or watt-limit a bit and lose just a fraction of FLOPS for much less heat/power though, depending on the workload

_ea1k · on April 4, 2024

I don't think they'd become a fire hazard, but it is true that one would likely pick something else for this application.

Having said that, switching to something like the Tesla V100-SXM2-16GB wouldn't cost that much more.

TBH, I'm shocked at how many people treat Amazon as the first choice for this stuff. Much of it isn't even what most would consider a "production" workload. You are paying for a lot of enterprise-readiness that you don't need for training.

robrenaud · on April 4, 2024

If you wanted to finetune a Mixtral 8x7B, what would you use?

_ea1k · on April 5, 2024

Given the relative availability, I'd probably try to do it with a couple of rtx4090s on tensordock.

pdntspa · on April 5, 2024

> TBH, I'm shocked at how many people treat Amazon as the first choice for this stuff

You can thank Amazon's legions of salespeople for that, particularly the end of year junket in Las Vegas where attendees are so pampered that about the only thing they won't do is suck your dick

Oh, yeah, they'll also yell at you on stage if you complain about their UI

vidarh · on April 4, 2024

Though this comparison is really only relevant for a couple of machines. Beyond that, at this cost, if you pay AWS list prices "at scale" you're doing something very wrong.

Don't get me wrong - I've frequently argued that AWS is price gouging and relying on peoples lack of understanding of how the devops costs of running your own works out, but it doesn't take a huge budget before this calculation will look very different (still cheaper to own your own, though).

renewiltord · on April 4, 2024

You can build old Xeon based but only has 40 lane PCIe. For training 8 GPUs how do you push data fast? I’m using 7000 series Epyc for this to get 128 lanes. Have you built this kind of machine? You see good speed with 40 lane? Curious because then I can use old Tyan motherboard which comes in full case with good layout for multi GPU. Epyc based I have to use riser and custom frame which is painful.

New Tyan more costly but great case layout.

ghshephard · on April 4, 2024

An A100-80 GPU goes for about $20K each.

greenavocado · on April 4, 2024

The instances in question use Tesla V100-SXM2-16GB

samus · on April 5, 2024

Since the GPUs can be rented out afterwards, they amortize very quickly with prices in the order of $1/h.

fleischhauf · on April 4, 2024

I think AWS prices scale with hardware price