Building the heap: racking 30 petabytes of hard drives for pretraining

si.inc

138 points by nee1r 3 hours ago


drnick1 - 11 minutes ago

Everyone should give AWS the middle finger and start doing this. Beyond cost, it's a matter of sovereignty over one's computing and data.

trebligdivad - 23 minutes ago

The networking stuff seems....odd.

'Networking was a substantial cost and required experimentation. We did not use DHCP as most enterprise switches don’t support it and we wanted public IPs for the nodes for convenient and performant access from our servers. While this is an area where we would have saved time with a cloud solution, we had our networking up within days and kinks ironed out within ~3 weeks.'

Where does the switch choice come into whether you DHCP? Wth would you want public IPs.

jonas21 - 2 hours ago

Nice writeup. All of the technical detail is great!

I'm curious about the process of getting colo space. Did you use a broker? Did you negotiate, and if so, how large was the difference in price between what you initially were quoted and what you ended up paying?

jimmytucson - an hour ago

Just wanted to say, thanks for doing this! Now the old rant...

I started my career when on-prem was the norm and remember so much trouble. When you have long-lived hardware, eventually, no matter how hard you try, you just start to treat it as a pet and state naturally accumulates. Then, as the hardware starts to be not good enough, you need to upgrade. There's an internal team that presents the "commodity" interface, so you have to pick out your new hardware from their list and get the cost approved (it's a lot harder to just spend a little more and get a little more). Then your projects are delayed by them racking the new hardware and you properly "un-petting" your pets so they can respawn on the new devices, etc.

Anyways, when cloud came along, I was like, yeah we're switching and never going back. Buuut, come to find out that's part of the master plan: it's a no-brainer good deal until you and everyone in your org/company/industry forgets HTF to rack their own hardware, and then it starts to go from no-brainer to brainer. And basically unless you start to pull back and rebuild that muscle, it will go from brainer to no-brainer bad deal. So thanks for building this muscle!

archmaster - 2 hours ago

Had the pleasure of helping rack drives! Nothing more fun than an insane amount of data :P

RagnarD - 2 hours ago

I love this story. This is true hacking and startup cost awareness.

pronoiac - an hour ago

I wonder if they'll go with "toploaders" - like Backblaze Storage Pods - later. They have better density and faster setup, as they don't have to screw in every drive.

They got used drives. I wonder if they did any testing? I've gotten used drives that were DOA, which showed up in tests - SMART tests, short and long, then writing pseudorandom data to verify capacity.

g413n - 3 hours ago

No mention of disk failure rates? curious how it's holding up after a few months

alchemist1e9 - 8 minutes ago

Would have been much easier and probably cheaper to buy gear from 45drives.

tarasglek - 23 minutes ago

i am still confused what their software stack is, they dont use ceph but bought netapp, so they use nfs?

boulos - 2 hours ago

It's quite cheap to just store data at rest, but I'm pretty confused by the training and networking set up here. It sounds like from other comments that you're not going to put the GPUs in the same location, so you'll be doing all training over X 100 Gbps lines between sites? Aren't you going to end up totally bottlenecked during pretraining here?

fragmede - 24 minutes ago

My question isn't why do it yourself. A quick back of the envelope math shows AWS being much more expensive. My question is why San Francisco? It's one of the most expensive real estate markets in the US (#2 residential, #1 commercial), and electricity is expensive. $0.71/KwH peak residential rate! A jaunt down 280 to San Jose's gonna be cheaper, at the expense of. having to take that drive to get hands on. But I'm sure you can find someone who's capable of running a DC that lives in San Jose and needs a job so the SF team doesn't have to commute down to South Bay. Now obviously there's something to be said for having the rack in the office, I know of at least two (three, now) in San Francisco, it just seems like a weird decision if you're already worrying about money to the point of not using AWS.

OliverGuy - 22 minutes ago

Aren't those netapp shelves pretty old at this point? See a lot of people recommending against them even for homelab type uses. You can get those 60 drive SuperMicro JBODs for pretty cheap now, and those aren't too old, would have been my choice.

Plus, the TCO is already way under the cloud equiv. so might as well spend a little more to get something much newer and more reliable

nharada - 3 hours ago

So how do they get this data to the GPUs now...? Just run it over the public internet to the datacenter?

ClaireBookworm - 3 hours ago

great write up, really appreciate the explanations / showing the process

mschuster91 - 3 hours ago

Shows how crazy cheap on prem can be. tips hat

synack - 39 minutes ago

IPMI is great and all, but I still prefer serial ports and remote PDUs. Never met a BMC I could trust.

ttfvjktesd - 2 hours ago

The biggest part that is always missing in such comparisons is the employee salaries. In the calculation they give $354k/year of total cost per year. But now add the cost of staff in SF to operate that thing.

not--felix - 3 hours ago

But where do you get 90 million hours worth of video data?

miltonlost - 2 hours ago

And how much did the training data cost?

huxley_marvit - 2 hours ago

damn this is cool as hell. estimate on the maintenance cost in person-hours/month?

g413n - 3 hours ago

the doodles are great

OutOfHere - 2 hours ago

Is it correct that you have zero data redundancy? This may work for you if you're just hoarding videos from YouTube, but not for most people who require an assurance that their data is safe. Even for you, it may hurt proper benchmarking, reproducibility, and multi-iteration training if the parent source disappears.

leejaeho - 3 hours ago

how long do you think it'll be before you fill all of it and have to build another cluster LOL

miniman1337 - 3 hours ago

Used Disks, No DR, not exactly a real shoot out.

zparky - 2 hours ago

$125/disk, 12k/mo depreciation cost which i assume means disk failures, so ~100 disks/mo or 1200/yr, which is half of their disks a year - seems like a lot.