More

dylan522p · on April 2, 2023

The article does answer the question, in detail. What you're reading is just the first half is freely available. The 2nd half is on the same web page and for subscribers.

dylan522p · on March 19, 2023

They use some of this too btw. Also wavelength level routing happens with breakout cables from ToR to Compute.

dylan522p · on March 2, 2023

OPC? It'd be great to talk about your area, because I bet most don't know what you do, and maybe I can make it an entertaining read.

lizknope · on March 3, 2023

I don't deal with OPC. That's handled after me. I mainly do floorplanning, place and route, static timing analysis, and write tons of scripts in Tcl for our CAD flow. This is a quick summary.

https://en.wikipedia.org/wiki/Physical_design_(electronics)

dylan522p · on Feb 11, 2023

No TPUs are getting better architecturally

dylan522p · on Feb 11, 2023

Your math is completely wrong dude.

It's 2000ms per token not for the whole query.

Hardware utilization rates and MFU are not the same thing, you forgot the latter.

You are pretending its perfectly parelelized on 1 GPU too. I use 8x GPU box throughput.

dylan522p · on Feb 11, 2023

> This article is based on wild speculations of how much things cost

Huh it literally uses real throughput figures

> doesn’t account for ways to make things cheaper over time

It does in the subscriber section and it says it does say that in the free section.

> We already have papers that suggest most big models are undertrained and smaller models can get the same accuracy.

Why assume 2023 model is the same as 2020 GPT-3 175B parameter

dylan522p · on Feb 11, 2023

- Google qps is closer to 100k then 320k [1]

That number is wrong. I have a number from googler, not livestats which cannot have google internal data

- Not every query has to run on LLM. Probably only 10% would benefit from it

Agree, i have something different coming up that looks into this more, 10% may be too low. I know i used 100% which isn't right, and explicitly say that

- This means 10,000 queries per second, each needing 5 A100s to run, so 50,000 A100s are sufficient. Cost for that is $500MM, quadruple that to $2B with CPU/RAM/storage/network. That is peanuts for Google.

50k A100s networking ramp cost way more than $2B HW utilization rate

- Latency, not cost, is a bigger issue. This should be addressed soon by H100 and newer chips.

Thats discussed in the subscriber section. It's both, but yes latency is bigger issue. H100 helps but doesn't solve.

dylan522p · on Jan 17, 2023

cutlass is faster actually, in most cases where cutlass supports same stuff.

cjbgkagh · on Jan 17, 2023

I presumed they’d fold cutlass improvements into cuBASE so cuBASE would be at least as fast. Maybe some compile time optimizations?

dylan522p · on Jan 16, 2023

Thanks! Ya, the archive link is useless, it only captures the free part, and I think I am very generous with what I keep in the free section on the ad-free website.

dylan522p · on July 21, 2022

Tsmc 7nm doesn't use EUV and is fantastic

jjoonathan · on July 21, 2022

It uses EUV.

> 7nm FinFET plus (N7+) technology entered full-scale production in 2019 and delivered customer 7nm products to market in high volume. N7+ technology is the first commercially available extreme ultraviolet EUV-enabled foundry manufacturing process

https://www.tsmc.com/english/dedicatedFoundry/technology/log...

It looks like there was a 7nm SRAM process in 2016 that only used DUV, but I'd be surprised if more than, say, 1% of "7nm" mentions over the last few years were intended to refer to the 2016 process rather than the 2019 process.

sudosysgen · on July 21, 2022

That is N7+, normal N7 was without EUV

According to Wikipedia: https://en.wikipedia.org/wiki/7_nm_process , mass production with N7+ which uses EUV was ramping up from 2018 to 2019. It seems that the Apple A12 as well as some Ryzen chips were using the N7 DUV process.