Explicitly setting GOMAXPROCS is probably the cleanest way to limit CPU among the runtimes that are out there, however. For example, if you set requests = 1, limits = 1, GOMAXPROCS=1, then you will never run into the latency-increasing cfs cpu throttling; you would be throttled if you used more than 1 CPU, but since you can't (modulo forks, of course), it won't happen. There is https://github.com/uber-go/automaxprocs to set this automatically, if you care.
You are right that by default, the logic that sets GOMAXPROCS is unaware of the limits you've set. That means GOMAXPROCS will be something much higher than your cpu limit, and an application that uses all available CPUs will use all of its quota early on in the cfs_period_us interval, and then sleep for the rest of it. This is bad for latency.
Setting GOMAXPROCS explicitly is the best practice in my experience. The runtime latches in a value for runtime.NumCPU() based on the population count of the cpumask at startup. The cpumask can change if kubernetes schedules or de-schedules a "guaranteed" pod on your node and the kubelet is using the static CPU management policy, and it will vary from node to node if you have various types of machines. You don't want to have 100 replicas of your microservice all using different, randomly-chose values of GOMAXPROCS.
You are right that by default, the logic that sets GOMAXPROCS is unaware of the limits you've set. That means GOMAXPROCS will be something much higher than your cpu limit, and an application that uses all available CPUs will use all of its quota early on in the cfs_period_us interval, and then sleep for the rest of it. This is bad for latency.