That's the part that surprised me as well; it doesn't seem like a field that sho...

twoodfin · on Feb 26, 2024

If for some reason you’re dealing with thousands of accounts that are architecturally indistinguishable, bucketing them by ID prefix isn’t a particularly wild thing to want to do.

efitz · on Feb 26, 2024

AWS assigns these individually, and customers can’t influence the ID that they get. For access control purposes I see no valid use case for wildcards there.

Sharding on account ID might make sense if someone has a large number of them, but that would not necessitate wildcard matching.

OJFord · on Feb 27, 2024

But it could seem like a neat and obvious way to reduce policy size (which is limited) and make it arguably more readable, or at least the intention clearer. (I might assume `2847373847261`, `37385857721`, `5847262671`, ... is `*1` over our accounts, but I might be wrong, or I might forget (/not correctly automate) to add the new one.)

viraptor · on Feb 26, 2024

It sure could. If you're sharding by id and have some per-shard resources, they could definitely get permissions to only accounts 12345*. (I'm not saying it's a good idea, just that once you're in that situation, you would pattern match on partial IDs)

NovemberWhiskey · on Feb 26, 2024

But account IDs are assigned by Amazon and there's no structure within the namespace that's useful to you. If you mean all, you can wildcard * - but there doesn't seem to be any legitimate cases for "all account ids beginning with a 1".

remus · on Feb 26, 2024

Presumably the permissions language is broadly defined and has something like

    filter = property, operand, value

with few constraints on which operands can be used in which situations, to keep parsing the language simple (parsers being notoriously prone to vulnerabilities after all). In retrospect perhaps that isn't a good trade-off, but it would be tricky to tighten things up now without breaking lots of existing users.

x3n0ph3n3 · on Feb 26, 2024

At that point you should be using AWS Organizations and OUs.

derefr · on Feb 26, 2024

I think the GP is talking about granting access to a particular bucket to an unbounded number of customer AWS accounts — probably in requester-pays config. (Think: static data used by an Amazon Marketplace virtual appliance.)

rtkwe · on Feb 26, 2024

Those wouldn't follow any particular partial pattern though would they?

derefr · on Feb 27, 2024

They don't need to — in fact, you might be depending on them not following any particular pattern. Think: treating the Account IDs as pre-hashed keys, and then specifying prefix patterns as ways of sharding the hash keys onto a set of buckets, to evenly distribute access (and therefore traffic) by customer.

rtkwe · on Feb 27, 2024

Do you ever really need to do that on AWS though? Can you really bog down S3 by having loads of requests to a single bucket?

derefr · on Feb 27, 2024

GET requests? Probably not. PUT/DELETE requests? I think so, yes. All updates to the bucket ultimately bottleneck at an update to the meta-version of the bucket-record-object in the object-storage-system's bucket-metadata store (itself probably something like DynamoDB / BigTable / etc.)

Given the way these IaaSs' distributed KV stores all manage writes (i.e. by having a cluster of transactor nodes that per-key write-linearization responsibility for parts of the keyspace is sharded across — such that writes are fanned out to a particular designated transactor-node given the key's hash-slot), a very large S3 user, generating an extremely high level of metadata-update concurrency against a bucket, could very likely write-contend that bucket's metadata / have a "hot" bucket-metadata key; experience low perf due to that; and solve that by sharding the bucket (swapping one too-hot metadata key for N somewhat-hot metadata keys.)

I want to give an intuition-building example here, of an IaaS feature that wouldn't exist / wouldn't be exposed to the user if not for object-storage buckets being metadata-write-contended at scale. I'm not very familiar with the AWS ecosystem, though, so I'm not sure what the good example is for AWS. What I do know is GCP, so here's a GCP example: Google Cloud Dataflow allows you to set a temporary workspace GCS bucket on a per-job basis (gcsTempLocation). And, IIRC, Google's Cloud Architects advise to not have a bunch of active Dataflow jobs sharing the same gcsTempLocation — regardless of whether they use distinct key prefixes to namespace the temp files. Given that each job would be doing a lot of little serial updates to the temp bucket — and given that Dataflow jobs can each be highly internally concurrent — you're already potentially putting out O(N^2) ~concurrent updates to that bucket. You really don't want to make it O(N^3).

nine_k · on Feb 27, 2024

Bucketing them by prefix of the end-user ID is not exactly smart.

Either you bucket by an internal ID and give the user a hash, or you give the user an ID and bucket by your internal hash.

Users have no business knowing your sharding scheme.

dclowd9901 · on Feb 26, 2024

Can you even block off IDs like this?

tempay · on Feb 26, 2024

If you’re big enough to need it I’m sure it can be arranged.

NovemberWhiskey · on Feb 26, 2024

If you're big enough to think you need it, you're also big enough to have people who can tell you it's a bad idea and there's a better tool for the job.

leetcrew · on Feb 27, 2024

aws support is not going to bend over backwards just to let you shoot yourself in the foot. it's more likely they grant an exception to one of the iam quotas.

PaulHoule · on Feb 26, 2024

If there are a lot of them and there is a b-tree index somewhere you might find it useful to scan them in alphabetical order.

sroussey · on Feb 26, 2024

Well, that assumes that the ID is cryptographically random. Perhaps that is a bad assumption.

jameshart · on Feb 26, 2024

My general assumption is not that they’re random, but at least that they’re not correlated; in particular that Amazon is not in the habit of handing out, like, account IDs 676363687000 - 676363687999 to a single organization. Even if they did hand out a sequential batch of 1000 account IDs, it would be more likely to be 676363687541 - 676363688540 than a set with a single consistent prefix.

Odds are that an account wildcard match like 676363687* will just match a few hundred entirely random AWS accounts.

naniwaduni · on Feb 26, 2024

> in particular that Amazon is not in the habit of handing out, like, account IDs 676363687000 - 676363687999 to a single organization

Honestly, wouldn't surprise me that much if they were willing to accommodate this if for sufficiently large accounts. It'd still pretty sketchy to design your access control around, but it wouldn't be unrealistic.

dpedu · on Feb 26, 2024

I once was involved in creating two (linked) amazon accounts at the "same" time, and ended up with account IDs of which the first 4 digits are identical.

jameshart · on Feb 26, 2024

A namespace you and only 100 million other accounts share - probably reasonable to just grant access to “1234*”.

pc86 · on Feb 26, 2024

Even if it's not that doesn't mean pattern matching IDs is good.

sroussey · on Feb 27, 2024

Oh god, no. Seems like a bad idea indeed! But might give some insight into their system.

brickteacup · on Feb 27, 2024

it's irrelevant whether they're "cryptographically" random, all that matters is that account IDs are not controlled by the user and therefore have no logical relation to any access-control policies the user may wish to implement

dclowd9901 · on Feb 26, 2024

If it's sequential, that somehow seems worse?