That's the part that surprised me as well; it doesn't seem like a field that should be eligible for anything other than an exact match. I am unable to conceive of a use case for pattern matching account IDs.
If for some reason you’re dealing with thousands of accounts that are architecturally indistinguishable, bucketing them by ID prefix isn’t a particularly wild thing to want to do.
AWS assigns these individually, and customers can’t influence the ID that they get. For access control purposes I see no valid use case for wildcards there.
Sharding on account ID might make sense if someone has a large number of them, but that would not necessitate wildcard matching.
But it could seem like a neat and obvious way to reduce policy size (which is limited) and make it arguably more readable, or at least the intention clearer. (I might assume `2847373847261`, `37385857721`, `5847262671`, ... is `*1` over our accounts, but I might be wrong, or I might forget (/not correctly automate) to add the new one.)
It sure could. If you're sharding by id and have some per-shard resources, they could definitely get permissions to only accounts 12345*. (I'm not saying it's a good idea, just that once you're in that situation, you would pattern match on partial IDs)
But account IDs are assigned by Amazon and there's no structure within the namespace that's useful to you. If you mean all, you can wildcard * - but there doesn't seem to be any legitimate cases for "all account ids beginning with a 1".
Presumably the permissions language is broadly defined and has something like
filter = property, operand, value
with few constraints on which operands can be used in which situations, to keep parsing the language simple (parsers being notoriously prone to vulnerabilities after all). In retrospect perhaps that isn't a good trade-off, but it would be tricky to tighten things up now without breaking lots of existing users.
I think the GP is talking about granting access to a particular bucket to an unbounded number of customer AWS accounts — probably in requester-pays config. (Think: static data used by an Amazon Marketplace virtual appliance.)
They don't need to — in fact, you might be depending on them not following any particular pattern. Think: treating the Account IDs as pre-hashed keys, and then specifying prefix patterns as ways of sharding the hash keys onto a set of buckets, to evenly distribute access (and therefore traffic) by customer.
GET requests? Probably not. PUT/DELETE requests? I think so, yes. All updates to the bucket ultimately bottleneck at an update to the meta-version of the bucket-record-object in the object-storage-system's bucket-metadata store (itself probably something like DynamoDB / BigTable / etc.)
Given the way these IaaSs' distributed KV stores all manage writes (i.e. by having a cluster of transactor nodes that per-key write-linearization responsibility for parts of the keyspace is sharded across — such that writes are fanned out to a particular designated transactor-node given the key's hash-slot), a very large S3 user, generating an extremely high level of metadata-update concurrency against a bucket, could very likely write-contend that bucket's metadata / have a "hot" bucket-metadata key; experience low perf due to that; and solve that by sharding the bucket (swapping one too-hot metadata key for N somewhat-hot metadata keys.)
I want to give an intuition-building example here, of an IaaS feature that wouldn't exist / wouldn't be exposed to the user if not for object-storage buckets being metadata-write-contended at scale. I'm not very familiar with the AWS ecosystem, though, so I'm not sure what the good example is for AWS. What I do know is GCP, so here's a GCP example: Google Cloud Dataflow allows you to set a temporary workspace GCS bucket on a per-job basis (gcsTempLocation). And, IIRC, Google's Cloud Architects advise to not have a bunch of active Dataflow jobs sharing the same gcsTempLocation — regardless of whether they use distinct key prefixes to namespace the temp files. Given that each job would be doing a lot of little serial updates to the temp bucket — and given that Dataflow jobs can each be highly internally concurrent — you're already potentially putting out O(N^2) ~concurrent updates to that bucket. You really don't want to make it O(N^3).
If you're big enough to think you need it, you're also big enough to have people who can tell you it's a bad idea and there's a better tool for the job.
aws support is not going to bend over backwards just to let you shoot yourself in the foot. it's more likely they grant an exception to one of the iam quotas.
My general assumption is not that they’re random, but at least that they’re not correlated; in particular that Amazon is not in the habit of handing out, like, account IDs 676363687000 - 676363687999 to a single organization. Even if they did hand out a sequential batch of 1000 account IDs, it would be more likely to be 676363687541 - 676363688540 than a set with a single consistent prefix.
Odds are that an account wildcard match like 676363687* will just match a few hundred entirely random AWS accounts.
> in particular that Amazon is not in the habit of handing out, like, account IDs 676363687000 - 676363687999 to a single organization
Honestly, wouldn't surprise me that much if they were willing to accommodate this if for sufficiently large accounts. It'd still pretty sketchy to design your access control around, but it wouldn't be unrealistic.
I once was involved in creating two (linked) amazon accounts at the "same" time, and ended up with account IDs of which the first 4 digits are identical.
it's irrelevant whether they're "cryptographically" random, all that matters is that account IDs are not controlled by the user and therefore have no logical relation to any access-control policies the user may wish to implement