Repo maintainer here. ...can someone explain how the repo keeps resurfacing? I h...

boyter · on Nov 16, 2018

Since you are here. Thanks for making this. I recently used it to prove to a client that the api I delivered could take any content they cared to throw at it. They were especially impressed considering they were coming from a 35 year old system that only allowed ASCII.

The BLNS allowed me to prove it and I hooked it into our integration and fuzz tests which managed to shake out a few bugs.

minimaxir · on Nov 16, 2018

You're very welcome! :D

awalton · on Nov 16, 2018

It was brought up during VMware's internal security conference called "MooseCon" earlier today during a talk on Unicode.

No idea if it's just coincidental resonance though.

minimaxir · on Nov 16, 2018

Yes, a few people from VMWare made a PR (which I just merged).

sbr464 · on Nov 16, 2018

Tangentially related to the original project intent;

Is there a place where common things in the dev world like this are accumulated? For example, a list of all countries or list of the US states, for use with an HTML dropdown. I know there are various repos on Github that maintain these types of lists, such as English stop words, profanity word lists etc, but is there a service that accumulates these in a familiar, structured api?

majewsky · on Nov 16, 2018

Look at Wikipedia's lists of things. For your particular examples:

https://en.wikipedia.org/wiki/List_of_sovereign_states

https://en.wikipedia.org/wiki/U.S._state

Some of them are quite meta, such as https://en.wikipedia.org/wiki/List_of_lists_of_lists

For a more structured source, Wikidata aims to be that, but I cannot comment on its completeness.

JacobDotVI · on Nov 16, 2018

Often times instead of the list of US Sates, you actually want to list of US States and Territories:

https://en.wikipedia.org/wiki/List_of_states_and_territories...

For example when the intent is "list of place where the USPS ships" or "list of state-level political jurisdictions where US residents live"

sbr464 · on Nov 16, 2018

Let’s move this tidbit to a structered api of common knowledge! Dewey decimal for data, not just a generic search engine for datasets in different formats (like the recent google datasets site), but a familiar, goto resource.

Piskvorrr · on Nov 16, 2018

Um...why not https://www.wikidata.org/ ?

bpchaps · on Nov 17, 2018

Have you ever used wikidata? It's kind of a shitshow.

eli · on Nov 16, 2018

Yes! Surprisingly often I am unable to complete a form because there is no option in the State field for Washington, DC.

sbr464 · on Nov 16, 2018

Structured, maintained API though, not general knowledge. I personally see an issue that someone has to accumulate their own stash of structured data for common knowledge (random examples) like: countries, zip codes, valid HTML5 element names, css properties, hex colors, common naming prefix/suffixes/professional titles, etc. A growing list of work repeated by each dev team/company for really no reason. No complaint about this repo, at all, just seeking if a solution exists.

thinkalone · on Nov 16, 2018

There is Corpora: https://github.com/dariusk/corpora/tree/master/data

sedatk · on Nov 16, 2018

> Corpora is a collection of small files. It is not meant to be an exhaustive source of anything: a list of resources should contain somewhere in the vicinity of 1000 items.

sbr464 · on Nov 16, 2018

Thanks, will check it out.

darkerside · on Nov 16, 2018

Here's a platform specific example. https://github.com/SmileyChris/django-countries/

bennofs · on Nov 16, 2018

Wikidata has a SPARQL API though

bluntfang · on Nov 16, 2018

I've used faker [0] for stuff like this. I think originally a perl package, has similar packages in other languages as well. I've used the python implementation and enjoy it, along with it's localization feature.

It looks like 1.0 was just released as well :D

[0] https://github.com/joke2k/faker/releases

davinic · on Nov 16, 2018

SecLists (https://github.com/danielmiessler/SecLists) contains a wealth of security-related lists of this sort, including a useful section containing the most common passwords.

CSMastermind · on Nov 16, 2018

That's a cool idea. I've seen individual packages for things like US states and HTTP status codes but I don't think I've ever seen them all packaged together.

wastedhours · on Nov 16, 2018

Would it essentially be like a graph with multiple nested nodes with different strands of info?

ioulian · on Nov 16, 2018

Somebody needed to find strings to test his/her app with, saw your repo, found it interesting and posted it here.

About the repo: nice job, I've used it a lot when testing sites/apps I did, good job on providing different formats too so it's easy to automate testing!

systematical · on Nov 16, 2018

I imagine because it's useful and has a fun name. So when someone stumbles across it, they post it. I've seen it on here a number of times and I still upvote it...because its useful and has a fun name.

jacquesm · on Nov 16, 2018

It's a pretty useful list. I do wonder how many people actually end up having to rebuild their databases after running a test!

mmcclellan · on Nov 16, 2018

It sounds like the VMWare comment is most likely, but I thought I would share how I learned of the project just yesterday. There was a HN post yesterday about https://sr.ht/ and in looking at that I noticed the project used a blacklist of usernames that I thought was cool, so when I took a look at that project it had a link to this repo.

layog · on Nov 16, 2018

The current spike might be due to a recent post [1] on programming subreddit

[1] https://www.reddit.com/r/programming/comments/9xla2j/naughty...

minimaxir · on Nov 16, 2018

That Reddit post was made after this HN submission hit the top.

judah · on Nov 16, 2018

Thanks for this list! And I appreciate that the RTL naughty string contains Hebrew for the first line of Genesis 1. :-)

PhasmaFelis · on Nov 16, 2018

I would guess because it's a useful tool that gets shared whenever people think about these issues. It's a nice reminder that not everything needs to be regularly updated or promoted to be useful. :)

throwaway2048 · on Nov 16, 2018

Its the kind of thing that sticks in your mind when you think of weird things going wrong with string input or unicode rendering.

croo · on Nov 16, 2018

Because you have created something great and novelty.

I stumbled on it for the first time and already saved it for future testing. Thanks :)

andendau · on Nov 16, 2018

It's a great testing resource. Way more concrete than 'well just test all the different strings'

DonHopkins · on Nov 16, 2018

Have you gotten any interesting or offensive pull requests?