Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FWIW, I screen scraped rogerebert.com and copied all of his ratings and an excerpt of every review to letterboxd:

https://letterboxd.com/re2/

Just the great movies:

https://letterboxd.com/re2/tag/great-movie/films/by/release-...

You can then filter those by streaming service, but you need a pro account. Looks like 38 movies:

https://letterboxd.com/re2/tag/great-movie/films/on/amazon-p...

https://ibb.co/KFSj9jg

Apparently I missed the Buster Keaton movies:

https://www.rogerebert.com/reviews/great-movie-the-films-of-...

https://letterboxd.com/director/buster-keaton/

But that means 39 isn't quite right either since Ebert is saying all of Buster Keaton's films are great.

Anyway, the scraping was easy. The harder problem was parsing the html reviews (even with BeautifulSoup, the html is a mess), and then matching the reviews on Ebert's site to the correct movie, which I did via queries to tmdb and a lot of heuristics. There's nearly 8000 reviews and many have wrong years, bad titles, etc on rogerebert.com. It was a fun spare time project for a couple weeks.



Nice! Yeah, I wish letterboxd was free somehow without ads and they made their beta api public.

Yeah, I bet there's not a great standard for normalization/corrections of tiles, making a distinction of like when a movie was made and when a movie was released and translations and imports.

Good work.


I ended up using the director and cast that are listed for most reviews on Ebert's site for matching the right movie. Even that required some tricks due to spelling errors or differences in how names were listed. I then flagged any matches that weren't unique or where the title wasn't similar enough for me to manually review. I think I only had to double-check about a hundred or so.

I didn't use the letterboxd api. Instead, I generated csv files for the letterboxd importer. I then did a csv export from their site I could reconcile to look for import errors.

Trivia: Ebert reviewed a few adult films which I couldn't import to letterboxd because the site officially doesn't allow those.

BTW, it's only $19/year for an account. I have my own account I pay for which follows the re2 account. That way I can easily see any of the re2 reviews for movies in my own watchlist.


Yep, I caved and got an account last month too


I've added Buster Keaton's silent films from 1920-1929 since those are the ones Ebert is referencing in his review. That brings the total number of Great Movies available on Prime up to 43.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: