This is one of those times that even if you don’t use a fully functional language, trying to make as much of your program logic pure functions would be helpful.
It also makes it more testable. Instead of putting the delete call right in the loop, split it into four functions.
function getAllVimeoVideos()
function getAllDbVideos()
function getVideosToDelete(vimeo_videos, db_videos)
function deleteVideos(videos_to_delete)
Your core logic lives in getVideosToDelete which is simply a set difference.
Given that there are only a few hundred videos, it is easy to run the getter functions above and quickly verify they are returning what you expect.
This was going to be my exact recommendation. By “separating the concerns”, you make it easier on my pretty much every dimension: testing in unit tests, doing a dry run in production, ability to read the code (you and code reviews), and in some cases your code will be written in a more functional way reducing variable scoping issues.
function is the first time I thought about time complexity in my job.
Say Foo and Bar have fields in common, such that you can say a Foo object "equals" or "matches to" a Bar object, like if they have name and dateOfBirth fields or something else that are the same (nothing like a common ID between the two). Now say there are some other fields too, like amountSpentThisYearOnDogFood that you know is always accurate for Bars, but might be out of date for Foos. How do you get the list of all the Foos to update?
Initially I did the nested for loop solution that's like
List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars)
{
List<Foo> returnList = new List<Foo>();
foreach (var foo in foos)
{
foreach (var bar in bars)
{
// check if "equal" or "matching" based on some criteria
// if equal, update foo dog food expenditure with bar dog food expenditure, add to returnList, and break
}
}
return returnList;
}
but that's O(n^2) right.
The solution with a Dictionary is obviously better. All you need to ensure is that you have a method for both the Foo and Bar classes that will produce the equivalent hash for both, if they would be considered equal or matching by whatever criteria you are using.
These two functions will return the same value if those fields are the same. So then you can do something like
List<Foo> getFoosToUpdate(List<Foo> foos, List<Bar> bars)
{
List<Foo> returnList = new List<Foo>();
Dictionary<int, Bar> barsByHash = new Dictionary<int, Bar>(bars.Count);
foreach (var bar in bars)
{
int barHash = GetHashOfBar(bar);
barsByHash[barHash] = bar;
}
foreach (var foo in foos)
{
int fooHash = GetHashOfFoo(foo);
if (barsByHash.ContainsKey(fooHash)
{
returnList.Add(foo.CopyWith(dogFoodExpenditure: barsByHash[fooHash].DogFoodExpenditure))
}
}
return returnList;
}
Which is faster cause you only have to go through the bars list once.
I actually messed up something like OP with this, but with doing undesired additions instead of undesired deletions.
You can think of it as having two endpoints, both expecting a .csv with rows being the things you were updating/changing/deleting.
The problem was, there was a column to indicate (with a character) whether the row was for an edit, or addition, or deletion, but this was only with one of these endpoints. For the other, there was only addition functionality, but I thought changes and deletions were also options for the other kind of .csv due to some unwise assumptions on my part (thinking that the other .csv would have the same options as the other). That's how we accidentally put in over 100 additions that should have been changes that had to be manually deleted. Luckily I had a list of all the mistaken additions.
It also makes it more testable. Instead of putting the delete call right in the loop, split it into four functions.
Your core logic lives in getVideosToDelete which is simply a set difference.Given that there are only a few hundred videos, it is easy to run the getter functions above and quickly verify they are returning what you expect.