First you should optimize performance locally. By choosing an algorithm and impl...

First you should optimize performance locally. By choosing an algorithm and implementation that gives you the best results at data size n you need to deal with most. Lowest O complexity algorithm is not necessarily the best. Worst cases and especially pathological cases also matter.

Then you need to deal with distribution between a large number of systems over network... This will likely force you to choose a different algorithm, but at least you'll have something the benchmark it against from local best case scenario.

When it comes to graphs, I think good partitioning is very important in order to minimize average case latency. So it'd greatly help if you can somehow partition the data to minimize the number of nodes referring to a graph node on a remote system. When walking the graph, instead of immediately sending a query to the target system, it's probably better to batch the queries. Or not if latency counts more. Either way, latency will hurt badly. On a local system it's 100 ns. When the data is on a remote system, you're probably talking about 1000000 ns.

To combat latency, maybe the best way is not to send the request in the first place? Maybe you can ensure nodes referring to a remote system are duplicated to some extent on multiple remote systems. It could help performance -- or hurt it more, like when you need to modify the nodes.

Maybe you should have counters on nodes with connections to nodes on remote systems. If some remote system is referring to it more than local system, migrate the node there instead. It's of course not hard to see how this strategy could blow up. :-)

Please don't take above ideas seriously if you're developing a graph processing system. I wrote those to illustrate the biggest basic problem of distributed systems, latency. Distributed systems are very hard.