Dev notes: Finding all paths in graph using Apache Spark

вторник, 18 августа 2015 г.

Finding all paths in graph using Apache Spark

Problem

You have a huge graph and you need to find all paths from particular vertices.

Apache Spark has great module GraphX to work with big distributed graphs. But in my opinion it is well suitable for numeric computation based on graph structure such as PageRank. When you need to discover some path GraphX becomes too slow, because of amount of data that is need to be delivered between spark executors.

Idea

So imagine we have our graph in form of pairs (a, b) - means oriented edge from a to b.
And we have RDD with particular vertices which paths we need to find.

The idea of algorithm is simple: we will make iterative joins to get all paths in our graph. Ok lets find all paths from startVertices with length 1

RDD initStep now contains all paths with length 1 from given vertices.
So far so good. As I said before algorithm has iterative nature(so it is very good for Spark). Let`s create recursive function to find all paths
Here we use cogroup instead of join because we do not need to store join key and join in Spark is just a special cogroup. Method count will trigger computation on our RDD and also is a marker for stepOver to stop.

Now we can reduce all paths from startVertices

P.S.

This naive solution has several problems that need to be resolved in real life applications

Your graph can have cycles - method stepOver will never ends. To resolve this issue, you can use BloomFilter data structure - it will accumulate all previously "visited" vertices in stepOver method and filter them in cogroup operation. I prefer twitter-algebird library and it`s BloomFilter implementation(see https://github.com/twitter/algebird)

6 комментариев:

FAYSAL METİN26 июля 2021 г. в 15:42
kayseriescortu.com - alacam.org - xescortun.com
ОтветитьУдалить
Ответы
Lafay Tech Plaza30 августа 2021 г. в 21:03
Whether it is for your professional or personal life, on the web or in the real world, there are many situations that might benefit fromgeospatial analyticsThese days, it seems that everyone is using it for some reason or another. For example, did you know that geospatial analytics can be used to improve home security? You simply create a map that shows the proximity of a neighborhood's homes to other homes in the area and you can determine if you live in an area that has more crime than others do. You might also find that geospatial analytics can help you improve your diet and exercise program.
ОтветитьУдалить
Ответы
Анонимный17 мая 2022 г. в 05:50
TÜL PERDE MODELLERİ
Numara onay
mobil ödeme bozdurma
nft nasıl alınır
Ankara evden eve nakliyat
Trafik Sigortası
dedektör
Web Sitesi Kurmak
ask kitaplari
ОтветитьУдалить
Ответы
Анонимный31 мая 2022 г. в 03:37
smm panel
SMM PANEL
iş ilanları
instagram takipçi satın al
hirdavatciburada.com
beyazesyateknikservisi.com.tr
Servis
Tiktok hile indir
ОтветитьУдалить
Ответы
Анонимный4 июня 2022 г. в 02:05
çekmeköy lg klima servisi
ataşehir lg klima servisi
pendik lg klima servisi
pendik alarko carrier klima servisi
pendik daikin klima servisi
ataşehir daikin klima servisi
maltepe toshiba klima servisi
kadıköy toshiba klima servisi
ataşehir arçelik klima servisi
ОтветитьУдалить
Ответы
Анонимный28 июня 2022 г. в 21:51
en son çıkan perde modelleri
minecraft premium
yurtdışı kargo
uc satın al
özel ambulans
en son çıkan perde modelleri
nft nasıl alınır
lisans satın al
ОтветитьУдалить
Ответы

Добавить комментарий

вторник, 18 августа 2015 г.