Large Scale Graph Mining on GPU


Project maintained by hanzhou and srw

By Han Zhou (hanzhou) and Shannon Williams (srw)

Main Page

Progress So Far and Preliminary Results(4/20)

So far, we have found sample graphs at http://snap.stanford.edu/data/index.html, which we have converted into a usable format. We wrote and ran a preliminary, functional program for multiplication on the GPU. Thereafter, we made optimizations to the code. Additional details of the code and results are summarized here

Concerns

At this point, we have solved most of the logistical concerns for this project. We have access to all of the necessary resources (AWS and Latedays) and as well as the test data, which we are getting from the Stanford Large Network Dataset Collection. Beyond this, the only difficult challenge remaining is to finish coding our GPU-optimized code. Since our project is primarily focused on comparative analysis, we will optimize our code as much as possible in the remaining weeks, and then perform analysis using our best results.

It was mentioned before in the proposal, but we have still not addressed how to scale up GPU to large datasets. This remains a huge unknown. However, the primary purpose of our project is, again, analysis. We aim to determine if it is possible to scale up GPU usage to large datasets efficiently. Therefore, if we are not able to reach a sufficiently fast speed-up, we will discuss our approaches and try to analyse why they failed. Hence, even if we cannot solve this hurdle, we should still be able to produce worthwhile results. Of course, we will try our best to produce good results on the GPU.

Rather than resources and code troubles, our biggest difficulties during the first half of this project has been coordination and time management. Both members of the group were unexpectedly busy during the past two weeks, and did not properly prioritize this project, so we fell substantially behind schedule. We still should be able to meet our goals, but in the second half of this project we hope to improve our efficiency. To this end, we will try to meet more frequently, at least once-twice a week in person, to increase accountability.

Goals and Deliverables

PLAN TO ACHIEVE

We previously planned to achieve 3 points:

We are well on our way to achieving the first two points. We have already completed a functional GPU version of multiplication which should easily extend to PageRank and several other graph algorithms. Also, we have begun preparations for analysis of our other two systems, which should not take substantially long to complete.

However, saying this, we are substantially behind the anticipated schedule. We have not even started on the third bullet point. We still have 3 weeks remaining, which should be sufficient for completing the third goal, but it is unlikely that we will have additional times for our "hope to achieve" goals unless the effort we put into the second half substantially exceeds that of the first half.

DEMO

At our demo, we plan to present several graphs comparing the runtime of our 3 algorithms on multiple input graphs for multiple graph mining algorithms. This should clearly show when each algorithm is superior to the others.

We probably won’t show a demo of our programs actually running, because it wouldn’t look very interesting, and it might take a while to run.

Schedule

IN REVIEW

Week 0: (March 30 - April 5)

Week 1: (April 6 - April 12)

Week 2: (April 13 - April 19)

PLANNED SCHEDULE

Week 3: (April 20 - April 27)

Week 4: (April 27 - May 3)

Week 5: (May 3 - May 10)

Having trouble with Pages? Check out the documentation at https://help.github.com/pages or contact support@github.com and we’ll help you sort it out.