Wikipedia Crawler

GitHub

For this project we had to implement Breadth-First-Search to construct a directed graph of wikipedia pages based off of a seed URL. Since this project was graded partially on Big O time of our algorithms, we used recursion, BFS, hashmaps, and directed graphs to improve the run-time of our application. The goal of this project was to gain experience with choosing and implementing data structures that will give the most optimal run-time for the WikiCrawler.

The algorithms we had to implement were influence algorithms to determine which web pages had the most nodes connected to other web pages on Wikipedia. We used degree greedy influence, modular greedy influence, and sub-modular greedy influence.

In this project I was the lead developer and my project partner was in charge of documentation and reporting.

Google Sites

Report abuse