Homework
Homework
Assignment 0: Hello World
For this assignment, you will implement "Hello World" in Java. The real purpose, of course, is to set up the environment you will need for this class, and to practice the various steps, such as GitHub commits and Gradescope submissions.
Assignment 1: Static Web Server
The goal of this assignment is to build a simple HTTP server that serves static files from a given directory. An enhanced version of this server will later be used as the frontend of your search engine.
Assignment 2: Dynamic Web Server
In this assignment, you will extend your static web server from HW1 with support for dynamic content and routes. The API will be based on the API from Spark Framework.
Assignment 3: HTTPS Server on the Cloud
For this homework, you will add support for sessions and HTTPS to your web server from HW2, and you will deploy it on an Amazon EC2 instance.
Assignment 4: In-Memory Key-Value Store
The goal of this hoemwork is to implement a simple distributed key-value store, which will build on your web server from the previous three assignments.
Assignment 5: Key-Value Store with Persistence
For this assignment, you will extend the key-value store from HW4 with persistence, as well as a nice user interface and some additional API functions.
Assignment 6: Analytics Engine
For this assignment, you will build a simple distributed analytics engine called Flame that is loosely based on Apache Spark. Flame will be able to work with large data sets (RDDs) that are spread across several nodes, and it will support some basic operations on these data sets.
Assignment 7: Enhanced Analytics Engine
The goal of this homework is extend your solution in HW6 with some important operations, such as join, fold, and distinct. These operations will be needed to support certain tasks that you will implement for the project, including, e.g., for crawling, for indexing, and for computing PageRank.
Assignment 8: Distributed Web Crawler
The goal of this homework is to build a simple distributed web crawler, based on your Flame engine from HW6+HW7 and the KVS from HW4+HW5. The crawler should be able to follow redirects, and it should implement the robot exclusion protocol.
Assignment 9: Indexer and PageRank
In this final assignment, you will build a simple indexer and an implementation of the PageRank algorithm, both will take as inputs the results of the crawler you built for HW8. These will be the two last building blocks of the project.
Final Project: Cloud-Based Search Engine
Over the course of the homework assignments, you have developed (and will develop) many of the components that a modern search engine would use: a web server (HW1–3), a key-value store (HW4+5), an analytics engine (HW6+7), a crawler (HW8), and a simple indexer and PageRank (HW9). The goal of the final project is to integrate all of these components, as well as to extend them with a few components, to build a complete web search engine that runs on the Cloud.
