Key-Value Store with Persistence
Key-Value Store with Persistence
For this assignment, you will extend the key-value store from HW4 with a way to store data on disk, as well as a nice user interface and some additional API functions. These will help to get the key-value store ready for the analytics system we will build in the following assignments. Storing the data on disk makes sense in some situations, but not in others. For instance, if the key-value store is used to store valuable data that cannot be reconstructed if it is lost, the data clearly should be on disk; if it is used as a cache or to store some intermediate result, it probably should not be on disk, since the disk I/O will slow things down. To support both use cases, we will have two kinds of tables: persistent tables and in-memory tables. To distinguish between the two, we will give the former names that start with pt-, for “persistent table”. A simple way to store data on disk is to keep each Row in a separate file in a certain directory, with the row key as the file name. (There are far more sophisticated ways, but this will do for our purposes.) A small complication is that most operating systems permit only certain characters in a file name, and that this set differs between operating systems. To avoid problems with this, we will use a special encoding that translates a given key to a “safe” file name and back. We will give you a helper class that takes care of this. As in the earlier assignments, please do use Google, answers from the discussion group, the Java API reference and, if necessary, a good Java book to solve simple problems on your own. If none of these steps solve your problem, please post on the discussion group, and we will be happy to help!
Requirements
Please start by downloading the HW5 package from http://cis5550.net/hw5.zip. This contains a README file with a short questionnaire and an Eclipse project definition, which you can ignore if you are not using Eclipse. Your solution must meet the same requirements as in HW4, with the following additions:
Persistent tables: When the name of a table starts with pt-, the data in this table should be stored entirely on disk; your workers should not maintain any in-memory state for such tables. When data is written to such a table for the first time, the worker should create a subdirectory within its storage directory; using the name of the table as the directory name. This subdirectory should contain a separate file for each row that exists in the table; the name of this file should be the result of tools.KeyEncoder.encode(key), where key is the row key, and the file should contain the output of Row.toByteArray().
Whole-row read: The workers should support a GET route for /data/XXX/YYY, where XXX is a table name and YYY is a row key. If table XXX exists and contains a row with key YYY, the worker should serialize this row using Row.toByteArray() and send it back in the body of the response. If the table does not exist or does not contain a row with the relevant key, it should return a 404 error code.
Streaming read: The workers should support a GET route for /data/XXX, where XXX is a table name. When this route is invoked, the worker should iterate over all the local entries in table XXX, serialize each entry with Row.toByteArray() and then send the entries back, each followed by a LF character (ASCII code 10). After the last entry, there should be another LF character to indicate the end of the stream. If the HTTP request contains query parameters called startRow and/or endRowExclusive, the worker should compare the key of each row to the value(s) of these parameter(s), and send the row only if 1) startRow is either absent or the row key is equal to, or higher than, the value of startRow; and 2) endRowExclusive is either absent or the row key is smaller than the value of endRowExclusive. For example, if the table has rows with keys A, C, M, and Q, and the query parameters are startRow=B and endRowExclusive=Q, you should return the rows with keys C and M. If no table with the name XXX exists on the local node, the request should return a 404 error code.
Rename: The worker should have a PUT route for /rename/XXX. When this route is invoked, the body will contain another name YYY, and the worker should rename table XXX to YYY (along with the corresponding subdirectory on disk, if it is a persistent table). The worker should return a 404 status if table XXX is not found, a 409 status if table YYY already exists, and a 400 status if XXX starts with pt- but YYY does not, or vice versa. If the rename succeds, the worker should return a 200 status and the word OK.
Delete: The worker should have a PUT route for /delete/XXX. When this route is invoked and a table with the name XXX exists, the worker should delete it; if the table is persistent, the worker should also delete the corresponding directory, as well as all the files in it. It should then return a 200 status and the word OK. If no such table exists, the worker should return a 404 status.
List of tables: The worker should have a GET route for /tables. This should return the names of all the tables the worker knows about, including any persistent tables. There should be a LF character after each table name. The content type should be text/plain.
Row count: The worker should support a GET route for /count/XXX, where XXX is a table name. If a table with this name exists, the body of the response should contain the number of rows in that table (as an ASCII string); otherwise, you should send a 404 error code.
User interface: The worker should support GET routes for / and /view/XXX, where XXX is a table name. Both routes should return HTML pages (content type text/html). The first route should return a HTML table with a row for each data table on the worker; each row should contain a) the name of the table – say, XXX – with a hyperlink to /view/XXX, and b) the number of keys in the table. The second route should return a HTML page that includes the name of the table and 10 rows of data; it should have one HTML row for each row of data, one column for the row key, and one column for each column name that appears at least once in those 10 rows. The cells should contain the values in the relevant rows and columns, or be empty if the row does not contain a column with that name. The rows and columns should be sorted by the row and column key, respectively. If the data table contains more than 10 rows, the route should display the first 10, and there should be a “Next” link at the end of the table that displays another table with the next 10 rows. You may add query parameters to the route for this purpose.
Compatibility: Your solution must work with the unmodified reference implementation in lib/webserver.jar, which we will use for grading.
Packaging: Your solution should be in a directory called HW5, which should contain 1) the README file from the HW5 package, with all the fields filled in; 2) the file webserver.jar in a subdirectory called lib; and 3) a subdirectory called src with all of your source code, including the files we provided, and in the directory structure Java expects (with subdirectories for packages). Your solution must compile without errors if, from the HW5 folder, you run javac -cp lib/webserver.jar --source-path src src/cis5550/kvs/Coordinator.java and javac -cp lib/webserver.jar --source-path src src/cis5550/kvs/Worker.java. Please do try this before you submit! Submissions that fail this basic check will receive a zero.
Suggested approach
We suggest that you use the steps below to solve this assignment; however, feel free to use a different approach, or to ignore this section entirely. We will give credit for your solution, as long as it meets the requirements above. The HW5 package does not contain a test suite; instead, you should use the small KVS client (cis5550.kvs.KVSClient) from HW4. However, please keep in mind that all the features from Section 2 are required, not just the ones that can be invoked by the client! Step #1: Copy over your HW4 code. You can simply cut and paste your code from HW4 into the HW5 project. Be sure to include the web server library (from the lib directory) as well! Step #2: Add the list of tables. The idea of the user interface is to make it easier to see what is going on, both in HW5 and in future assignments that build on it. So this is a good place to start. Add the GET route for / first, and return a little HTML page with the required rows. Now the tablist test case should work. Step #3: Add the table viewer. Next, add the /view/XXX route, initially without pagination. As in HW4, the table name (here XXX) should be a named parameter. Use KVSClient to insert a couple of rows into the test table, ideally with different columns. Then open http://localhost:8081/, click on the link for the test table (which should take you to http://localhost:8081/view/test), and see whether the rows and columns appear properly. Also, the tabview test case should now work. Step #4: Add writing to disk. Now add a way to write the data in persistent tables to disk. Don’t worry about reading the data for now; just create a new subdirectory within the storage directory when a persistent table is first written to, and write or update a file there whenever something is added or changed. If you followed our recommendations on the HW4 handout, you should already have a putRow helper function; you can just add a write there if the table is persistent. Notice that the Row object already contains a toByteArray() method for serialization. To test, use KVSClient’s put command to generate some writes to a table whose name starts with pt-; be sure to both insert new rows and to change existing ones. You can then inspect the files inside the storage directory and check that the data was written correctly. If all goes well, the wdisk test should now work. Step #5: Add reading from disk. Next, change your getRow function so that, in the case of a persistent table, it reads the actual row from the disk. The rdisk and putget2 test cases should now work. Step #6: Add whole-row and streaming read. Now, add the GET routes for /data/XXX/YYY and /data/XXX. The former should be fairly straightforward, and you can use the readrow test case to test it, or simply open the URL directly in your browser (say, http://localhost:8001/data/test/somerow); this should produce a readable representation of the data in that row. For the streaming read, use the write() method in the response object. Don’t try to materialize the entire response in memory and then sending it as a whole – this would consume a lot of memory and could cause problems later, when you work with a large amount of data. Also, don’t forget to support the startRow and endRowExclusive query parameters. Again, test with the rstream test case and/or by opening the URL (say, http://localhost:8001/data/test) directly in your browser; you should see a response with one row of text for each row of data in the table. If you see only a single giant row, you probably forgot to insert the LFs between the lines, or you are using the HTML content type (try using text/plain).
Step #7: Add the row count. Next, add the GET route for /count/XXX. For the in-memory tables, this should be fairly straightforward; for the persistent tables, you will need to count the files in the corresponding directory. Now the count test case should work. Step #8: Add renaming and deletion. Now add the PUT route for /rename/XXX and /delete/XXX. The former shouldn’t be too difficult; for instance, you can use Files.move. For the latter, you’ll first need to delete all of the files within the directory. The rename and delete test cases should now work. Step #9: Add pagination. Finally, make the following three changes the table viewer. First, show the rows sorted by key. Second, add a query parameter (say, fromRow) that can be used to specify where the table should start; when that parameter is present, skip all rows whose keys are smaller than the specified value. And finally, once you’ve displayed ten rows of data, check whether there is another key after that; if so, add a “Next” link at the bottom of the page, with the next key as the value of the fromRow parameter. To test, you can use KVSClient to upload a resonably large amount of data (at least 25-30 rows) and then verify that the viewer can display all of them properly. Make sure that the keys are sorted, and that the overall count is correct; it’s easy to accidentally skip a row between pages. The pages test case should now work as well. Final step: Double-check the requirements! During initial development, it makes sense to focus on passing the tests; however, please keep in mind that the test cases are not meant to be exhaustive! Once you’ve finished the above steps, we strongly suggest that you read over the requirements from Section 2 again and make sure that you haven’t overlooked anything.
Extra credit
If you like, you can implement the following additional features for some extra credit. If you do, please indicate this in the README file you submit!
Subdirectories (+5 points): Some file systems do not have very good support for directories with a very large number of files – for instance, operations on such directories can become very slow. To support such file systems, change your implementation such that any file with a name of six characters or more (after encoding with KeyEncoder.encode) is stored in a subdirectory called __xx, where xx are the first two letters of the file name. (KeyEncoder.encode will not return consecutive underscores, so you do not have to worry about name clashes.) Thus, a row with key defghi in a table abc would be stored as abc/__de/defghi, but a row with key xyz would be stored as abc/xyz, as before. This will limit the size of each individual subdirectory.
Replication (+5 points): Have the workers download the current list of workers from the coordinator every five seconds. Whenever a worker that is currently responsible for a given key receives a PUT for that key, it should forward the PUT to the two workers with the next-lower IDs (wrapping around if necessary)
Replica maintenance (+5 points): Add two new operations: one that returns the current list of tables, and another that is analogous to the streaming read but returns only the row keys and a hash of each row. Then, every 30 seconds, have each worker invoke the first operation on the two workers with the next-higher IDs (wrapping around if necessary), and, for each table that is returned, invoke the second operation on that table, using the key range for which the other worker would be responsible. Then, whenever the worker finds a row it does not have locally, or a row that has a different hash on the other worker than the corresponding local row, have the worker download that row and add it locally (or replace its local copy with it). That way, if a new worker joins, it will automatically acquire all the data it is supposed to replicate. This EC is only available if the “replication” EC is also implemented.
