{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "import io_backend_v3 as io" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note: this notebook requires `io_backend.py` and `compModel.js` to be in the same directory**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sequential Flooding: An example of better buffer management for joins\n", "------------------------------------------------------------------\n", "\n", "Recall: A _buffer manager_ is a system to handle operations supporting the buffer\n", "* For example, a buffer manager should handle **eviction** of pages from the buffer when space needs to be freed up for reading in new pages.\n", "\n", "In general the OS has a built-in buffer manager; however, most DBMSs implement their own. *Why?*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Key concept:** Often, a DBMS can potentially implement *more efficient* buffer management since it knows what operations it is carrying out.\n", "\n", "## For Example: When Performing a Join!\n", "\n", "We'll observe a phenomenon known as **sequential flooding** that occurs when a buffer manager uses a naive default eviction policy while looping over data, e.g. in conditions that would occur when a DBMS was performing a join!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1: NLJ in Python\n", "\n", "Just to get warmed up, write a nested loop join in python, that takes as inputs **two lists of dictionaries**, and joins them on the attribute `A`, returning tuples which have attribute `A` and the union of the other attributes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# SAMPLE INPUT\n", "R = [{'A':0,'B':1},{'A':1,'B':1},{'A':2,'B':1}]\n", "S = [{'A':0,'C':3},{'A':3,'C':0},{'A':0,'C':5}]\n", "\n", "# Your join algorithm here\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 2: LRU eviction & sequential flooding\n", "\n", "In general, most OS buffer managers will have a **least recently used (LRU)** eviction policy, which means that the least recently used page\\* will be evicted if more space is needed in the buffer.\n", "\n", "\\* *[Note: here we define \"used\" to mean any sort of action involving the page, for simplicty!]*\n", "\n", "Let's see what happens under this policy when we have a buffer of size $B$ and want to loop over $B+1$ pages $M$ times:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# Create pages & flush to disk\n", "# NOTE: set display mark so that we don't bother displaying this\n", "def init_pages(b, n_pages):\n", " file_id = b.new_file()\n", " for i in range(n_pages):\n", " page = b.new_page(file_id)\n", " b.flush(page)\n", " b.display_set_mark()\n", " return file_id\n", "\n", "# Loop through one iteration of the file, highlighting the LRU/MRU page\n", "def get_next_page(b, fid, pid, eviction_method='LRU'):\n", " try:\n", " page = b.read(fid, pid)\n", " except io.BufferMemoryException:\n", " old_page, old_buffer_idx = b.get_buffer_page(eviction_method)\n", " b.release(old_page)\n", " page = b.read(fid, pid)\n", " return page" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will do the following $M$ times:\n", "* For $i$ in range $B+1$:\n", " * If page $i$ is already in buffer, read from buffer (*fast!*)\n", " * Else:\n", " * If buffer **is full**:\n", " * **Evict**- i.e. flush to disk- the LRU page (*slow!*)\n", " * Read page $i$ from disk -> buffer (*slow!*), then read from buffer" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "EVICTION_METHOD = 'LRU'\n", "BUFFER_SIZE = 3\n", "N_PAGES = BUFFER_SIZE + 1\n", "M = 3\n", "\n", "# Initialize buffer\n", "b = io.Buffer(buffer_size=BUFFER_SIZE, page_size=1, buffer_queue_indicator=EVICTION_METHOD)\n", "file_id = init_pages(b, N_PAGES)\n", "\n", "# Do M ordered passes over the N pages\n", "for i in range(M):\n", " for pid in range(N_PAGES):\n", " page = get_next_page(b, file_id, pid, eviction_method=EVICTION_METHOD)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Can you see what happens?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Visualize what we just did\n", "b.display(speed=1000)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For general $M$ and $N>B$, what is the IO cost here? And why is this the case?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3: A better replacement policy?\n", "\n", "Can you think of a better replacement policy? It should be comparabley simple, but lead to better IO cost for our scenario..." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 4: A better IO Cost...\n", "\n", "With this new replacement policy, what is the IO cost for $M$ and $N>B$ now?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "EVICTION_METHOD = # YOUR BETTER POLICY NAME HERE?\n", "BUFFER_SIZE = 3\n", "N_PAGES = BUFFER_SIZE + 1\n", "M = 3\n", "\n", "# Initialize buffer\n", "b = io.Buffer(buffer_size=BUFFER_SIZE, page_size=1, buffer_queue_indicator=EVICTION_METHOD)\n", "file_id = init_pages(b, N_PAGES)\n", "\n", "# Do M ordered passes over the N pages\n", "for i in range(M):\n", " for pid in range(N_PAGES):\n", " page = get_next_page(b, file_id, pid, eviction_method=EVICTION_METHOD)\n", " \n", "# Visualize what we just did- note that a different buffer_num must be specified so it displays in own pane!\n", "b.display(speed=1000, buffer_num=1)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 1 }