{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/Users/tarabalakrishnan/.local/lib/python2.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated. You should import from traitlets.config instead.\n", " \"You should import from traitlets.config instead.\", ShimWarning)\n", "/Users/tarabalakrishnan/.local/lib/python2.7/site-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.\n", " warn(\"IPython.utils.traitlets has moved to a top-level traitlets package.\")\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Done.\n", "Done.\n" ] }, { "data": { "text/plain": [ "[]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%load_ext sql\n", "%sql sqlite:///complaint.db\n", "import time\n", "%sql drop index if exists helpful_index;\n", "%sql analyze" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Activity 12: Indexes\n", "===========\n", "\n", "Let's play with the [consumer complaint database](https://catalog.data.gov/dataset/consumer-complaint-database) from data.gov" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
count(*)
79468
" ], "text/plain": [ "[(79468,)]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql select count(*) from complaints;" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false, "slideshow": { "slide_type": "slide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Done.\n" ] }, { "data": { "text/html": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Date_receivedProductSubproductIssueSubissueConsumer_narrativeCompany_public_responseCompanyStateZIP_codeSubmitted_viaDate_sent_to_companyCompany_responseTimely_responseConsumer_disputedComplaint_ID
12/30/2015MortgageOther mortgageLoan servicing, payments, escrow accountCompany chooses not to provide a public responseU.S. BancorpTN38138N/AReferral01/05/2016Closed with explanationYes
12/12/2015MortgageOther mortgageLoan modification,collection,foreclosureCompany chooses not to provide a public responseCitibankNY13021N/AReferral12/23/2015Closed with explanationYes
12/02/2015MortgageOther mortgageLoan modification,collection,foreclosureNationstar MortgageMI49102N/AReferral12/17/2015Closed with explanationYes
12/02/2015Bank account or serviceOther bank product/serviceAccount opening, closing, or managementCompany chooses not to provide a public responseWells Fargo & CompanyN/AReferral12/07/2015Closed with explanationYes
12/22/2015MortgageConventional fixed mortgageLoan servicing, payments, escrow accountNationstar MortgageFL33484Older AmericanConsent not providedWeb01/06/2016Closed with explanationYes
" ], "text/plain": [ "[(u'12/30/2015', u'Mortgage', u'Other mortgage', u'Loan servicing, payments, escrow account', u'', u'', u'Company chooses not to provide a public response', u'U.S. Bancorp', u'TN', u'38138', u'', u'N/A', u'Referral', u'01/05/2016', u'Closed with explanation', u'Yes'),\n", " (u'12/12/2015', u'Mortgage', u'Other mortgage', u'Loan modification,collection,foreclosure', u'', u'', u'Company chooses not to provide a public response', u'Citibank', u'NY', u'13021', u'', u'N/A', u'Referral', u'12/23/2015', u'Closed with explanation', u'Yes'),\n", " (u'12/02/2015', u'Mortgage', u'Other mortgage', u'Loan modification,collection,foreclosure', u'', u'', u'', u'Nationstar Mortgage', u'MI', u'49102', u'', u'N/A', u'Referral', u'12/17/2015', u'Closed with explanation', u'Yes'),\n", " (u'12/02/2015', u'Bank account or service', u'Other bank product/service', u'Account opening, closing, or management', u'', u'', u'Company chooses not to provide a public response', u'Wells Fargo & Company', u'', u'', u'', u'N/A', u'Referral', u'12/07/2015', u'Closed with explanation', u'Yes'),\n", " (u'12/22/2015', u'Mortgage', u'Conventional fixed mortgage', u'Loan servicing, payments, escrow account', u'', u'', u'', u'Nationstar Mortgage', u'FL', u'33484', u'Older American', u'Consent not provided', u'Web', u'01/06/2016', u'Closed with explanation', u'Yes')]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%sql select * from complaints limit 5;" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 1: Query without an index\n", "\n", "First, let's start off by writing a query to find the **counts of the top 5 Product, State pairs** in the complaints database (return the product and state as well as the count). Use the single-line syntax for simple timing so we can see how long the query takes:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%time %sql # YOUR QUERY HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exerise 2: Single search key index\n", "\n", "Now create a _single-key_ index such that the above query is faster! The syntax to create an index in SQL is:\n", "> DROP INDEX IF EXISTS index_name;\n", "> CREATE INDEX index_name ON table(attributes);" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%sql\n", "# CREATE YOUR INDEX HERE" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%time %sql # TIME YOUR SAME QUERY HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3\n", "\n", "Now, create a _covering_ index for the query and then see how long it takes to run!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%sql\n", "# CREATE YOUR INDEX HERE" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%time %sql # TIME YOUR SAME QUERY HERE" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exercise 3(b):\n", "\n", "Use EXPLAIN to see if sqlite used/recognized your covering index. EXPLAIN is an operator that tells SQL to explain its query plan... we'll look into this in more depth later. For now, the syntax is:\n", "> EXPLAIN QUERY PLAN your_query_here;" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "%%sql\n", "# EXPLAIN YOUR QUERY HERE" ] } ], "metadata": { "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.10" } }, "nbformat": 4, "nbformat_minor": 0 }