{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/tarabalakrishnan/.local/lib/python2.7/site-packages/IPython/config.py:13: ShimWarning: The `IPython.config` package has been deprecated. You should import from traitlets.config instead.\n",
" \"You should import from traitlets.config instead.\", ShimWarning)\n",
"/Users/tarabalakrishnan/.local/lib/python2.7/site-packages/IPython/utils/traitlets.py:5: UserWarning: IPython.utils.traitlets has moved to a top-level traitlets package.\n",
" warn(\"IPython.utils.traitlets has moved to a top-level traitlets package.\")\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n",
"Done.\n"
]
},
{
"data": {
"text/plain": [
"[]"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%load_ext sql\n",
"%sql sqlite:///complaint.db\n",
"import time\n",
"%sql drop index if exists helpful_index;\n",
"%sql analyze"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"Activity 12: Indexes\n",
"===========\n",
"\n",
"Let's play with the [consumer complaint database](https://catalog.data.gov/dataset/consumer-complaint-database) from data.gov"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n"
]
},
{
"data": {
"text/html": [
"
\n",
" \n",
" count(*) | \n",
"
\n",
" \n",
" 79468 | \n",
"
\n",
"
"
],
"text/plain": [
"[(79468,)]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql select count(*) from complaints;"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Done.\n"
]
},
{
"data": {
"text/html": [
"\n",
" \n",
" Date_received | \n",
" Product | \n",
" Subproduct | \n",
" Issue | \n",
" Subissue | \n",
" Consumer_narrative | \n",
" Company_public_response | \n",
" Company | \n",
" State | \n",
" ZIP_code | \n",
" Submitted_via | \n",
" Date_sent_to_company | \n",
" Company_response | \n",
" Timely_response | \n",
" Consumer_disputed | \n",
" Complaint_ID | \n",
"
\n",
" \n",
" 12/30/2015 | \n",
" Mortgage | \n",
" Other mortgage | \n",
" Loan servicing, payments, escrow account | \n",
" | \n",
" | \n",
" Company chooses not to provide a public response | \n",
" U.S. Bancorp | \n",
" TN | \n",
" 38138 | \n",
" | \n",
" N/A | \n",
" Referral | \n",
" 01/05/2016 | \n",
" Closed with explanation | \n",
" Yes | \n",
"
\n",
" \n",
" 12/12/2015 | \n",
" Mortgage | \n",
" Other mortgage | \n",
" Loan modification,collection,foreclosure | \n",
" | \n",
" | \n",
" Company chooses not to provide a public response | \n",
" Citibank | \n",
" NY | \n",
" 13021 | \n",
" | \n",
" N/A | \n",
" Referral | \n",
" 12/23/2015 | \n",
" Closed with explanation | \n",
" Yes | \n",
"
\n",
" \n",
" 12/02/2015 | \n",
" Mortgage | \n",
" Other mortgage | \n",
" Loan modification,collection,foreclosure | \n",
" | \n",
" | \n",
" | \n",
" Nationstar Mortgage | \n",
" MI | \n",
" 49102 | \n",
" | \n",
" N/A | \n",
" Referral | \n",
" 12/17/2015 | \n",
" Closed with explanation | \n",
" Yes | \n",
"
\n",
" \n",
" 12/02/2015 | \n",
" Bank account or service | \n",
" Other bank product/service | \n",
" Account opening, closing, or management | \n",
" | \n",
" | \n",
" Company chooses not to provide a public response | \n",
" Wells Fargo & Company | \n",
" | \n",
" | \n",
" | \n",
" N/A | \n",
" Referral | \n",
" 12/07/2015 | \n",
" Closed with explanation | \n",
" Yes | \n",
"
\n",
" \n",
" 12/22/2015 | \n",
" Mortgage | \n",
" Conventional fixed mortgage | \n",
" Loan servicing, payments, escrow account | \n",
" | \n",
" | \n",
" | \n",
" Nationstar Mortgage | \n",
" FL | \n",
" 33484 | \n",
" Older American | \n",
" Consent not provided | \n",
" Web | \n",
" 01/06/2016 | \n",
" Closed with explanation | \n",
" Yes | \n",
"
\n",
"
"
],
"text/plain": [
"[(u'12/30/2015', u'Mortgage', u'Other mortgage', u'Loan servicing, payments, escrow account', u'', u'', u'Company chooses not to provide a public response', u'U.S. Bancorp', u'TN', u'38138', u'', u'N/A', u'Referral', u'01/05/2016', u'Closed with explanation', u'Yes'),\n",
" (u'12/12/2015', u'Mortgage', u'Other mortgage', u'Loan modification,collection,foreclosure', u'', u'', u'Company chooses not to provide a public response', u'Citibank', u'NY', u'13021', u'', u'N/A', u'Referral', u'12/23/2015', u'Closed with explanation', u'Yes'),\n",
" (u'12/02/2015', u'Mortgage', u'Other mortgage', u'Loan modification,collection,foreclosure', u'', u'', u'', u'Nationstar Mortgage', u'MI', u'49102', u'', u'N/A', u'Referral', u'12/17/2015', u'Closed with explanation', u'Yes'),\n",
" (u'12/02/2015', u'Bank account or service', u'Other bank product/service', u'Account opening, closing, or management', u'', u'', u'Company chooses not to provide a public response', u'Wells Fargo & Company', u'', u'', u'', u'N/A', u'Referral', u'12/07/2015', u'Closed with explanation', u'Yes'),\n",
" (u'12/22/2015', u'Mortgage', u'Conventional fixed mortgage', u'Loan servicing, payments, escrow account', u'', u'', u'', u'Nationstar Mortgage', u'FL', u'33484', u'Older American', u'Consent not provided', u'Web', u'01/06/2016', u'Closed with explanation', u'Yes')]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%sql select * from complaints limit 5;"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 1: Query without an index\n",
"\n",
"First, let's start off by writing a query to find the **counts of the top 5 Product, State pairs** in the complaints database (return the product and state as well as the count). Use the single-line syntax for simple timing so we can see how long the query takes:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%time %sql # YOUR QUERY HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exerise 2: Single search key index\n",
"\n",
"Now create a _single-key_ index such that the above query is faster! The syntax to create an index in SQL is:\n",
"> DROP INDEX IF EXISTS index_name;\n",
"> CREATE INDEX index_name ON table(attributes);"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"# CREATE YOUR INDEX HERE"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%time %sql # TIME YOUR SAME QUERY HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 3\n",
"\n",
"Now, create a _covering_ index for the query and then see how long it takes to run!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"# CREATE YOUR INDEX HERE"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%time %sql # TIME YOUR SAME QUERY HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Exercise 3(b):\n",
"\n",
"Use EXPLAIN to see if sqlite used/recognized your covering index. EXPLAIN is an operator that tells SQL to explain its query plan... we'll look into this in more depth later. For now, the syntax is:\n",
"> EXPLAIN QUERY PLAN your_query_here;"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%sql\n",
"# EXPLAIN YOUR QUERY HERE"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}