{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "source": [
    "NOTE:\n",
    "-----\n",
    "\n",
    "진행하기 전에 다음의 셀을 실행시키고 가자.\n",
    "\n",
    "Please run the below cells first before proceeding- you'll need them soon!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Connected: @dataset_1.db'"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%load_ext sql\n",
    "%sql sqlite:///dataset_1.db"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "활동 3b\n",
    "------------\n",
    "집계 연산자, GROUP BY\n",
    "\n",
    "Aggregation operators, GROUP BY"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "문제 #1\n",
    "-----------\n",
    "\n",
    "베이글 스타트업 산업에 사용된 다음의 테이블들을 살펴보자. 먼저 두 개의 테이블을 먼저 살펴 볼 텐데, `bagel`은 베이글 회사가 생산하는 베이글의 종류를 나타낸다:\n",
    "\n",
    "Consider a set of tables that describe the up-and-coming bagel startup industry; for now let's just look at two tables here, `bagel`, which describes types of bagels made by the different bagel companies:\n",
    "\n",
    "\n",
    "> * name STRING\n",
    "> * price FLOAT\n",
    "> * made_by STRING\n",
    "\n",
    "그리고 `purchase`:\n",
    "\n",
    "And `purchase`:\n",
    "\n",
    "> * bagel_name STRING\n",
    "> * franchise STRING\n",
    "> * date INT\n",
    "> * quantity INT\n",
    "> * purchaser_age INT\n",
    "\n",
    "`purchase.bagel_name` 는  `bagel.name`을 참조하고 `purchase.franchise` 는 `bagel.made_by` 을 참조한다:\n",
    "\n",
    "Where `purchase.bagel_name` references `bagel.name` and `purchase.franchise` references `bagel.made_by`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * sqlite:///dataset_1.db\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>name</th>\n",
       "        <th>price</th>\n",
       "        <th>made_by</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Plain with shmear</td>\n",
       "        <td>1.99</td>\n",
       "        <td>Bobs Bagels</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Egg with shmear</td>\n",
       "        <td>2.39</td>\n",
       "        <td>Bobs Bagels</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>eBagel Drinkable Bagel</td>\n",
       "        <td>27.99</td>\n",
       "        <td>eBagel</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[('Plain with shmear', 1.99, 'Bobs Bagels'),\n",
       " ('Egg with shmear', 2.39, 'Bobs Bagels'),\n",
       " ('eBagel Drinkable Bagel', 27.99, 'eBagel')]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT * FROM bagel LIMIT 3;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * sqlite:///dataset_1.db\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>bagel_name</th>\n",
       "        <th>franchise</th>\n",
       "        <th>date</th>\n",
       "        <th>quantity</th>\n",
       "        <th>purchaser_age</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Plain with shmear</td>\n",
       "        <td>Bobs Bagels</td>\n",
       "        <td>1</td>\n",
       "        <td>12</td>\n",
       "        <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Egg with shmear</td>\n",
       "        <td>Bobs Bagels</td>\n",
       "        <td>2</td>\n",
       "        <td>6</td>\n",
       "        <td>47</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Plain with shmear</td>\n",
       "        <td>BAGEL CORP</td>\n",
       "        <td>2</td>\n",
       "        <td>12</td>\n",
       "        <td>24</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[('Plain with shmear', 'Bobs Bagels', 1, 12, 28),\n",
       " ('Egg with shmear', 'Bobs Bagels', 2, 6, 47),\n",
       " ('Plain with shmear', 'BAGEL CORP', 2, 12, 24)]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT * FROM purchase LIMIT 3;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**구매자의 평균 나이가 18세 이상인** 베이글의 종류에 대한 _총수입_을 계산하는 질의문을 작성해보자. 질의문을 아래에 작성해보자:\n",
    "\n",
    "Can you write a query to get the _total revenue_ for each bagel type **which had an average purchaser age over 18**?  Type your query below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * sqlite:///dataset_1.db\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>name</th>\n",
       "        <th>Revenue</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Egg with shmear</td>\n",
       "        <td>14.34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>Plain with shmear</td>\n",
       "        <td>84.50999999999999</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[('Egg with shmear', 14.34), ('Plain with shmear', 84.50999999999999)]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"\n",
    "예상 되는 결과는 다음과 같다.\n",
    "\n",
    "이 셀을 다시 실행하지 말자. 다시 실행하면 결과는 사라진다.\n",
    "\"\"\"\n",
    "\n",
    "\"\"\"\n",
    "Expected output below\n",
    "\n",
    "Don't re-execute this cell!\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "문제 #2\n",
    "-----------\n",
    "\n",
    "이번에는 간소화된 버전의 `precipitation_full` 테이블을 사용할 것이다. 스키마는 다음과 같고, _CA_ 지역의 하루 강우량만 기록하고 있다:\n",
    "\n",
    "Here we'll use a simplified version of the `precipitation_full` table, which just has _daily_ rainfall _in CA only_, and has the following schema:\n",
    "\n",
    "\n",
    "> * station_id\n",
    "> * day\n",
    "> * precipitation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      " * sqlite:///dataset_1.db\n",
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "        <th>day</th>\n",
       "        <th>precipitation</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>16102</td>\n",
       "        <td>1</td>\n",
       "        <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>16102</td>\n",
       "        <td>4</td>\n",
       "        <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>16102</td>\n",
       "        <td>24</td>\n",
       "        <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>21201</td>\n",
       "        <td>1</td>\n",
       "        <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>21201</td>\n",
       "        <td>20</td>\n",
       "        <td>10</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(16102, 1, 10),\n",
       " (16102, 4, 10),\n",
       " (16102, 24, 30),\n",
       " (21201, 1, 0),\n",
       " (21201, 20, 10)]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%sql SELECT * FROM precipitation LIMIT 5;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "평균 침전물이 75 이상인 station_ids 를 구하려고 한다. 먼저, 중첩된 질의문으로 작성해보자:\n",
    "\n",
    "We want to get station_ids which have average precipitations > 75.  Try doing this first as a nested query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%sql\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>88302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>250002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>335701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>357302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>488301</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(88302,), (250002,), (335701,), (357302,), (488301,)]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"\n",
    "예상 되는 결과는 다음과 같다.\n",
    "\n",
    "이 셀을 다시 실행하지 말자. 다시 실행하면 결과는 사라진다.\n",
    "\"\"\"\n",
    "\n",
    "\"\"\"\n",
    "Expected output below\n",
    "\n",
    "Don't re-execute this cell!\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "이 번에는 GROUP BY를 사용하여 작성해보자:\n",
    "\n",
    "Now, try re-writing as a GROUP BY:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "%%sql\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>88302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>250002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>335701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>357302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>488301</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(88302,), (250002,), (335701,), (357302,), (488301,)]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"\n",
    "예상 되는 결과는 다음과 같다.\n",
    "\n",
    "이 셀을 다시 실행하지 말자. 다시 실행하면 결과는 사라진다.\n",
    "\"\"\"\n",
    "\n",
    "\"\"\"\n",
    "Expected output below\n",
    "\n",
    "Don't re-execute this cell!\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "실행 결과 계산 시간을 `%time`으로 측정해보자. `%time` 명령 다음에 질의문을 한 줄로 표현하여 실행하여 보자. 갑갑해 보이지만, 그래도 동작한다. 그리고 실행 결과들을 비교해보자!\n",
    "\n",
    "Now time it by using `%time` followed by single-line versions of your queries above (clunky, but will work) to see how they compare!\n",
    "\n",
    "\n",
    "**Note:**   아래는 임의의 질의문의 측정 결과이다. 앞에서 작성을 아래 문장과 치환하여 성능을 비교해보자.\n",
    "\n",
    "**Note:** Yes, currently the answers are filled in below for convenience... but you should still try getting them on your own above!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n",
      "CPU times: user 41.9 ms, sys: 1.39 ms, total: 43.3 ms\n",
      "Wall time: 42.5 ms\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>88302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>250002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>335701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>357302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>488301</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(88302,), (250002,), (335701,), (357302,), (488301,)]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "-- 중첩 질의문의 사용\n",
    "%time %sql SELECT DISTINCT p.station_id FROM precipitation p ;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Done.\n",
      "CPU times: user 3.06 ms, sys: 1.43 ms, total: 4.49 ms\n",
      "Wall time: 3.11 ms\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<table>\n",
       "    <tr>\n",
       "        <th>station_id</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>88302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>250002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>335701</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>357302</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "        <td>488301</td>\n",
       "    </tr>\n",
       "</table>"
      ],
      "text/plain": [
       "[(88302,), (250002,), (335701,), (357302,), (488301,)]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "-- 집계 함수와 조건절 사용\n",
    "%time %sql SELECT p.station_id FROM precipitation p ;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**몇 배 정도의 시간 차이가 있나?**\n",
    "\n",
    "**An ~ 10-20x difference in execution time!!**"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}