Tutorials


Basics of Python Programming

a. Lists, Tuples, Dictionaries, Conditionals, Loops, etc…

https://lnkd.in/gWRbc3J

b. Data Structures & Algorithms

https://lnkd.in/gYKnJWN

d. NumPy Arrays: https://lnkd.in/geeFePh

c. Regex: https://lnkd.in/gzUahNV

Practice Coding Challenges

a. Hacker Rank: https://lnkd.in/gEufBUu

b. Codeacademy: https://lnkd.in/gGQ7cuv

c. LeetCode: https://leetcode.com/

Data Manipulation 

a. Pandas: https://lnkd.in/gxSgfuQ

b. Pandas Cheatsheet: https://lnkd.in/gfAdcpw

c. SQLAlchemy: https://lnkd.in/gjvbm7h

Data Visualization 

a. Matplotlib: https://lnkd.in/g_3fx_6

b. Seaborn: https://lnkd.in/gih7hqz

c. Plotly: https://lnkd.in/gBYBMXc

d. Python Graph Gallery: https://lnkd.in/gdGe-ef

Machine Learning / Deep Learning

a. Skcikit-Learn Tutorial: https://lnkd.in/gT5nNwS

b. Deep Learning Tutorial: https://lnkd.in/gHKWM5m

c. Kaggle Kernels: https://lnkd.in/e_VcNpk

d. Kaggle Competitions: https://lnkd.in/epb9c8N

Top Categories

 

 

 

SQL Interview Questions

Our schema

We’ll be going through six questions covering topics like query performance, joins, and SQL injection. They’ll refer to the same database for cakes, customers, and orders at a bakery. Here’s the schema:

CREATE TABLE cakes (
    cake_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    flavor VARCHAR(100) NOT NULL
);

CREATE TABLE customers (
    customer_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(100) NOT NULL,
    last_name VARCHAR(100) NOT NULL,
    phone VARCHAR(15),
    street_address VARCHAR(255),
    city VARCHAR(255),
    zip_code VARCHAR(5),
    referrer_id INT,
    FOREIGN KEY (referrer_id) REFERENCES customers (customer_id)
);

CREATE TABLE orders (
    order_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    cake_id INT NOT NULL,
    customer_id INT,
    pickup_date DATE NOT NULL,
    FOREIGN KEY (cake_id) REFERENCES cakes (cake_id),
    FOREIGN KEY (customer_id) REFERENCES customers (customer_id)
);

SQL

We’ll be using MySQL for consistency.

Want to get set up with sample data? Here’s how to get four cakes, a million customers, and a million orders:

In your terminal, download our script and start up MySQL:

$ curl -O https://static.interviewcake.com/bakery_schema_and_data.sql && mysql.server start && mysql -u root

Then run our script to set up the BAKERY database and insert data:

> source bakery_schema_and_data.sql;

If you want to come back to the data again and you already downloaded and ran the script:

  1. In terminal, start up MySQL:  mysql.server start && mysql -u root
  2. In the MySQL shell, use the bakery database:  USE BAKERY;

Alright, let’s get started.

How can we make this query faster?

We want the order ID of every order in January and February, 2017. This is the query we’ve got so far:

SELECT order_id FROM orders WHERE DATEDIFF(orders.pickup_date, '2017-03-01') < 0;
-- 161314 rows in set (0.25 sec)

SQL

How can we get the nth highest number of orders a single customer has?

We made a view  of the number of orders each customer has:

CREATE VIEW customer_order_counts AS
SELECT customers.customer_id, first_name, count(orders.customer_id) AS order_count
FROM customers LEFT OUTER JOIN orders
ON (customers.customer_id = orders.customer_id)
GROUP BY customers.customer_id;

SQL

So for example, Nancy has 3 orders:

SELECT * FROM customer_order_counts ORDER BY RAND() LIMIT 1;

/*
    +-------------+------------+-------------+
    | customer_id | first_name | order_count |
    +-------------+------------+-------------+
    |        9118 | Nancy      |           3 |
    +-------------+------------+-------------+
*/

SQL

What ways can we use wildcard characters in LIKEclauses?

Now how can we make this query faster?

We’re mailing a promotion to all our customers named Sam who live in Dover. Since some customers go by names that aren’t exactly Sam, like Samuel or Sammy, we use names like Sam. Here’s how we find them:

SELECT first_name, last_name, street_address, city, zip_code FROM customers
WHERE first_name LIKE '%sam%' AND city = 'Dover';
-- 1072 rows in set (0.42 sec)

SQL

That’s pretty slow. How can we speed it up?

Answer

First, do we need to get the city and zip code for every customer? The search is constructed using the city Dover, so we know the city. And we know that the zip code for Dover is 33220. If we can complete the addresses efficiently somewhere else in our code, there’s no reason to get that information from the database for every result.

SELECT first_name, last_name, street_address FROM customers
WHERE first_name LIKE '%sam%' AND city = 'Dover';
-- 1072 rows in set (0.40 sec)

SQL

A little better, but only a little.

Let’s look at that wildcard % before “sam.” Wildcards at the beginning of comparisons can slow down performance because instead of just looking at names that start with “sam” the query has to look at every character in every first name.

Do we really need the wildcard % before sam? Should our customers with “sam” in their name but not at the start of their name, like Rosamond, be included in a Sam Promotion?

Probably not. Let’s just try removing the % at the beginning and adding an index on first_name:

ALTER TABLE customers ADD INDEX (first_name);

SELECT first_name, last_name, street_address FROM customers
WHERE first_name LIKE 'sam%' AND city = 'Dover';
-- 1065 rows in set (0.02 sec)

SQL

0.42 seconds down to 0.02 seconds!

This is a huge improvement. But—these changes are a big deal because we’re changingfunctionality. This isn’t just faster, it’s different. Some customers won’t be getting a promotion in the mail now. The decision of who’s included in the promotion would probably be made independent of database performance. But it’s always a good idea to look out for wildcard characters at the beginning of a pattern.

What are all the SQL joins?

Answer

First, let’s talk about the keywords “inner” and “outer.” They’re optional—INNER JOIN is the same as JOIN, and LEFT OUTER JOIN is the same as LEFT JOIN. These keywords are added for clarity because they make the joins easier to understand conceptually. Some developers leave them out, arguing there’s no reason to have extra nonfunctional words in a database query. The most important thing is to be consistent. We’ll use them.

Inner joins give only the rows where all the joined tables have related data. If we inner join our customers and orders, we’ll get all the related customers and orders. We won’t get any customers without orders or any orders without customers.

SELECT first_name, phone, orders.cake_id, pickup_date
FROM customers INNER JOIN orders
ON customers.customer_id = orders.customer_id;

/*
    +------------+------------+---------+-------------+
    | first_name | phone      | cake_id | pickup_date |
    +------------+------------+---------+-------------+
    | Linda      | 8095550114 |       4 | 2017-10-12  |
    | May        | 8015550130 |       4 | 2017-02-03  |
    | Frances    | 8345550120 |       1 | 2017-09-16  |
    | Matthew    | 8095550122 |       3 | 2017-07-20  |
    | Barbara    | 8015550157 |       2 | 2017-07-07  |
    ...
*/

SQL

If we wanted the cake flavor, not just the cake ID, we could also join the cake table:

SELECT first_name, phone, cakes.flavor, pickup_date
FROM customers
INNER JOIN orders ON customers.customer_id = orders.customer_id
INNER JOIN cakes ON orders.cake_id = cakes.cake_id;

/*
    +------------+------------+-----------+-------------+
    | first_name | phone      | flavor    | pickup_date |
    +------------+------------+-----------+-------------+
    | Frances    | 8345550120 | Chocolate | 2017-09-16  |
    | Theodore   | 8015550175 | Chocolate | 2017-08-13  |
    | James      | 8015550165 | Chocolate | 2017-10-12  |
    | Kathleen   | 8095550157 | Chocolate | 2017-09-24  |
    | Jennifer   | 8015550153 | Chocolate | 2017-06-22  |
    ...
*/

SQL

Left outer joins give all the rows from the first table, but only related rows in the next table. So if we run a left outer join on customers and orders, we’ll get all the customers, and their orders ifthey have any.

SELECT cake_id, pickup_date, customers.customer_id, first_name
FROM orders LEFT OUTER JOIN customers
ON orders.customer_id = customers.customer_id
ORDER BY pickup_date;

/*
    +---------+-------------+-------------+------------+
    | cake_id | pickup_date | customer_id | first_name |
    +---------+-------------+-------------+------------+
    |       2 | 2017-01-01  |        NULL | NULL       |
    |       3 | 2017-01-01  |      108548 | Eve        |
    |       1 | 2017-01-01  |      857831 | Neil       |
    |       4 | 2017-01-01  |        NULL | NULL       |
    |       3 | 2017-01-01  |      168516 | Maria      |
    ...
*/

SQL

Right outer joins include any related rows in the first table, and all the rows in the next table. Right outer joining our customers and orders would give the customer if there is one, and then every order.

In our schema, customer_id isn’t NOT NULL on orders. This may be seem unintuitive, but maybe we don’t require customers to register with us to place an order, or orders can be associated with other models like restaurant or vendor. In any case, with our schema, we can have orders without customers.

SELECT customers.customer_id, first_name, pickup_date
FROM customers RIGHT OUTER JOIN orders
ON customers.customer_id = orders.customer_id
ORDER BY pickup_date;

/*
    +-------------+------------+-------------+
    | customer_id | first_name | pickup_date |
    +-------------+------------+-------------+
    |        NULL | NULL       | 2017-01-01  |
    |      108548 | Eve        | 2017-01-01  |
    |      857831 | Neil       | 2017-01-01  |
    |        NULL | NULL       | 2017-01-01  |
    |        NULL | NULL       | 2017-01-01  |
    ...
*/

SQL

Right outer joins give the same result as left outer joins with the order of the tables switched:

SELECT customers.customer_id, first_name, pickup_date
FROM orders LEFT OUTER JOIN customers
ON customers.customer_id = orders.customer_id
ORDER BY pickup_date;

/*
    same results as right outer join we just did!

    +-------------+------------+-------------+
    | customer_id | first_name | pickup_date |
    +-------------+------------+-------------+
    |        NULL | NULL       | 2017-01-01  |
    |      108548 | Eve        | 2017-01-01  |
    |      857831 | Neil       | 2017-01-01  |
    |        NULL | NULL       | 2017-01-01  |
    |        NULL | NULL       | 2017-01-01  |
    ...
*/

SQL

Full outer joins take all the records from every table. Related data are combined like the other joins, but no rows from any table are left out. For customers and orders, we’ll get all the related customers and orders, and all the customers without orders, and all the orders without customers.

The standard SQL syntax is:

SELECT order_id, pickup_date, customers.customer_id, first_name
FROM orders FULL OUTER JOIN customers
ON orders.customer_id = customers.customer_id

SQL

But MySQL doesn’t support full outer joins! No problem, we can get the same result with a UNIONof left and right outer joins:

SELECT order_id, pickup_date, customers.customer_id, first_name
FROM orders LEFT OUTER JOIN customers
ON orders.customer_id = customers.customer_id

UNION

SELECT order_id, pickup_date, customers.customer_id, first_name
FROM orders RIGHT OUTER JOIN customers
ON orders.customer_id = customers.customer_id;


/*
    +----------+-------------+-------------+------------+
    | order_id | pickup_date | customer_id | first_name |
    +----------+-------------+-------------+------------+
    |   900075 | 2017-05-17  |        NULL | NULL       |
    |   900079 | 2017-12-26  |      487996 | Frances    |
    |   900057 | 2017-10-25  |      498546 | Loretta    |
    |     NULL | NULL        |      640804 | Whitney    |
    |     NULL | NULL        |       58405 | Zoe        |
    ...
*/

SQL

Using UNION or UNION ALL with this strategy generally emulates a full outer join. But things get complicated for some schemas, like if a column in the ON clause isn’t NOT NULL.

Cross joins give every row from the first table paired with every row in the next table, ignoring any relationship. With customers and orders, we’d get every customer paired with every order. Cross joins are sometimes called Cartesian joins because they return the cartesian product of data sets—every combination of elements in every set.

This isn’t used often because the results aren’t usually useful. But sometimes you might actually need every combination of the rows in your tables, or you might need a large table for performance testing. If you cross join 2 tables with 10,000 rows each, you get a table with 100,000,000 rows!

Self joins refer to any join that joins data in the same table. For example, some of our customers were referred to our bakery by other customers. We could do a left outer join to get every customer and their referrer if they have one:

SELECT customer.first_name, referrer.first_name
FROM customers AS customer LEFT OUTER JOIN customers AS referrer
ON customer.referrer_id = referrer.customer_id;

/*
    +------------+------------+
    | first_name | first_name |
    +------------+------------+
    | Tim        | NULL       |
    | Mattie     | Wendy      |
    | Kurtis     | NULL       |
    | Jared      | NULL       |
    | Lucille    | Tim        |
    ...
*/

SQL

What’s an example of SQL injection and how can we prevent it?

Answer

SQL injection is when a hacker gains access to our database because we used their malicious user input to build a dynamic SQL query.

Let’s say we have an input field that takes a phone number, and we use it to build this SQL query:

sql_text = "SELECT * FROM customers WHERE phone = '%s'" % phone_input
C
C#
C++
Java
JavaScript
Objective-C
PHP
Python
Ruby
Swift

We’re expecting something like “8015550198” which would neatly build:

SELECT * FROM customers WHERE phone = '8015550198';

SQL

But what if a user enters “1′ OR 1=1;–“?

Then we’d have:

SELECT * FROM customers WHERE phone = '1' OR 1=1;--';

SQL

Which will return the data for every customer because the WHERE clause will always evaluate to true! (1 always equals 1, and the “–” comments out the last single quotation mark.)

With the right input and queries, SQL injection can let hackers create, read, update and destroy data.

So to prevent SQL injection, we’ll need to look at how we build and run our SQL queries. And we can think about some smart technical design in our application.

Here are five ways to protect ourselves:

1. Use stored procedures or prepared SQL statements. So do not build dynamic SQL. This is the most effective way to prevent SQL injection.

For example, we could build a prepared statement:

from mysql import connector

# Connect to the database and instantiate a cursor
cnx = connector.connect(database='bakery')
cursor = cnx.cursor(prepared=True)

statement = "SELECT * FROM customers WHERE phone = ?"
cursor.execute(statement, (phone_input,))
C
C#
C++
Java
JavaScript
Objective-C
PHP
Python
Ruby
Swift

Or we could build a stored procedure get_customer_from_phone() with a string parameter input_phone:

DELIMITER //
CREATE PROCEDURE get_customer_from_phone
(IN input_phone VARCHAR(15))
BEGIN
    SELECT * FROM customers
    WHERE phone = input_phone;
END //
DELIMITER ;

SQL

which we could call like this:

from mysql import connector

# Connect to the database and instantiate a cursor
cnx = connector.connect(database='bakery')
cursor = cnx.cursor()

cursor.callproc('get_customer_from_phone', args=(phone_input,))
C
C#
C++
Java
JavaScript
Objective-C
PHP
Python
Ruby
Swift

2. Validate the type and pattern of input. If you know you’re looking for specific data—like an ID, name, or email address—validate any user input based on type, length, or other attributes.

For example, here’s one way we could validate a phone number:

import re

def is_valid_phone(phone_number):
    # Check for None or empty string
    if phone_number is None or not len(phone_number):
        return False

    # Contains only valid phone characters
    # Has exactly 10 digits
    phone_format = re.compile(r'^\(?([0-9]{3})\)?[-. ]?([0-9]{3})[-. ]?([0-9]{4})$')
    return phone_format.match(phone_number) is not None
C
C#
C++
Java
JavaScript
Objective-C
PHP
Python
Ruby
Swift

3. Escape special characters like quotes. This approach is a quick and easy way to reduce the chances of SQL injection, but it’s not fully effective.

For example, let’s say we want to escape backslashes, single and double quotes, new lines (\n and \r), and null (\0):

def escape_input(input):
    char_replacements = [
        # We have to escape backslashes first so we don't escape the
        # backslashes we'll add to escape other special characters
        ('\\', '\\\\'),
        ('\0', '\\0'),
        ('\n', '\\n'),
        ('\r', '\\r'),
        ('\'', '\\\''),
        ('"', '\\"'),
    ]

    for char_to_replace, replacement in char_replacements:
        if char_to_replace not in input:
            continue
        input = input.replace(char_to_replace, replacement)

    return input
C
C#
C++
Java
JavaScript
Objective-C
PHP
Python
Ruby
Swift

(See table 10.1 in the MySQL String Literals docs for a full list of special character escape sequences.)

In Python, you could also use the _mysql module’s escape_string() function (which additionally escapes the SQL wildcard characters “%” and “_”):

import _mysql

phone_param = _mysql.escape_string(phone_input)

Python

When we escape our input, now our query will be:

SELECT * FROM customers WHERE phone = '1\' OR 1=1;--';

which isn’t a valid query.

4. Limit database privileges. Application accounts that connect to the database should have as few privileges as possible. It’s unlikely, for example, that your application will ever have to delete a table. So don’t allow it.

5. Don’t display database error messages to users. Error messages contain information that could tell hackers a lot of information about your data. Best practice is to give generic database error messages to users, and log detailed errors where developers can access them. Even better, send an alert to the dev team when there’s an error.