Storage Systems I

Info 253: Web Architecture
Kay Ashaolu

Why data storage?

  • When we make a web request, where do we get the data from?
  • When we create data, where do we put it?
  • Where do "resources" live?

Example: bit.ly

  • Lots of data to store
    • Shortcut to url mapping
    • Statistics about links
    • Information about users

Example: bit.ly

long url http://news.google.com
short url http://bit.ly/awekl
hit count 482240

long url http://facebook.com/user/profile
short url http://bit.ly/czasw
hit count 11023

long url http://msnbc.com/news/article/
short url http://bit.ly/olkjpl
hit count 1232
					

Data Storage Design

  • What is the storage format?
  • How do we lay out data?
  • How do we access data?

Why use a file?

  • http://news.google.com, http://bit.ly/awekl, 482240
  • http://facebook.com/user/profile, http://bit.ly/czasw, 11023
  • http://msnbc.com/news/article, http://bit.ly/olkjpl, 1232
  • What are the pros and cons?

Problems with Files

  • What if we want to add another field?
  • What if we want to query different parts of data? How efficient is this?
  • What if we have concurrent accesses?
  • What data structures should we use?

Data Independence

  • Databases: apps shouldn’t have to worry about these problems!
  • Underlying storage format independent of application-level logic

Relational Data Stores

  • RDBMS: Relational Database Management System
  • Invented in the 1970s
  • e.g., Oracle, MySQL, Postgres, IBM DB2, Microsoft SQL Server

Relational Model

  • Reason about sets of facts, or "tables"
  • Each fact is a "row"
  • Attributes are "columns" of row

For Example

For Example

SQL Query Language

  • High-level query language over tables
  • Declarative: say "what" you want computed, not "how"
  • Why is this good?

Select Example

SELECT * FROM links;
					
long_url short_url hit_count created
http://www.google.com qwelmw 2 2016-4-5
http://www.facebook.com adfer 45 2016-8-5
2 rows in set (0.00 sec)
					

Select Example

SELECT * FROM links WHERE hit_count < 20;
					
long_url short_url hit_count created
http://www.google.com qwelmw 2 2016-4-5
1 row in set (0.00 sec)
					

INSERT Example

INSERT INTO links VALUES ("http://www.twitter.com", "eovle", 0, CURDATE());
Query OK, 1 row affected (0.00 sec)

					
SELECT * FROM links;
					
long_url short_url hit_count created
http://www.google.com qwelmw 2 2016-4-5
http://www.facebook.com adfer 45 2016-8-5
http://www.twitter.com eovle 0 2016-10-28
3 rows in set (0.00 sec)
					

UPDATE example

UPDATE links SET hit_count = '0' WHERE created < '2016-10-22';
Query OK, 2 rows affected (0.00 sec)
Rows matched: 2  Changed: 2  Warnings: 0
					
SELECT * FROM links;
					
long_url short_url hit_count created
http://www.google.com qwelmw 0 2016-4-5
http://www.facebook.com adfer 0 2016-8-5
http://www.twitter.com eovle 9 2016-10-28
3 rows in set (0.00 sec)
					

Useful Properties

  • Atomicity: all updates happen or none do
  • Consistency: easy to reason about database
  • Isolation: operations are separated from each other
  • Durability: updates won’t disappear

RDBMS Pros and Cons

Pros
  • High-level query language
  • High Data integrity
  • Data independence
Cons
  • Have to define schema at start
  • Scaling can be complicated
  • Can become slow

Questions