Storage Systems II

Info 253: Web Architecture
Kay Ashaolu

Typical Web Architecture

Typical Web Architecture

Caching

  • Reduce load on database by placing cheap copies in front of DB
  • Problem?
  • Have to keep cache(s) up to date

Horizontal Scale-Out

Statelessness

  • Front-end, mid-tier are often stateless--why?
  • Simplifies programming, reasoning about services
  • DBs can manage complexity of state management
  • Promote reuse of complex code

Scaling

  • What if data can’t fit on a single server?
  • What if a server goes down?
  • What if a machine fails completely?

Replication

  • Provides durability: don’t lose data
  • Provides capacity: multiple servers
  • Leads to many interesting challenges

Typical Replication

Data Placement

  • Which server gets data?
  • Assign students to server based on age

Data Placement

  • Which servers get what data?
  • Range vs. Hash vs. ?
  • How many copies of the data?
    • Durability: how many failures?
    • Capacity: how many requests?

Consistency

  • Need to keep replicas up to date
  • May be slow or impossible!
  • Very expensive if servers are located around the world!

NoSQL

  • Different approach to data storage
  • Simple but predictable data models
  • Often have to build own features
  • Designed for massive scale-out

Key-Value Store

put(key, value) get(value)
					
Pros
  • Simple API
  • Easy to understand performance
  • Easy to scale and use
Cons
  • Simple API
  • Must handle own schema management
  • May need to manually implement search features

Document Store

{
	"long_url": "http://www.google.com",
	"short_url": "qwelmw",
	"hit_count": 2
}
					
  • No predefined schema
  • Store handles layout of arbitrary fields
  • Examples: MongoDB, CouchDB, Cassandra, Redis

Summary

  • Databases designed to solve many common data storage problems
  • Storage comes in many flavors; right choice is often specific to use case
  • When in doubt, start simple!
  • My opinion: start with a RDBMS and learn about your data, move to a DB that better suites your use case afterwards