How to store hundreds of millions of simple key-value pairs in Redis

This is about using Redis to store a huge number of data. Original article: http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value.

When transitioning systems, sometimes you have to build a little scaffolding. At Instagram, we recently had to do just that: for legacy reasons, we need to keep around a mapping of about 300 million photos back to the user ID that created them, in order to know which shard to query (see more info about our sharding setup). While eventually all clients and API applications will have been updated to pass us the full information, there are still plenty who have old information cached. We needed a solution that would:

  1. Look up keys and return values very quickly
  2. Fit the data in memory, and ideally within one of the EC2 high-memory types (the 17GB or 34GB, rather than the 68GB instance type)
  3. Fit well into our existing infrastructure
  4. Be persistent, so that we wouldn’t have to re-populate it if a server died

Continue reading How to store hundreds of millions of simple key-value pairs in Redis

Sharding & IDs at Instagram

This post is about sharding and IDs generation at Instagram. Original article: http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram

With more than 25 photos & 90 likes every second, Instagram store a lot of data. To make sure all of these data fits into memory and is available quickly for users, we’ve begun to shard our data—in other words, place the data in many smaller buckets, each holding a part of the data.

Instagram’s application servers run Django with PostgreSQL as our back-end database. Our first question after deciding to shard out our data was whether PostgreSQL should remain our primary data-store, or whether we should switch to something else. We evaluated a few different NoSQL solutions, but ultimately decided that the solution that best suited our needs would be to shard our data across a set of PostgreSQL servers. Continue reading Sharding & IDs at Instagram