Mark As Completed Discussion

Timeline Architectures in Twitter

Twitter's architecture consists of two primary timelines, each with distinct design challenges and optimizations.

1. User Timeline

The User Timeline represents the tweets and retweets made by a specific user, presented in chronological order.

  • Data Retrieval: When fetching the User Timeline, a query is sent to the User table. The corresponding tweets are then retrieved from the Tweets table.
  • Caching Layer: To optimize this retrieval process, Twitter employs a caching layer using Redis. Since fetching data from Redis is faster than querying the database, this reduces latency significantly.
  • Metrics: With Twitter handling around 500 million tweets per day, the caching strategy is essential for maintaining efficient operations.
2. Home Timeline

The Home Timeline displays content from people that the user follows. This requires a more sophisticated approach, as it involves aggregating data from multiple sources.

  • Fanout Caching Approach: Instead of fetching each follower's tweets and rearranging them, Twitter uses a fanout caching approach. When a user tweets, the tweet is sent through a load balancer to servers, saved in the database, and cached in Redis. The server then retrieves information about the tweeter's followers and injects the tweet into the in-memory timelines of those followers.
  • Users with More Than a Million Followers (Case Study):
    • Problem: Handling a tweet by a user with a large following (e.g., celebrities) requires special handling to avoid overwhelming the system.
    • Solution: A hybrid approach combining precomputed home timelines and synchronous calls is used. First, the home timeline is updated with all other tweets, excluding those from highly followed users. Then, a list of heavily followed users is maintained in the user's cache, allowing for runtime fetching of relevant tweets.
    • Metrics: This strategy is critical for managing the impact of tweets from users with millions of followers, balancing responsiveness with system load.

Optimization and Scalability Considerations

  • Cache Management: Optimization techniques within the cache enable faster performance and reduced load. For instance, home timelines for inactive users are not precalculated and stored in the cache, saving resources.
  • Load Balancing: Distributing requests efficiently across servers ensures that the system can handle the vast volume of queries and updates.
  • High Availability: Redundancy and failover mechanisms ensure that the system remains operational even in the face of individual component failures.

Timelines