EP31: Super High-performance NoSQL and MQ

This week’s system design refresher:


ByteByteGo talent collective

This week has been rough. Lots of layoffs and hiring freezes everywhere. My heart goes out to everyone who is going through this right now.

We are working on compiling a list of interview resources and will share out soon.

We also have ByteByteGo's talent collective here (dozens of companies are hiring) and hope this can be helpful.

Apply Now


Data Pipeline by Semantix

Very nice illustration of the Data Pipeline by Semantix. It may provide some insights into understanding data pipelines.

No alternative text description for this image

The data platform ingests, processes, analyzes and presents data generated by different data sources. A data platform manages all aspects of the data puzzle.

Modern data platforms offer a number of benefits, including centralized access to data across an organization, which eliminates silos and provides actionable insights.

Thanks for reading ByteByteGo Newsletter! Subscribe for free to receive new posts and support my work.


What does API gateway do?


Super high-performance NoSQL and MQ

Is it possible to achieve at least a 10x performance boost compared to the original Kafka and Cassandra? How to achieve that? What are the trade-offs?

There is an exciting class of storage software like π‘πžππ©πšπ§ππš and π’πœπ²π₯π₯πšπƒπ that boasts at least an order of magnitude improvement in performance.

Redpanda and ScyllaDB are used as examples in the diagram below. Redpanda can be compared to Kafka, while ScyllaDB is like NoSQL Cassandra.

It’s been a decade since Apache Kafka, and Apache Cassandra revolutionized how the software industry handled huge amounts of data.

Since then, the server CPU core count has grown 10x. Memory has grown from 64GB to half a TB. NVMe SSD drives are about 100 times faster than spinning disks from a decade ago. Network bandwidth at 25Gbps is commonplace.

A new class of software has come into the market to capitalize on this trend. We wrote this post to raise awareness about this trend.


How to scale from 0 to millions of users - spooky edition

Designing a system that supports millions of users is challenging, and it is a journey that requires continuous refinement and endless improvement. Let’s take a quick look at what are some of the key components powering the system.

Load balancer
A load balancer evenly distributes incoming traffic among web servers that are defined in a load-balanced set.

Web servers
Web server returns HTML pages or JSON response for rendering.

Databases: vertical scaling and horizontal scaling

Cache
A cache is a temporary storage area that stores the result of expensive responses or frequently accessed data in memory so that subsequent requests are served more quickly. 

CDN
A CDN is a network of geographically dispersed servers used to deliver static content. CDN servers cache static content like images, videos, CSS, JavaScript files, etc.

Message queue
A message queue is a durable component, stored in memory, that supports asynchronous communication. 

Logging, metrics, automation
When working with a small website that runs on a few servers, logging, metrics, and automation support are good practices but not a necessity. However, now that your site has grown to serve a large business, investing in those tools is essential.

Thanks for reading ByteByteGo Newsletter! Subscribe for free to receive new posts and support my work.