How Uber Built Real-Time Chat to Handle 3 Million Tickets Per Week?

Integrate API users 50% faster (Sponsored)

Creating a frictionless API experience for your partners and customers no longer requires an army of engineers. Speakeasy’s platform makes crafting type-safe, idiomatic SDKs for enterprise APIs easy. That means you can unlock API revenue while keeping your team focused on what matters most: shipping new products. Make SDK generation part of your API’s CI/CD and distribute libraries that users love at a fraction of the cost of maintaining them in-house.

Try for free


Uber has a diverse customer base consisting of riders, drivers, eaters, couriers, and merchants. 

Each user persona has different support requirements when reaching out to Uber’s customer agents through various live and non-live channels. Live channels are chat and phone while non-live is Uber’s inApp messaging channel.

For the users, the timely resolution of issues takes center stage. However, Uber’s main concern revolves around customer satisfaction and the cost of resolution of tickets. To keep costs in control, Uber needs to maintain a low CPC (cost-per-contact) with a good customer satisfaction rating.

Based on their analysis, they found that the live chat channel offers the most value when compared to other channels. It allows for:

However, from 2019 to 2024, only 1% of all support interactions (also known as contacts) were served via the live chat channel because the chat infrastructure at Uber wasn’t capable of meeting the demand.

In this post, we look at how Uber built their real-time chat channel to work at the required scale.

The Legacy Chat Architecture

The legacy architecture for live chat at Uber was built using the WAMP protocol. WAMP or Web Application Messaging Protocol is a WebSocket subprotocol that is used to exchange messages between application components.

It was primarily used for message passing and PubSub over WebSockets to relay contact information to the agent’s machine.

The below diagram shows a high-level flow of the chat contact from being created to being routed to an agent on the front end.

This architecture had some core issues as follows:

1 - Reliability

Once the traffic scaled to 1.5X, the system started to face reliability issues.

Almost 46% of the events from the backend were not getting delivered to the Agent’s browser, adding to the customer’s wait time to speak to an agent. It also created delays for the agent resulting in wastage of bandwidth.

2 - Scale

Once the request per second crossed 10, the system’s performance deteriorated due to high memory usage and file descriptor leaks. 

Also, it wasn’t possible to horizontally scale the system due to limitations with older versions of the WAMP library.

3 - Observability and Debugging

There were major issues related to observability and debugging:

4 - Stateful

The services in the architecture were stateful resulting in maintenance and restart complications. 

This caused frequent spikes in message delivery time and losses. 


Latest articles

If you’re not a paid subscriber, here’s what you missed.

  1. A Crash Course in API Versioning Strategies

  2. Embracing Chaos to Improve System Resilience: Chaos Engineering

  3. A Crash Course in CI/CD

  4. A Crash Course in IPv4 Addressing

  5. A Brief History of Scaling Netflix

To receive all the full articles and support ByteByteGo, consider subscribing:

Subscribe now


Goals of the New Chat Architecture

Due to these challenges, Uber decided to build a new real-time chat infrastructure with the following goals:

  1. Scale up the chat traffic from 1% to 80% of the overall contact volume by the end of 2023. This came to around 3 million tickets per week.

  2. The process of connecting a customer to an agent after identification should have a greater than 99.5% success rate on the first trial.

  3. Build end-to-end observability and debuggability over the entire chat flow.

  4. Build stateless services that can be scaled horizontally.

The New Live Chat Architecture

It was important for the new architecture to be simple to improve transparency and scalability.

The team at Uber decided to go with the Push Pipeline. It was a simple WebSocket server that agent machines would connect to and be able to send and receive messages through one generic socket channel.

The below diagram shows the new architecture.

Below are the details of the various components:

Front End UI

This is used by the agents to interact with the customers. 

Widgets and different actions are made available to the agents to take appropriate actions for the customer.

Contact Reservation

The router is the service that finds the most appropriate match between the agent and contact depending on the contact details. 

An agent is selected based on the concurrency set of the agent’s profile such as the number of chats an agent can handle simultaneously. Other considerations include:

On finding the match, the contact is pushed into a reserved state for the agent.

Push Pipeline

When the contact is reserved for an agent, the information is published to Kafka and is received by the GQL Subscription Service.

On receiving the information through the socket via GraphQL subscriptions, the Front End loads the contact for the agent along with all the necessary widgets and actions.

Agent State

When the agent starts working, he/she goes online via a toggle on the Front End. 

This updates the Agent State service, allowing the agent to be mapped to a contact.

GQL Subscription Service

The front-end team was already using GraphQL for HTTP calls to the services. Due to this familiarity, the team selected GraphQL subscriptions for pushing data from the server to the client. 

The below diagram shows how GraphQL subscriptions work on a high level.

In GraphQL subscriptions, the client sends messages to the server via subscription requests. The server matches the queries and sends back messages to the client machines. In this case, the client machines are the agent machines.

Uber’s engineering team used GraphQL over WebSockets by leveraging the graphql-ws library. The library had almost 2.3 million weekly downloads and was also recommended by Apollo.

To improve the availability, they used a few techniques:

Test Results from the New Chat Architecture

Uber performed functional and non-functional tests to ensure that both customers and agents received the best experience.

Some results from the tests were as follows:

References:


SPONSOR US

Get your product in front of more than 500,000 tech professionals.

Our newsletter puts your products and services directly in front of an audience that matters - hundreds of thousands of engineering leaders and senior engineers - who have influence over significant tech decisions and big purchases.

Space Fills Up Fast - Reserve Today

Ad spots typically sell out about 4 weeks in advance. To ensure your ad reaches this influential audience, reserve your space now by emailing [email protected].