Seriously, can AWS Lambda take streaming data?

9 min readApr 1, 2024

This tiny little AWS Lambda can do it! 🚀

Streaming Data to S3 with Scale

TL;DR Use HTTP POST to send newline delimited JSON to URL (Data Tap), and land your data on S3 in optimised compressed Parquet files more cost efficiently and with higher scale than instance/cluster/serverless based counterparts.

What’s the simplest, most scalable, and most cost efficient solution to stream data into S3? With any scale from hundreds and thousands of data sources? Production ready as a first class cloud native building block? Highly Available (HA) and reacting to varying peaks and lows efficiently. Globally available, on the nearest region to the data source.

Surprisingly, it is AWS Lambda accompanied with a AWS Lambda Function URL. But what kind of Lambda function?

You can setup Kafka or a multitude other streaming ingestion stacks out there, or if you have known data sources, any ELT/ETL combo that extracts data from known sources and either transforms/loads or loads/transforms it into destination like S3. But most, if not all of these solutions, require you to do VPC peering or vendor specific (or your own in-house) deployments into your VPC while also considering the scalability aspects.

With high requests per second (RPS), the CPU is kept busy? The bigger the scale the more memory and CPU you need to handle connections, SSL/TSL decryption, parsing, and processing. Furthermore, with continuous load it sounds like the best compute alternative at least is not AWS Lambda, right? The engineer inside says that AWS Lambda as a compute is expensive in scale — but in this case the intuition is wrong!

The engineer inside says that AWS Lambda as a compute is expensive in scale — but in this case the intuition is wrong!

I started streaming data into S3 with AWS Lambda Function URL, NodeJS, and embedded DuckDB database engine for in-transit SQL transformations — like ETL where the E is an open URL to HTTP POST data to, e.g. from PostgreSQL CDC, or logs, events, metrics, whatever you wanted to collect in newline delimited JSON format. This was already great and I could even push realtime metrics directly from the AWS Lambda to an embedded Apache ECharts and Plotly.JS charts on a web page with WebSockets (iframes that communicate together through main page JS “router” for session management). But intrigued by the AWS LLRT runtime, my hunger grew.

AWS Lambda Function URLs

It struck me how perfect the Lambda Function URL is for streaming ingestion!

Lambda Function URLs are AWS managed public HTTP endpoints for AWS Lambda functions. AWS manages the infrastructure that takes care of TCP connections, SSL/TSL offloading, throttling, and buffering. And what is left for you with the Lambda function is a HTTP packet blob processing and it is handed to you nicely with max 6 MB packets. You’re given horizontal scaling, dedicated vCPUs and memory. Now, your job is to process the hot potato as fast as possible! How to efficiently process the data then? And how fast can we do it?

I ended up going to the bottom of it, implementing custom AWS Lambda C/C++ runtime and handler, accompanied with Lambda extension. This is the best that can be done, unless you start inlining assembly 😅. Of course the code can be optimised further and further 😁, and there are various things you can do to make the latency very low and steady for the clients.

DuckDB itself is C++ code, so it has a C/C++ API as well — and it embeds easily because you can get both dynamic and static library with low number of dependencies. And it is reasonably small sized, fits nicely in Lambda.

Custom AWS Lambda C++ Runtime, Extension, and Handler

Here we describe how we built Data Taps.

The results are staggering and unbelievable. The smallest arm64 AWS Lambda with 128MB memory and arguably about 2 vCPU can handle the load smoothly and with steady latency.

AWS Lambda functions are firecracker containers, created very fast and once warm keep the memory and disk contents intact between subsequent invocations — at least for some time — and you pay only for the milliseconds the code is processing the invocation. While the warm Lambda container holds your code and data in memory and on the local SSD disk, you don’t pay for it.

You have a AWS Lambda extension that hooks to the container lifecycle events, SHUTDOWN event in our case. So, you have all the building blocks of using the AWS Lambda as a stateful independent entity for data processing — without loosing data. This is how logging extension libraries work too.

At the C/C++ level, an atomic append call to local filesystem file is very fast (one of the building blocks for streaming systems like Kafka). This warm Lambda then gets called again and again — and in our case, to process the incoming data. Append the data to file and report back — fast and efficient.

At this point, even this tiniest ARM64 128MB AWS Lambda function processing time is well below 2ms with small packet sizes.

Enter DuckDB. DuckDB has become very resilient data processor — i.e. suitable for data engineering problems in addition to analytics. Its’ fast, multi-threaded C++ vector based and what-not engine is top class.

DuckDB has fairly small binary size and fits nicely into AWS Lambda!

So, you need the bootstrap binary (runtime), extension for hooking into the lifecycle events and the handler code itself of course. Together you achieve data lossless ingestion to S3, where the left-over data that has been stacking up since the last sync is flushed to S3 in the shutdown hook or in the next sync (assuming that our code or DuckDB does not crash and that S3 is available and allows uploading the small data in the 2s time).

This buffering into AWS Lambda memory and disk and the scale out capability of AWS Lambda itself makes it a monster at sucking data in with scale! And then you have tens and hundreds of concurrent vCPUs crunching your data efficiently.

One of the remaining things is to get the data out of AWS Lambda to S3.

DuckDB can stream from source to destination with a single SQL line while doing any transformation or aggregations, compression, and data format change — from newline delimited JSON to ZSTD compressed Parquet files. The SQL clause is taken from the Lambda environment variable.

Data format conversion to columnar format together with compression reduces the data size considerably and not only saves on S3 costs, but also on further data processing.

With newline delimited JSON comes also error resiliency as DuckDB can skip erroneous JSON without dropping rest of the data or throwing exceptions — no poison pills.

Simple Load Test

I started AWS Load Tester and pointed it against Data Tap Function URL (i.e. against the Lambda function we describe above). It’s a bit unclear to me on what frequency a single “user” sends these packets, but the test had 10 containers and 20 users each with a 2 min ramp up period and 2 min hold period.

The live scatter plot ramped steadily from 0 to 200 users and varied there peaking close to 300. One could do many more of these tests, especially with more steep ramp up period to see how fast AWS Lambda really responds, plus of course adding much more users.

The Lambda that holds the Function URL is multi-AZ and runs inside a VPC. The per request payload size was about 244kB.

AWS Load Tester sample run with 2 min ramp-up and 2 min hold time, 244kB request size, 10 containers, 20 users per container

Here is an example of Data Taps realtime data metrics snapshot.

Stable and steady SLA!

The solution is stable with excellent SLA and scale. You don’t have to over-provision your auto-scaling cluster to be able to serve steep traffic peaks because the container environment scale in/out is a bit slow. There is no cluster(s) to worry about, no alarms to create, no DevOps people to keep around 24/7. It’s a single tiniest AWS Lambda function that outperforms its big old brothers and you can see its metrics on AWS Console Dashboard if you like.

You can also control the scale out by using reserved concurrency configuration option, if you like. Thing to keep in mind is that if you don’t set the reserved concurrency, the stream intake Lambda will eat the shared concurrency pool from the AWS Account Region. By default this soft limit is 1000 concurrent exections.

Cost Calculations

The most costly thing is uploading this data as it takes more time the smaller the Lambda is. This is also the tricky part as you need to tune the max payload size (threshold) to send to S3 with the size of the AWS Lambda function as it determines the network bandwidth available. With 2s budget the extension has, you still have plenty room to process and flush the data.

The recipe to calculate costs is to e.g. use awslogs to download all the Data Tap AWS Lambda CW Logs and sum all billedDurationMs entries (e.g. with DuckDB) to see how much cost the Lambda actually incurred. Plus, not to forget, in this case especially, the number of AWS Lambda invocations made ($0.20 per million requests).

Finally, add also all S3 API calls, e.g. with S3 Express and the stored bytes of CW Logs as well (remember CW Logs retention).

==> You can deploy Data Taps yourself and run tests and then calculate the total costs. You may not believe your eyes 👀. All the CW Logs are in JSON format and easy to process (e.g. with DuckDB). The Lambda function binary is included in the GitHub repository, along with the extension.

Achieving more than 10x cost efficiency improvement compared to already cost efficient AWS Firehose is quite incredible! Not to speak about the simplicity of a single AWS Lambda vs. the complexity of setting up Firehose to land data as parquet on S3 — not to speak about doing that cross-account.

With Data Taps you can share the data ingestion rights to any other BoilingData user.

How about Security?

Data Taps is a BoilingData product. It implements and uses our C++ AWS Lambda runtime and handler. They use the AWS Lambda Function URLs and de-facto standard JWT token based authentication and ACL (Access Control List) based access control.

Every incoming packet must contain authorization header which identifies and authenticates the user. An environment variable holds list of Data Taps users that are allowed to send data to the Tap. By default the owner of the Tap is the only user allowed to ingest data.

Other Data Taps users in other AWS regions or AWS accounts can send data to you if you put their BoilingData account username (email address) into the ACL. It’s an easy way to stream data between borders and the senders do not need to have an AWS Account or know anything about AWS.

Conclusion

Data Taps, our tailored C++ AWS Lambda runtime, handler, and extension accompanied with DuckDB provides simple and very powerful construct for data ingestion to S3. The implementation pattern and C++ code is so efficient that you don’t have to worry about the cost aspects.

AWS Lambda Function URL characteristics like TCP connections pooling / throttling, SSL/TSL offloading, buffering make it possible to achieve stable and very low latency, highly cost efficient streaming data processing with AWS Lambda.

AWS Lambda scalability is unparalleled and together with Function URLs you can also control the max scale out limits.

A single function can burst to 1,000 concurrent executions instantly. It can then scale by a further 1,000 concurrent executions every 10 seconds.

Data Taps is much simpler and much more cost efficient than e.g. its counterpart AWS Firehose, while it also allows you to define data transformation and filtering with SQL.

You can Deploy Data Taps into your own AWS Account!

Data Taps is BYOC (Bring Your Own (AWS) Cloud) ready. You can deploy it into your AWS Account and the data plane stays with you. You don’t need to worry about VPC peerings or data plane routing across AZs, not to speak about load balancers.

We have an AWS SAM template which you can use to deploy Data Taps to your AWS account and start ingesting data to S3 right away.

For more details, please see Data Taps homepage: https://www.taps.boilingdata.com/

Follow-up bdcli, our command line tool that utilises write-through-cache on deployments into AWS to achieve deployment speeds that you’re not used to! It can deploy Data Taps in couple of seconds so you don’t have to wait for minutes or half an hour when your deployment stack gets stuck for some reason and starts snoring.

You can use bdcli to fetch Data Tap tokens when you want to send data to Taps, either your own or some other Taps shared to you (i.e. the Data Tap ACL includes your account email address).

You can use our JS SDK on browser or node to acquire authorization tokens for sending data to your and other’s Data Taps.