This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Encrypt, Drop and Route to S3 Workshop

In this workshop, you will collect incoming financial transactions from edge devices, encrypt them in motion, filter, reduce and pass the information downstream to S3. This is a common scenario in retail, banking and other industries with many complexities to consider. A simplified version is presented to illustrate the fundamentals of the Mezmo Telemtry Pipeline (Tech Preview).

1: Getting Started
2: Create a Pipeline with Financial Events
3: Tapping: Understand Your Data
4: Encrypt and Filter
5: Connect to S3 and Deploy
6: Next Steps

Time to complete: 15 minutes

Support

If you run into any issues or have feedback on either the workshop or Pipeline, please reach out to us at support@mezmo.com.

1 - Getting Started

Support

Outside our Telemetry Pipeline docs, if you run into any issues or have feedback on either the workshop or Pipeline, please reach out to us at support@mezmo.com.

Prerequisites

Before beginning, you will need the following

A Mezmo account with Pipeline enabled.
An AWS account with the necessary permission to administer S3 buckets.

Overview

In this workshop, we will be managing signals coming in from many simulated edge devices to scrub PII and clean them for downstream use.

To accomplish this we will

Create a new Pipleline
Configure a Source to receive the data
Drop useless information from events
Filter unneccesary events
Encrypt PII in motion
Store required data in S3 by type

Final Product

In the end you are going to build a Pipeline that looks like

This pipeline will drop unnecsary information from events, encrypt a filtered subset to pass those on to S3 for data engineers. By allowing for easy, granular control you can ensure the right data ends up where it belongs.

2 - Create a Pipeline with Financial Events

Make sure you have a Mezmo account

In order to complete this workshop, you will need a Mezmo account with Pipeline enabled. Note that this is a technical preview, Pipeline may not be enabled off the bat. If you do not have an account, you can sign up for a free trial here and/or if you don’t have Pipeline enabled you can reach out to us at support@mezmo.com to get it set up.

Step 1: Create a new Pipeline

Once you have Pipeline enabled, go ahead and Create a new Pipeline. You will be prompted to name your Pipeline, call it what you will but we will go with Edge Transaction to S3. After creation, you will be taken to the following blank canvas

Step 2: Add the Demo Financial Transaction Source

We are going to connect up some Demo Logs that simulate financial transaction’s from edge devices. You can think of this as data from Point-of-Sale systems, payment processing devices, etc. While you can connect many sources, we made it easy and simulate having multiple edge devices streaming through one Source endpoint.

Doing so is easy, click Add Source

And from there

Select Demo Logs
Give it a Title like Edge Devices
Set Format to fincancial
Click Save

3 - Tapping: Understand Your Data

Why It’s Important

Fundamental to any task on flowing data is knowing the structure. While you can see the general format from the edge (ie terminal output, etc from the device) or by digging into the code/databases, your team can now explore the strucutre of disparate events at scale in Mezmo.

This is enabled in a Deployed pipeline via Tapping. To take advantage of this, we simply deploy and then tap in the Montiored Pipeline view.

Let’s do that now.

Step 1: Deploy the Pipeline

To make this Pipeline live for tapping, we need to Deploy it. In the top right corner of the Pipeline view, select Deploy pipeline and accept the popup by selecting Deploy.

Accept the output about unconnected nodes and it will take you to the Pipeline Monitoring view where you can see high level statistics and information on the data passing through. Note that it will take a couple minutes to update the information so at first it will look empty. But eventually it should look something like this:

Step 2: Tap the Pipeline

To tap any node, we simply hover over the righthand side of the node and click the blue column that overlays. This can only be done on a Deployed pipeline in the Monitoring view.

A sidebar will slide out where you can select the number of events to be grabbed. Leave it at 10 and select the blue Start Tap button to the right. You should begin to events piling up like below.

You can expand and explore any event’s structure by clicking on the triangle to the left of the event. As you can see, we have a couple types of logs flowing through via different devices. But, for this workshop, the ones we care about contain financial transaction information (fear not, these aren’t real CC numbers) and are of the form

The other events also contain datetime, device, event and buffer but transaction is replaced by other unique details. We won’t bother with those for this workshop.

4 - Encrypt and Filter

Overview

There is a lot we may want to do with the data as you may have saw while looking at live data.

For this workshop we are going to encrypt PII, drop useless information from events and then route the financial transaction data to a specific S3 bucket while sending that and everything else to the teams general S3 bucket.

While we are going with S3 for this workshop, we have many other Destinations available today and others, like Mezmo Log Analysis, that are experimental. If interested in access to experimental features, reach out to your account representative or support@mezmo.com.

But, let’s take this one step at a time.

Note on Editing a Deployed Pipeline

If you previously Deployed your pipeline in Tapping: Understand Your Data, then you need to go into Edit mode on your Pipeline. You can do this by selecting Edit pipeline in the top right corner of the Pipeline view.

Step 1: Drop the Unnecessary Buffer

We don’t need the buffer, so let’s drop it. First select Add Processor which will pull up a dialog like so

Select Remove Fields from the list (docs)
Give it a title like Drop buffer
Enter the field .buffer to drop it
Click Save.

Then connect this to the Source processor by hovering over the Source till you see a gray half circle. Click and drag to the right edge of the Drop buffer node. Release the mouse and things are linked up. Data will now flow left to right, from the Source to the Processor. Also note that things will rearrange themselves as you go.

You may have noticed we are referencing the buffer key via .buffer. This is syntax is slightly different than you may be used to but its very straight forward. To learn more, check out our docs here.

Step 3: Route Transaction Data

We want to send only the transaction events to S3, to do this we can use a Route processor (docs). Go ahead and add one with the Title Transactions.

We could group successful and failed transactions (.transaction.result) but let’s seperate the routes. To do this, we will create two Outputs.

For the first route:

Give it the name Transaction Success
Select an IF and enter .transaction.result equals true
To weed out any anomalies for later analysis, lets also ensure .transaction.total_price is greater_or_equal to 0 via Add Expresion.

Similarly, for the second output:

Select Add route and enter the name Transaction Fail
Configure the IF with .transaction.result equals false
Eliminate anomalies with the .transaction.total_price expression from above.

Click Save.

Connect the Drop Buffer processor from Step 1 to the Route processor you just created.

Note that we will leave the Unmatched route untouched for this workshop. But there are many things that could be done with this data: send to Log Analytics, send to a SIEM, etc.

Step 4: Encrypt the Credit Card Information

Now, let’s encrypt each of the credit card fields individually to ensure security and compliance. The fields we want to encrypt are

.transaction.cc.cc_number
.transaction.cc.cc_exp
.transaction.cc.cc_cvv
.transaction.cc.cc_zip
.transaction.cc.cc_name

Since each are unique, order doesn’t matter so much here. For each:

Add an Encrypt Field processor (docs)
Choose the AES-256-CFB algorithm with a 32 character Encryption Key (checkout AllKeysGenerator.comto generate each key)
Add an Initialization Vector and name it whatever you like. Note that every encryption processor needs to add a key like this to the event itself for decryption down the road.
Click Save.

Once you do this for each of the above fields (or don’t, it’s just a demo pipeline afterall), you should have 5 floating processors like so

Now, connect each one sequentially and then link the fail and success routes to the first processor in this group in parallel. The Pipeline should now look similar to

Now that the transformations have been defined, it’s time to sink this all up to the S3 and start gathering data.

5 - Connect to S3 and Deploy

Step 1: Add S3 Financial Destination

With our data cleaned we can get fancy with how we route the financial transactions here (see the workshop on S3 to Snowflake to learn more), but we will keep it simple for now. Let’s dump all this data into a single S3 bucket for our data engineering teams.

You will need the following information from you AWS account:

AWS Access Key ID
AWS Secret Access Key
AWS Bucket Name
AWS Region

Note

The IAM User associated with that Access Key ID needs to have at least the PutObject privlege in AWS.
You must create the Bucket with that Name in AWS yourself.

For more details you can check out the IAM and S3 section of the S3 to Snowflake workshop.

With those in hand, add a new Destination and select AWS S3 (docs)

Give the title S3 Fin Transactions
Enter your Access Key ID
Enter your Secret Access Key
Enter your Bucket name (we will go with mezmo-pipeline-financial-transactions)
Select JSON for the Encoding
Enter your Region (we will go with us-east-1
Click Save when yours looks similar to the image below.

Then, connect up the last Encryption Processor for the CC data to this destination like so

Step 2: Add S3 General Destination

The last step before deploying is to funnel the cleaned data and those that were unmatched to the teams general S3 bucket.

Follow a similar procedure to step one, but this time create it using a new bucket (say mezmo-pipeline-financial-all) and a new name of S3 General.

Once done, connect up that Destination to the same final Encryption Processor from Step 1 as well as the Unmatched Route from earlier. You should end up with something like this

Step 3: Deploy

Now, simply Deploy pipeline in the top right. After the Pipeline should no longer be a draft (if you hadn’t deployed earlier) and look like this

Watch as data comes into both S3 buckets. Looking at the bucket connected to S3 Fin Transactions, you should begin seeing files like so

Note that it will take up to 5 min to first see data flowing in to S3. This is due to batching and our durable queues, no data will be dropped.

6 - Next Steps

Survey, $25 for your thoughts?

We are offering $25 Amazon giftcards for completing one of the workshops and filling out a short survey on your experience with the Mezmo Pipeline Tech Preview. If you have 7 min, head on over here.

Recap

We have succesfully connected a fleet of simulated devices to a Mezmo Pipeline to clean, encrypt and route an important subset to S3 for later analysis. You should have a Pipeline that looks like

Learn More

So we have the data encrypted and the relevant events are separated in our S3. But now what?

Always recommend peaking at the docs, but if you feel like exploring more through workshops check out our Dynamic S3 to Snowflake Ingestion workshop to learn how to organize dynamically and get the transactions into a data warehouse for further analysis. Or take a peak at our Mezmo Platform workshop to utilize Open Telemetry and find other ways you can take advantage of events holistically on the Mezmo Platform through our Log Analysis add-on.