This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Encrypt, Drop and Route to S3 Workshop

In this workshop, you will collect incoming financial transactions from edge devices, encrypt them in motion, filter, reduce and pass the information downstream to S3. This is a common scenario in retail, banking and other industries with many complexities to consider. A simplified version is presented to illustrate the fundamentals of the Mezmo Telemtry Pipeline (Tech Preview).

Time to complete: 15 minutes

1 - Getting Started

Overview

In this workshop, we will be managing signals coming in from many simulated edge devices to scrub PII and clean them for downstream use.

To accomplish this we will

  • Create a new Pipleline
  • Configure a Source to receive the data
  • Drop useless information from events
  • Filter unneccesary events
  • Encrypt PII in motion
  • Store required data in S3 by type

Final Product

In the end you are going to build a Pipeline that looks like

Final Pipeline

This pipeline will drop unnecsary information from events, encrypt a filtered subset to pass those on to S3 for data engineers. By allowing for easy, granular control you can ensure the right data ends up where it belongs.

2 - Create a Pipeline with Financial Events

Step 1: Create a new Pipeline

Once you have Pipeline enabled, go ahead and Create a new Pipeline. You will be prompted to name your Pipeline, call it what you will but we will go with Edge Transaction to S3. After creation, you will be taken to the following blank canvas

Blank Pipeline

Step 2: Add the Demo Financial Transaction Source

We are going to connect up some Demo Logs that simulate financial transaction’s from edge devices. You can think of this as data from Point-of-Sale systems, payment processing devices, etc. While you can connect many sources, we made it easy and simulate having multiple edge devices streaming through one Source endpoint.

Doing so is easy, click Add Source

Add Source

And from there

  • Select Demo Logs
  • Give it a Title like Edge Devices
  • Set Format to fincancial
  • Click Save

Added Source

3 - Tapping: Understand Your Data

Why It’s Important

Fundamental to any task on flowing data is knowing the structure. While you can see the general format from the edge (ie terminal output, etc from the device) or by digging into the code/databases, your team can now explore the strucutre of disparate events at scale in Mezmo.

This is enabled in a Deployed pipeline via Tapping. To take advantage of this, we simply deploy and then tap in the Montiored Pipeline view.

Let’s do that now.

Step 1: Deploy the Pipeline

To make this Pipeline live for tapping, we need to Deploy it. In the top right corner of the Pipeline view, select Deploy pipeline and accept the popup by selecting Deploy.

Deploy Pipeline

Accept the output about unconnected nodes and it will take you to the Pipeline Monitoring view where you can see high level statistics and information on the data passing through. Note that it will take a couple minutes to update the information so at first it will look empty. But eventually it should look something like this:

Monitoring View

Step 2: Tap the Pipeline

To tap any node, we simply hover over the righthand side of the node and click the blue column that overlays. This can only be done on a Deployed pipeline in the Monitoring view.

Insert Data Tap

A sidebar will slide out where you can select the number of events to be grabbed. Leave it at 10 and select the blue Start Tap button to the right. You should begin to events piling up like below.

Tap Play Button

You can expand and explore any event’s structure by clicking on the triangle to the left of the event. As you can see, we have a couple types of logs flowing through via different devices. But, for this workshop, the ones we care about contain financial transaction information (fear not, these aren’t real CC numbers) and are of the form

Tap Structure Exploration

The other events also contain datetime, device, event and buffer but transaction is replaced by other unique details. We won’t bother with those for this workshop.

4 - Encrypt and Filter

Overview

There is a lot we may want to do with the data as you may have saw while looking at live data.

For this workshop we are going to encrypt PII, drop useless information from events and then route the financial transaction data to a specific S3 bucket while sending that and everything else to the teams general S3 bucket.

While we are going with S3 for this workshop, we have many other Destinations available today and others, like Mezmo Log Analysis, that are experimental. If interested in access to experimental features, reach out to your account representative or support@mezmo.com.

But, let’s take this one step at a time.

Step 1: Drop the Unnecessary Buffer

We don’t need the buffer, so let’s drop it. First select Add Processor which will pull up a dialog like so

Add Processor List

  • Select Remove Fields from the list (docs)
  • Give it a title like Drop buffer
  • Enter the field .buffer to drop it
  • Click Save.

Remove Fields Processor

Then connect this to the Source processor by hovering over the Source till you see a gray half circle. Click and drag to the right edge of the Drop buffer node. Release the mouse and things are linked up. Data will now flow left to right, from the Source to the Processor. Also note that things will rearrange themselves as you go.

Drop Processor Connection

You may have noticed we are referencing the buffer key via .buffer. This is syntax is slightly different than you may be used to but its very straight forward. To learn more, check out our docs here.

Step 3: Route Transaction Data

We want to send only the transaction events to S3, to do this we can use a Route processor (docs). Go ahead and add one with the Title Transactions.

We could group successful and failed transactions (.transaction.result) but let’s seperate the routes. To do this, we will create two Outputs.

For the first route:

  • Give it the name Transaction Success
  • Select an IF and enter .transaction.result equals true
  • To weed out any anomalies for later analysis, lets also ensure .transaction.total_price is greater_or_equal to 0 via Add Expresion.

Route: Success

Similarly, for the second output:

  • Select Add route and enter the name Transaction Fail
  • Configure the IF with .transaction.result equals false
  • Eliminate anomalies with the .transaction.total_price expression from above.

Click Save.

Route: Failed

Connect the Drop Buffer processor from Step 1 to the Route processor you just created.

Route: Connected

Note that we will leave the Unmatched route untouched for this workshop. But there are many things that could be done with this data: send to Log Analytics, send to a SIEM, etc.

Step 4: Encrypt the Credit Card Information

Now, let’s encrypt each of the credit card fields individually to ensure security and compliance. The fields we want to encrypt are

  • .transaction.cc.cc_number
  • .transaction.cc.cc_exp
  • .transaction.cc.cc_cvv
  • .transaction.cc.cc_zip
  • .transaction.cc.cc_name

Since each are unique, order doesn’t matter so much here. For each:

  • Add an Encrypt Field processor (docs)
  • Choose the AES-256-CFB algorithm with a 32 character Encryption Key (checkout AllKeysGenerator.comto generate each key)
  • Add an Initialization Vector and name it whatever you like. Note that every encryption processor needs to add a key like this to the event itself for decryption down the road.
  • Click Save.

Encrypt CC Number Dialog

Once you do this for each of the above fields (or don’t, it’s just a demo pipeline afterall), you should have 5 floating processors like so

Encrypt CC: Unconnected

Now, connect each one sequentially and then link the fail and success routes to the first processor in this group in parallel. The Pipeline should now look similar to

Encrypt CC: Connected

Now that the transformations have been defined, it’s time to sink this all up to the S3 and start gathering data.

5 - Connect to S3 and Deploy

Step 1: Add S3 Financial Destination

With our data cleaned we can get fancy with how we route the financial transactions here (see the workshop on S3 to Snowflake to learn more), but we will keep it simple for now. Let’s dump all this data into a single S3 bucket for our data engineering teams.

You will need the following information from you AWS account:

  • AWS Access Key ID
  • AWS Secret Access Key
  • AWS Bucket Name
  • AWS Region

With those in hand, add a new Destination and select AWS S3 (docs)

  • Give the title S3 Fin Transactions
  • Enter your Access Key ID
  • Enter your Secret Access Key
  • Enter your Bucket name (we will go with mezmo-pipeline-financial-transactions)
  • Select JSON for the Encoding
  • Enter your Region (we will go with us-east-1
  • Click Save when yours looks similar to the image below.

S3 Financial Destination Definition

Then, connect up the last Encryption Processor for the CC data to this destination like so

S3 Financial Destination Connected

Step 2: Add S3 General Destination

The last step before deploying is to funnel the cleaned data and those that were unmatched to the teams general S3 bucket.

Follow a similar procedure to step one, but this time create it using a new bucket (say mezmo-pipeline-financial-all) and a new name of S3 General.

Once done, connect up that Destination to the same final Encryption Processor from Step 1 as well as the Unmatched Route from earlier. You should end up with something like this

S3 General Destination Connected

Step 3: Deploy

Now, simply Deploy pipeline in the top right. After the Pipeline should no longer be a draft (if you hadn’t deployed earlier) and look like this

Final Pipeline Deployed

Watch as data comes into both S3 buckets. Looking at the bucket connected to S3 Fin Transactions, you should begin seeing files like so

S3 Data in AWS

Note that it will take up to 5 min to first see data flowing in to S3. This is due to batching and our durable queues, no data will be dropped.

6 - Next Steps

Recap

We have succesfully connected a fleet of simulated devices to a Mezmo Pipeline to clean, encrypt and route an important subset to S3 for later analysis. You should have a Pipeline that looks like

Final Pipeline

Learn More

So we have the data encrypted and the relevant events are separated in our S3. But now what?

Always recommend peaking at the docs, but if you feel like exploring more through workshops check out our Dynamic S3 to Snowflake Ingestion workshop to learn how to organize dynamically and get the transactions into a data warehouse for further analysis. Or take a peak at our Mezmo Platform workshop to utilize Open Telemetry and find other ways you can take advantage of events holistically on the Mezmo Platform through our Log Analysis add-on.