Time to complete: 15 minutes
Support
This is the multi-page printable view of this section. Click here to print.
Time to complete: 15 minutes
Support
Support
Prerequisites
Before beginning, you will need the following
In this workshop, we will be managing signals coming in from many simulated edge devices to scrub PII and clean them for downstream use.
To accomplish this we will
In the end you are going to build a Pipeline that looks like
This pipeline will drop unnecsary information from events, encrypt a filtered subset to pass those on to S3 for data engineers. By allowing for easy, granular control you can ensure the right data ends up where it belongs. Make sure you have a Mezmo account Once you have Pipeline enabled, go ahead and Create a new Pipeline. You will be prompted to name your Pipeline, call it what you will but we will go with We are going to connect up some Demo Logs that simulate financial transaction’s from edge devices. You can think of this as data from Point-of-Sale systems, payment processing devices, etc. While you can connect many sources, we made it easy and simulate having multiple edge devices streaming through one Source endpoint. Doing so is easy, click Add Source And from there Fundamental to any task on flowing data is knowing the structure. While you can see the general format from the edge (ie terminal output, etc from the device) or by digging into the code/databases, your team can now explore the strucutre of disparate events at scale in Mezmo. This is enabled in a Let’s do that now. To make this Pipeline live for tapping, we need to Accept the output about unconnected nodes and it will take you to the To tap any node, we simply hover over the righthand side of the node and click the blue column that overlays. This can only be done on a A sidebar will slide out where you can select the number of events to be grabbed. Leave it at You can expand and explore any event’s structure by clicking on the triangle to the left of the event. As you can see, we have a couple types of logs flowing through via different devices. But, for this workshop, the ones we care about contain financial transaction information (fear not, these aren’t real CC numbers) and are of the form The other events also contain There is a lot we may want to do with the data as you may have saw while looking at live data. For this workshop we are going to encrypt PII, drop useless information from events and then route the financial transaction data to a specific S3 bucket while sending that and everything else to the teams general S3 bucket. While we are going with S3 for this workshop, we have many other But, let’s take this one step at a time. Note on Editing a Deployed Pipeline We don’t need the buffer, so let’s drop it. First select Add Processor which will pull up a dialog like so Then connect this to the Source processor by hovering over the Source till you see a gray half circle. Click and drag to the right edge of the You may have noticed we are referencing the We want to send only the transaction events to S3, to do this we can use a We could group successful and failed transactions ( For the first route: Similarly, for the second output: Click Connect the Note that we will leave the Now, let’s encrypt each of the credit card fields individually to ensure security and compliance. The fields we want to encrypt are Since each are unique, order doesn’t matter so much here. For each: Once you do this for each of the above fields (or don’t, it’s just a demo pipeline afterall), you should have 5 floating processors like so Now, connect each one sequentially and then link the fail and success routes to the first processor in this group in parallel. The Pipeline should now look similar to Now that the transformations have been defined, it’s time to sink this all up to the S3 and start gathering data. With our data cleaned we can get fancy with how we route the financial transactions here (see the workshop on S3 to Snowflake to learn more), but we will keep it simple for now. Let’s dump all this data into a single S3 bucket for our data engineering teams. You will need the following information from you AWS account: Note For more details you can check out the IAM and S3 section of the S3 to Snowflake workshop. With those in hand, add a new Then, connect up the last Encryption Processor for the CC data to this destination like so The last step before deploying is to funnel the cleaned data and those that were Follow a similar procedure to step one, but this time create it using a new bucket (say Once done, connect up that Now, simply Watch as data comes into both S3 buckets. Looking at the bucket connected to Note that it will take up to 5 min to first see data flowing in to S3. This is due to batching and our durable queues, no data will be dropped. Survey, $25 for your thoughts? We have succesfully connected a fleet of simulated devices to a Mezmo Pipeline to clean, encrypt and route an important subset to S3 for later analysis. You should have a Pipeline that looks like So we have the data encrypted and the relevant events are separated in our S3. But now what? Always recommend peaking at the docs, but if you feel like exploring more through workshops check out our Dynamic S3 to Snowflake Ingestion workshop to learn how to organize dynamically and get the transactions into a data warehouse for further analysis. Or take a peak at our Mezmo Platform workshop to utilize Open Telemetry and find other ways you can take advantage of events holistically on the Mezmo Platform through our Log Analysis add-on.2 - Create a Pipeline with Financial Events
technical preview
, Pipeline may not be enabled off the bat. If you do not have an account, you can sign up for a free trial here and/or if you don’t have Pipeline enabled you can reach out to us at support@mezmo.com to get it set up.Step 1: Create a new Pipeline
Edge Transaction to S3
. After creation, you will be taken to the following blank canvasStep 2: Add the Demo Financial Transaction Source
Demo Logs
Edge Devices
fincancial
3 - Tapping: Understand Your Data
Why It’s Important
Deployed
pipeline via Tapping
. To take advantage of this, we simply deploy and then tap in the Montiored Pipeline view.Step 1: Deploy the Pipeline
Deploy
it. In the top right corner of the Pipeline view, select Deploy pipeline
and accept the popup by selecting Deploy
.Pipeline Monitoring
view where you can see high level statistics and information on the data passing through. Note that it will take a couple minutes to update the information so at first it will look empty. But eventually it should look something like this:Step 2: Tap the Pipeline
Deployed
pipeline in the Monitoring
view.10
and select the blue Start Tap
button to the right. You should begin to events piling up like below.datetime
, device
, event
and buffer
but transaction
is replaced by other unique details. We won’t bother with those for this workshop.4 - Encrypt and Filter
Overview
Destinations
available today and others, like Mezmo Log Analysis, that are experimental
. If interested in access to experimental
features, reach out to your account representative or support@mezmo.com.Deployed
your pipeline in Tapping: Understand Your Data, then you need to go into Edit mode on your Pipeline. You can do this by selecting Edit pipeline
in the top right corner of the Pipeline view.Step 1: Drop the Unnecessary Buffer
Remove Fields
from the list (docs)Drop buffer
.buffer
to drop itSave
.Drop buffer
node. Release the mouse and things are linked up. Data will now flow left to right, from the Source to the Processor. Also note that things will rearrange themselves as you go.buffer
key via .buffer
. This is syntax is slightly different than you may be used to but its very straight forward. To learn more, check out our docs here.Step 3: Route Transaction Data
Route
processor (docs). Go ahead and add one with the Title Transactions
..transaction.result
) but let’s seperate the routes. To do this, we will create two Outputs
.Transaction Success
.transaction.result
equals true
.transaction.total_price
is greater_or_equal to 0
via Add Expresion.Transaction Fail
.transaction.result
equals false
.transaction.total_price
expression from above.Save
.Drop Buffer
processor from Step 1 to the Route processor you just created.Unmatched
route untouched for this workshop. But there are many things that could be done with this data: send to Log Analytics, send to a SIEM, etc.Step 4: Encrypt the Credit Card Information
.transaction.cc.cc_number
.transaction.cc.cc_exp
.transaction.cc.cc_cvv
.transaction.cc.cc_zip
.transaction.cc.cc_name
Encrypt Field
processor (docs)AES-256-CFB
algorithm with a 32 character Encryption Key
(checkout AllKeysGenerator.comto generate each key)Initialization Vector
and name it whatever you like. Note that every encryption processor needs to add a key like this to the event itself for decryption
down the road.Save
.5 - Connect to S3 and Deploy
Step 1: Add S3 Financial Destination
AWS Access Key ID
AWS Secret Access Key
AWS Bucket Name
AWS Region
IAM User
associated with that Access Key ID
needs to have at least the PutObject
privlege in AWS.Bucket
with that Name
in AWS yourself.Destination
and select AWS S3
(docs)S3 Fin Transactions
Access Key ID
Secret Access Key
Bucket
name (we will go with mezmo-pipeline-financial-transactions
)JSON
for the Encoding
Region
(we will go with us-east-1
Save
when yours looks similar to the image below.Step 2: Add S3 General Destination
unmatched
to the teams general S3 bucket.mezmo-pipeline-financial-all
) and a new name of S3 General
.Destination
to the same final Encryption Processor from Step 1 as well as the Unmatched Route
from earlier. You should end up with something like thisStep 3: Deploy
Deploy pipeline
in the top right. After the Pipeline should no longer be a draft (if you hadn’t deployed earlier) and look like thisS3 Fin Transactions
, you should begin seeing files like so6 - Next Steps
Recap
Learn More