All articles
We've launched Data Streams!
Product updates
We've launched Data Streams!

We've launched Data Streams!

Today we are happy to announce that we’ve launched a possibility to collect all types of real-time events into your own Google BigQuery data warehouse using SegmentStream!

What is Data Streams

A Data Stream — is a universal way to collect any real-time data directly into your Google BigQuery data warehouse. This includes both client-side and server-side event data.

A few sample usages:

  • Automatically intercept and duplicate all Google Analytics hits to SegmentStream that allows collecting raw hit-level data into your Google BigQuery without any limits and sampling.
  • Track and analyse AMP pages using non-script tracking pixels.
  • Track views of your media ads by embedding a tracking pixel that collects data about ad views directly into SegmentStream and Google BigQuery.
  • Collect CRM/ERP/ESP webhooks data from 3rd party systems.
  • Send real-time server-side events into Google BigQuery.

How Data Streams work

In a nutshell, a Data Stream is a unique SegmentStream endpoint where you can send absolutely any data. All endpoints have the following format:

https://track.segmentstream.com/ds/<DATA_STREAM_ID>

Where <DATA_STREAM_ID> is a unique ID of your data stream connected to the specific BigQuery table.

You can send data into this endpoint in any format using:

  1. Simple HTTP request:
    curl -d "event=Lead&value=3500&currency=USD" \
    https://track.segmentstream.com/ds/<DATA_STREAM_ID>
    
  2. HTTP request with JSON body:

    curl -H "Content-Type: application/json" -X \ 
    POST -d '{"event":"Lead","value":3500,"currency":"USD"}' \
    https://track.segmentstream.com/ds/<DATA_STREAM_ID>
    
  3. Tracking pixel (for AMP pages or ad impressions tracking):

    <img height="1" width="1" style="display:none;" src="https://track.segmentstream.com/ds/<DATA_STREAM_ID>?event=ad_view&placement=DV360&type=banner" />
    
  4. JavaScript:

    jQuery.post("https://track.segmentstream.com/ds/<DATA_STREAM_ID>", { 
      event: "Lead",
      value: 3500,
      currency: "USD"
    });
    

Regardless of the method you choose to send the data to the Data Stream endpoint, SegmentStream guarantees that this data will be successfully recorded into the Google BigQuery table specified for this Data Stream.

Creating a new Data Stream

  1. In your SegmentStream account, you can create a new “Data Stream” which will eventually create a unique API endpoint where you’ll be able to send your data.
  2. Depending on the Data Stream type you will be provided with additional instructions on how to properly set up tracking code, tracking pixel, or even server-side API request.
  3. Use Snippet code on your website (Google Analytics custom task on a screenshot), mobile app or server depending on the requirements.

You can find more information about Data Streams setup in the documentation. Contact our support team if you need any help with setting this up!

What is so cool about Data Streams

The main advantage of Data Streams — is the possibility to move from the ETL approach to the ELT approach. What is the difference?

  • With ETL (Extract-Transform-Load) you have to design your data schema in advance before the data collection. Data transformation happens “on the fly” with the predefined logic. If data is sent in unexpected format - the whole ETL will fail until you fix the data collection. This makes the ETL approach a very inflexible method of data collection and also makes it hard to migrate from one schema to another.
  • With the ELT approach (Extract-Load-Transform) there are no data transformations happening on the server. All the data goes directly to the data warehouse in a raw format and only afterwards it gets processed.

An example for the raw data:

hit_id payload datetime ip
31a94989-87e0-44fe-873d-9537e9665715 {event:"lead","value":3500,"currency":"USD"} 2020-08-17 22:00:00UTC 216.27.61.137
a12a64b0-70a2-450e-bb20-29dc2b841674 {event:"pageview","url":"https://segmentstream.com/blog/article"} 2020-08-17 22:01:00UTC 203.21.61.100
a12a64b0-70a2-450e-bb20-29dc2b841674 {event:"ad_impression",url:"https://publisher.com"} 2020-08-17 21:13:00UTC 216.27.61.137

An example of how processed table might look like in your Google BigQuery:

hit_id event value currency url datetime ip
31a94989-87e0-44fe-873d-9537e9665715 lead 3500 USD https://segmentstream.com/blog/article 2020-08-17 22:00:00UTC 216.27.61.137
a12a64b0-70a2-450e-bb20-29dc2b841674 pageview NULL NULL NULL 2020-08-17 22:01:00UTC 203.21.61.100
a12a64b0-70a2-450e-bb20-29dc2b841674 ad_impression NULL NULL https://publisher.com 2020-08-17 21:13:00UTC 216.27.61.137

This makes the ELT approach far more flexible and convenient and also gives much more flexibility if you decide to change data extraction provider or data source you are currently using.

Of course, the are some cases where ETL might be a better choice than ELT. But the most common reason why a lot of companies are still using ETL is simply a matter of time and technology development.

ETL was once a popular solution because of the high costs of on-premises storage and computation. With the fast growth of cloud-based solutions and the very affordable cost of cloud-based computation and storage, there are very few reasons to continue using ETL over ELT.

We see that more and more companies are moving forward from ETL to the modern ELT data collection process and will be happy to support you in moving to the ELT approach.

What is the difference between SegmentStream Data Sources and Data Streams

  • Data Sources are all about batched data import that usually happens once a day. For example, connecting to Facebook API, importing the advertising cost data for the entire previous day, and loading it into Google BigQuery.
  • Data Streams are all about real-time data collection. With Data Streams you’ll be able to access all the events in your Google BigQuery once the particular event happened.

What is the future of Data Streams

Besides adding more Data Streams types in the SegmentStream platform, we want to make sure that data is not only easy to collect but also easy to use.

To achieve this we will prepare a collection of turn-key data transformations that will help to transform collected hit-level data into different schemas inside your Google BigQuery.

We hope you’ll find Data Streams very useful for your business. Don’t hesitate to contact us if you have any questions, comments, or ideas for further improvement of this feature!

You might also be interested in

More articles

Get started with SegmentStream

Request a personalized demo with our team or try SegmentStream for free today!