How to retrieve more than 2500 records from a Data Extension with WSProxy in server-side JavaScript

This article explains how to retrieve more than 2500 records from a Data Extension with WSProxy object using server-side Javascript in Salesforce Marketing Cloud.

What’s the point?

Retrieving records from a Data Extension is something that every Marketing Cloud developer is accustomed to.

Here is an example of how to do it with a Core function:

<script runat="server">
	Platform.Load("core", "1.1.1");
	var DE = DataExtension.Init("0000-1111-2222-4444-6666");
	var records = DE.Rows.Retrieve();	
</script>

But when it comes to retrieval requests, there is a limitation of 2500 records per request which regular Core functions cannot bypass.

The question is: how do we retrieve all the records from a Data Extension that has around 10.000 records?

In most situations, an SQL query in an Automation activity is the way to go, but what if it’s not enough?

What if our goal is to perform a series complex operations on a huge amount of data retrieved from a Data Extension?

And what if we want it super mega fast?

That’s where WSProxy comes in!

Before we start

At first, we need a Data Extension with more than 2500 valid records.

Please consider using a Data Extension with fake random records for the purpose of this exercise.

The field names don’t matter, but let’s say it has at least SubscriberKey and EmailAddress as two main fields and one of them is a Primary key.

Retrieve records with WSProxy

In order to use the WSProxy object for data retrieval, we need the name of the Data Extension and the names of the fields we want to retrieve.

<script runat="server">
	Platform.Load("core", "1.1.1");
	var config = {
		name: "MyDataExtension",
		cols: ["SubscriberKey", "EmailAddress"]
	}
	var prox = new Script.Util.WSProxy();
	var data = prox.retrieve("DataExtensionObject[" + config.name + "]", config.cols);
</script>

This method retrieves the first page (in other words, the first 2500 results) from the Data Extension. So far, we could have used the Core function instead.

Retrieve filtered records

As in the Core function, we can apply a filter to our request, but there are 2 crucial differences:

WSProxy is waaaaaaay faster than a Core function.
The syntax for WSProxy is similar, but not the same.

For exemple, when in a Core function, this filter works perfectly:

Property: "SubscriberKey",
SimpleOperator: "isNotNull"

It will not work at all for the WSProxy object. This is the syntax to use:

Property: "SubscriberKey",
SimpleOperator: "isNotNull",
Value: " "

Why is it this way is beyond me, but let’s not get carried away and build our simple filtered request:

<script runat="server">
	Platform.Load("core", "1.1.1");
	var config = {
		name: "MyDataExtension",
		cols: ["SubscriberKey", "EmailAddress"],
    	filter: {
        	Property: "SubscriberKey",
        	SimpleOperator: "isNotNull",
        	Value: " "
    	}
	}
	var prox = new Script.Util.WSProxy();
	var data = prox.retrieve("DataExtensionObject[" + config.name + "]", config.cols, config.filter);
</script>

Again, nothing fancy, same result can be achieved with a Core function so far.

Format the results

If everything went according to plan, the WSProxy returned an array of objects devided into the Key/Value pairs. This is not an easy object to work with.

Let’s build a function that compiles all the object into one single simple object for future use:

function formatResult(data) {
    var results = [];
    for (var i = 0; i < data.Results.length; i++) {
      var result_list = data.Results[i].Properties;
      var obj = {};
      for (k in result_list) {
        var key = result_list[k].Name;
        var val = result_list[k].Value
        if (key.indexOf("_") != 0) obj[key] = val;
      }
      results.push(obj);
    }
   return results;
}

Note that the function filters out all the results from fields that begin with an underscore, which are not valid fields.

Now let’s compile everything we’ve done so far in a single bit of code and use function to make the code more generic:

<script runat="server">

Platform.Load("core", "1.1.1");

var config = {
    name: "MyDataExtension",
    cols: ["SubscriberKey", "EmailAddress"],
    filter: {
        Property: "SubscriberKey",
        SimpleOperator: "isNotNull",
        Value: " "
    }
}

var records = retrieveRecords(config);

Write(Stringify(records));

function retrieveRecords(config) {

    var prox = new Script.Util.WSProxy();

    var data = prox.retrieve("DataExtensionObject[" + config.name + "]", config.cols, config.filter);
  
    var records = formatResult(data);
  
    return records;
}
	
function formatResult(data) {
    var results = [];
    for (var i = 0; i < data.Results.length; i++) {
      var result_list = data.Results[i].Properties;
      var obj = {};
      for (k in result_list) {
        var key = result_list[k].Name;
        var val = result_list[k].Value
        if (key.indexOf("_") != 0) obj[key] = val;
      }
      results.push(obj);
    }
   return results;
}  
</script>

Retrieve more records

If we inspect the result of the WSProxy request, there are 2 parameters that attract attention: RequestID and HasMoreRows.

{
   "Status":"OK",
   "RequestID":"22222222-3333-4444-5555-666666666666",
   "Results":[...],
   "HasMoreRows":true
}

While RequestID is self-explanatory, HasMoreRows indicates if there are still more records left in the Data Extension after our request.

Using this data, we can use another WSProxy function called getNextBatch(), which performs a retrieve request based on the last API request.

It works like a pagination mechanism and all we need to do is provide it with the object name (DataExtensionObject[MyDataExtension]) and the RequestID from the last retrieve.

Here is how it looks when we implement it within a While loop and add the formatting method to simplify the result object:

<script runat="server">

Platform.Load("core", "1.1.1");

var config = {
    name: "MyDataExtension",
    cols: ["SubscriberKey", "EmailAddress"],
    filter: {
        Property: "SubscriberKey",
        SimpleOperator: "isNotNull",
        Value: " "
    }
}

var records = retrieveRecords(config);

Write(Stringify(records));

function retrieveRecords(config) {

    var prox = new Script.Util.WSProxy();

    var records = [],
        moreData = true,
        reqID = data = null;

    while (moreData) {

        moreData = false;

        if (reqID == null) {
            data = prox.retrieve("DataExtensionObject[" + config.name + "]", config.cols, config.filter);
        } else {
            data = prox.getNextBatch("DataExtensionObject[" + config.name + "]", reqID);
        }

        if (data != null) {
            moreData = data.HasMoreRows;
            reqID = data.RequestID;
            for (var i = 0; i < data.Results.length; i++) {
                var result_list = data.Results[i].Properties;
                var obj = {};
                for (k in result_list) {
                    var key = result_list[k].Name;
                    var val = result_list[k].Value
                    if (key.indexOf("_") != 0) obj[key] = val;
                }
                records.push(obj);
            }
        }
    }
    return records;
}
</script>

From now on, there is no longer a 2500 records per request limitation, since a While loop continues to make retrieval operation as long as the HasMoreRows parameter of the request returns true.

Retrieve all available fields

Although the WSProxy method works as expected, there is still one flaw that we need to consider: it requires the names of the fields (column names or “cols”) from the Data Extension.

In order to bypass this flaw, please consider the following code:

function retrieveFieldNames(de_name) {

    var attr = DataExtension.Retrieve({ Property: "Name", SimpleOperator: "equals", Value: de_name });
    var de = DataExtension.Init(attr[0].CustomerKey);
    var fields = de.Fields.Retrieve();

    fields.sort(function (a, b) { return (a.Ordinal > b.Ordinal) ? 1 : -1 });

    var out = [];

    for (k in fields) {
        out = out.concat(fields[k].Name);
    }

    return out;
}

This function returns all the field names from a Data Extension in the order they were created and in a shape of a nice and simple Array.

That’s it! Let’s apply everything we learned and wrap it in a nice single function.

Full code

<script runat="server">

Platform.Load("core", "1.1.1");

var records = retrieveAllRecords("MyDataExtension");

Write(Stringify(records));

function retrieveFieldNames(name) {

    var attr = DataExtension.Retrieve({ Property: "Name", SimpleOperator: "equals", Value: name });

    var de = DataExtension.Init(attr[0].CustomerKey);

    var fields = de.Fields.Retrieve();

    fields.sort(function (a, b) { return (a.Ordinal > b.Ordinal) ? 1 : -1 });

    var out = [];

    for (k in fields) {
        out = out.concat(fields[k].Name);
    }

    return out;

}

function retrieveAllRecords(name) {

    var prox = new Script.Util.WSProxy();

    var cols = retrieveFieldNames(name);

    var config = {
        name: name,
        cols: cols
    }

    var records = [],
        moreData = true,
        reqID = data = null;

    while (moreData) {

        moreData = false;

        if (reqID == null) {
            data = prox.retrieve("DataExtensionObject[" + config.name + "]", config.cols);
        } else {
            data = prox.getNextBatch("DataExtensionObject[" + config.name + "]", reqID);
        }

        if (data != null) {
            moreData = data.HasMoreRows;
            reqID = data.RequestID;
            for (var i = 0; i < data.Results.length; i++) {
                var result_list = data.Results[i].Properties;
                var obj = {};
                for (k in result_list) {
                    var key = result_list[k].Name;
                    var val = result_list[k].Value
                    if (key.indexOf("_") != 0) obj[key] = val;
                }
                records.push(obj);
            }
        }
    }
    return records;
}
</script>

Have I missed anything?

Please poke me with a sharp comment below or use the contact form.