This article explains how retrieve and process Data Extension records in Script Activity, using server-side JavaScript.
Use case
We are all familiar with the Query Activities in the Marketing Cloud Automations: they are perfectly optimized for retrieving and processing large amounts of data, coming from a Data Extension.
But sometimes, they are simply not enough!
As a matter of fact, we often find ourselves restricted by the limitations of the SQL language variation used in Marketing Cloud.
What if we wanted to process the data using an API call or an advanced regular expression?
That’s where the Script Activities come into play!
They are certainly more complex and slower to run, but they can work wonders when it comes to advanced data processing, thanks to the broad functionality of the JavaScript language.
Before we start
Before we can start, there are a couple of things everybody should know about the Script Activities in the Automation Studio.
- If a Script Activity runs for more than 30 minutes, the Automation is stopped and returns an error.
- When a Script Activity fails, there is no error log available to us out-of-the-box.
Now that we are aware of these issues, we can address them later on in the code.
How does it work
In order to process the records in a Data Extension, we’ll need to perform the following actions:
- Retrieve all the records from a Data Extension.
- Process the records.
- Update all the processed records in the same Data Extension.
Simple, isn’t it? But nearly not enough!
Here is what the full list of actions looks like, if we are going to build a successful processing script:
- Retrieve the Data Extension attributes, including the data types of different fields.
- Retrieve all the records from a Data Extension (filtered or not).
- Format the values of the records according to their respective data types.
- Process the records.
- Create batches from the processed records for an optimized update action.
- Perform a batch update of all the processed records in the same Data Extension.
- Capture the errors in a dedicated Data Extension and stop the Automation.
Creating the process
In order to understand how the script is structured, please have a look at the following code:
<script runat="server">
Platform.Load("core", "1");
var api = new Script.Util.WSProxy();
//////////////////////////////
//// CONFIGURATION OBJECT
//////////////////////////////
try {
//////////////////////////////
//// PERFORM ACTIONS
//////////////////////////////
} catch(err) {
//////////////////////////////
//// WRITE IN ERROR LOG
//////////////////////////////
}
//////////////////////////////
//// FUNCTIONS & HELPERS
//////////////////////////////
</script>
Setup: Configuration object.
The configuration object is the heart of our script! It holds the names of the Data Extensions, the filter object for the data retrieval, as well as the function that performs the data processing.
var config = {
de: "Customers",
filter: {
Property: "PostalCode",
SimpleOperator: "equals",
Value: "1000"
},
error: {
de: "Automation ErrorLog",
source: "Script Activity #1"
},
process: function(records) {
if(records.length === 0) throw "No records to process.";
for(var k in records) {
var record = records[k];
//////////////////////////////
//// YOUR LOGIC
//////////////////////////////
record.Purchases = 1; // for example
record.Active = false; // for example
}
return records;
}
}
In this example, we are going to retrieve the records from a Data Extension named Customers, where the field PostalCode is equal to 1000.
All the errors will be written in a Data Extension named Automation ErrorLog, tagged as Script Activity #1.
The data process function will hold whatever logic we need to implement: an API call, data refactoring, etc. But in this case, let’s do something simple, like assigning a value to 2 Data Extension fields: Purchases and Active.
Step #1: Retrieve the Data Extension attributes.
The biggest question of the first step is: why do we bother?
The answer is simple: because Marketing Cloud retrieves all the data as a text value and in order to perform a proper data processing, we need the data in the correct format and as a correct data type.
function retrieveAttributes(config) {
var req = DataExtension.Retrieve(
{
Property: "Name",
SimpleOperator: "equals",
Value: config.de
}
);
if(req[0].CustomerKey != null) {
var customerKey = req[0].CustomerKey;
} else {
throw "The DataExtension \"" + config.de + "\" could not be found!";
}
var de = DataExtension.Init(customerKey);
var fields = de.Fields.Retrieve();
fields.sort(function (a, b) {
return (a.Ordinal > b.Ordinal) ? 1 : -1
});
var columns = [],
dates = [],
bools = [],
nums = [],
decs = [],
prims = [];
for (var k in fields) {
var field = fields[k];
columns.push(field.Name);
if(field.FieldType == "Date") dates.push(field.Name);
if(field.FieldType == "Number") nums.push(field.Name);
if(field.FieldType == "Boolean") bools.push(field.Name);
if(field.FieldType == "Decimal") decs.push(field.Name);
if(field.IsPrimaryKey == true) prims.push(field.Name);
}
return {
name: config.de,
customerKey: customerKey,
fields: fields,
columns: {
all: columns,
dates: dates,
bools: bools,
nums: nums,
decs: decs,
prims: prims
}
};
}
The function returns an object with the Name and CustomerKey of the Data Extension, as well as several lists of the Data Extension fields, broken down in data types (dates, booleans, numbers, decimals) and other parameters (all, primary key).
Please note that when retrieved, the fields are out of order. We can fix that by sorting the fields according to the Ordinal property.
Step #2: Retrieve all the records from a Data Extension.
By default, every retrieval request in AMPscript and server-side JavaScript is restricted to 2500 records.
The good news: we can lift that limitation by using WSProxy and a while loop!
function retrieveRecords(key, cols, filter) {
var now = new Date(),
start = now.getTime(),
timeOut = (60000 * 15); // 15 minutes
var records = [],
moreData = true,
reqID = data = null;
while (moreData && (new Date().getTime() - start) < timeOut) {
moreData = false;
if (reqID == null) {
data = api.retrieve("DataExtensionObject[" + key + "]", cols, filter);
} else {
data = api.getNextBatch("DataExtensionObject[" + key + "]", reqID);
}
if (data != null) {
moreData = data.HasMoreRows;
reqID = data.RequestID;
for (var i = 0; i < data.Results.length; i++) {
var props = data.Results[i].Properties;
records.push(wsUnpack(props));
}
}
}
if(records.length === 0) throw "No records could be found!";
return records;
}
The only problem is, if the Data Extension holds more than 100.000 records, it is highly likely that the Script Activity times out after 30 minutes of processing.
Again, this is something we can bypass, by adding a time limit of 15 minutes for this particular retrieval process.
Why 15? Well, if it takes 15 minutes to retrieve the data, it’s very likely that it will also take 15 minutes to update it as well, making it 30 minutes in total.
If your processing function takes time, feel free to decrease this 15 minutes limit in order to get yourself more room for action.
Please note that in this piece of code, we are using a function called wsUnpack, which is a custom-made function that compiles the Name/Value list pairing rendered by WSProxy into a regular JavaScript object.
function wsUnpack(props) {
var out = {};
for(var k in props) {
var item = props[k];
out[item.Name] = item.Value;
}
return out;
}
Step #3: Format the values of the records according to their respective data types.
Assigning the correct format and data type is key for a successful data processing.
function formatRecords(records, columns) {
for(var k in records) {
var record = records[k];
for(var name in record) {
if(record[name].length > 0) {
if(inArray(columns.dates, name)) record[name] = new Date(record[name]);
if(inArray(columns.nums, name)) record[name] = Number(record[name]);
if(inArray(columns.decs, name)) record[name] = Number(record[name]);
if(inArray(columns.bools, name)) record[name] = (record[name].toLowerCase() == "true") ? true : false;
}
}
}
return records;
}
Please note, that the code uses inArray function to check if a particular value is present in a particular list.
function inArray(arr, k) {
var out = false;
for (var i in arr) {
if (arr[i] == k) out = true;
}
return out;
}
This function is only needed because the Array.includes() prototype is not supported by the current version of SSJS.
Step #4: Process the data.
As discussed previously, the processing function is located in the configuration object. Feel free to add your own logic.
process: function(records) {
if(records.length === 0) throw "No records to process.";
for(var k in records) {
var record = records[k];
//////////////////////////////
//// YOUR LOGIC
//////////////////////////////
record.Purchases = 1; // for example
record.Active = false; // for example
}
return records;
}
Step #5: Create batches from the processed records.
In order to update all the records in a Data Extension, we have 2 options: either we create a loop and perform 1 update request per record, or we perform 1 single request, but we organize all the records in 1 big object.
The second option is obviously better, especially when considering the performance of processing thousands of records.
function createBatches(key, records) {
var batches = [];
for (var k in records) {
batches.push({
CustomerKey: key,
Properties: wsPack(records[k])
});
}
return batches;
}
Note that we use a new function called wsPack. This function splits the object in Name/Value pairs used by WSProxy to write or read data.
function wsPack(obj) {
var out = [];
for (k in obj) {
out.push({
Name: k,
Value: obj[k]
});
}
return out;
}
Step #6: Perform a batch update of all the processed records in the same Data Extension.
Nothing fancy, we already have the batches in a big JavaScript object.
function updateRecords(batches) {
var options = {
SaveOptions: [
{
PropertyName: "*",
SaveAction: "UpdateAdd"
}
]
};
return api.updateBatch("DataExtensionObject", batches, options);
}
Step #7: Capture the errors in a dedicated Data Extension and stop the Automation.
As previously stated, Script Activities don’t have an extended error log. Therefore, in order to be able to debug our script, we’ll need to write the errors in a dedicated Data Extension and before stopping the Automation.
Creating a new record is simple enough, but how do we stop an Automation?
Simple!!! All we have to do is to trigger the Error 500 by calling an unexisting function called createAutomationError in the catch portion of the try…catch function.
function logError(err, config) {
var props = {
Id: GUID(),
Message: Stringify(err.message),
Description: Stringify(err.description),
Source: config.error.source,
CreatedDate: DateTime.SystemDateToLocalDate(Now())
}
err.log = api.createItem("DataExtensionObject", {
Name: config.error.de,
Properties: wsPack(props)
});
createAutomationError();
}
Conclusion and full code
Now that we had a look at every piece of the puzzle, let’s have a look at the whole picture.
The Script Activity here below will retrieve all records from a Data Extension named Customers where the field PostalCode equals 1000.
It will then process every record and set the field Purchases to 1 and Active to false.
Then it will update all the records in the Data Extension.
If the Data Extension holds more than 100.000 records, several runs of this Script Activity will be needed to process all the data, because of the 30 minute time-out limitation set by Marketing Cloud.
In case of errors, the Automation will be stopped and the error will be written in the Data Extension called Automation ErrorLog. According to the notification settings, an email about the error will be sent to person responsible for the Automation.
<script runat="server">
Platform.Load("core", "1");
var api = new Script.Util.WSProxy();
var config = {
de: "Customers",
filter: {
Property: "PostalCode",
SimpleOperator: "equals",
Value: "1000"
},
error: {
de: "Automation ErrorLog",
source: "Script Activity #1"
},
process: function(records) {
if(records.length === 0) throw "No records to process.";
for(var k in records) {
var record = records[k];
record.Purchases = 1;
record.Active = false;
}
return records;
}
}
try {
var result = retrieveAttributes(config);
result.records = retrieveRecords(result.customerKey, result.columns.all, config.filter);
result.records = formatRecords(result.records, result.columns);
result.processed = config.process(result.records);
result.batches = createBatches(result.customerKey, result.processed);
result.update = updateRecords(result.batches);
if(result.update.Status == "OK") {
Write(Stringify(result));
} else {
throw result.update;
}
} catch(err) {
logError(err, config);
}
function logError(err, config) {
var props = {
Id: GUID(),
Message: Stringify(err.message),
Description: Stringify(err.description),
Source: config.error.source,
CreatedDate: DateTime.SystemDateToLocalDate(Now())
}
err.log = api.createItem("DataExtensionObject", {
Name: config.error.de,
Properties: wsPack(props)
});
createAutomationError();
}
function updateRecords(batches) {
var options = {
SaveOptions: [
{
PropertyName: "*",
SaveAction: "UpdateAdd"
}
]
};
return api.updateBatch("DataExtensionObject", batches, options);
}
function createBatches(key, records) {
var batches = [];
for (var k in records) {
batches.push({
CustomerKey: key,
Properties: wsPack(records[k])
});
}
return batches;
}
function formatRecords(records, columns) {
for(var k in records) {
var record = records[k];
for(var name in record) {
if(record[name].length > 0) {
if(inArray(columns.dates, name)) record[name] = new Date(record[name]);
if(inArray(columns.nums, name)) record[name] = Number(record[name]);
if(inArray(columns.decs, name)) record[name] = Number(record[name]);
if(inArray(columns.bools, name)) record[name] = (record[name].toLowerCase() == "true") ? true : false;
}
}
}
return records;
}
function retrieveRecords(key, cols, filter) {
var now = new Date(),
start = now.getTime(),
timeOut = (60000 * 15); // 15 minutes
var records = [],
moreData = true,
reqID = data = null;
while (moreData && (new Date().getTime() - start) < timeOut) {
moreData = false;
if (reqID == null) {
data = api.retrieve("DataExtensionObject[" + key + "]", cols, filter);
} else {
data = api.getNextBatch("DataExtensionObject[" + key + "]", reqID);
}
if (data != null) {
moreData = data.HasMoreRows;
reqID = data.RequestID;
for (var i = 0; i < data.Results.length; i++) {
var props = data.Results[i].Properties;
records.push(wsUnpack(props));
}
}
}
if(records.length === 0) throw "No records could be found!";
return records;
}
function retrieveAttributes(config) {
var req = DataExtension.Retrieve({
Property: "Name",
SimpleOperator: "equals",
Value: config.de
});
if(req[0].CustomerKey != null) {
var customerKey = req[0].CustomerKey;
} else {
throw "The DataExtension \"" + config.de + "\" could not be found!";
}
var de = DataExtension.Init(customerKey);
var fields = de.Fields.Retrieve();
fields.sort(function (a, b) {
return (a.Ordinal > b.Ordinal) ? 1 : -1
});
var columns = [],
dates = [],
bools = [],
nums = [],
decs = [],
prims = [];
for (var k in fields) {
var field = fields[k];
columns.push(field.Name);
if(field.FieldType == "Date") dates.push(field.Name);
if(field.FieldType == "Number") nums.push(field.Name);
if(field.FieldType == "Boolean") bools.push(field.Name);
if(field.FieldType == "Decimal") decs.push(field.Name);
if(field.IsPrimaryKey == true) prims.push(field.Name);
}
return {
name: config.de,
customerKey: customerKey,
fields: fields,
columns: {
all: columns,
dates: dates,
bools: bools,
nums: nums,
decs: decs,
prims: prims
}
};
}
function inArray(arr, k) {
var out = false;
for (var i in arr) {
if (arr[i] == k) out = true;
}
return out;
}
function wsPack(obj) {
var out = [];
for (k in obj) {
out.push({
Name: k,
Value: obj[k]
});
}
return out;
}
function wsUnpack(props) {
var out = {};
for(var k in props) {
var item = props[k];
out[item.Name] = item.Value;
}
return out;
}
</script>
Have I missed anything?
Please poke me with a sharp comment below or use the contact form.
Hello Ivan!
Nice article!
I’m trying to use here, but something went wrong. When I run without this “IF”, the automation run without error, but it didn’t update the DE
if(result.update.Status == “OK”) {
Write(Stringify(result));
} else {
throw “result.update”;
}
This is my logical treatment:
process: function(records) {
if(records.length === 0) throw “No records to process.”;
for(var k in records) {
var record = records[k];
var vcpf = base64decode(record.id);
record.flg = false;
record.cpf = vcpf;
}
return records;
}
Tks!
Hi Dennis,
If you want to know what happens, run the code from a Cloud page and remove quotes around
throw “result.update”
Hi Ivan,
Actually I did without quotes (I don’t know why I posted with quotes…) and it still doesn’t work. I tried debugging in Cloud Pages and returns an error.
Tks!
Hi Ivan,
Thanks for sharing.
I was wondering if you would limit the number of records for updateBatch function when handling a large number of records in the Data Extension.
Regards,
Ash
I usually limit to 100000, as a Script Activity can only run for 30 min. In my tests I can only process from 100-120k records in 30 min.
Hi Ivan,
It seems that the retrieve(“DataExtensionObject[xxxx) – in the retrieveRecords function) is limited to the BU of execution of the code (because SFMC suffixes the “key” with the MID. Would you have a workaround or fix to process a DE that is in the shared items (parent-level BU)?
Hey! Have you tried prefixing the name of your DE with « ENT. »?