Jump to content

How to use expression function over subset of data ?


Manon Carvalho 2
Go to solution Solved by Manon Carvalho 2,

Recommended Posts

Hello,

I have a dataset with several Groups, several Parameters. Each row has a Value which is associated to a Time. I would like to normalize this dataset using the values of one of the parameters (A in the below example). However I do not have exact match in the time, so I would like to normalize each value using the value of the closest time inside of each group.

image.png.07b0d606440e8dffc92d48bd34c79f7e.png

As I could not find how to do this with calculated columns using regular Spotfire custom expressions, I tried using a custom expression function with TERR.

As a first step I created a calculated column Time from A:

If([Parameter]="A",[Time]) 

Then I created an expression function:

Closest = function(col_to_cut,col_with_cuts){
	col_with_cuts <- sort(col_with_cuts)
	cuts <- c(-Inf, col_with_cuts[-1]-diff(col_with_cuts)/2, Inf)
	cuts <- sort(cuts[!is.na(cuts)])
	index = findInterval(col_to_cut, cuts)
	return(col_with_cuts[index])
}
output <- Closest(input1,input2)

And I inserted a calculated column calling this function:

Closest([Time],[Time from A])

But this is only working for the whole dataset (Group 1 and 2 together, see Wrong intermediate result in the image above).

Is it possible to apply an expression function over groups of data using something like Closest([Time],[Time from A]) OVER ([Group]) ?

Thank you

Link to comment
Share on other sites

Hi David,

Yes I tried the expression Closest([Time],[Time from A]) OVER ([Group]) but "OVER expressions can only be used with aggregation methods". And if I use an aggregation method then I will have the same value everywhere within each group, which is not what I want. For example with Avg(Closest([Time],[Time from A])) OVER ([Group]) I have 11 for all rows of Group 1 and 8 for all rows of Group 2.

Attached is the dxp file containing the data table with calculations (which are wrong) and the data table with expected results. 

Thank you

Normalization Expression Function.dxp

Link to comment
Share on other sites

Hi Manon,

Yes, indeed. The OVER requires an aggregation method.
I'm not sure how big your actual data set would be, but a workaround could be to calculate for each occurrence of Group/Parameter A, a calculated column using CASE WHEN THEN statements. In my screenshot, these are highlighted (Case  when ([Group]=1) AND ([Parameter]="A") THEN [Time] END):
image.thumb.png.ca186b1ecc84fbccced6f5d8b024cf07.png

Once you have that, you can calculate the 'Closest time..' columns for each of the 'Time from..' columns: Closest([Time],[Time from 1A]).
And finally, the Result calculated column brings it all together with another CASE statement:

CASE  WHEN [Group]=1 THEN [Closest time from 1A]
WHEN [Group]=2 THEN [Closest time from 2A]
END

That will bring you the requested values in the Result column.
The disadvantage is that you may end up with quite some calculated columns and a large CASE statement.

I'll try to find a better, more flexible solution and ask some of my colleagues as well.

Kind regards,

David

Link to comment
Share on other sites

Hi Manon,

The below calculated column eliminates the need for the two 'Closest time from..' columns.

For your result column, you can also use this expression:
CASE  WHEN [Group]=1 THEN Closest([Time],[Time from 1A])
WHEN [Group]=2 THEN Closest([Time],[Time from 2A]) END

That makes it already a bit more flexible to use for larger data sets.

Kind regards,

David

Link to comment
Share on other sites

Hi David, 

Sadly this is not a solution for me because my actual dataset is much bigger than this example one, and above all, it is loaded on-demand and so dynamic (as well as the parameter which is used as reference, but for this I will use a document property). So I have a dataset for which I don't know in advance how many groups I will get, and so how many calculated columns I would need.

Thank you for checking

Link to comment
Share on other sites

Hi Manon,

You may use a different approach to calculate the nearest (closest) time value.
Currently, you've used a custom expression function to calculate the closest time. But what if you use a 'normal' (Python) data function to calculate this? And store the results in a data table (output parameter)?

Would that be an option to consider? Or are you able to alter your current custom expression function in such a way, that it takes into account the Group and Parameter columns?

Kind regards,

David

Edited by David Boot-Olazabal
Link to comment
Share on other sites

  • Solution

Hi David, 

The second option is the one which fits the best with my needs. I modified the custom expression function to take groups into account:

Closest = function(col_to_cut,col_with_cuts){
    col_with_cuts <- sort(col_with_cuts)
    cuts <- c(-Inf, col_with_cuts[-1]-diff(col_with_cuts)/2, Inf)
    cuts <- sort(cuts[!is.na(cuts)])
    index = findInterval(col_to_cut, cuts)
    return(col_with_cuts[index])
}

ClosestInGroup = function(col_to_cut,col_with_cuts,col_to_group){
    indices <- c()
    result <- c()
    for(group in unique(col_to_group)) {
        indexes <- which(col_to_group %in% group)
        closest <- Closest(col_to_cut[indexes],col_with_cuts[indexes])
        indices <- append(indices,indexes)
        result <- append(result, closest)
    }
    df <- data.frame(result, indices)
    df <- df[with(df, order(indices)),]
    return(df$result)
}
output<-ClosestInGroup(input1,input2,input3)

Then I have a first calculated column Closest Time from A:

ClosestInGroup([Time],If([Parameter]="A",[Time]),[Group])

A second calculated column Value of the Closest Time from A:

Avg(If([Parameter]="A",[Value])) over ([Closest Time from A],[Group])

And a final calculated column Normalized Value:

[Value] / [Value of the Closest Time from A]

This works as expected.

Thank you for your support.

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...