Jump to content
  • Variable Selection Specification in Statistica Data Function


    This article will show how to define input parameters in Spotfire in order to be used for variable selection in Statistica data function.

    If you are not familiar with the concept of data functions and how to use Spotfire Statistica® (or simply Statistica) for that, please view this article. Generally, it is a concept of using a visual workflow environment in Statistica as parameterizable data function utilized by the Spotfire application. 

    Let us describe in more detail one concrete but extremely important parameter in this concept which is used for transferring variable selection.

     

    Statistica side

    We will take Feature Selection functionality as an example. We can have a simple workflow as below:
    workspace.jpg.995fca2732f7cade4a6c2010641e299a.jpg

    In the Statistica data function (using this workflow) we will want in Spotfire not only to define/assign input (the file which will be investigated) and results (output file with important predictors which will be retrieved after running the data function) but also we would like to parameterize Feature Selection node and more precisely Variable Selection parameter inside this node. This will allow us to define which variables will be investigated directly from the Spotfire application. Here you can see the dialog (Statistica side) we would like to parameterize:

    feature_selection.jpg.a5a87e1efe08a0aac3941f3788069ab2.jpg

    We can see in the dialog from Statistica 4 variable lists for this example (for some other methods there could be 1, 2, 3, or 4 variable lists). If we look in the User View of this workspace we will see that after choosing these variables, we have the following representation of 4 variable lists: |2|7 15|11-12| (simply 4 lists divided by pipe symbols, the first list is empty in this use case). This is a representation of the variable selection in one string format.

    In Statistica, you can reference variables by their number (like in the statement above or as variable names divided by spaces (if the space is in the variable name, we would need to have quotes inserted as well). So for our example, the variable selection string for chosen variables in equivalent to |2|7 15|11-12| will be |"Credit Rating"|"Amount of Credit" "Age"|"Marital Status" "Gender"|.

    In addition to these, we can reference variables as wild card selection in Statistica e.g. s* means all the variables starting with s (this is not usable in this example).

    Spotfire side

    So we need to define a custom expression in Spotfire which will create a string in exactly the needed form. But first, we would need 3 variable selections (Spotfire side) where we can interactively choose variables from all 3 lists in question. For that, we have 3 parameters as separate variable lists for a categorical dependent, continuous and categorical predictors. In the final application it will look this way:

    variable_selection_spotfire.jpg.f82609dd9ef1437e068cd080bd8c6780.jpg

    Under the hood, we are assigning variable lists or single variables in the case of the dependent variables to Spotfire's document property (in addition these multiple selections can be effectively limited according to conditions, so you can have really only categorical variables in categorical predictors list, etc.). Below is a screenshot of Categorical predictors list (together with the visible limitation formula Selectable Columns section):

    document_property_setting.jpg.51ad41d6f1b92694183211d1e19bc75d.jpg

    The last thing is to construct the custom expression. This custom expression will create string  |"Credit Rating"|"Amount of Credit" "Age"|"Marital Status" "Gender"| from 3 values of the document properties: CatDependent2 with value Credit RatingContinuousPredictors with value Amount of Credit, Age and CategoricalPredictors with the value  Marital Status, Gender. Variable lists have variable names separated by commas.

    Custom expression in the formula form will look this way: Concatenate('|"',"${CatDependent2}",'"|"',Substitute("${ContinuousPredictors}",",",'" "'), '"|"',Substitute("${CategoricalPredictors}",",",'" "'), '"|') This expression will have exactly the value of the parameter needed to be transferred to Statistica during run of the data function.

    Below is the assigning of the parameter in the data function Edit Parameters  dialog (Spotfire side):

    custom_expression.jpg.f31522297e6bb0f00e0f3fd0dd09aaea.jpg

    Output

    For this particular example, the output is a table with variable importance, more precisely in the workflow shown at the beginning, it is a Subset spreadsheet that we would like to bring back to Spotfire. Here is the definition inside Spotfire data function settings:

    output.jpg.6a8d665cd9bff111e80d5db3636bb348.jpg

    After running the defined data function, the following Subset node result will be brought back to Spotfire (screenshot below is Statistica resulting table):

    statistica_output.jpg.e66d854ab0b11d4258c1d77e18779dba.jpg

    Based on this new Spotfire table (Predictor Importance), users can create various visualizations which will change accordingly in case or run the data function again with a different setting.


    User Feedback

    Recommended Comments

    There are no comments to display.


×
×
  • Create New...