Peter Shaw Posted June 5, 2019 Share Posted June 5, 2019 I would like to do a PCA analysis on my data. Column 1 contains a class variable; I would like to use columns 2 through 14 in the PCA analysis. Once the PCA is finished, I'd like to see the resulting centered and rotated data, attached to the class variable (column 1) for further analysis. Using the PCA tool in Team Studio all I see is the rotation matrix displayed. It appears ther PCA tool can generate some output results that are placed into my collection of data - but the default name of this new table ("alp@user_id_@flow_id_pca_0_1") is not intuitive. Temporary data tables like these seem to become permanent fixtures that mingle with actual data in my data sources collection - and it can become difficult to remember where these came from, or if they are important Is there some way I can simply continue my analysis in Team Studio by connecting an output from the PCA tool into the downstream analysis Link to comment Share on other sites More sharing options...
Chia-Yui Lee Posted July 2, 2019 Share Posted July 2, 2019 Hi Peter, You can add a "Predictor" operator after the PCA operator to transform your original data. Note that there are two incoming links to the Predictor operator - one from your original dataset, the other from the PCA operator. The Predictor operator will calculate the transformation on the principal component axes. In my example, it produced the 3 transformed columns (y_0_PCA, ...) and appended them to the original data. In your case, your class variable in the original datasetwill be passed along to the Predictor operator. Chia-Yui LEE TIBCO Data Science Link to comment Share on other sites More sharing options...
Peter Shaw Posted July 11, 2019 Author Share Posted July 11, 2019 Sorry I'm being dense, but I am not able to attach an output from the PCA. The outbound arrow appears but it does not attach to the predict node. Is there some setting I need to address in the configuration of the PCA node The PCA is happily computing some output so that is not the issue. I've tried adjusting the carryover columns, no luck so far: Link to comment Share on other sites More sharing options...
Chia-Yui Lee Posted July 23, 2019 Share Posted July 23, 2019 Peter, We found the reason to this and I'll document it here for completeness. You couldn't join the PCA operator to the predictor because the workflow uses a database datasource. We were able to join the operators after switching to a Hadoop datasource. In general, functionalities available with Hadoop operators are richer. On the PCA operator for database, the product documentation has this paragraph: Output to Succeeding Operator: Database Stored database tables that can be accessed by other Operators The PCA Operatorfor the database is technically a "terminal" operator, meaning that no other Operator directly follows it in the workflow. However, thePCA Operatorstores itsPrincipal Component Results(andEigenvalue Outputdetails) in two database tables that can then be accessed as thedata source for a new workflow, if applicable. The following example shows the results of the databasePCA Operatorbeing saved aspcaOperatorResultsIrisandpcaOperatoreEigenOutputIris.The tables can be brought into the workflow and the derivedPrincipal Componentscan be fed into anAlpine Forest Operator,for example, and the classification results analyzed in theConfusion Matrixin orderto understand if the reduced set of Variables created by thePCA Operator provide an accurate enough model. You can try this approach if your data has to be on database. With Hadoop datasources, you won't need to access the intermediate tables yourself as the workflow would know where to pick themup. Chia-Yui LEE TIBCO Data Science Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now