veni Posted July 6 Share Posted July 6 Hi! I currently want to split a data table to be used as training and validation datasets. I am thinking of splitting it 80% (training) 20% (validation) via a stratified split (if possible). The other part of the query is that I also want to perform Random Forest prediction on a categorical variable, but it seems to me that within Spotfire there isn't any as far as I know, only Logistic regression and Decision Tree. I have searched on Google for a while and still having some difficulty getting these 2 done. Would appreciate any help given Link to comment Share on other sites More sharing options...
David Boot-Olazabal Posted July 8 Share Posted July 8 (edited) Hi veni, Did you already had the chance to look at our data functions library? https://community.spotfire.com/files/category/8-data-functions/ You may find a couple of data functions that could help you out. And here is a direct link to a random forest data function: The data splitting part is kind of different. What do you expect the code in Spotfire would do for you? Just the splitting or also the whole pipeline? if you could elaborate a bit more on that, that would give us a better idea of what you're trying to achieve. Kind regards David Edited July 8 by David Boot-Olazabal Link to comment Share on other sites More sharing options...
Vincent Thuilot Posted July 10 Share Posted July 10 Hi Veni, In your place i would consider writing everything directly in Python from the start: you would have better control all all of the aspects of your algorithm, and would have the possibility to perform stratified split. Through data functions, you could use scikit learn (make sure to import it first) and use the train_test_split function as usual Something like this: 1 Link to comment Share on other sites More sharing options...
Gaia Paolini Posted July 10 Share Posted July 10 if you want to go down the Python route, there is a Python module called Spotfire DSML that can be downloaded and used within Spotfire. One of the modules is ml_modelling. It offers an end-to-end modelling process. Once you split your data, you need to do something with the training and test data. If you have missing data, or categorical variables, you need to handle those consistently in training and test data Once you have a model, you need to validate it with the test data and usual performance indicators. So it is advisable to be able to handle the entire process. Useful links: 2 Link to comment Share on other sites More sharing options...
veni Posted July 14 Author Share Posted July 14 On 7/8/2024 at 2:11 PM, David Boot-Olazabal said: Hi veni, Did you already had the chance to look at our data functions library? https://community.spotfire.com/files/category/8-data-functions/ You may find a couple of data functions that could help you out. And here is a direct link to a random forest data function: The data splitting part is kind of different. What do you expect the code in Spotfire would do for you? Just the splitting or also the whole pipeline? if you could elaborate a bit more on that, that would give us a better idea of what you're trying to achieve. Kind regards David Hi David, My dataset has a categorical variable "Diabetes", it notes whether a person has diabetes or not with "1" or "0". What I want to do is to do a stratified splitting of my dataset into the training and validation datasets Regards, Veni Link to comment Share on other sites More sharing options...
veni Posted July 14 Author Share Posted July 14 I also have some calculated columns in Spotfire, so I would want the splitting to retain those calculated columns Link to comment Share on other sites More sharing options...
Gaia Paolini Posted July 15 Share Posted July 15 You said you wanted to do modelling after splitting. Could you elaborate on what your goals are? Maybe post a sample dataset? Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now