Charles Randall 2 Posted June 22, 2022 Share Posted June 22, 2022 Hello, When importing data from a text file using IronPython using code such as this: readerSettings = TextDataReaderSettings() readerSettings.Separator = "t" # Our text file is tab-delimited readerSettings.AddColumnNameRow(0) readerSettings.AddIgnoreRow(1) # Ignore the bad data type row completely # create a data source to read in the file textDataSource = TextFileDataSource(dataPath, readerSettings) newTable = Document.Data.Tables.Add("My Table", textDataSource) tableSettings = DataTableSaveSettings(newTable, True, False) # UseLinkedData = True; = False tableSettings.UseLinkedData = True # To be safe Document.Data.SaveSettings.DataTableSettings.Add(tableSettings)is there a way to make Spotfire automatically "guess" the data types of columns (like when you open the "Import settings" window for a data source, then click "Refresh," and all the types get set automatically) Currently, every single column gets imported as a String, regardless of whether it is solely numeric or not. I know that specifying a TypeRow is the preferred method for setting DataTypes automatically, but the data I am working with unfortunately has unhelpful type rows (which, for example, list some - but not all - numeric columns as being of type "STRING"), and they are out of my control. Thank you very much, Charles Link to comment Share on other sites More sharing options...
Gaia Paolini Posted June 27, 2022 Share Posted June 27, 2022 I cannot find a way to do it on import, however it is possible to change the data types on a table after it's been created. See script below. This snippet checks for floating point. It is pretty basic, I hope you can adapt it to your needs and maybe not check all rows but a subset. --------------------------------------- from Spotfire.Dxp.Data.Transformations import ExpressionTransformation,ColumnSelection from Spotfire.Dxp.Data import * table = Document.Data.Tables['MyNewTableWithWrongDataTypes'] rowsToInclude = IndexSet(table.RowCount,True) t = ExpressionTransformation() #get the list of columns columns = table.Columns #iterate on columns, collect values, try to turn them into floating point. If they are not actually floating point, do nothing, otherwise change the column type. for cc in columns: ccname=cc.Name cursor = DataValueCursor.CreateFormatted(table.Columns[ccname]) values=[] for row in table.GetRows(rowsToInclude,cursor): values.append(cursor.CurrentValue) try: values = [float(x) for x in values ] t.ColumnReplacements.Add( ccname,'real(['+ccname+'])',ColumnSelection(ccname) ) except: print ('was not real') pass table.AddTransformation(t) Link to comment Share on other sites More sharing options...
Sundaram Sridharan Posted July 31, 2023 Share Posted July 31, 2023 this approach works great in my case - was able to even nest a try: / except: to check for Integer first and then for real. However whenever it encounters a column name that has spaces it sometimes throws an errorValueError: The column 'my column with spaces' does not exist in the collection. it only happens to one column in a file that has a bunch of other columns with spaces in their namesthe line of code which throws this error iscursor = DataValueCursor.CreateFormatted(table.Columns[ccname]) Link to comment Share on other sites More sharing options...
Gaia Paolini Posted August 1, 2023 Share Posted August 1, 2023 is there a way you can provide the dxp you are trying, with an anonymized dataset? If it does not happen with other columns with spaces in their names, it might be something else. Link to comment Share on other sites More sharing options...
Sundaram Sridharan Posted August 1, 2023 Share Posted August 1, 2023 it only happens to columns that have another related column - let's say you have two columns "Well" and "Well Type" - then it is able to process "Well" fine but complains that "Well Type" cannot be found in the collection - if you go around it and complete the code, it brings in a new column in table called "Well(2)" that has the same content as the "Well" column and "Well Type" is completely gone. If I comment out this piece of code, then the file imports fine with all columns intact. If it would help to share DXP then I can try to recreate a smaller version Link to comment Share on other sites More sharing options...
Gaia Paolini Posted August 1, 2023 Share Posted August 1, 2023 this is useful information. Can you send the code you are using now and I can have a look? Link to comment Share on other sites More sharing options...
Sundaram Sridharan Posted August 1, 2023 Share Posted August 1, 2023 here is part of code that imports files and try to change the data types#load the first file into the desired output table file0 = fileList[0] f_name0,f_ext = os.path.splitext(os.path.basename(file0)) ds = TextFileDataSource(file0, readerSettings) table_name="Imported Data" try: #if it exists, replace it table = Document.Data.Tables[table_name] table.ReplaceData(ds) except: #if not, create a new one table=Document.Data.Tables.Add(table_name, ds)#load the second file, and now specity an origin column, the new value, and the previous value if len(fileList)>1: file1 = fileList[1] f_name1,f_ext = os.path.splitext(os.path.basename(file1)) ds=TextFileDataSource(file1, readerSettings) settings = AddRowsSettings(table, ds, "File Name", f_name1,f_name0) table.AddRows(ds, settings)#load third etc. files if len(fileList)>2: for i in range(2,len(fileList)): file = fileList f_name,f_ext = os.path.splitext(os.path.basename(file)) ds=TextFileDataSource(file, readerSettings)#specify only the new value for the origin column settings = AddRowsSettings(table,ds,"File Name",f_name) table.AddRows(ds,settings)#change column types rowsToInclude = IndexSet(table.RowCount,True) t = ExpressionTransformation() #get the list of columns columns = table.Columns #iterate on columns, collect values, try to turn them into floating point. If they are not actually floating point, do nothing, otherwise change the column type. for cc in columns: ccname=cc.Name# print ccname try: cursor = DataValueCursor.CreateFormatted(table.Columns[ccname]) except: continue values=[] for row in table.GetRows(rowsToInclude,cursor): if row.Index < 100: values.append(cursor.CurrentValue) try: values = [int(x) for x in values ] t.ColumnReplacements.Add(ccname,'Integer(['+ccname+'])',ColumnSelection(ccname)) except: print ('was not Integer') try: values = [float(x) for x in values ] t.ColumnReplacements.Add(ccname,'Real(['+ccname+'])',ColumnSelection(ccname)) except: print ('was not Real') pass table.AddTransformation(t) Link to comment Share on other sites More sharing options...
Gaia Paolini Posted August 1, 2023 Share Posted August 1, 2023 Thanks, but the indentation is all gone. At the bottom of your editor window you see a symbol like </> which you can click on and enter the code there so it is not distorted by the editor. Please also add the library imports. Link to comment Share on other sites More sharing options...
Sundaram Sridharan Posted August 1, 2023 Share Posted August 1, 2023 import clr import sys import re import os #import chardet import io clr.AddReference("System.Windows.Forms") clr.AddReference('System') from System.IO import FileStream,FileMode, File,MemoryStream,SeekOrigin,StreamWriter,StringReader,StreamReader from System.Windows.Forms import OpenFileDialog from Spotfire.Dxp.Data.DataOperations import DataOperation import System.String from Spotfire.Dxp.Data import * from Spotfire.Dxp.Data.Import import * from Spotfire.Dxp.Framework.ApplicationModel import NotificationService from Spotfire.Dxp.Application.Visuals import VisualContent from System.Collections.Generic import List from Spotfire.Dxp.Data.Transformations import ExpressionTransformation,ColumnSelection #load the first file into the desired output table file0 = fileList[0] f_name0,f_ext = os.path.splitext(os.path.basename(file0)) ds = TextFileDataSource(file0, readerSettings) table_name="Imported Data" try: #if it exists, replace it table = Document.Data.Tables[table_name] table.ReplaceData(ds) except: #if not, create a new one table=Document.Data.Tables.Add(table_name, ds) #load the second file, and now specity an origin column, the new value, and the previous value if len(fileList)>1: file1 = fileList[1] f_name1,f_ext = os.path.splitext(os.path.basename(file1)) ds=TextFileDataSource(file1, readerSettings) settings = AddRowsSettings(table, ds, "File Name", f_name1,f_name0) table.AddRows(ds, settings) #load third etc. files if len(fileList)>2: for i in range(2,len(fileList)): file = fileList[i] f_name,f_ext = os.path.splitext(os.path.basename(file)) ds=TextFileDataSource(file, readerSettings) #specify only the new value for the origin column settings = AddRowsSettings(table,ds,"File Name",f_name) table.AddRows(ds,settings) #change column types rowsToInclude = IndexSet(table.RowCount,True) t = ExpressionTransformation() #get the list of columns columns = table.Columns #iterate on columns, collect values, try to turn them into floating point. If they are not actually floating point, do nothing, otherwise change the column type. for cc in columns: ccname=cc.Name # print ccname try: cursor = DataValueCursor.CreateFormatted(table.Columns[ccname]) except: continue values=[] for row in table.GetRows(rowsToInclude,cursor): if row.Index < 100: values.append(cursor.CurrentValue) try: values = [int(x) for x in values ] t.ColumnReplacements.Add(ccname,'Integer(['+ccname+'])',ColumnSelection(ccname)) except: print ('was not Integer') try: values = [float(x) for x in values ] t.ColumnReplacements.Add(ccname,'Real(['+ccname+'])',ColumnSelection(ccname)) except: print ('was not Real') pass table.AddTransformation(t) Link to comment Share on other sites More sharing options...
Gaia Paolini Posted August 1, 2023 Share Posted August 1, 2023 Thanks, I could reproduce the problem, but I haven't found a solution yet Link to comment Share on other sites More sharing options...
Sundaram Sridharan Posted August 1, 2023 Share Posted August 1, 2023 thanks for verifying this - is there any other way to fix data types during /after import? Link to comment Share on other sites More sharing options...
Gaia Paolini Posted August 2, 2023 Share Posted August 2, 2023 I found a workaround to the workaround, which loops us back to the original question...the script below should add a table and specify the data types all in one go. Disclaimer: this is testing the boundaries of my Spotfire API prowess! I am not sure whether there is a better way, but this one worked for me. You need to supply valid values for dataPath and new_table_name. from Spotfire.Dxp.Data.Import import * from Spotfire.Dxp.Data import * dataPath='C:/blahblah/mydata.csv' new_table_name="My Table" readerSettings = TextDataReaderSettings() # Initial reader settings readerSettings.Separator = "," # Our text file is tab-delimited readerSettings.AddColumnNameRow(0) readerSettings.AddIgnoreRow(1) # Ignore the bad data type row completely readerSettings.ClearDataTypes(True) # To be sure, clear all defined data types (all loaded into strings) # Create a data source to read in the file textDataSource = TextFileDataSource(dataPath, readerSettings) # Create the new table try: newTable = Document.Data.Tables.Add(new_table_name, textDataSource) except: newTable = Document.Data.Tables[new_table_name] # Loop through the columns and inspect the values, try to set them to Real new_columns = newTable.Columns new_columns_list=list(new_columns) for icol in range(len(new_columns_list)): col=new_columns_list[icol] #print (col.Name) enum=col.RowValues.GetEnumerator() values = [tt.ValidValue for tt in enum] try: values = [float(x) for x in values ] readerSettings.SetDataType(icol, DataType.Real) except: print ('was not real',col.Name) pass #Since we updated the reader settings, we need to reload the new data table from file textDataSource = TextFileDataSource(dataPath, readerSettings) Document.Data.Tables.Remove(new_table_name) newTable = Document.Data.Tables.Add(new_table_name, textDataSource) tableSettings = DataTableSaveSettings(newTable, True, False) tableSettings.UseLinkedData = True # To be safe Document.Data.SaveSettings.DataTableSettings.Add(tableSettings) Link to comment Share on other sites More sharing options...
Sundaram Sridharan Posted August 2, 2023 Share Posted August 2, 2023 Thanks a lot - this works perfectly for me I had some trouble with the list() function and hence changed things slightly #Change Column Data Types columns = table.Columns for icol,col in enumerate (columns): enum=col.RowValues.GetEnumerator() values = [tt.ValidValue for tt in enum] numV = len(values) if numV > 100: numV = 100 try: values = [int(x) for x in values[:numV]] readerSettings.SetDataType(icol, DataType.Integer) print ('was Int',col.Name) except: try: values = [float(x) for x in values[:numV]] readerSettings.SetDataType(icol, DataType.Real) print ('was Real',col.Name) except: pass #Since we updated the reader settings, we need to reload the new data table from file ds = TextFileDataSource(file0, readerSettings) table.ReplaceData(ds) Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now