S H Posted January 21, 2019 Share Posted January 21, 2019 Hello, I have a data set with a country column that contains several comma separated lists. I want to do a uniqueconcatenate across this column, but because the column can already contain multiple values (i.e. a single record might have "Canada, United States") I am not getting the output I want. My uniqueconcatenation comes out as: "Canada, Canada, India, Canada, Israel, Canada, United States, India, United States, United States" I know that an RXReplace could help me here, but I am beyond a beginner at these. Cribbing from the internet, the best I got was: RXReplace(UniqueConcatenate([COUNTRY]),"([^,]+)(,[ ]*\1)+","\1","") Which only gets me halfway there with an output of "Canada, India, Canada, Israel, Canada, United States, India, United States, United States". Any thoughts on how this could be made to work (And at the same time, any favourite resources for learning regular expressions) Thank you in advance! Link to comment Share on other sites More sharing options...
Gaia Paolini Posted January 22, 2019 Share Posted January 22, 2019 Hi I am not sure I understand the desired result, it would be great if you couldclarify with a sample dataset showing the desired result. I am interpreting it this way: - you have data in which each row is a different concatenation of countries. Example: COUNTRY A,A,B B,B,C,A D,A,D,A C,B,B,B,B -you want a new column, that for each row contains the sameunique concatenation of all the countries in all rows. You could achieve it by creating a calculated column like this: (type all in one row) TERR_String("output Link to comment Share on other sites More sharing options...
S H Posted January 22, 2019 Author Share Posted January 22, 2019 Hello,Yes, you have the right idea. In this case I need this in a calculated value, but the expression would be the same as doing it in a calculated column I believe. The only difference between your sample data and the real data is that for a given record a country will only appear once. So if the data set is:COUNTRY A, B A B, C B, D, E D D, EMy desired output of the calculated value is A, B, C, D, E. Simply doing a uniqueconcatenate doesn't work since it looks at the whole string and not subsets.Ideally I would prefer to use regular expressions if possible, as our company does not have a license that permits TERR to be used in webplayer.Appreciate your help! Link to comment Share on other sites More sharing options...
Gaia Paolini Posted January 22, 2019 Share Posted January 22, 2019 Hi I think you googled what I have now googled. From what I see, the problems are 1) the uniqueConcatenate adds a space which is non-avoidable, 2) the proposed solution does not work if the repeated countries are not adjacent. Here is something that worked for the example, although it is a bit of a pastiche. Unfortunately I do not have the resources to clean it or rationalise it further. RXReplace(Trim(RXReplace(Concatenate(",",RXReplace(UniqueConcatenate([COUNTRY]),"\s+","","g")),"(.)(=.*\1)"," ","g")),"\s+",",","g) The inner coreRXReplace(UniqueConcatenate([COUNTRY]),"\s+","","g) does the concatenation, then removes every inner space. The basic idea is to apply a positive lookahead (.)(=.*\1) The rest is garnish to replace spurious spaces and commas Gaia Link to comment Share on other sites More sharing options...
S H Posted January 22, 2019 Author Share Posted January 22, 2019 I expect you are correct and we landed on the same reference page. And yes, the issue with that solution is that the concatenated list is not sorted and so the solution only removes adjacent duplicate values.Unfortunately when I entered the below solution, the output I get is:Input: RXReplace(Trim(RXReplace(Concatenate(",",RXReplace(UniqueConcatenate([COUNTRY]),"\s+","","g")),"(.)(=.*\1)"," ","g")),"\s+",",","g") Output: "r,l,C,I,,Uni,dS,ates"In the above, UniqueConcatenate([COUNTRY]) returns the string "Canada, Canada, India, Canada, Israel, Canada, United States, India, United States, United States".Did the above work for you Could SF versions be at play here (I am running v7.7)Thank you again for your help! Link to comment Share on other sites More sharing options...
Gaia Paolini Posted January 22, 2019 Share Posted January 22, 2019 hi It is the sample data I am using, with more than one character per word. Without the right data, it is difficult to provide a solution. Gaia Link to comment Share on other sites More sharing options...
S H Posted January 22, 2019 Author Share Posted January 22, 2019 Apologies if I wasn't clear - that list of countries is the data set in question. Any function that works on that string in place of UniqueConcatenate([COUNTRY]) would solve the general problem. Link to comment Share on other sites More sharing options...
Gaia Paolini Posted January 22, 2019 Share Posted January 22, 2019 Hi this might work - it does not add commas but spaces. RXReplace(RXReplace(UniqueConcatenate([COUNTRY]),","," ",g"),"([A-Za-z]+)\b\s+(=.*\b\1)","","g") Gaia Link to comment Share on other sites More sharing options...
S H Posted January 22, 2019 Author Share Posted January 22, 2019 This may just do the trick. Thank you for all your help with this! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now