|
|
|
|
|||||||||||||||||||||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
Comparing files
on Monday 28 July 2008 14:42 GS wrote:
Hi all, I have two files like this: > file A: > line1 line2 line3 line4 > file B: > line5 line3 line2 > I want to get only lines from file A that do not appear in file B: > line1 line4 > How can I accomplish this without looping through the lines of both files? Is there a unix command to do it quickly? I tried "comm" and "uniq" but I cannot get what I want. > Thanks Guido > > > sort fileA fileB fileB | uniq -u uniq -u prints only unique lines. By including file B twice any lines unique to file B will end up being duplicated. Andrew |
|
#2
|
|||
|
|||
|
Comparing files
Wed, 30 Jul 2008 08:18:12 +0100, Andrew McDermott did *:
on Monday 28 July 2008 14:42 GS wrote: > >Hi all, >I have two files like this: >> >file A: >> >line1 >line2 >line3 >line4 >> >file B: >> >line5 >line3 >line2 >> >I want to get only lines from file A that do not appear in file B: >> >line1 >line4 >> >How can I accomplish this without looping through the lines of both >files? Is there a unix command to do it quickly? I tried "comm" and >"uniq" but I cannot get what I want. >> >Thanks >Guido >> >> >> sort fileA fileB fileB | uniq -u > uniq -u prints only unique lines. By including file B twice any lines unique to file B will end up being duplicated. > Andrew Excellent! This form *may* be less straining in case of huge files: $ sort fileA <(sort fileB fileB) | uniq -u (though it'll be a bit slower because of the two steps) |
|
#3
|
|||
|
|||
|
Comparing files
7/30/2008 2:18 AM, Andrew McDermott wrote:
on Monday 28 July 2008 14:42 GS wrote: > > >>Hi all, >>I have two files like this: >> >>file A: >> >>line1 >>line2 >>line3 >>line4 >> >>file B: >> >>line5 >>line3 >>line2 >> >>I want to get only lines from file A that do not appear in file B: >> >>line1 >>line4 >> >>How can I accomplish this without looping through the lines of both files? >>Is there a unix command to do it quickly? I tried "comm" and "uniq" but I >>cannot get what I want. >> >>Thanks >>Guido >> >> >> > sort fileA fileB fileB | uniq -u > uniq -u prints only unique lines. By including file B twice any lines unique to file B will end up being duplicated. But any lines that appear multiple times in fileA will be discarded even if they don't appear in fileB. Ed. |
|
#4
|
|||
|
|||
|
Comparing files
But any lines that appear multiple times in fileA will be discarded even if they
don't appear in fileB. > Ed. Indeed! It seemed too easy. I should think twice. |
|
#5
|
|||
|
|||
|
Comparing files
Wed, 30 Jul 2008 17:24:59 -0500, Ed Morton did *:
7/30/2008 2:18 AM, Andrew McDermott wrote: >on Monday 28 July 2008 14:42 GS wrote: >> >> Hi all, I have two files like this: file A: line1 line2 line3 line4 file B: line5 line3 line2 I want to get only lines from file A that do not appear in file B: line1 line4 How can I accomplish this without looping through the lines of both files? Is there a unix command to do it quickly? I tried "comm" and "uniq" but I cannot get what I want. Thanks Guido >sort fileA fileB fileB | uniq -u >> >uniq -u prints only unique lines. By including file B twice any lines >unique to file B will end up being duplicated. > But any lines that appear multiple times in fileA will be discarded even if they don't appear in fileB. > Ed. then this should cure it: $ sort <(sort FileA | uniq ) <(sort FileB FileB ) | uniq -u |
|
#6
|
|||
|
|||
|
Comparing files
7/31/2008 3:58 AM, Loki Harfagr wrote:
Wed, 30 Jul 2008 17:24:59 -0500, Ed Morton did cat : > > >7/30/2008 2:18 AM, Andrew McDermott wrote: >> on Monday 28 July 2008 14:42 GS wrote: Hi all, I have two files like this: file A: line1 line2 line3 line4 file B: line5 line3 line2 I want to get only lines from file A that do not appear in file B: line1 line4 How can I accomplish this without looping through the lines of both files? Is there a unix command to do it quickly? I tried "comm" and "uniq" but I cannot get what I want. Thanks Guido sort fileA fileB fileB | uniq -u uniq -u prints only unique lines. By including file B twice any lines unique to file B will end up being duplicated. >> >>But any lines that appear multiple times in fileA will be discarded even >>if they don't appear in fileB. >> >> Ed. > > then this should cure it: > $ sort <(sort FileA | uniq ) <(sort FileB FileB ) | uniq -u > You could just use "sort -u FileA" instead of "sort FileA | uniq", but the P probably doesn't want to get rid of duplicate lines from FileA anyway. Ed. |
|
#7
|
|||
|
|||
|
Comparing files
Thu, 31 Jul 2008 08:18:08 -0500, Ed Morton did *:
7/31/2008 3:58 AM, Loki Harfagr wrote: >Wed, 30 Jul 2008 17:24:59 -0500, Ed Morton did cat : >> >> 7/30/2008 2:18 AM, Andrew McDermott wrote: on Monday 28 July 2008 14:42 GS wrote: Hi all, I have two files like this: file A: line1 line2 line3 line4 file B: line5 line3 line2 I want to get only lines from file A that do not appear in file B: line1 line4 How can I accomplish this without looping through the lines of both files? Is there a unix command to do it quickly? I tried "comm" and "uniq" but I cannot get what I want. Thanks Guido sort fileA fileB fileB | uniq -u uniq -u prints only unique lines. By including file B twice any lines unique to file B will end up being duplicated. But any lines that appear multiple times in fileA will be discarded even if they don't appear in fileB. Ed. >> >> >then this should cure it: >> >$ sort <(sort FileA | uniq ) <(sort FileB FileB ) | uniq -u >> >> You could just use "sort -u FileA" instead of "sort FileA | uniq", That's right, I'm using so frequently the counting form '( sort - | uniq -c )' that I forget easily about the "recent" extensions ;-) but the P probably doesn't want to get rid of duplicate lines from FileA anyway. Well, in this case I don't see an easier direct toolbox solution than $ comm -23 <(sort fileA) <(sort <(sort FileB) <(sort FileB) ) but that's not really an "easy" one :-) so I'd use an awk script (K, or other scripting language having hash and/or sorters). But as the P sample was too small to determine if fileA data could possibly be unique and/or pre-sorted I'll rest my case ;D) (if the files are not too big Stephane's "grep -Fxvf FileB FileA" is certainly a good go) |
![]() |
| Viewing: Web Development Archives > FAQs > Unix/Linux > Comparing files |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|