|
|
|
| |||||||||
![]() |
|
|
«
Previous Thread
|
Next Thread
»
|
Thread Tools | Search this Thread | Display Modes |
|
#1
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Albretch Mueller wrote:
The reason why I need this is because I need to sort some data (directory structures) first on the directory depth (a numeric value) and then, alphabetically, using the actual directory path > I am using find in order to get the initial data > find . -type f -printf '%T@ %A@ %C@ %M %n %u %g %s %d %h %f ' | awk '{ \ print("\042"$12"\042" \ "\054"$1 \ "\054"$2 \ "\054"$3 \ "\054""\042"$4"\042" \ "\054"$5 \ "\054""\042"$6"\042" \ "\054""\042"$7"\042" \ "\054"$8 \ "\054"$9 \ "\054""\042"$10"\042" \ "\054""\042"$11"\042" \ "\054""\042"$13"\042"); }' > but then sort does not sort on one field as numeric and the other alphabetically > And/or I am not getting it right/I am missing something fundamental here First, you should at least put \n at the end of find's printf format string, or you'll end up with a single line of input. Then, assuming your filenames do not contain newlines, you can do find . -type f -printf '%T@ %A@ %C@ %M %n %u %g %s %d %h %f\n' | LC_ALL=C sort -k 9,9n -k 10 if by "directory path" you mean from the 10th to the end of the line. If you want to sort only on the directory path (10th field), then use LC_ALL=C sort -k 9,9n -k 10,10 but beware that there might be spaces in the names, so the 10th field may contain only part of the directory name. -- echo 0|sed 's909=#3u)o19;s0#0ooo)];s()(0bu}=(;s#}#.1m"?0^2{#; s)")9v2@3%"9$);so%op]t(p$e#!o;sz(z^+.z;su+ur!z"au;sxzxd?_{h)cx;:b; s/\(\(.\).\)\(\(\)*\)\(\(.\).\)\(\(\)\6.*\2.*\)/\5\3\1\7/; tb'|awk '{while((i+=2)<=length($1)-18)a=a substr($1,i,1);print a}' |
|
#2
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Dave B wrote:
but beware that there might be spaces in the names, so the 10th field may contain only part of the directory name. > Well, this is why (wrongly or not) I was using awk. I thought if I have the last field under parenthesis, and since parenthesis are not allowed in directory paths anyway, awk would process eveything between the parenthesis, that means the whole path Am I right on that one? Thanks lbrtchx |
|
#3
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Albretch Mueller wrote:
Dave B wrote: > >but beware that there might be spaces in the names, so the 10th field may >contain only part of the directory name. >> > Well, this is why (wrongly or not) I was using awk. I thought if I have the last field under parenthesis, and since parenthesis are not allowed in directory paths anyway, Parentheses are allowed. awk would process eveything between the parenthesis, that means the whole path In the find's printf, why not use %p instead of %h+%f? This way, you just sort numerically on the 9th field, alphabetically on the 10th field, and you are done. -- echo 0|sed 's909=#3u)o19;s0#0ooo)];s()(0bu}=(;s#}#.1m"?0^2{#; s)")9v2@3%"9$);so%op]t(p$e#!o;sz(z^+.z;su+ur!z"au;sxzxd?_{h)cx;:b; s/\(\(.\).\)\(\(\)*\)\(\(.\).\)\(\(\)\6.*\2.*\)/\5\3\1\7/; tb'|awk '{while((i+=2)<=length($1)-18)a=a substr($1,i,1);print a}' |
|
#4
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Dave B wrote:
sort numerically on the 9th field, alphabetically on the 10th field, and you That should be "alphabetically from the 10th field to the end" |
|
#5
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Dave B wrote:
>> but beware that there might be spaces in the names, so the 10th field may contain only part of the directory name. >> >Well, this is why (wrongly or not) I was using awk. I thought if I have >the >last field under parenthesis, and since parenthesis are not allowed in >directory paths anyway, > Parentheses are allowed. ~ Do you mean in directory path/file names? Which FS allows them? ~ do you mean in sort, in the conventional way in which all characters from the start to the end of the parenthesis are taken into account? ~ > >awk would process eveything between the parenthesis, that means the whole >path > In the find's printf, why not use %p instead of %h+%f? This way, you just sort numerically on the 9th field, alphabetically on the 10th field, and you are done. ~ Actually I am using %p to do the sorting, as you suggested to me, but then I crop that field because I don't really need it, if I have %h + %f ~ Also, I need %h + %f separate because I will then get all directories, sort and index them and use the indexes then to substitute the path names in the file that contains the 'found' files ~ Let me test/polish a little more my silly script for you guys to take a look at it ~ Thanks lbrtchx |
|
#6
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Maxwell Lol wrote:
mkdir 'a()' ~ Well, you are right. The thing is that I would never have file names like that, but of course it isn't really about silly me ;-) ~ sh-3.1# ls -l total 7980 drwxr-xr-x 2 root root 4096 Jun 25 09:34 "a" drwxr-xr-x 2 root root 4096 Jun 25 09:34 a() . . ~ Is there a way to safely use find that gives you all these, I would say, weird cases? ~ lbrtchx |
|
#7
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Albretch Mueller wrote:
! Yes, it does work! I was just messing with some file names ~ Thanks Note that if you have filenames with commas you will sort only on the first part of the name before the first comma (no, double quotes do not protect against that). You better use -k 5 to use all the fields from the 5th to end of line. If, on the other hand, you do not have filesnames with commas, you can surely avoid using double quotes since commas will already separate fields. Bottom line: in any case, you don't need double quotes. -- echo 0|sed 's909=#3u)o19;s0#0ooo)];s()(0bu}=(;s#}#.1m"?0^2{#; s)")9v2@3%"9$);so%op]t(p$e#!o;sz(z^+.z;su+ur!z"au;sxzxd?_{h)cx;:b; s/\(\(.\).\)\(\(\)*\)\(\(.\).\)\(\(\)\6.*\2.*\)/\5\3\1\7/; tb'|awk '{while((i+=2)<=length($1)-18)a=a substr($1,i,1);print a}' |
|
#8
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Dave B wrote:
Albretch Mueller wrote: > find . -type f -printf '%T@ %A@ %C@ %M %n %u %g %s %d %h %f\n' | LC_ALL=C sort -k 9,9n -k 10 > if by "directory path" you mean from the 10th to the end of the line. If you want to sort only on the directory path (10th field), then use > LC_ALL=C sort -k 9,9n -k 10,10 > but beware that there might be spaces in the names, so the 10th field may contain only part of the directory name. > ~ I am still not getting it right somehow ~ sort/your script: ~ sort -t, -k 4,4n -k 5,5 <file_name> ~ doesn't sort the 4th column as numeric and the 5th as text ~ "drwxr-xr-x",16,"root",0,"" "drwx",3,"root",1,".thumbnails" "drwx",2,"root",2,".thumbnails/normal" "drwxr-xr-x",2,"root",1,".mcop" "drwxr-xr-x",2,"root",1,"Desktop" "drwxr-xr-x",7,"root",1,".kde" "drwx",4,"root",2,".kde/cache-Knoppix" "drwx",2,"root",3,".kde/cache-Knoppix/favicons" "drwx",2,"root",3,".kde/cache-Knoppix/background" "drwx",2,"root",2,".kde/tmp-Knoppix" "drwx",2,"root",2,".kde/socket-Knoppix" "drwxr-xr-x",11,"root",2,".kde/share" "drwx",2,"root",3,".kde/share/servicetypes" "drwxr-xr-x",2,"root",3,".kde/share/services" "drwxr-xr-x",5,"root",3,".kde/share/mimelnk" "drwxr-xr-x",2,"root",4,".kde/share/mimelnk/video" "drwxr-xr-x",2,"root",4,".kde/share/mimelnk/audio" "drwxr-xr-x",2,"root",4,".kde/share/mimelnk/application" "drwxr-xr-x",3,"root",3,".kde/share/icons" "drwxr-xr-x",2,"root",4,".kde/share/icons/favicons" "drwxr-xr-x",5,"root",3,".kde/share/fonts" "drwxr-xr-x",2,"root",4,".kde/share/fonts/override" "drwxr-xr-x",4,"root",3,".kde/share/config" "drwxr-xr-x",2,"root",4,".kde/share/config/session" "drwxr-xr-x",2,"root",4,".kde/share/config/colors" "drwxr-xr-x",4,"root",3,".kde/share/cache" "drwxr-xr-x",15,"root",4,".kde/share/cache/http" "drwxr-xr-x",2,"root",5,".kde/share/cache/http/t" "drwxr-xr-x",2,"root",5,".kde/share/cache/http/s" "drwxr-xr-x",2,"root",5,".kde/share/cache/http/p" "drwxr-xr-x",2,"root",5,".kde/share/cache/http/a" "drwxr-xr-x",2,"root",4,".kde/share/cache/favicons" "drwxr-xr-x",18,"root",3,".kde/share/apps" "drwx",2,"root",4,".kde/share/apps/konsole" "drwxr-xr-x",2,"root",4,"" "drwxr-xr-x",3,"root",3,".kde/share/applnk" "drwxr-xr-x",2,"root",4,".kde/share/applnk/.hidden" "drwxr-xr-x",2,"root",2,".kde/Autostart" "drwxr-xr-x",4,"root",1,".mozilla" "drwxr-xr-x",3,"root",2,".mozilla/knoppix" "drwxr-xr-x",2,"root",3,".mozilla/knoppix/ujixazk6.slt" "drwxr-xr-x",4,"root",2,".mozilla/firefox" "drwx",6,"root",3,"" "drwxr-xr-x",2,"root",4,"" "drwxr-xr-x",2,"root",3,"" "drwxr-xr-x",2,"root",1,".gnome_private" "drwxr-xr-x",3,"root",1,".gnome" "drwxr-xr-x",2,"root",2,".gnome/accels" "drwxr-xr-x",2,"root",1,"tmp" "drwxr-xr-x",2,"root",1,".xmms" "drwxr-xr-x",2,"root",1,".xine" "drwxr-xr-x",2,"root",1,".qt" "drwxr-xr-x",3,"root",1,".local" "drwxr-xr-x",4,"root",2,".local/share" "drwx",4,"root",3,".local/share/Trash" "drwx",2,"root",4,".local/share/Trash/files" "drwx",2,"root",4,".local/share/Trash/info" "drwxr-xr-x",2,"root",3,".local/share/applications" "drwxr-xr-x",2,"root",1,".links" "drwxr-xr-x",21,"root",1,".gimp-2.2" "drwxr-xr-x",2,"root",2,".gimp-2.2/tool-options" "drwxr-xr-x",2,"root",2,".gimp-2.2/tmp" "drwxr-xr-x",2,"root",2,".gimp-2.2/curves" "drwxr-xr-x",2,"root",2,".gimp-2.2/brushes" |
|
#9
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Dave B wrote:
Note that if you have filenames . . . ~ K, what do you do in order to avoid all those kinds of nuances that can happen with file path names, which conflict with other utilities? ~ I think the sorting part can be safely managed by somehow including a temporary column with hexadecimal representation of the string, but of course you can not feed the exec part of a find statement with that ~ Feeding "find" a directory path that contains spaces works fine if you do it right on the command line: ~ sh-3.1# find "/home/root/New Folder () & %% ^/New Folder" -type f -exec md5sum {} \; /home/root/New Folder () & %% ^/New Folder/Text File~ /home/root/New Folder () & %% ^/New Folder/Text File ~ But if you (actually -I- couldn't do it anyway) try crafting that same statement as a script ~ #!/bin/bash START_DIR="/home/root/New Folder () & %% ^/New Folder" START_DIR="\"/home/root/New Folder () & %% ^/New Folder\"" START_DIR="\'/home/root/New Folder () & %% ^/New Folder\'" find ${START_DIR} -type f -exec md5sum {} \; ~ I expectedly got: ~ sh-3.1# sh ./script00.sh find: invalid predicate `()' ~ What would you do to make sure that find does not stumble on such cases? ~ thanks lbrtchx |
|
#10
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
it seems to be working just fine for this basic script
But when I used some formattig via awk and stuff it did not seem to like it More to come lbrtchx |
|
#11
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Monday 30 June 2008 03:06, Albretch Mueller wrote:
Basically what is happening, as I see it, is that when file names contain spaces in a statment containing some -exec and/or awk processing the processing parts is not being fed with the actual name of the file , let's try to make things easy. If you have N spaces in your names, then just use find -printf and awk will see all the correct fields: $ echo "field1 field2 field3" | awk '{for(i=1;i<=NF;i++) print $i}' field1 field2 field3 If you D have spaces, then just use a different separator, something that does not appear elsewhere in the input (eg, a comma), and tell awk what that separator is: $ echo "field with space,field2,field3 space" | \ awk -F, '{for(i=1;i<=NF;i++) print $i}' field with space field2 field3 space How do you produce a comma separated list with find's printf? Just do -printf '%T@,%A@,%C@,%M,%n,%u,%g,%s,%d,%h,%f\n' | awk -F, (the \n at the end of the printf format string is important, and I see it's missing in one place in your post). -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
#12
|
|||
|
|||
|
trying to wrestle sort and/or awk (not succesfully anyway ;-))
Kenny McCormack wrote:
I think it means that it got all huffy and threatened to hold its breath until it got its way. *When that didn't work, it probably took its toys and went home. ~ I don't get what the deal is about getting huffy with toys or so seriously cobbling some script, but here is what I came up with which totallt suits my needs in case someone else is looking for something similar: In short pk oversimplified my intentions and showed me something that worked but it was not what i looking for to do #!/bin/bash # __ _BRX_DIR="/ramdisk/home/root" _BRX_DIR="/media/sda1" # __ _DATE=`date +%Y%m%d%H%M%S` # __ _FLS_DATA=${_DATE}".fs.data.txt"; # __ _FLS_SIGN=${_DATE}".fs.sign.txt"; # __ UT_DIRS=${_DATE}".dirs.txt"; # __ echo "Starting Directory Branch: "${_BRX_DIR} echo "Files data: "${_FLS_DATA} echo "Files signatures: "${_FLS_SIGN} echo "Directories: "${UT_DIRS} # ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ SNAPSHT F FILES WITHUT MD5SUM # __ getting files formatted as csv find "${_BRX_DIR}" -type f -printf '%T@,%A@,%C@,"%F","%M",%n,"%u","%g",%s %d,"%h","%f","%P"\n' "${_FLS_DATA}" # ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ MD5SUMs find "${_BRX_DIR}" -type f -print0 | xargs -0 -n1 md5sum -b "${_FLS_SIGN}" # ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ SNAPSHT F FILES' DIRECTRIES find "${_BRX_DIR}" -type d -printf '%T@,%A@,%C@,"%M",%n,"%u","%g",%d,"%P"\n' ${UT_DIRS}.2sort.tmp # __ sorting on depth (numeric) and then on directory path (alpha) sort -t, -k 8,8n -k 9,9 ${UT_DIRS}.2sort.tmp ${UT_DIRS} # __ rm ${UT_DIRS}.2sort.tmp |
![]() |
| Viewing: Web Development Archives > FAQs > Unix/Linux > trying to wrestle sort and/or awk (not succesfully anyway ;-)) |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
|
|
|
|