Python
 
Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
 
User Name:
Password:
Remember me
Go Back   Web Development Archives Mailing Lists Python

Reply
Add This Thread To:
  Del.icio.us   Digg   Google   Spurl   Blink   Furl   Simpy   Y! MyWeb 
Thread Tools Search this Thread Display Modes
 
Unread Web Development Archives Sponsor:
  #1  
Old May 18th, 2008, 11:55 AM
alivip alivip is offline
Registered User
Dev Archives Newbie (0 - 499 posts)
 
Join Date: Mar 2008
Posts: 4 alivip User rank is Just a Lowly Private (1 - 20 Reputation Level)  
Time spent in forums: 26 m 37 sec
Reputation Power: 0
My code is trying to get double word from multube files but give errore please help

How can I get every Token (word) and PreviousToken(Previous word) From multube files and frequency of each two word

my code is trying to get all single word and double word (every Token (word) and PreviousToken(Previous word)) from multube files and get frequency of both. it can get for single word but double word give error

line 50, in most_frequant_word
word1+= ' ' + word_list[ix+1]
IndexError: list index out of range


Code:
import __future__
import Tkinter as tk
import os, glob
import sys
import string
import re
import tkFileDialog      
def most_frequant_word():
 browser= tkFileDialog.askdirectory()
 word_freq={}
 word_freq1={}
 count11=0
 for root, dirs, files in os.walk(browser):
    text1.insert(tk.INSERT, 'Found %d dirs and %d files' % (len(dirs), len(files)))
    text1.insert(tk.INSERT, "\n")
    for idx, file in enumerate(files):
     ff = open (os.path.join(root, file), "r")
     text = ff.read ( )
     ff.close ( )
     word_list = text.split()
     my_list = text.split()
     count11=len(word_list)+count11
     text1.insert(tk.INSERT, "total number of tokens %s" % pair_list)
     text1.insert(tk.INSERT, "\n") 
     for ix, word in enumerate(word_list):
      word = word.lower()
      word = word.rstrip('.,/"\ -_;\[](){} ')
     # build the dictionary
      word1=word
      word1+= ' ' + word_list[ix+1]
      count = word_freq.get(word, 0)
      word_freq[word] = count + 1
      count1 = word_freq1.get(word1,0)
      word_freq1[word1] = count1 + 1
       # create a list of (freq, word) tuples
      freq_list = [(word,freq ) for freq,word  in word_freq.items()]
      freq_list1 = [(word1,freq1 ) for freq1,word1  in word_freq.items()]
       # sort the list by the first element in each tuple (default)
      freq_list.sort(reverse=True)
      freq_list1.sort(reverse=True)
     for n, tup in enumerate(freq_list1):
        text1.insert(tk.INSERT, "%s times: %s" % tup)
        text1.insert(tk.INSERT, "\n")

root = tk.Tk(className = " most_frequant_word")
# text entry field, width=width chars, height=lines text
v1 = tk.StringVar()
text1 = tk.Text(root, width=50, height=50, bg='green')
text1.pack()
# function listed in command will be executed on button click
button1 = tk.Button(root, text='Brows', command=most_frequant_word)
button1.pack(pady=5)
text1.focus()
root.mainloop()


the code subose to do
For example if the text file content is
"Every man has a price. Every woman has a price."

First Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Second Token(word) is "man" PreviousToken(Previous word) is "Every"
Third Token(word) is "has" PreviousToken(Previous word) is "man"
Forth Token(word) is "a" PreviousToken(Previous word) is "has"
Fifth Token(word) is "price" PreviousToken(Previous word) is "a"

Sixth Token(word) is "Every" PreviousToken(Previous word) is none(no previos)
Seventh Token(word) is "man" PreviousToken(Previous word) is "Every"
Eighth Token(word) is "has" PreviousToken(Previous word) is "man"
Ninth Token(word) is "a" PreviousToken(Previous word) is "has"
Tenth Token(word) is "price" PreviousToken(Previous word) is "a"


Frequency of "has a" is 2 (repeated two times first and second sentence)
Frequency of " a price" is 2 (repeated two times first and second sentence)
Frequency of "Every man" is 1 (occur one time only)
Frequency of "man has" is 1 (occur one time only)
Frequency of "Every woman" is 1 (occur one time only)
Frequency of "woman has" is 1 (occur one time only)

please I need help

Reply With Quote
Reply

Viewing: Web Development Archives Mailing Lists Python > My code is trying to get double word from multube files but give errore please help


Thread Tools  Search this Thread 
Search this Thread:

Advanced Search
Display Modes  Rate This Thread 
Rate This Thread:


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are Off
[IMG] code is On
HTML code is Off
View Your Warnings | New Posts | Latest Threads | Shoutbox
Forum Jump


Forums: » Register « |  User CP |  Games |  Calendar |  Members |  FAQs |  Sitemap |  Support | 
  
 





© 2003-2008 by Developer Shed. All rights reserved. DS Cluster 6 hosted by Hostway