Friday, 9 August 2013

python: an urlopen trouble while trying to download a gzip file

python: an urlopen trouble while trying to download a gzip file

Greeting to all.
I am going to use the wiktionary dump for the purpose of POS tagging.
Somehow it stuck yet at the step of downloading. Here is my code
import nltk
from urllib import urlopen
from collections import Counter
import gzip
url =
'http://dumps.wikimedia.org/enwiktionary/latest/enwiktionary-latest-all-titles-in-ns0.gz'
fStream = gzip.open(urlopen(url).read(), 'rb')
dictFile = fStream.read()
fStream.close()
text = nltk.Text(word.lower() for word in dictFile())
tokens = nltk.word_tokenize(text)
Here is the kind of mistake I get:
Traceback (most recent call last):
File "~/dir1/dir1/wikt.py", line 15, in <module>
fStream = gzip.open(urlopen(url).read(), 'rb')
File "/usr/lib/python2.7/gzip.py", line 34, in open
return GzipFile(filename, mode, compresslevel)
File "/usr/lib/python2.7/gzip.py", line 89, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
TypeError: file() argument 1 must be encoded string without NULL bytes,
not str
Process finished with exit code 1

No comments:

Post a Comment