Run MRJob from IPython notebook(从 IPython 笔记本运行 MRJob)
问题描述
我正在尝试从 IPython 笔记本运行 mrjob 示例
I'm trying to run mrjob example from IPython notebook
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
然后用代码运行它
mr_job = MRWordFrequencyCount(args=["testfile.txt"])
with mr_job.make_runner() as runner:
runner.run()
for line in runner.stream_output():
key, value = mr_job.parse_output_line(line)
print key, value
并得到错误:
TypeError: <module '__main__' (built-in)> is a built-in class
有没有办法从 IPython notebook 运行 mrjob?
Is there way to run mrjob from IPython notebook?
推荐答案
我还没有找到完美的方法",但你可以做的一件事是创建一个笔记本单元格,使用 %%file
魔术,将单元格内容写入文件:
I haven't found the "perfect way" yet, but one thing you can do is create one notebook cell, using the %%file
magic, writing the cell contents to a file:
%%file wordcount.py
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
然后让 mrjob
在稍后的单元格中运行该文件:
And then have mrjob
run that file in a later cell:
import wordcount
reload(wordcount)
mr_job = wordcount.MRWordFrequencyCount(args=['example.txt'])
with mr_job.make_runner() as runner:
runner.run()
for line in runner.stream_output():
key, value = mr_job.parse_output_line(line)
print key, value
请注意,我调用了我的文件 wordcount.py
并且我从 wordcount
模块导入了类 MRWordFrequencyCount
-- 文件名和模块必须匹配.Python 还会缓存导入的模块,当您更改 wordcount.py
文件时,iPython 不会重新加载模块,而是使用旧的缓存模块.这就是我将 reload()
调用放在那里的原因.
Notice that I called my file wordcount.py
and that I import the class MRWordFrequencyCount
from the wordcount
module -- the filename and module has to match. Also Python caches imported modules and when you change the wordcount.py
-file iPython will not reload the module but rather used the old, cached one. That's why I put the reload()
call in there.
参考:https://groups.google.com/d/味精/mrjob/CfdAgcEaC-I/8XfJPXCjTvQJ
更新(更短)
对于较短的第二个笔记本单元,您可以通过从笔记本中调用 shell 来运行 mrjob
Update (shorter)
For a shorter second notebook cell you can run the mrjob by invoking the shell from within the notebook
! python mrjob.py shakespeare.txt
参考:http://jupyter.cs.brynmawr.edu/hub/dblank/公共/Jupyter%20Magics.ipynb
Reference: http://jupyter.cs.brynmawr.edu/hub/dblank/public/Jupyter%20Magics.ipynb
这篇关于从 IPython 笔记本运行 MRJob的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:从 IPython 笔记本运行 MRJob


- 我如何透明地重定向一个Python导入? 2022-01-01
- 使用 Cython 将 Python 链接到共享库 2022-01-01
- ";find_element_by_name(';name';)";和&QOOT;FIND_ELEMENT(BY NAME,';NAME';)";之间有什么区别? 2022-01-01
- 检查具有纬度和经度的地理点是否在 shapefile 中 2022-01-01
- 我如何卸载 PyTorch? 2022-01-01
- CTR 中的 AES 如何用于 Python 和 PyCrypto? 2022-01-01
- 使用公司代理使Python3.x Slack(松弛客户端) 2022-01-01
- 如何使用PYSPARK从Spark获得批次行 2022-01-01
- 计算测试数量的Python单元测试 2022-01-01
- YouTube API v3 返回截断的观看记录 2022-01-01