Friday, October 14, 2011

Get torrent info like seeds/peers/completed from tracker (UDP) aka scraping torrent

The previous script I made adds trackers to a .torrent file. After I made that I thought that it would be if I could remove all the dead torrents by checking how many seeds/peers are available according to a particular tracker. So, the following script finds seeds/peers information if we have tracker url & torrent hash. It also finds the torrent name from the torrent hash using torrentz.me 

You can read more about the protocol from here:
http://bittorrent.org/beps/bep_0015.html#udp-tracker-protocol


Code:
"""
Author: shadyabhi abhijeet.1989@gmail.com
For protocol description(not mine), check http://bittorrent.org/beps/bep_0015.html#udp-tracker-protocol
"""

import socket
import struct   
from random import randrange #to generate random transaction_id
from urllib import urlopen
import re

tracker = "tracker.istole.it"
port = 80
torrent_hash = ["3ebde329f208b9e2e81c8e0f80d14384d5f416e4", "3ac9002ce1a7d5dde2c02b7cf9dc9e0f15eda7cb", "00e058f6629a19b42458af4dea5f6b9e2ebe8e25"]
torrent_details = {}

def get_torrent_name(infohash):
    url = "http://torrentz.me/" + infohash
    p = urlopen(url)
    page = p.read()
    c = re.compile(r'<h2><span>(.*?)</span>')
    return c.search(page).group(1)

def pretty_show(infohash):
    print "Torrent Hash: ", infohash
    try:
        print "Torrent Name (from torrentz): ", get_torrent_name(infohash)
    except:
        print "Coundn'f find torrent name"
    print "Seeds, Leechers, Completed", torrent_details[infohash] 
    print

#Create the socket
clisocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
clisocket.connect((tracker, port))

#Protocol says to keep it that way
connection_id=0x41727101980
#We should get the same in response
transaction_id = randrange(1,65535)

packet=struct.pack(">QLL",connection_id, 0,transaction_id)
clisocket.send(packet)
res = clisocket.recv(16)
action,transaction_id,connection_id=struct.unpack(">LLQ",res)

packet_hashes = ""
for infohash in torrent_hash:
    packet_hashes = packet_hashes + infohash.decode('hex')

packet = struct.pack(">QLL", connection_id, 2, transaction_id) + packet_hashes

clisocket.send(packet)
res = clisocket.recv(8 + 12*len(torrent_hash))

index = 8
for infohash in torrent_hash:
    seeders, completed, leechers = struct.unpack(">LLL", res[index:index+12])
    torrent_details[infohash] = (seeders, leechers, completed)
    pretty_show(infohash)
    index = index + 12 

Usage (The above script has 3 hashes for demonstration, you can change them):
 shadyabhi@archlinux ~ $ python2 check_trackers.py 
Torrent Hash:  3ebde329f208b9e2e81c8e0f80d14384d5f416e4
Torrent Name (from torrentz):  House.S08E02.HDTV.XviD-LOL.avi
Seeds, Leechers, Completed (10297, 1051, 172274)

Torrent Hash:  3ac9002ce1a7d5dde2c02b7cf9dc9e0f15eda7cb
Torrent Name (from torrentz):  Dexter.S06E02.Once.Upon.a.Time.HDTV.XviD-FQM.avi
Seeds, Leechers, Completed (10962, 1328, 248032)

Torrent Hash:  00e058f6629a19b42458af4dea5f6b9e2ebe8e25
Torrent Name (from torrentz):  Breaking.Bad.S04E13.Face.Off.HDTV.XviD-FQM.avi
Seeds, Leechers, Completed (7751, 495, 183809)

shadyabhi@archlinux ~ $ 


5 comments:

  1. Shankar.shankar TJuly 5, 2012 at 10:30 AM

    As soon as a careful browse I
    thought it was really enlightening. 
    I take pleasure in you taking the time and effort to put this blog post
    together. 
    I once again discover me personally spending way to much time both reading and
    leaving comments.


    Free software

    ReplyDelete
  2. Could you try to make a plugin for deluge with this? Pleease? :)

    ReplyDelete
  3. also it tells me "invalid syntax" :| something with the " at line 25, but i have no idea of python...i used python 3.3 under windows xp...

    ReplyDelete
  4. You should use python < 3 for this (for ex. latest version of python 2.7). "print" syntax was changed in python 2.* versions.

    ReplyDelete
  5. Well, doesn't it show total peers/seends present already? I haven't used Deluge, qbittorrent is more than sufficient for me.

    ReplyDelete