SlideShare a Scribd company logo
Website Monitoring with Distributed
Messages/Tasks Processing (AMQP &
       RabbitMQ) on Django
About me?
●   Rahmat Ramadhan Irianto
●   Software Developer at Void-Labs & Defpy-Labs
●   is a Open Source Software Developer Team
●   A Student from Indonesian University STMIK
    Dipanegara 2010 Makassar
●   Lives in Indonesian, Makassar
●   Write Python Apps every day
What is Website-Monitoring ?




●   Website monitoring provides page change monitoring
    and notification services to internet users worldwide.
    Website monitoring will create a change log for the
    page and alert user by email when it detects a change
    in the page text.
What Useful For ?
●   Website monitoring can monitor almost any page on the internet and when it
    detect page changes then it will alert you by email.
●   Website Monitoring can be your good choice for business intelligence
    strategy. Track your competition and get timely alerts when a they changes
    their website. or You can Watch for developments at your customer's
    websites.
●   Monitor the press release page of companies you are invested in. Keep
    track of their current executives. Be alerted to changes on their home page.
●   Monitoring page privacy policies or terms and conditions without notice
    companies on the web , Now you can use website monitoring for alert you to
    these changes.
●   Monitor the new job listings pages at companies where you would like to
    work. When they post a new listing, we will email you.
●   Keep your up to date news. Monitor news page of your top site news. When
    they update it, you'll get an email alert.
                                                  Inspirate from changedetection
●   And much more                                 http://www.changedetection.com
What Power build Website-
       monitoring?




http://goo.gl/hCf34
Python !


                                     http://goo.gl/sSqHh


( Powerfull,Efficient,flexibility,ideal language,Effective for
      OOP,Elegant syntax,Rich of library & etc )
                     www.python.org
http://goo.gl/YXnA9




            Django !
( Django is a high-level Python Web framework that
encourages rapid development and clean, pragmatic
                    design & Etc)
        https://www.djangoproject.com/
Mongodb
  ( flexibility, powerfull, Fast,
        and ease of use )




http://www.mongodb.org

                                    http://goo.gl/NZQ18
RabbitMQ
  ( Powerfull,fast, reliable & high availability
 for message queuing system. open source
  queueing option & Greats for building and
      managing scalable applications)



http://www.rabbitmq.com
                                      http://goo.gl/Pvd9Q
Workflow Website-Monitoring
Ajax Post             Post Api



               request         If Post Api      Rest Api                     Save data


If ajax post                                                            Procces task
                                                                                       Scrape page

                          Message queue      Create worker     worker
  Myview
               Publish task

                                                             Save result

                                                                           If changepage

Save data
                                                                             Alert Email


                                                    Report Diff


                                                                                        Mongodb
Lets Talk About




          http://goo.gl/m8QUH
Why Mongodb ?
●   Greats features of document databases,key-
    value stores, and relational databases.
●   How greats ?
●     Fast
●     Smart
●     Scalable
●     Schema-less
●     Dynamic Query
●     Easy use & etc..
What we gonna Need ?


              +               = Pymongo
http://pypi.python.org/pypi/pymongo/
How to ?
import pymongo
from pymongo import Connection
collection_user = pymongo.Connection().website_monitor.user
collection_monitor = pymongo.Connection().website_monitor.monitor
collection_task = pymongo.Connection().website_monitor.task

INSERT
monitor = {'username':smart_str(request.user),
             'user_id':request.user.id,
             'url':url,
             'datetime':datetime.utcnow(),
             'status':status,
             'hit':0,
             'fail_hit':0,
             'period':int(request.POST.get('period')),
             'email':collection_user.find_one({'name':str(request.user)})['email'],
             'pk':pk,
             'last_checking':None,
             'task_id':task_id,
 }
collection_monitor.insert(monitor)
UPDATE
collection_user.update({'name':data_user['id']},{'$set':
{'email':data_user['email'],
                      'firstname':smart_str(data_user['first_name']),
                      'lastname':smart_str(data_user['last_name']),
                      'ip': request.META.get('REMOTE_ADDR','unknown'),
                      'login':datetime.now(),
                      'user_agent':
request.META.get('HTTP_USER_AGENT','unknown'),
                      'session':
request.META.get('XDG_SESSION_COOKIE','unknown'),
                      'session_fb':session_key,
                      'ts':datetime.now(),
                      'authkey':authkey,
                             }
                          }
                      )



 REMOVE
 if collection_content.find({'url':i['url']}).count() == 3:
     collection_content.remove({'url':i['url'][0]})
Why we must use Distributed
       Computing

       Distributed Computing
Is a method of solving computational
problem by dividing the problem into
  many tasks run simultaneously on
many hardware or software systems
             (Wikipedia)
What is Message queue ?
Message Queues are:
 0->Communication Buffers
 0->Between independent sender & receiver processes
 0->Asynchronous
  • Time of sending not necessarily same as receiving
  • In context of Web Applications:
     o Sender: Web Application Servers
     o Receiver: Background worker processes
     o Queue items: Tasks that the web server doesn’t
       have time/resources to do
How it work ?
Say a web application server has a task it
doesn’t have time to do
• It puts the task in the message queue
• Other web servers can access the same
queue(s)
and put tasks there
• Workers are greedy and they all watch the
queues for tasks
• Workers asynchronously pick up the first
available task on the queue when they are ready
What usefull for ?

• Message Queues are useful in certain
situations
• General guidelines:
  0->Does your web applications take more than
a few seconds to generate a response?
  o->Are you using a lot of cron jobs to process
data in the background?
  o->Do you wish you could distribute the
processing of the data generated by your
application among
many servers?
What We Need To Make Message
          Queue ?
AMQP & RabbitMQ
Why Choice AMQP & RabbitMQ ?
1.RabbitMQ is free to use
2.The documentation is decent
3.There is decent clustering support, even though we
never needed clustering
4.We didn’t want to lose queues or messages upon
broker crash/ restart
5. We develop applications using Python/django and
setting up an AMQP backend using carrot was
easy
Now Lets Talk about RabbitMQ
RabbitMQ ?

 RabbitMQ is Erlang-based open source
application that serves as a message broker or
message-oriented middleware.
 RabbitMQ implementation refers to the
application layer protocol that is the Advanced
Message Queuing Protocol(AMQP).
 AMQP provide an interoperable standard
protocol between the vendor to regulate the
exchange of messages on enterprise-scale
systems.
Why Use RabbitMQ ?
● We need For...
●  Running Task / Procces in the
  backround
●  Asynchronous tasking process
●  Scheduling system & Etc
So .. What make Rabbit Focus ?
Carrot !
           Carrot is an AMQP messaging
           queue framework. AMQP is the
           Advanced Message Queuing
           Protocol, an open standard
           protocol for message orientation,
           queuing, routing, reliability and
           security.

             Easy way to connect to
           RabbitMQ.
             Easy way to pull stuff out of the
           queue.
             Easy way to throw stuff into the
           queue.


 https://github.com/ask/carrot/
Concept ?
●   Publishers (Publishers sends messages to an exchange.)
●   Exchanges (Messages are sent to exchanges. Exchanges are named and can be
    configured to use one of several routing algorithms. The exchange routes the
    messages to consumers by matching the routing key in the message with the routing
    key the consumer provides when binding to the exchange.)
●   Consumers (Consumers declares a queue, binds it to a exchange and receives
    messages from it.)
●   Queues ( Queues receive messages sent to exchanges. The queues are declared by
    consumers. )
●   Routing keys ( Every message has a routing key. The interpretation of the routing
    key depends on the exchange type. There are four default exchange types defined by
    the AMQP standard, and vendors can define custom types (so see your vendors
    manual for details )
●   Exchange types defined by AMQP/0.8:
●     Direct exchange ( Matches if the routing key property of the message and the
    routing_key attribute of the consumer are identical. )
●     Fan-out exchange(Always matches, even if the binding does not have a routing
    key.)
●     Topic exchange (Matches the routing key property of the message by a primitive
    pattern matching scheme.)
Creating Connetion on Django

Settings.py
RABBITMQ_HOST = 'localhost'
RABBITMQ_PORT = 5672
RABBITMQ_USER = 'guest'
RABBITMQ_PASS = 'guest'
RABBITMQ_VHOST = '/'




Views.py
from carrot.messaging import Publisher, Consumer
from carrot.connection import AMQPConnection
from django.conf import settings

conn_for_carrot =
AMQPConnection(hostname=settings.RABBITMQ_HOST,
                  port=settings.RABBITMQ_PORT,
                  userid=settings.RABBITMQ_USER,
                  password=settings.RABBITMQ_PASS,
                  vhost=settings.RABBITMQ_VHOST)
Publisher
      publisher = Publisher(connection=conn_for_carrot,
exchange='website_monitoring_exchange', exchange_type = 'direct')
      publisher.send({'msg':{'do': 'check',
                 'task_id':task_id,
                 }
            })




        publisher = Publisher(connection=conn_for_carrot,
exchange='website_monitoring_exchange', exchange_type = 'direct')
        publisher.send({'msg':{'do': 'check',
                  'task_id':hashlib.md5(str(task_id)
+request.PUT.get('url')).hexdigest(),
                  }
            })
Consumer
def monitoring_check():
   def call(message_data,message):
      if message_data['msg']['do'] == 'check':
         print '[+] receiving message'
         message.ack()
         task_id = message_data['msg']['task_id']
         get_pid = subprocess.Popen(['python','scraper.py', task_id])
         pid = get_pid.pid
         collection_task.update({'task_id':task_id}, {'$set': {'status':'RUNNING',
'pid':pid}})
         print '[Starting PID:%s]'%pid
         get_pid.wait()
      else:
         message.ack()

  queuename = 'website_monitoring_checker'
  consumer = Consumer(connection=conn_for_carrot, queue=queuename,
exchange='website_monitoring_exchange', exchange_type = 'direct')
  consumer.register_callback(call)
  try:
     print '[queue:%s]consume..' % queuename
     consumer.wait()
  except Exception, err:
     print err
Cooking soup with beautifullsoup?

from BeautifulSoup import BeautifulSoup
monitor = collection_monitor.find_one({'pk':pk})

contents = [collection_content.find({'url':str(monitor['url'])})
[1],collection_content.find({'url':str(monitor['url'])})[0]]

 texts = BeautifulSoup(BeautifulSoup(i['content']).prettify()).findAll(text=True)
        data = {'content': ' '.join(filter(visible, texts)),
             'datetime': i['datetime'],
        }



def visible(element):
   if element.parent.name in ['style', 'script', '[document]', 'head', 'title']:
       return False
   if re.search('<!--', str(element)) or re.search('-->', str(element)) or
re.search('&nbsp;', str(element)):
       return False
   return True
Alert by email !

def sending_email(to,sub,msg):
  try:
     gmail_user = 'romanticdevil.jimmy@gmail.com'
     gmail_pwd = '***************'
     smtpserver = smtplib.SMTP("smtp.gmail.com",587)
     smtpserver.ehlo()
     smtpserver.starttls()
     smtpserver.ehlo
     smtpserver.login(gmail_user, gmail_pwd)
     header = 'To:' + to + 'n' + 'From: Website-Monitoring <'+gmail_user+'>n' +
'Subject: %sn'%sub
     msg = header + msg
     smtpserver.sendmail(gmail_user,to, msg)
     smtpserver.close()
  except Exception ,err :
     print err
Task / Scheduling Checking ?
task_id = sys.argv[1]
print task_id
raw_delay = collection_task.find_one({'task_id':task_id})['schedule']
print raw_delay
if raw_delay == "1":
   delay = 60*60
elif raw_delay =="12":
   delay = 720*60
else:
   delay = 1440*60
while True:
    try:
       print '[+] Starting task: %s' %sys.argv[1]
       log(task_id, 'INFO', 'starting session')
       main()
    except Exception, err:
       log(task_id, 'exception', err)
       print err
       collection_task.update({'task_id':task_id}, {'$set': {'status':'STOPPED', 'pid':None}})
       log(task_id, 'INFO', 'updating database [status:STOPPED]')
    else:
       collection_task.update({'task_id':task_id}, {'$set': {'status':'SLEEP', 'pid':None}})
       log(task_id, 'INFO', 'updating database [status:SLEEP] for %s sec' %delay)
       time.sleep(delay)
Django-Piston
    ( A mini-framework for Django but powerfull for creating RESTful APIs )
               https://bitbucket.org/jespern/django-piston/wiki/Home



●    Ties into Django's internal mechanisms.
●    Supports OAuth out of the box (as well as Basic/Digest or custom auth.)
●    Doesn't require tying to models, allowing arbitrary resources.
●    Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.)
●    Ships with a convenient reusable library in Python
●    Respects and encourages proper use of HTTP (status codes, ...)
●    Has built in (optional) form validation (via Django), throttling, etc.
●    Supports streaming, with a small memory footprint.
●    Stays out of your way.
How to ?
Include on urls.py
url(r'^api/', include('api.urls')),

Include on settings.py

INSTALLED_APPS = (
  ….......
  'api',

Create folder name /api/ on project
directory and file.
-API/
-----handlers.py
-----__init__.py
-----urls.py
Rest API'S urls.py

from django.conf.urls.defaults import *
from piston.resource import Resource
from piston.authentication import HttpBasicAuthentication
from api.handlers import *

auth = HttpBasicAuthentication(realm="website-monitoring")
ad = { 'authentication': auth }

main = Resource(handler=Main, **ad)
monitor = Resource(handler=Monitor, **ad)

urlpatterns = patterns('',
  url(r'^(?P<obj_id>[^/]+)/$', main),
  url(r'^monitor/(?P<obj_id>[^/]+)/$', monitor),
)
Rest API'S handlers.py
from piston.handler import BaseHandler
class Main(BaseHandler):
   allowed_methods = ('GET')
   def read(self, request, obj_id):
      data = collection_user.find_one({'pk': obj_id})
      if data:
         return data
      data = collection_monitor.find_one({'pk': obj_id})
      if data:
         return data
class Monitor(BaseHandler):
   allowed_methods = ('GET', 'PUT', 'DELETE')
   fields = ('url', 'status', 'hit', 'fail_hit', 'year', 'month', 'day', 'hour', 'email', 'period', 'diff')
   def read(self, request, obj_id):
      try:
         if obj_id == 'all':
            data = list(collection_monitor.find({'username': str(request.user)}))
         elif obj_id =="status_running":
            data = list(collection_monitor.find({'status':'running'}))
            ….........
      except Exception, err:
         return rc.BAD_REQUEST
      return data

  def update(self, request, obj_id):
    try:
       if obj_id == 'create':
          url_list = []
          for i in collection_monitor.find({'username': str(request.user)}):
              url_list.append(i['url'])
          if request.PUT.get('url') in url_list:
              print '[+] Url is exist '
              print '[+] Data will be Update '

       else:
         raise Exception
     except Exception, err:
       print err
       return rc.BAD_REQUEST
      …......................
def delete(self, request, obj_id):
     try:
        if obj_id == 'all':
           for i in collection_monitor.find({'username': str(request.user)}):
              collection_monitor.remove({'username': str(request.user)})
        else:
           if collection_monitor.find_one({'pk': obj_id}):
              collection_monitor.remove({'pk': obj_id})

    except Exception, err:
      print err
      return rc.FORBIDDEN
    else:
      print 'deleted'
      return rc.DELETED
Facebook Integration ?
●   Just for lazy people
●   You don't have to fill the register form just login
    in to your facebook then klick – klick & klick .
●   Good for bussiness marketing
●   Easy integrate & Etc
●   Download :
●    git clone
    http://github.com/dickeytk/django_facebook_oauth.git
Question ?
●   Twitter :@jimmyromanticde
●   Facebook:https://www.facebook.com/jimmy.ro
    mantic.devil
●   Email : romanticdevil.jimmy@gmail.com
●   Bitbucket:
    https://bitbucket.org/jimmyromanticdevil/
●   Blog : http://jimmyromanticdevil.wordpress.com
References
               http://www.python.org
          https://www.djangoproject.com
              http://www.mongodb.org
             http://www.rabbitmq.com
        http://pypi.python.org/pypi/pymongo

           https://github.com/ask/carrot/

https://bitbucket.org/jespern/django-piston/wiki/Home

http://github.com/dickeytk/django_facebook_oauth.git

         Life in a Queue “Tareque Hossain”
             Google “Message Queue”
Thank You ! :)

More Related Content

Viewers also liked (8)

PDF
Resftul API Web Development with Django Rest Framework & Celery
Ridwan Fadjar
 
PDF
Practical Celery
Cameron Maske
 
PDF
Building Distributed System with Celery on Docker Swarm - PyCon JP 2016
Wei Lin
 
PDF
Life in a Queue - Using Message Queue with django
Tareque Hossain
 
PDF
Distributed Task Processing with Celery - PyZH
Cesar Cardenas Desales
 
ODP
Europython 2011 - Playing tasks with Django & Celery
Mauro Rocco
 
PDF
An Introduction to Celery
Idan Gazit
 
PDF
Queue Everything and Please Everyone
Vaidik Kapoor
 
Resftul API Web Development with Django Rest Framework & Celery
Ridwan Fadjar
 
Practical Celery
Cameron Maske
 
Building Distributed System with Celery on Docker Swarm - PyCon JP 2016
Wei Lin
 
Life in a Queue - Using Message Queue with django
Tareque Hossain
 
Distributed Task Processing with Celery - PyZH
Cesar Cardenas Desales
 
Europython 2011 - Playing tasks with Django & Celery
Mauro Rocco
 
An Introduction to Celery
Idan Gazit
 
Queue Everything and Please Everyone
Vaidik Kapoor
 

Similar to Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django (20)

PDF
Django at Scale
bretthoerner
 
PDF
Evented applications with RabbitMQ and CakePHP
markstory
 
PDF
Python & Django TTT
kevinvw
 
ODP
The Art of Message Queues - TEKX
Mike Willbanks
 
PDF
On Rabbits and Elephants
Gavin Roy
 
PDF
PyCon 2011 Scaling Disqus
zeeg
 
PDF
Celery: The Distributed Task Queue
Richard Leland
 
PDF
Fixing twitter
Roger Xia
 
PDF
Fixing_Twitter
liujianrong
 
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
PDF
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
PDF
Pinkoi Platform
mikeleeme
 
KEY
Cooking a rabbit pie
Tomas Doran
 
PDF
RabbitMQ with python and ruby RuPy 2009
Paolo Negri
 
PDF
MongoDB as Message Queue
MongoDB
 
PDF
Python RESTful webservices with Python: Flask and Django solutions
Solution4Future
 
PPT
Rabbit MQ introduction
Sitg Yao
 
PDF
RabbitMQ
Lenz Gschwendtner
 
PDF
App engine devfest_mexico_10
Chris Schalk
 
KEY
Real time system_performance_mon
Tomas Doran
 
Django at Scale
bretthoerner
 
Evented applications with RabbitMQ and CakePHP
markstory
 
Python & Django TTT
kevinvw
 
The Art of Message Queues - TEKX
Mike Willbanks
 
On Rabbits and Elephants
Gavin Roy
 
PyCon 2011 Scaling Disqus
zeeg
 
Celery: The Distributed Task Queue
Richard Leland
 
Fixing twitter
Roger Xia
 
Fixing_Twitter
liujianrong
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Pinkoi Platform
mikeleeme
 
Cooking a rabbit pie
Tomas Doran
 
RabbitMQ with python and ruby RuPy 2009
Paolo Negri
 
MongoDB as Message Queue
MongoDB
 
Python RESTful webservices with Python: Flask and Django solutions
Solution4Future
 
Rabbit MQ introduction
Sitg Yao
 
App engine devfest_mexico_10
Chris Schalk
 
Real time system_performance_mon
Tomas Doran
 
Ad

Recently uploaded (20)

PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
PDF
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
PPTX
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
PDF
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
PDF
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
PDF
Draugnet: Anonymous Threat Reporting for a World on Fire
treyka
 
PDF
Introducing and Operating FME Flow for Kubernetes in a Large Enterprise: Expe...
Safe Software
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
PDF
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
PDF
Kubernetes - Architecture & Components.pdf
geethak285
 
PDF
Sound the Alarm: Detection and Response
VICTOR MAESTRE RAMIREZ
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Wondershare Filmora Crack Free Download 2025
josanj305
 
PPTX
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
PDF
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
PDF
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
99 Bottles of Trust on the Wall — Operational Principles for Trust in Cyber C...
treyka
 
Hello I'm "AI" Your New _________________
Dr. Tathagat Varma
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Poster...
Michele Kryston
 
Securing Model Context Protocol with Keycloak: AuthN/AuthZ for MCP Servers
Hitachi, Ltd. OSS Solution Center.
 
Optimizing the trajectory of a wheel loader working in short loading cycles
Reno Filla
 
Understanding The True Cost of DynamoDB Webinar
ScyllaDB
 
Draugnet: Anonymous Threat Reporting for a World on Fire
treyka
 
Introducing and Operating FME Flow for Kubernetes in a Large Enterprise: Expe...
Safe Software
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
MARTSIA: A Tool for Confidential Data Exchange via Public Blockchain - Pitch ...
Michele Kryston
 
TrustArc Webinar - Navigating APAC Data Privacy Laws: Compliance & Challenges
TrustArc
 
Kubernetes - Architecture & Components.pdf
geethak285
 
Sound the Alarm: Detection and Response
VICTOR MAESTRE RAMIREZ
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Wondershare Filmora Crack Free Download 2025
josanj305
 
01_Approach Cyber- DORA Incident Management.pptx
FinTech Belgium
 
Darley - FIRST Copenhagen Lightning Talk (2025-06-26) Epochalypse 2038 - Time...
treyka
 
How to Comply With Saudi Arabia’s National Cybersecurity Regulations.pdf
Bluechip Advanced Technologies
 
Ad

Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django

  • 1. Website Monitoring with Distributed Messages/Tasks Processing (AMQP & RabbitMQ) on Django
  • 2. About me? ● Rahmat Ramadhan Irianto ● Software Developer at Void-Labs & Defpy-Labs ● is a Open Source Software Developer Team ● A Student from Indonesian University STMIK Dipanegara 2010 Makassar ● Lives in Indonesian, Makassar ● Write Python Apps every day
  • 3. What is Website-Monitoring ? ● Website monitoring provides page change monitoring and notification services to internet users worldwide. Website monitoring will create a change log for the page and alert user by email when it detects a change in the page text.
  • 4. What Useful For ? ● Website monitoring can monitor almost any page on the internet and when it detect page changes then it will alert you by email. ● Website Monitoring can be your good choice for business intelligence strategy. Track your competition and get timely alerts when a they changes their website. or You can Watch for developments at your customer's websites. ● Monitor the press release page of companies you are invested in. Keep track of their current executives. Be alerted to changes on their home page. ● Monitoring page privacy policies or terms and conditions without notice companies on the web , Now you can use website monitoring for alert you to these changes. ● Monitor the new job listings pages at companies where you would like to work. When they post a new listing, we will email you. ● Keep your up to date news. Monitor news page of your top site news. When they update it, you'll get an email alert. Inspirate from changedetection ● And much more http://www.changedetection.com
  • 5. What Power build Website- monitoring? http://goo.gl/hCf34
  • 6. Python ! http://goo.gl/sSqHh ( Powerfull,Efficient,flexibility,ideal language,Effective for OOP,Elegant syntax,Rich of library & etc ) www.python.org
  • 7. http://goo.gl/YXnA9 Django ! ( Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design & Etc) https://www.djangoproject.com/
  • 8. Mongodb ( flexibility, powerfull, Fast, and ease of use ) http://www.mongodb.org http://goo.gl/NZQ18
  • 9. RabbitMQ ( Powerfull,fast, reliable & high availability for message queuing system. open source queueing option & Greats for building and managing scalable applications) http://www.rabbitmq.com http://goo.gl/Pvd9Q
  • 11. Ajax Post Post Api request If Post Api Rest Api Save data If ajax post Procces task Scrape page Message queue Create worker worker Myview Publish task Save result If changepage Save data Alert Email Report Diff Mongodb
  • 12. Lets Talk About http://goo.gl/m8QUH
  • 13. Why Mongodb ? ● Greats features of document databases,key- value stores, and relational databases. ● How greats ? ● Fast ● Smart ● Scalable ● Schema-less ● Dynamic Query ● Easy use & etc..
  • 14. What we gonna Need ? + = Pymongo http://pypi.python.org/pypi/pymongo/
  • 15. How to ? import pymongo from pymongo import Connection collection_user = pymongo.Connection().website_monitor.user collection_monitor = pymongo.Connection().website_monitor.monitor collection_task = pymongo.Connection().website_monitor.task INSERT monitor = {'username':smart_str(request.user), 'user_id':request.user.id, 'url':url, 'datetime':datetime.utcnow(), 'status':status, 'hit':0, 'fail_hit':0, 'period':int(request.POST.get('period')), 'email':collection_user.find_one({'name':str(request.user)})['email'], 'pk':pk, 'last_checking':None, 'task_id':task_id, } collection_monitor.insert(monitor)
  • 16. UPDATE collection_user.update({'name':data_user['id']},{'$set': {'email':data_user['email'], 'firstname':smart_str(data_user['first_name']), 'lastname':smart_str(data_user['last_name']), 'ip': request.META.get('REMOTE_ADDR','unknown'), 'login':datetime.now(), 'user_agent': request.META.get('HTTP_USER_AGENT','unknown'), 'session': request.META.get('XDG_SESSION_COOKIE','unknown'), 'session_fb':session_key, 'ts':datetime.now(), 'authkey':authkey, } } ) REMOVE if collection_content.find({'url':i['url']}).count() == 3: collection_content.remove({'url':i['url'][0]})
  • 17. Why we must use Distributed Computing Distributed Computing Is a method of solving computational problem by dividing the problem into many tasks run simultaneously on many hardware or software systems (Wikipedia)
  • 18. What is Message queue ? Message Queues are: 0->Communication Buffers 0->Between independent sender & receiver processes 0->Asynchronous • Time of sending not necessarily same as receiving • In context of Web Applications: o Sender: Web Application Servers o Receiver: Background worker processes o Queue items: Tasks that the web server doesn’t have time/resources to do
  • 19. How it work ? Say a web application server has a task it doesn’t have time to do • It puts the task in the message queue • Other web servers can access the same queue(s) and put tasks there • Workers are greedy and they all watch the queues for tasks • Workers asynchronously pick up the first available task on the queue when they are ready
  • 20. What usefull for ? • Message Queues are useful in certain situations • General guidelines: 0->Does your web applications take more than a few seconds to generate a response? o->Are you using a lot of cron jobs to process data in the background? o->Do you wish you could distribute the processing of the data generated by your application among many servers?
  • 21. What We Need To Make Message Queue ?
  • 23. Why Choice AMQP & RabbitMQ ? 1.RabbitMQ is free to use 2.The documentation is decent 3.There is decent clustering support, even though we never needed clustering 4.We didn’t want to lose queues or messages upon broker crash/ restart 5. We develop applications using Python/django and setting up an AMQP backend using carrot was easy
  • 24. Now Lets Talk about RabbitMQ
  • 25. RabbitMQ ? RabbitMQ is Erlang-based open source application that serves as a message broker or message-oriented middleware. RabbitMQ implementation refers to the application layer protocol that is the Advanced Message Queuing Protocol(AMQP). AMQP provide an interoperable standard protocol between the vendor to regulate the exchange of messages on enterprise-scale systems.
  • 26. Why Use RabbitMQ ? ● We need For... ● Running Task / Procces in the backround ● Asynchronous tasking process ● Scheduling system & Etc
  • 27. So .. What make Rabbit Focus ?
  • 28. Carrot ! Carrot is an AMQP messaging queue framework. AMQP is the Advanced Message Queuing Protocol, an open standard protocol for message orientation, queuing, routing, reliability and security. Easy way to connect to RabbitMQ. Easy way to pull stuff out of the queue. Easy way to throw stuff into the queue. https://github.com/ask/carrot/
  • 29. Concept ? ● Publishers (Publishers sends messages to an exchange.) ● Exchanges (Messages are sent to exchanges. Exchanges are named and can be configured to use one of several routing algorithms. The exchange routes the messages to consumers by matching the routing key in the message with the routing key the consumer provides when binding to the exchange.) ● Consumers (Consumers declares a queue, binds it to a exchange and receives messages from it.) ● Queues ( Queues receive messages sent to exchanges. The queues are declared by consumers. ) ● Routing keys ( Every message has a routing key. The interpretation of the routing key depends on the exchange type. There are four default exchange types defined by the AMQP standard, and vendors can define custom types (so see your vendors manual for details ) ● Exchange types defined by AMQP/0.8: ● Direct exchange ( Matches if the routing key property of the message and the routing_key attribute of the consumer are identical. ) ● Fan-out exchange(Always matches, even if the binding does not have a routing key.) ● Topic exchange (Matches the routing key property of the message by a primitive pattern matching scheme.)
  • 30. Creating Connetion on Django Settings.py RABBITMQ_HOST = 'localhost' RABBITMQ_PORT = 5672 RABBITMQ_USER = 'guest' RABBITMQ_PASS = 'guest' RABBITMQ_VHOST = '/' Views.py from carrot.messaging import Publisher, Consumer from carrot.connection import AMQPConnection from django.conf import settings conn_for_carrot = AMQPConnection(hostname=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, userid=settings.RABBITMQ_USER, password=settings.RABBITMQ_PASS, vhost=settings.RABBITMQ_VHOST)
  • 31. Publisher publisher = Publisher(connection=conn_for_carrot, exchange='website_monitoring_exchange', exchange_type = 'direct') publisher.send({'msg':{'do': 'check', 'task_id':task_id, } }) publisher = Publisher(connection=conn_for_carrot, exchange='website_monitoring_exchange', exchange_type = 'direct') publisher.send({'msg':{'do': 'check', 'task_id':hashlib.md5(str(task_id) +request.PUT.get('url')).hexdigest(), } })
  • 32. Consumer def monitoring_check(): def call(message_data,message): if message_data['msg']['do'] == 'check': print '[+] receiving message' message.ack() task_id = message_data['msg']['task_id'] get_pid = subprocess.Popen(['python','scraper.py', task_id]) pid = get_pid.pid collection_task.update({'task_id':task_id}, {'$set': {'status':'RUNNING', 'pid':pid}}) print '[Starting PID:%s]'%pid get_pid.wait() else: message.ack() queuename = 'website_monitoring_checker' consumer = Consumer(connection=conn_for_carrot, queue=queuename, exchange='website_monitoring_exchange', exchange_type = 'direct') consumer.register_callback(call) try: print '[queue:%s]consume..' % queuename consumer.wait() except Exception, err: print err
  • 33. Cooking soup with beautifullsoup? from BeautifulSoup import BeautifulSoup monitor = collection_monitor.find_one({'pk':pk}) contents = [collection_content.find({'url':str(monitor['url'])}) [1],collection_content.find({'url':str(monitor['url'])})[0]] texts = BeautifulSoup(BeautifulSoup(i['content']).prettify()).findAll(text=True) data = {'content': ' '.join(filter(visible, texts)), 'datetime': i['datetime'], } def visible(element): if element.parent.name in ['style', 'script', '[document]', 'head', 'title']: return False if re.search('<!--', str(element)) or re.search('-->', str(element)) or re.search('&nbsp;', str(element)): return False return True
  • 34. Alert by email ! def sending_email(to,sub,msg): try: gmail_user = '[email protected]' gmail_pwd = '***************' smtpserver = smtplib.SMTP("smtp.gmail.com",587) smtpserver.ehlo() smtpserver.starttls() smtpserver.ehlo smtpserver.login(gmail_user, gmail_pwd) header = 'To:' + to + 'n' + 'From: Website-Monitoring <'+gmail_user+'>n' + 'Subject: %sn'%sub msg = header + msg smtpserver.sendmail(gmail_user,to, msg) smtpserver.close() except Exception ,err : print err
  • 35. Task / Scheduling Checking ? task_id = sys.argv[1] print task_id raw_delay = collection_task.find_one({'task_id':task_id})['schedule'] print raw_delay if raw_delay == "1": delay = 60*60 elif raw_delay =="12": delay = 720*60 else: delay = 1440*60 while True: try: print '[+] Starting task: %s' %sys.argv[1] log(task_id, 'INFO', 'starting session') main() except Exception, err: log(task_id, 'exception', err) print err collection_task.update({'task_id':task_id}, {'$set': {'status':'STOPPED', 'pid':None}}) log(task_id, 'INFO', 'updating database [status:STOPPED]') else: collection_task.update({'task_id':task_id}, {'$set': {'status':'SLEEP', 'pid':None}}) log(task_id, 'INFO', 'updating database [status:SLEEP] for %s sec' %delay) time.sleep(delay)
  • 36. Django-Piston ( A mini-framework for Django but powerfull for creating RESTful APIs ) https://bitbucket.org/jespern/django-piston/wiki/Home ● Ties into Django's internal mechanisms. ● Supports OAuth out of the box (as well as Basic/Digest or custom auth.) ● Doesn't require tying to models, allowing arbitrary resources. ● Speaks JSON, YAML, Python Pickle & XML (and HATEOAS.) ● Ships with a convenient reusable library in Python ● Respects and encourages proper use of HTTP (status codes, ...) ● Has built in (optional) form validation (via Django), throttling, etc. ● Supports streaming, with a small memory footprint. ● Stays out of your way.
  • 37. How to ? Include on urls.py url(r'^api/', include('api.urls')), Include on settings.py INSTALLED_APPS = ( …....... 'api', Create folder name /api/ on project directory and file. -API/ -----handlers.py -----__init__.py -----urls.py
  • 38. Rest API'S urls.py from django.conf.urls.defaults import * from piston.resource import Resource from piston.authentication import HttpBasicAuthentication from api.handlers import * auth = HttpBasicAuthentication(realm="website-monitoring") ad = { 'authentication': auth } main = Resource(handler=Main, **ad) monitor = Resource(handler=Monitor, **ad) urlpatterns = patterns('', url(r'^(?P<obj_id>[^/]+)/$', main), url(r'^monitor/(?P<obj_id>[^/]+)/$', monitor), )
  • 39. Rest API'S handlers.py from piston.handler import BaseHandler class Main(BaseHandler): allowed_methods = ('GET') def read(self, request, obj_id): data = collection_user.find_one({'pk': obj_id}) if data: return data data = collection_monitor.find_one({'pk': obj_id}) if data: return data
  • 40. class Monitor(BaseHandler): allowed_methods = ('GET', 'PUT', 'DELETE') fields = ('url', 'status', 'hit', 'fail_hit', 'year', 'month', 'day', 'hour', 'email', 'period', 'diff') def read(self, request, obj_id): try: if obj_id == 'all': data = list(collection_monitor.find({'username': str(request.user)})) elif obj_id =="status_running": data = list(collection_monitor.find({'status':'running'})) …......... except Exception, err: return rc.BAD_REQUEST return data def update(self, request, obj_id): try: if obj_id == 'create': url_list = [] for i in collection_monitor.find({'username': str(request.user)}): url_list.append(i['url']) if request.PUT.get('url') in url_list: print '[+] Url is exist ' print '[+] Data will be Update ' else: raise Exception except Exception, err: print err return rc.BAD_REQUEST …......................
  • 41. def delete(self, request, obj_id): try: if obj_id == 'all': for i in collection_monitor.find({'username': str(request.user)}): collection_monitor.remove({'username': str(request.user)}) else: if collection_monitor.find_one({'pk': obj_id}): collection_monitor.remove({'pk': obj_id}) except Exception, err: print err return rc.FORBIDDEN else: print 'deleted' return rc.DELETED
  • 42. Facebook Integration ? ● Just for lazy people ● You don't have to fill the register form just login in to your facebook then klick – klick & klick . ● Good for bussiness marketing ● Easy integrate & Etc ● Download : ● git clone http://github.com/dickeytk/django_facebook_oauth.git
  • 43. Question ? ● Twitter :@jimmyromanticde ● Facebook:https://www.facebook.com/jimmy.ro mantic.devil ● Email : [email protected] ● Bitbucket: https://bitbucket.org/jimmyromanticdevil/ ● Blog : http://jimmyromanticdevil.wordpress.com
  • 44. References http://www.python.org https://www.djangoproject.com http://www.mongodb.org http://www.rabbitmq.com http://pypi.python.org/pypi/pymongo https://github.com/ask/carrot/ https://bitbucket.org/jespern/django-piston/wiki/Home http://github.com/dickeytk/django_facebook_oauth.git Life in a Queue “Tareque Hossain” Google “Message Queue”