Saturday, May 9, 2015

Django Signals Performance Implications

Django signals are cool and very useful sometimes. They allow certain receivers get notified when certain actions occur elsewhere in the framework. For example, if we have registered a receiver function save_contribution() to listen on the post_save signal, when an article is saved, the save_contribution() will be notified and executed without explicitly calling it in the articles module. Cool, right?

However, the performance implications regarding the Django signals can be easily overlooked. An optimization I did recently reduced the number of unnecessary signals being fired from 3525 to 36 in one of our applications, which significantly speed up the sites.

Take a look at the post_init signal below:

def your_awesome_function(...):
    ...

post_init.connect(your_awesome_function)    # bad

This is bad unless you really want your_awesome_function to be called every time an object is created. What's the problem here? Well, this signal doesn't specify the Sender. As a result,  if you have a large application as we do, say, your app has 10000 objects, the function will be called 10000 times, faithfully!

post_init.connect(your_awesome_function, sender=YourModel)  # better

Now, it's better. It only gets fired when an instance of YourModel is created.

There are other cases that signals can be avoided altogether. For instance, I found a couple of unnecessary post_init signals in our applications that can be easily replaced with the functions in the base class because they really mean only for the subclasses of that base class.  

Keep in mind that signals are asynchronous, or blocking. For the performance consideration, the rule of thumb is:

1) Specify the Sender if possible.

2) Eliminate unnecessary signals.

Saturday, January 24, 2015

Using django-pyodbc to Connect to MS SQL Server Database on Ubuntu


Django doesn't officially support Microsoft SQL Server. To connect to SQL Server in your Django projects, you can use django-pyodbc as the DB backend.

1. Install required packages on your Ubuntu server:

sudo apt-get install unixodbc unixodbc-dev freetds-dev tdsodbc

In the virtualenv of your django project, install pyodbc and django-pyodbc:

pip install --allow-external pyodbc --allow-unverified pyodbc pyodbc
pip install django-pyodbc 
 
2. Edit /etc/freetds/freetds.conf to update the server for SQL Server .

# A typical Microsoft server
[sqlserver]
        host = <server ip>
        port = 1433
        tds version = 7.0


3. Set up FreeTDS driver in etc/odbcinst.ini (Run odbcinst -j to check where the configuration files are located and make sure the paths for the Driver and Setup are valid):

[FreeTDS]
Description = TDS driver (Sybase/MS SQL)
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
CPTimeout =
CPReuse =
FileUsage = 1

4. Update the DATABASES setting for your django project:

DATABASES = { 
     'default': {
         'ENGINE': "django_pyodbc", 
         'HOST': "sqlserver", 
         'USER': "mssql_user", 
         'PASSWORD': "mssql_password", 
         'NAME': "database_name", 
         'OPTIONS': {
             'host_is_server': False, 
             'driver': 'FreeTDS', 
         }, 
     } 
}

5. Test your connection and celebrate.

Friday, June 22, 2012

The Expensive Dots - A Case of Django Performance Optimization

Django's ORM is handy and convenient in a way that developers don't have to focus on the implementation under the hood. You can use the little dots to retrieve all related objects, such as foo.bar.tar.hello.kittie.. However, understanding what's going on under the scene helps you build efficient applications.

I recently had a chance to optimize an event report using Django debug toolbar. I dumped a production database to my local for testing. This is the before-after result:

# of Queries Page Load Time(ms) CPU Time(ms)
Before Optimization 5698 3444 51156
After Optimization 16 58 958

Probably nobody has ever realized that a report with only159 entries can hit database 5698 times. That's crazy talk! Yes, blame the little dots (or not paying attention to what a little dot can do, to be precise).

For the page load time and CPU time, of course, this is measured on my local. The production server can be much faster.

If you calculate the percentage decrease, the # of queries, page load time and CPU are 99.72%, 98.32% and 98.13%, respectively. 

So, how did I optimize it?

My goal was to reduce both # of database hits and CPU time. So basically, all the effort was to prevent the "expensive dots":
  • Eliminated the unnecessary queries.
  • Used aggregate functions instead of looping through all objects to calculate the totals.  
  • Used select_related with care. 
  • Selected only the fields needed with values_list. 

1) Eliminated the unnecessary queries.

Only small percentage of our events has the addons. But when looping through the registrants, each registrant queries the database for registered addons regardless of whether or not the event even has any addons. So this was taken care of by adding the if statement to check if an event has any addons. Notice that we didn't call event.has_addons. Instead, we stored the event.has_addons to event_has_addons because has_addons is a function and every time it gets called, a database query will be issued.

  1. {% if event_has_addons %}
  2. {% with registrant.registration.regaddon_set.all as regaddons %}
  3. {% if regaddons %}
  4. .....
  5. {% endif %}
  6. {% endwith %}
  7. {% endif %}

Now, look at two lines below to see the difference.

  1. {% if registrant.user %}

  1. {% if registrant.user_id %}

They look similar, right? But the first one can be expensive - It needs to query the database to get the user object. While the second registrant already has the user_id available. If all you need is the user id, then use registrant.user_id instead of registrant.user or registrant.user.id.  

2) Used aggregate functions instead of looping through all objects to calculate the totals.

Originally, the total and balance was calculated by looping through each registration like this:

  1. # get total and balance (sum)
  2. for reg8n in registrations:
  3.    if not reg8n.canceled: # not cancelled
  4.        if reg8n.invoice != None:
  5.            if roster_view != 'paid':
  6.               total_sum += float(reg8n.invoice.total)
  7.               balance_sum += float(reg8n.invoice.balance)

Notice reg8n.canceled, the canceled is not a field of reg8n. In order to get the value of  reg8n.canceled, it goes through each registrant to check if they have the cancel_dt. Very inefficient. And each reg8n.invoice needs to query the database for invoice as well. 
 
  1.   @property
  2.     def canceled(self):
  3.         """
  4.         Return True if all registrants are canceled. Otherwise False.
  5.         """
  6.         registrants = self.registrant_set.all()
  7.         for registrant in registrants:
  8.             if not registrant.cancel_dt:
  9.                 return False
  10.         return True

So, rather than a canceled function, a canceled field was added to the reg8n. With the canceled field, I was able to use the aggregate function with only one query.

     
  1.    # Get the total_sum and balance_sum.
  2.    totals_d = Registration.objects.filter(
  3.                         event=event,
  4.                         canceled=False
  5.                       ).aggregate(
  6.                          total_sum=Sum('invoice__total'),
  7.                          balance_sum=Sum('invoice__balance') )
  8.     total_sum = totals_d['total_sum']
  9.    balance_sum = totals_d['balance_sum']

3) Used select_related with care.

As you know, you can use select_related to reduce number of database queries. However, reducing the number of database queries doesn't necessarily improve the performance. It would do the opposite sometimes when the tables doing the joins are hugely big.  For that reason, select_related was used selectively.

4) Selected only the fields needed with values_list. 

The values_list was used heavily when optimizing the report. It helped retrieving the field values of the custom registration form for each registrant with only one query and very little CPU time. Same with the pricing.


After done all of those, although very few probably would care about a faster report , I know, our servers will thank me :-).



Monday, January 2, 2012

Using jQuery Auto Complete in Django

I recently implemented jquery auto complete in django. I thought it might be helpful to write down the entire process. At least, it can serve as a note for me in the future.

Assuming you have a project module. One of the field champion is the foreign key to the auth user. And you want the champion field to have the auto complete feature as the editor types a name in the text box.

models.py:
  1. from django.db import models
  2. class Project(models.Model):
  3.      title = models.CharField(max_length=200)
  4.      champion = models.ForeignKey(User)


In the forms.py below, as you noticed, an extra field champion_display is added, and champion is turned to a hidden field. This is because we want the auto complete text box to show the name of the selected user not the id. The id will be assigned to the hidden field champion.  

forms.py:
  1. from django import forms

  2. class ProjectForm(forms.ModelForm):
  3.      champion_display = forms.CharField(max_length=100, help_text='type name or username or email')
  4.      class_Meta:
  5.           model = Project
  6.           fields=('title', 'champion_display', 'champion',)

  7.      def __init__(self, *args, **kwargs):
  8.           super(ProjectForm, self).__init__(*args, **kwargs)
  9.           self.field['champion_display'].label = "Add a Champion"
  10.           self.fields['champion'].widget = forms.HiddenInput()

In your project urls.py, add the champion auto complete url.
         
urls.py:       
  1. from django.conf.urls.defaults import patterns, url

  2. urlpatterns = patterns('projects.views',
  3.      url(r'^add/$', 'add'),
  4.      url(r'^champion_auto_complete/$', 'champion_auto_complete', name='champion_auto_complete'),
  5. )

views.py:
  1. from django.http import HttpResponse
  2. from django.contrib.auth.models import User
  3. from django.utils import simplejson

  4. def champion_auto_complete(request):
  5.      q = request.REQUEST('term']
  6.      users = User.objects.filter(is_active=True)
  7.      users_list = []

  8.      for u in users:
  9.           value = '%s, %s (%s) - %s' % (u.last_name, u.first_name, u.username, u.email)
  10.           u_dict = {'id': u.id, 'label': value, 'value': value}
  11.           users_list.append[u_dict]

  12.      return HttpResponse(simplejson.dumps(users_list),mimetype='application/json')


In the Project template, you need to include: jqueryui css, jquery and jqueryui.

templates/projects/add.html:
  1. ......
    <link rel="stylesheet" href="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.16/themes/base/jquery-ui.css" type="text/css" media="all" />

  2. ......
  3. <form action="" method="post">{% csrf_token %}
  4. <div class="forms">
  5.     {{ form }}
  6.     <input type="submit" name="submit" value="Save" />
  7. </div>
  8. </form>
  9. ......

  10. <script src="http://ajax.googleapis.com/ajax/libs/jquery/1.6.2/jquery.min.js" type="text/javascript"></script>
  11. <script src="http://ajax.googleapis.com/ajax/libs/jqueryui/1.8.16/jquery-ui.min.js" type="text/javascript"></script>

  12.  <script type="text/javascript">
  13.    $(document).ready(function(){  
  14.     $( "#id_champion_display" ).autocomplete({
  15.       source: "{% url champion_auto_complete %}",
  16.       selectFirst:true,
  17.       minLength:2,
  18.       select:function(event,ui) {
  19.         $("#id_champion").val(ui.item.id)
  20.       }
  21.         });
  22.        
  23.    });