Speeding up Dlib’s Facial Landmark Detector

By | May 24, 2016

In this tutorial I will explore a few ways to speed up Dlib’s Facial Landmark Detector.

Dlib’s Facial Landmark Detector

Dlib has a very good implementation of a very fast facial landmark detector. I had reviewed it in my post titled Facial Landmark Detection.

Subsequently, I wrote a series of posts that utilize Dlib’s facial landmark detector.

  1. Facial Landmark Detection
  2. Delaunay Triangulation
  3. Face Morphing
  4. Face Swap
  5. Average Face

There are two example files in Dlib that deal with facial landmark detection

  1. For Images : dlib/examples/face_landmark_detection_ex.cpp
  2. For Videos : dlib/examples/webcam_face_pose_ex.cpp
The tricks used in this post are included in my version of Dlib in the following files

  1. For Images : dlib/examples/face_landmark_detection_to_file.cpp
  2. For Videos : dlib/examples/webcam_face_pose_fast.cpp

This post fully explains all the tricks and provides snippets of code. To get access to the above files, and code and images used in all other posts please subscribe to our newsletter.

About the only complaint I have heard from readers of this blog about Dlib’s facial landmark detector is that it is slow. Is it really slow ? Yes and No. Out of the box it appears to be slow, but that is not because of bad implementation of the Facial Landmark Detector. Let’s find out the bottlenecks and how to improve the speed.

How to make Dlib’s Facial Landmark Detector faster ?

Dlib’s facial landmark detector implements a paper that can detect landmarks in just 1 millisecond! That is 1000 frames a second. You will never get 1000 fps because you first need to detect the face before doing landmark detection and that takes a few 10s of milliseconds. But you can easily do 30 fps with the optimizations listed below.

Compile Dlib in Release Mode with Optimizations turned on

As mentioned in the Dlib’s documentation, it is critical to compile Dlib in release mode with appropriate compiler instructions turned on.


cd dlib/examples
mkdir build
cd build

# Enable compiler instructions. 
# In the example below I have enabled SSE4
# Use the one that is appropriate for you

# SSE2 works for most Intel or AMD chip. 

# SSE4 works for most current machines

# AVX works on processors released after 2011. 

# Compile in release mode
cmake --build . --config Release

If you are using Intel or AMD chip enable at least SSE2 instructions. AVX is the fastest but requires a CPU from at least 2011. SSE4 is the next fastest and is supported by most current machines.

Using Visual Studios

People often make this mistake while using Visual Studios because by default they are working in the debug mode. You can see detailed explanation and how to fix it here.

Visual Studios Release Mode

Using QT

Similarly while using QT you need to turn on Release mode as show below.

Qt Release Mode

Speed Up Face Detection

The following steps will help speed up face detection with small ( probably negligible ) loss in accuracy.

Resize Frame

Facial Landmark Detector algorithms usually require the user to provide a bounding box containing a face. The algorithm takes as input this box and returns the landmarks. The time reported by these algorithms is only the time required to do landmark detection and not the face detection. Landmark detection algorithms can run in less than 5 milliseconds, but face detection can take a long time ( 30 milliseconds ). The speed of face detection depends on the the resolution of the image because with smaller resolution images, you look for a smaller range of face sizes. The downside is that you will miss out smaller faces, but in most of the applications I have listed above we have one person looking at the webcam from arm’s length.

An easy way to speed up face detection is to resize the frame. My webcam records video at 720p ( i.e. 1280×720 ) resolution and I resize the image to a quarter of that for face detection. The bounding box obtained should be resized by dividing the coordinates by the scale used for resizing the original frame. This allows us to do facial landmark detection at full resolution.

Skip frame

Typically webcams record video at 30 fps. In a typical application you are sitting right in front of the webcam and not moving much. So there is no need to detect the face in every frame. We can simply do facial landmark detection based on facial bounding box obtained a few frames earlier. If you do face detection every 3 frames, you can have just sped up landmark detection by almost three times.

Is is possible to do better than using the previous location of the frame ? Yes, we can use Kalman filtering to predict the location of the face in frames where detection is not done, but in a webcam application it is an overkill.

The snippet of code for the above optimizations is show below. Check out the highlighted lines.


#define SKIP_FRAMES 2

cv::VideoCapture cap(0);
cv::Mat im;
cv::Mat im_small, im_display;

frontal_face_detector detector = get_frontal_face_detector();
shape_predictor pose_model;
deserialize("shape_predictor_68_face_landmarks.dat") >> pose_model;

int count = 0;
std::vector<rectangle> faces;


// Grab a frame
cap >> im;
// Resize image for face detection
cv::resize(im, im_small, cv::Size(), 1.0/FACE_DOWNSAMPLE_RATIO, 1.0/FACE_DOWNSAMPLE_RATIO);
// Change to dlib's image format. No memory is copied.
cv_image<bgr_pixel> cimg_small(im_small);
cv_image<bgr_pixel> cimg(im);

// Detect faces on resize image
if ( count % SKIP_FRAMES == 0 )
    faces = detector(cimg_small);
// Find the pose of each face.
std::vector<full_object_detection> shapes;
for (unsigned long i = 0; i < faces.size(); ++i)
    // Resize obtained rectangle for full resolution image. 
     rectangle r(
                   (long)(faces[i].left() * FACE_DOWNSAMPLE_RATIO),
                   (long)(faces[i].top() * FACE_DOWNSAMPLE_RATIO),
                   (long)(faces[i].right() * FACE_DOWNSAMPLE_RATIO),
                   (long)(faces[i].bottom() * FACE_DOWNSAMPLE_RATIO)

    // Landmark detection on full sized image
    full_object_detection shape = pose_model(cimg, r);

    // Custom Face Render
    render_face(im, shape);

Optimizing Display

When I first tried speeding up facial landmark detector, I was surprised to find that a third of the time was spent in drawing the landmarks and displaying the frame. I did two optimizations that helped speed up things

Resize Frame

I resized the image to half resolution for display. This makes a huge difference because when the resolution is changed from 720p to 360p, the actual number of pixels that need to be displayed goes down by a factor of 4.

Custom Face Renderer

Dlib’s face render didn’t work very well for me; the frames did not render smoothly. So I wrote my own using OpenCV’s polylines. The code is shown below


#include <dlib/image_processing/frontal_face_detector.h>
#include <opencv2/highgui/highgui.hpp>

void draw_polyline(cv::Mat &img, const dlib::full_object_detection& d, const int start, const int end, bool isClosed = false)
    std::vector <cv::Point> points;
    for (int i = start; i <= end; ++i)
        points.push_back(cv::Point(d.part(i).x(), d.part(i).y()));
    cv::polylines(img, points, isClosed, cv::Scalar(255,0,0), 2, 16);

void render_face (cv::Mat &img, const dlib::full_object_detection& d)
     d.num_parts() == 68,
     "\n\t Invalid inputs were given to this function. "
     << "\n\t d.num_parts():  " << d.num_parts()
    draw_polyline(img, d, 0, 16);           // Jaw line
    draw_polyline(img, d, 17, 21);          // Left eyebrow
    draw_polyline(img, d, 22, 26);          // Right eyebrow
    draw_polyline(img, d, 27, 30);          // Nose bridge
    draw_polyline(img, d, 30, 35, true);    // Lower nose
    draw_polyline(img, d, 36, 41, true);    // Left eye
    draw_polyline(img, d, 42, 47, true);    // Right Eye
    draw_polyline(img, d, 48, 59, true);    // Outer lip
    draw_polyline(img, d, 60, 67, true);    // Inner lip


I also tried rendering all the points using a single polyline hoping to see some improvement in speed, but there was no difference in speed at all.


Using the above optimizations I am able to get a speed of 70 fps on videos recorded at 120 fps. On my webcam I get 27-30 fps because we are limited by the recording speed of the webcam. The reported numbers include the time needed to read the frame from camera or video file, face detection, facial landmark detection and display at half resolution.

Subscribe & Download Code

If you liked this article and would like have access to my version of Dlib and all code / images used in this blog, please subscribe to our newsletter. You will also receive a free Computer Vision Resource guide. In our newsletter we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Subscribe Now

Category: Face how-to Tools Tutorial Tags:

About Satya Mallick

I am an entrepreneur with a love for Computer Vision and Machine Learning with a dozen years of experience (and a Ph.D.) in the field. In 2007, right after finishing my Ph.D., I co-founded TAAZ Inc. with my advisor Dr. David Kriegman and Kevin Barnes. The scalability, and robustness of our computer vision and machine learning algorithms have been put to rigorous test by more than 100M users who have tried our products.

  • Ahlem AH

    thank you Mr SATYA MALLICK :like:

  • Salah Eddine Bekhouche

    You can use the opencv face detector (LBP cascade) to boost the algorithm more

  • Logic

    Other very effective way to speed up is that decrese the pyramid_down size in “frontal_face_detector.h”:

    typedef object_detector<scan_fhog_pyramid<pyramid_down > > frontal_face_detector;

    The number 6 in the code above is a little large, resulting in a slow face detection which cost most of
    time, although it can detect smaller faces.
    However, if you do not need to detect such small face, you can set the number smaller, the minimum is 2, the speed can be increased 2-3 times.

    • Thanks. I had missed that one.

    • Changing the pyramid_down argument, I didn’t see any significant change in speed. Should I only change it in the “frontal_face_detector.h” or there are some other places as well?

  • random002 wang

    dlib is very perfect job!

  • Miguel Bordallo

    Detaching webcam reading, face detection and display, using separate threads for each might also speed up the system even further. You can use one thread for continuously updating the landmarks and displaying them (main thread) while in the background, the other threads are capturing images and recalculating face detection.

  • Virut Sontisuk

    How to landmark whole face?
    Thank you.

  • 林季伯

    Hello Mr Satya Mallick,
    I’m trying to find efficient and correct way to capture mouth pose.
    And your face landmark to capture facial detail is super and realtime!!
    I have some question that is there any existed dataset for only track mouth movement in real time?
    (If the camera is always set in front of mouth)
    Or should I have to train my own dataset?
    (If giving some document link that would be very helpful!!)

    Thank you!!

  • Luc Michalski

    Maybe processing a cropped version (ROI) from the camera matrix would be good too. It does not need to go through the 10%/20% around the border of the matrix matrix. Just need to recalculate dynamically the ROI and the coordinates of points.

    • thinh

      I do not actually get your idea. Could you elaborate it?

  • anhhoangta

    Hi, thanks for your great work!

    How can I get coordinates of 68 points after using facial landmark detector? I need to know this for detecting mouth closing or opening, detecting smile… Is it possible if I have the coordinates?

    Thanks in advance!

    • Abhishek Batra

      Did you find the solution ? I am also trying to achieve the same thing

    • anhhoangta

      I tried in 2 ways: make a classifier by SVM and use facial landmark detector. Classifier is quite good especially if you want to build application in mobile. I’m reading this paper for using facial landmark detector: http://vision.fe.uni-lj.si/cvww2016/proceedings/papers/05.pdf

  • abdelaziz elh

    thanks for this demo but i have a qustion what about exporting the data or file .FBX or …
    for face animation in 3D Max

  • Infa tum

    when i’m trying to compile this examples i get this error from compiler:
    [ 99%] Building CXX object CMakeFiles/webcam_face_pose_ex.dir/webcam_face_pose_ex.cpp.o
    /home/infatum/Projects/dlib/examples/webcam_face_pose_ex.cpp: In function ‘int main()’:
    /home/infatum/Projects/dlib/examples/webcam_face_pose_ex.cpp:75:13: error: ‘resize’ is not a member of ‘cv’
    cv::resize(im, im_small, cv::Size(), 1.0/FACE_DOWNSAMPLE_RATIO, 1.0/FACE_DOWNSAMPLE_RATIO);
    make[2]: *** [CMakeFiles/webcam_face_pose_ex.dir/webcam_face_pose_ex.cpp.o] Error 1
    make[1]: *** [CMakeFiles/webcam_face_pose_ex.dir/all] Error 2
    make: *** [all] Error 2
    What am i doing wrong? why id doesn’t recognize resize function?

    • Sam Zheng


  • Sergey Chilingaryan

    I cannot download the code 🙁

  • abhishek gupta

    https://uploads.disquscdn.com/images/07df99fefb54867e0c4f139777c53d5da85421ca7864b4638519702d670682e2.png https://uploads.disquscdn.com/images/7990050bb232c246f4ce5ca86eb969bf5ba712f6b28ad07e961db74a3822e81f.png

    HI Mr. Mallick

    It may be a very basic question for you. I have been working on different computer vision techniques since two years but using MATLAB only. Now I was trying to follow your blog for facial landmark detection using dlib in codeblock. But I am not able to compile the library. Here are steps I followed:

    1. I downloaded and unzipped the dlib
    2. use an empty console in codeblock and pasted the example code of facial landmark detection
    3. in compiler settings, I added the library as shown in figure 1 and 2 attached.
    4. When I build it, it shows the error that don’ put the dlib folder in the include search path. I don’t get the solution of it. What is include search path.

  • Anonymous

    Hello sir, i downloaded your code and run the webcam face pose fast program. But am still not getting 30fps. It is still in range of 15-17 fps only. Am a missing something? Please help.
    Also, i want to use dlib only for the purpose of eye corners detection. Is there anyway i can modify code and make it more efficient??

    • Are you sure your webcam is fast enough? Check its specification.

      Can you do profiling of your code to see where exactly most of the time is being spent ?

  • sunil shenoy

    Hello Satya Mallick. I have been following your tutorial for both OpenCV and Dlib libraries. It is really very helpful for freshers like me to work on computer vision. My aim was to detect and extract face landmarks in mobile device camera preview at real time in both Android and iOS platforms. I am happy to that it works in both the platforms. Compared Android, iOS provides much more better performance due to few compiler optimization and flexible support of iOS platform. Even i have tried real time pose model estimation in iOS it works well. I am trying to improve the performance in Android. Dlib Detection is taking much time(I have tried to resize and skip the frame) among all other processes. I have implemented resizing frame technique it worked pretty well. But skipping frames will cause for flicking problem. I tried to reduce the resolution of camera frame though it improves the performance detection distance will be less. i want to work with at least 720p resolution. I need guidance from you. Please do suggest me few more optimization techniques in mobile application development area. Thank you so much.

    • have you tried using Android’s face detector instead of dlib ?

      • sunil shenoy

        Thanks for the reply Satya Mallick.. No I haven’t tried that one.. I have tried opencv face detection and converted resulted rectangle to dlib::rectangle and used it for pose model implementation..

        • Most of the processing time is consumed by the face detector and the most effective way to speed up things on mobile is to use the face detector provided by Android / iOS. Face detection can take 30 ms or more and after that landmark detection takes just 1 ms.

          • sunil shenoy

            That’s is really good idea.. I’ll try to implement that.. Thank you for the guidance..

  • Sam Zheng

    I have subsricbed for long time ago but now I cant get the code?

  • 楊芷瑀

    Thanks a lot for the article but I have some problem:
    if I use VS2012, OpenCV2.4.11 and dlib with facial landmarks C++ code, and I use combination of OpenCV face detection and dlib shape prediction, it’s worked, but the speed is very slow ! Is it the VS or OpenCV version too old so the execution time is high? Thank you !