• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Learn OpenCV

OpenCV examples and tutorials ( C++ / Python )

  • Home
  • About
  • Courses
  • PyTorch
  • CV4Faces
  • Resources
  • AI Consulting

Deep Learning based Object Detection using YOLOv3 with OpenCV ( Python / C++ )

Sunita Nayak

August 20, 2018 By 81 Comments

In this post, we will learn how to use YOLOv3 — a state of the art object detector — with OpenCV.

YOLOv3 is the latest variant of a popular object detection algorithm YOLO – You Only Look Once. The published model recognizes 80 different objects in images and videos, but most importantly it is super fast and nearly as accurate as Single Shot MultiBox (SSD).

Starting with OpenCV 3.4.2, you can easily use YOLOv3 models in your own OpenCV application.

This post mainly focusses on inference, but if you want to train your own YOLOv3 model on your dataset, you will find our tutorial for the same in this follow-up post.

How does YOLO work ?

We can think of an object detector as a combination of a object locator and an object recognizer.

In traditional computer vision approaches, a sliding window was used to look for objects at different locations and scales. Because this was such an expensive operation, the aspect ratio of the object was usually assumed to be fixed.

Early Deep Learning based object detection algorithms like the R-CNN and Fast R-CNN used a method called Selective Search to narrow down the number of bounding boxes that the algorithm had to test.

Another approach called Overfeat involved scanning the image at multiple scales using sliding windows-like mechanisms done convolutionally.

This was followed by Faster R-CNN that used a Region Proposal Network (RPN) for identifying bounding boxes that needed to be tested. By clever design the features extracted for recognizing objects, were also used by the RPN for proposing potential bounding boxes thus saving a lot of computation.

YOLO on the other hand approaches the object detection problem in a completely different way. It forwards the whole image only once through the network. SSD is another object detection algorithm that forwards the image once though a deep learning network, but YOLOv3 is much faster than SSD while achieving very comparable accuracy. YOLOv3 gives faster than realtime results on a M40, TitanX or 1080 Ti GPUs.

Lets see how YOLO detects the objects in a given image.

First, it divides the image into a 13×13 grid of cells. The size of these 169 cells vary depending on the size of the input. For a 416×416 input size that we used in our experiments, the cell size was 32×32. Each cell is then responsible for predicting a number of boxes in the image.

For each bounding box, the network also predicts the confidence that the bounding box actually encloses an object, and the probability of the enclosed object being a particular class.

Most of these bounding boxes are eliminated because their confidence is low or because they are enclosing the same object as another bounding box with very high confidence score. This technique is called non-maximum suppression.

The authors of YOLOv3, Joseph Redmon and Ali Farhadi, have made YOLOv3 faster and more accurate than their previous work YOLOv2. YOLOv3 handles multiple scales better. They have also improved the network by making it bigger and taking it towards residual networks by adding shortcut connections.

Why use OpenCV for YOLO ?

Here are a few reasons you may want to use OpenCV for YOLO

  1. Easy integration with an OpenCV application: If your application already uses OpenCV and you simply want to use YOLOv3, you don’t have to worry about compiling and building the extra Darknet code.
  2. OpenCV CPU version is 9x faster: OpenCV’s CPU implementation of the DNN module is astonishingly fast. For example, Darknet when used with OpenMP takes about 2 seconds on a CPU for inference on a single image. In contrast, OpenCV’s implementation runs in a mere 0.22 seconds! Check out table below.
  3. Python support: Darknet is written in C, and it does not officially support Python. In contrast, OpenCV does. There are python ports available for Darknet though.

Speed Test for YOLOv3 on Darknet and OpenCV

The following table shows the performance of YOLOv3 on Darknet vs. OpenCV. The input size in all cases is 416×416. It is not surprising the GPU version of Darknet outperforms everything else. It is also not suprising that Darknet with OpenMP works much better than Darknet without OpenMP because OpenMP enables use of multiple processors.

What is indeed surprising is that OpenCV’s CPU implementation of DNN is 9x faster than Darknet with OpenML.

OSFrameworkCPU/GPUTime(ms)/Frame
Linux 16.04Darknet12x Intel Core i7-6850K CPU @ 3.60GHz9370
Linux 16.04Darknet + OpenMP12x Intel Core i7-6850K CPU @ 3.60GHz1942
Linux 16.04OpenCV [CPU]12x Intel Core i7-6850K CPU @ 3.60GHz220
Linux 16.04DarknetNVIDIA GeForce 1080 Ti GPU23
macOSDarkNet2.5 GHz Intel Core i7 CPU7260
macOSOpenCV [CPU]2.5 GHz Intel Core i7 CPU400

Table 1: Speed Test of YOLOv3 on Darknet vs OpenCV

Note: We ran into problems using OpenCV’s GPU implementation of the DNN. The documentation indicates that it is tested only with Intel’s GPUs, so the code would switch you back to CPU, if you do not have an Intel GPU.

Object Detection using YOLOv3 in C++/Python

Let us now see how to use YOLOv3 in OpenCV to perform object detection.

Download Code
To easily follow along this tutorial, please download code by clicking on the button below. It’s FREE!

Download Code

Step 1 : Download the models

We will start by downloading the models using the script file getModels.sh from command line.

sudo chmod a+x getModels.sh
./getModels.sh

This will download the yolov3.weights file (containing the pre-trained network’s weights), the yolov3.cfg file (containing the network configuration) and the coco.names file which contains the 80 different class names used in the COCO dataset.

Step 2 : Initialize the parameters

The YOLOv3 algorithm generates bounding boxes as the predicted detection outputs. Every predicted box is associated with a confidence score. In the first stage, all the boxes below the confidence threshold parameter are ignored for further processing.

The rest of the boxes undergo non-maximum suppression which removes redundant overlapping bounding boxes. Non-maximum suppression is controlled by a parameter nmsThreshold. You can try to change these values and see how the number of output predicted boxes changes.

Next, the default values for the input width (inpWidth) and height (inpHeight) for the network’s input image are set. We set each of them to 416, so that we can compare our runs to the Darknet’s C code given by YOLOv3’s authors. You can also change both of them to 320 to get faster results or to 608 to get more accurate results.

Python
# Initialize the parameters
confThreshold = 0.5  #Confidence threshold
nmsThreshold = 0.4   #Non-maximum suppression threshold
inpWidth = 416       #Width of network's input image
inpHeight = 416      #Height of network's input image
C++
// Initialize the parameters
float confThreshold = 0.5; // Confidence threshold
float nmsThreshold = 0.4;  // Non-maximum suppression threshold
int inpWidth = 416;        // Width of network's input image
int inpHeight = 416;       // Height of network's input image

Step 3 : Load the model and classes

The file coco.names contains all the objects for which the model was trained. We read class names.

Next, we load the network which has two parts —

  1. yolov3.weights : The pre-trained weights.
  2. yolov3.cfg : The configuration file.

We set the DNN backend to OpenCV here and the target to CPU. You could try setting the preferable target to cv.dnn.DNN_TARGET_OPENCL to run it on a GPU. But keep in mind that the current OpenCV version is tested only with Intel’s GPUs, it would automatically switch to CPU, if you do not have an Intel GPU.

Python
# Load names of classes
classesFile = "coco.names";
classes = None
with open(classesFile, 'rt') as f:
    classes = f.read().rstrip('\n').split('\n')

# Give the configuration and weight files for the model and load the network using them.
modelConfiguration = "yolov3.cfg";
modelWeights = "yolov3.weights";

net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
net.setPreferableBackend(cv.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv.dnn.DNN_TARGET_CPU)
C++
    // Load names of classes
    string classesFile = "coco.names";
    ifstream ifs(classesFile.c_str());
    string line;
    while (getline(ifs, line)) classes.push_back(line);
    
    // Give the configuration and weight files for the model
    String modelConfiguration = "yolov3.cfg";
    String modelWeights = "yolov3.weights";

    // Load the network
    Net net = readNetFromDarknet(modelConfiguration, modelWeights);
    net.setPreferableBackend(DNN_BACKEND_OPENCV);
    net.setPreferableTarget(DNN_TARGET_CPU);

Step 4 : Read the input

In this step we read the image, video stream or the webcam. In addition, we also open the video writer to save the frames with detected output bounding boxes.

Python
outputFile = "yolo_out_py.avi"
if (args.image):
    # Open the image file
    if not os.path.isfile(args.image):
        print("Input image file ", args.image, " doesn't exist")
        sys.exit(1)
    cap = cv.VideoCapture(args.image)
    outputFile = args.image[:-4]+'_yolo_out_py.jpg'
elif (args.video):
    # Open the video file
    if not os.path.isfile(args.video):
        print("Input video file ", args.video, " doesn't exist")
        sys.exit(1)
    cap = cv.VideoCapture(args.video)
    outputFile = args.video[:-4]+'_yolo_out_py.avi'
else:
    # Webcam input
    cap = cv.VideoCapture(0)

# Get the video writer initialized to save the output video
if (not args.image):
    vid_writer = cv.VideoWriter(outputFile, cv.VideoWriter_fourcc('M','J','P','G'), 30, (round(cap.get(cv.CAP_PROP_FRAME_WIDTH)),round(cap.get(cv.CAP_PROP_FRAME_HEIGHT))))

C++
        outputFile = "yolo_out_cpp.avi";
        if (parser.has("image"))
        {
            // Open the image file
            str = parser.get<String>("image");
            ifstream ifile(str);
            if (!ifile) throw("error");
            cap.open(str);
            str.replace(str.end()-4, str.end(), "_yolo_out.jpg");
            outputFile = str;
        }
        else if (parser.has("video"))
        {
            // Open the video file
            str = parser.get<String>("video");
            ifstream ifile(str);
            if (!ifile) throw("error");
            cap.open(str);
            str.replace(str.end()-4, str.end(), "_yolo_out.avi");
            outputFile = str;
        }
        // Open the webcaom
        else cap.open(parser.get<int>("device"));

        // Get the video writer initialized to save the output video
        if (!parser.has("image")) {
           video.open(outputFile, VideoWriter::fourcc('M','J','P','G'), 28, Size(cap.get(CAP_PROP_FRAME_WIDTH),          cap.get(CAP_PROP_FRAME_HEIGHT)));
        }

Step 4 : Process each frame

The input image to a neural network needs to be in a certain format called a blob.

After a frame is read from the input image or video stream, it is passed through the blobFromImage function to convert it to an input blob for the neural network. In this process, it scales the image pixel values to a target range of 0 to 1 using a scale factor of 1/255. It also resizes the image to the given size of (416, 416) without cropping. Note that we do not perform any mean subtraction here, hence pass [0,0,0] to the mean parameter of the function and keep the swapRB parameter to its default value of 1.

The output blob is then passed in to the network as its input and a forward pass is run to get a list of predicted bounding boxes as the network’s output. These boxes go through a post-processing step in order to filter out the ones with low confidence scores. We will go through the post-processing step in more detail in the next section. We print out the inference time for each frame at the top left. The image with the final bounding boxes is then saved to the disk, either as an image for an image input or using a video writer for the input video stream.

Python
while cv.waitKey(1) < 0:
    
    # get frame from the video
    hasFrame, frame = cap.read()
    
    # Stop the program if reached end of video
    if not hasFrame:
        print("Done processing !!!")
        print("Output file is stored as ", outputFile)
        cv.waitKey(3000)
        break

    # Create a 4D blob from a frame.
    blob = cv.dnn.blobFromImage(frame, 1/255, (inpWidth, inpHeight), [0,0,0], 1, crop=False)

    # Sets the input to the network
    net.setInput(blob)

    # Runs the forward pass to get output of the output layers
    outs = net.forward(getOutputsNames(net))

    # Remove the bounding boxes with low confidence
    postprocess(frame, outs)

    # Put efficiency information. The function getPerfProfile returns the 
    # overall time for inference(t) and the timings for each of the layers(in layersTimes)
    t, _ = net.getPerfProfile()
    label = 'Inference time: %.2f ms' % (t * 1000.0 / cv.getTickFrequency())
    cv.putText(frame, label, (0, 15), cv.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255))

    # Write the frame with the detection boxes
    if (args.image):
        cv.imwrite(outputFile, frame.astype(np.uint8));
    else:
        vid_writer.write(frame.astype(np.uint8))

C++
    // Process frames.
    while (waitKey(1) < 0)
    {
        // get frame from the video
        cap >> frame;

        // Stop the program if reached end of video
        if (frame.empty()) {
            cout << "Done processing !!!" << endl;
            cout << "Output file is stored as " << outputFile << endl;
            waitKey(3000);
            break;
        }
        // Create a 4D blob from a frame.
        blobFromImage(frame, blob, 1/255.0, cvSize(inpWidth, inpHeight), Scalar(0,0,0), true, false);
        
        //Sets the input to the network
        net.setInput(blob);
        
        // Runs the forward pass to get output of the output layers
        vector<Mat> outs;
        net.forward(outs, getOutputsNames(net));
        
        // Remove the bounding boxes with low confidence
        postprocess(frame, outs);
        
        // Put efficiency information. The function getPerfProfile returns the 
        // overall time for inference(t) and the timings for each of the layers(in layersTimes)
        vector<double> layersTimes;
        double freq = getTickFrequency() / 1000;
        double t = net.getPerfProfile(layersTimes) / freq;
        string label = format("Inference time for a frame : %.2f ms", t);
        putText(frame, label, Point(0, 15), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(0, 0, 255));
        
        // Write the frame with the detection boxes
        Mat detectedFrame;
        frame.convertTo(detectedFrame, CV_8U);
        if (parser.has("image")) imwrite(outputFile, detectedFrame);
        else video.write(detectedFrame);
        
    }

Now lets go into details of some of the function calls used above.

Step 4a : Getting the names of output layers

The forward function in OpenCV’s Net class needs the ending layer till which it should run in the network. Since we want to run through the whole network, we need to identify the last layer of the network. We do that by using the function getUnconnectedOutLayers() that gives the names of the unconnected output layers, which are essentially the last layers of the network. Then we run the forward pass of the network to get output from the output layers, as in the previous code snippet (net.forward(getOutputsNames(net))).

Python
# Get the names of the output layers
def getOutputsNames(net):
    # Get the names of all the layers in the network
    layersNames = net.getLayerNames()
    # Get the names of the output layers, i.e. the layers with unconnected outputs
    return [layersNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
C++
// Get the names of the output layers
vector<String> getOutputsNames(const Net& net)
{
    static vector<String> names;
    if (names.empty())
    {
        //Get the indices of the output layers, i.e. the layers with unconnected outputs
        vector<int> outLayers = net.getUnconnectedOutLayers();
        
        //get the names of all the layers in the network
        vector<String> layersNames = net.getLayerNames();
        
        // Get the names of the output layers in names
        names.resize(outLayers.size());
        for (size_t i = 0; i < outLayers.size(); ++i)
        names[i] = layersNames[outLayers[i] - 1];
    }
    return names;
}

Step 4b : Post-processing the network’s output

The network outputs bounding boxes are each represented by a vector of number of classes + 5 elements.

The first 4 elements represent the center_x, center_y, width and height. The fifth element represents the confidence that the bounding box encloses an object.

The rest of the elements are the confidence associated with each class (i.e. object type). The box is assigned to the class corresponding to the highest score for the box.

The highest score for a box is also called its confidence. If the confidence of a box is less than the given threshold, the bounding box is dropped and not considered for further processing.

The boxes with their confidence equal to or greater than the confidence threshold are then subjected to Non Maximum Suppression. This would reduce the number of overlapping boxes.

Python
# Remove the bounding boxes with low confidence using non-maxima suppression
def postprocess(frame, outs):
    frameHeight = frame.shape[0]
    frameWidth = frame.shape[1]

    classIds = []
    confidences = []
    boxes = []
    # Scan through all the bounding boxes output from the network and keep only the
    # ones with high confidence scores. Assign the box's class label as the class with the highest score.
    classIds = []
    confidences = []
    boxes = []
    for out in outs:
        for detection in out:
            scores = detection[5:]
            classId = np.argmax(scores)
            confidence = scores[classId]
            if confidence > confThreshold:
                center_x = int(detection[0] * frameWidth)
                center_y = int(detection[1] * frameHeight)
                width = int(detection[2] * frameWidth)
                height = int(detection[3] * frameHeight)
                left = int(center_x - width / 2)
                top = int(center_y - height / 2)
                classIds.append(classId)
                confidences.append(float(confidence))
                boxes.append([left, top, width, height])

    # Perform non maximum suppression to eliminate redundant overlapping boxes with
    # lower confidences.
    indices = cv.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)
    for i in indices:
        i = i[0]
        box = boxes[i]
        left = box[0]
        top = box[1]
        width = box[2]
        height = box[3]
        drawPred(classIds[i], confidences[i], left, top, left + width, top + height)
C++
// Remove the bounding boxes with low confidence using non-maxima suppression
void postprocess(Mat& frame, const vector<Mat>& outs)
{
    vector<int> classIds;
    vector<float> confidences;
    vector<Rect> boxes;
    
    for (size_t i = 0; i < outs.size(); ++i)
    {
        // Scan through all the bounding boxes output from the network and keep only the
        // ones with high confidence scores. Assign the box's class label as the class
        // with the highest score for the box.
        float* data = (float*)outs[i].data;
        for (int j = 0; j < outs[i].rows; ++j, data += outs[i].cols)
        {
            Mat scores = outs[i].row(j).colRange(5, outs[i].cols);
            Point classIdPoint;
            double confidence;
            // Get the value and location of the maximum score
            minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
            if (confidence > confThreshold)
            {
                int centerX = (int)(data[0] * frame.cols);
                int centerY = (int)(data[1] * frame.rows);
                int width = (int)(data[2] * frame.cols);
                int height = (int)(data[3] * frame.rows);
                int left = centerX - width / 2;
                int top = centerY - height / 2;
                
                classIds.push_back(classIdPoint.x);
                confidences.push_back((float)confidence);
                boxes.push_back(Rect(left, top, width, height));
            }
        }
    }
    
    // Perform non maximum suppression to eliminate redundant overlapping boxes with
    // lower confidences
    vector<int> indices;
    NMSBoxes(boxes, confidences, confThreshold, nmsThreshold, indices);
    for (size_t i = 0; i < indices.size(); ++i)
    {
        int idx = indices[i];
        Rect box = boxes[idx];
        drawPred(classIds[idx], confidences[idx], box.x, box.y,
                 box.x + box.width, box.y + box.height, frame);
    }
}

The Non Maximum Suppression is controlled by the nmsThreshold parameter. If nmsThreshold is set too low, e.g. 0.1, we might not detect overlapping objects of same or different classes. But if it is set too high e.g. 1, then we get multiple boxes for the same object. So we used an intermediate value of 0.4 in our code above. The gif below shows the effect of varying the NMS threshold.

non maximum suppression threshold object detection

Step 4c : Draw the predicted boxes

Finally, we draw the boxes that were filtered through the non maximum suppression, on the input frame with their assigned class label and confidence scores.

Python
# Draw the predicted bounding box
def drawPred(classId, conf, left, top, right, bottom):
    # Draw a bounding box.
    cv.rectangle(frame, (left, top), (right, bottom), (0, 0, 255))
    
    label = '%.2f' % conf
        
    # Get the label for the class name and its confidence
    if classes:
        assert(classId < len(classes))
        label = '%s:%s' % (classes[classId], label)

    #Display the label at the top of the bounding box
    labelSize, baseLine = cv.getTextSize(label, cv.FONT_HERSHEY_SIMPLEX, 0.5, 1)
    top = max(top, labelSize[1])
    cv.putText(frame, label, (left, top), cv.FONT_HERSHEY_SIMPLEX, 0.5, (255,255,255))
C++
// Draw the predicted bounding box
void drawPred(int classId, float conf, int left, int top, int right, int bottom, Mat& frame)
{
    //Draw a rectangle displaying the bounding box
    rectangle(frame, Point(left, top), Point(right, bottom), Scalar(0, 0, 255));
    
    //Get the label for the class name and its confidence
    string label = format("%.2f", conf);
    if (!classes.empty())
    {
        CV_Assert(classId < (int)classes.size());
        label = classes[classId] + ":" + label;
    }
    
    //Display the label at the top of the bounding box
    int baseLine;
    Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);
    top = max(top, labelSize.height);
    putText(frame, label, Point(left, top), FONT_HERSHEY_SIMPLEX, 0.5, Scalar(255,255,255));
}

Subscribe & Download Code

If you liked this article and would like to download code (C++ and Python) and example images used in this post, please subscribe to our newsletter. You will also receive a free Computer Vision Resource Guide. In our newsletter, we share OpenCV tutorials and examples written in C++/Python, and Computer Vision and Machine Learning algorithms and news.

Subscribe Now

References:

YOLOv3 Tech Report

We used video clips from the following sources:
Pixabay: [1], [2], [3], [4], [5], [6]
Pexels: [2]

Filed Under: Deep Learning, Object Detection, Tutorial Tagged With: deep learning, Object Detection, YOLOv3

Comments

  1. Muhammad Usman says

    August 20, 2018 at 7:44 am

    Hi Satya Malik,
    Your blog come always come with new idea. Great work. Keep it up.

    Reply
    • Satya Mallick says

      August 21, 2018 at 10:39 am

      Muhammad, thanks. It’s not just me, now we are a small team :). For example, this post was written by Dr. Sunita Nayak.

      Reply
      • Muhammad Usman says

        August 21, 2018 at 5:20 pm

        Your Team work is highly appreciable.

        Reply
  2. Shiva Shankar N says

    August 20, 2018 at 9:05 am

    Satya, Thanks a lot of the post. How do you calculate Mean average Percision (mAP) on this obejct detected during both training and testing. I’m using keras implementation of YoloV3

    Reply
  3. Kevin SD says

    August 20, 2018 at 9:16 am

    Hi Satya, Thanks a lot for the post. How do you calculate mean average precision(mAP) in YoloV3 or any object detection for both training and testing data. I’m using keras implementation of yolov3.

    Reply
    • Vikas Gupta says

      August 21, 2018 at 3:37 am

      You can use the COCO API. For Theory, you can look into the following links

      medium
      COCO dataset page

      Reply
      • Kevin SD says

        August 22, 2018 at 6:29 am

        Thanks vikas for sharing this link. Can i use this API to calculate mAP for my own custom dataset ?

        Reply
  4. Fábio Uechi says

    August 20, 2018 at 10:35 am

    Hi Sunita,

    Do you know if the OpenCV dnn module is also compatible with the latest YOLOv3-spp (spatial pyramid pooling) configuration (https://github.com/pjreddie/darknet/blob/master/cfg/yolov3-spp.cfg) and weights (https://pjreddie.com/media/files/yolov3-spp.weights)?

    And, of course, thanks a lot for the post!

    Reply
    • Sunita Nayak says

      August 21, 2018 at 4:00 pm

      Hi Fabio,

      Yes, the above code runs fine with the yolov3-spp.config and yolov3-spp.weights files too. Thanks for reading !

      Reply
  5. KIM JIN-SOL says

    August 20, 2018 at 4:38 pm

    Hello, Satya, Thanks for sharing good information. I want to compare darknet ver and Opencv ver.

    Reply
    • Satya Mallick says

      August 21, 2018 at 10:38 am

      Thanks, Kim. We have shared the OpenCV version. For darknet, you can directly get it from https://pjreddie.com/darknet/

      Reply
  6. 朱铭德 says

    August 20, 2018 at 6:54 pm

    Hi Satya, what version of the OpenCV did you use in the blog?

    Reply
  7. Ashish Kejriwal says

    August 21, 2018 at 9:36 pm

    I am getting the following error

    OpenCV(3.4.1) Error: Parsing error (Unknown layer type: shortcut) in ReadDarknetFromCfgFile, file opencv/modules/dnn/src/darknet/darknet_io.cpp, line 503
    Traceback (most recent call last):
    File “object_detection_yolo.py”, line 33, in
    net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
    cv2.error: OpenCV(3.4.1) opencv/modules/dnn/src/darknet/darknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function ReadDarknetFromCfgFile

    Reply
    • Ashish Kejriwal says

      August 21, 2018 at 10:18 pm

      My mistake, didn’t check the OpenCV version requirement 3.4.2

      Reply
      • Satya Mallick says

        August 22, 2018 at 7:56 am

        Cool. Did you get it working?

        Reply
        • Ashish Kejriwal says

          August 22, 2018 at 7:43 pm

          Yup, its working after upgrading to 3.4.2. Thanks for the post Sunita & Satya.

          Reply
        • ujang says

          September 12, 2018 at 8:08 pm

          Hello Satya. Thanks for a very good post.

          Btw, I have made a project using VS 2017 and used this code. After running the code, I got this result: http://prntscr.com/kts9x0. Do you have any suggestion why it does not showing anything?

          I used OpenCV 3.4.3 and yolov3.cfg, yolov3.weights, coco.names, ana my own image.

          I have edited the width and height parameters in yolov3.cfg according to the size of my image.

          Reply
    • Edgar Policarpio says

      August 23, 2018 at 8:26 am

      My OpenCV version is 3.4.2 and yet, I still get the error. Mine reads
      OpenCV Error: Parsing error (Unknown layer type: shortcut) in ReadDarknetFromCfgFile, file /home/epolicar/Applications/opencv/modules/dnn/src/darknet/darknet_io.cpp, line 503
      Traceback (most recent call last):
      File “object_detection_yolo.py”, line 33, in
      net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
      cv2.error: /home/epolicar/Applications/opencv/modules/dnn/src/darknet/darknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function ReadDarknetFromCfgFile

      Any ideas?

      Reply
  8. tham says

    August 21, 2018 at 10:16 pm

    Could I use gpu to run yolov3 by opencv?

    Reply
    • Sunita Nayak says

      August 22, 2018 at 8:40 am

      You could target it to OPENCL and try that. But OpenCV mentions that they have tested it only on Intel GPUs. So if you don’t have an Intel GPU, they would revert the run back to CPU.

      Reply
      • tham says

        August 23, 2018 at 11:11 am

        Thanks for confirmation, I found it hard to believe opencv dnn still do not support gpu of nvidia. It is nice to know yolo v3 can run by opencv dnn module, but it is almost useless when you need to process multiple streams, this solution eat too much cpu resource, gpu should be the solution for deep learning, not sure why opencv community do not support nvidia or amd first(especially nvidia). The only reason I could think of is business issue.

        Reply
        • Utku Tekalmaz says

          September 12, 2018 at 5:53 am

          you can use wrapper like this.
          https://github.com/TommyX12/darknet-cpp-wrapper

          Reply
      • tham says

        September 27, 2018 at 6:12 pm

        Find out mxnet support cpp api(I have build it and run the yolov3 on gpu). For those who want to develop stand-alone application with non-commercial library, maybe mxnet can ease your pain. openCV dnn has ease to use api and aggressive optimize on cpu but their support on gpu is bad; dlib got decent api and support gpu but there are too few pre-trained models + build in layers are few; mxnet got decent support on cpu and nvidia gpu, but their dll are huge. No perfect choices, we have to choose the one suit our needs most.

        Reply
  9. Rishabh Shukla says

    August 22, 2018 at 4:32 am

    When i tried running the code on my local machine for yolov3 cfg and weights i got the following error:
    Traceback (most recent call last):
    File “yolov3.py”, line 56, in
    net = cv.dnn.readNet(args.model, args.config, args.framework)
    cv2.error: OpenCV(3.4.2) /Users/travis/build/skvark/opencv-python/opencv/modules/dnn/src/darknet/darknet_io.cpp:511: error: (-215:Assertion failed) separator_index < line.size() in function 'ReadDarknetFromCfgFile'

    When i tried running the same code with yolov2 weights, i didn't get this error but i didn't get any predictions either. The output was the same file as the input in all cases.
    Because the code ran without any errors for the yolov2 file, i don't think the opencv-contrib folder is the issue as you suggested in our email conversation.
    And i have read on a few other forums as well where people are facing the same issue without any solution.
    What should i do to remedy this issue?

    Reply
    • Satya Mallick says

      August 22, 2018 at 10:38 am

      Hi Rishabh,

      Did you try the code we had shared? It is named object_detection_yolo.py.

      Reply
      • Rishabh Shukla says

        August 22, 2018 at 7:28 pm

        Yes, i ran the exact same code that you have shared and i continue to face this error.

        Reply
        • Jian Kang says

          November 5, 2018 at 4:55 pm

          I had exactly the same issue.

          It turns out the way I downloaded the config file was incorrect. I used ‘right-click’ -> ‘save link as’, which resulted in an html file. That’s not the correct format. Later I just git cloned the entire cfg folder, from there I got the file with the correct format. It’s not an html, the first few lines look like this:

          [net]
          # Testing
          # batch=1
          # subdivisions=1
          # Training
          batch=64
          subdivisions=16

          Reply
  10. Haider Ali says

    August 23, 2018 at 12:14 am

    Is there a way to build our own weights to use with YOLOV3. Actually, I want to use this algorithm to detect only terrains(grass,floor,gravel,stairs,mud,etc).
    Thanks.

    Reply
    • Satya Mallick says

      August 23, 2018 at 12:45 pm

      Yes, you will need to train it. This post may help

      https://medium.com/@manivannan_data/how-to-train-yolov3-to-detect-custom-objects-ccbcafeb13d2

      Reply
      • Haider Ali says

        August 24, 2018 at 9:28 am

        Thanks Sir.

        Reply
  11. Shrutheesh Raman says

    August 23, 2018 at 11:53 am

    Hey Satya and Sunita, amazing content. Is there a way to remove unwanted classes and make it faster only for a relevant two-three classes that I need. If so, can you please guide me on how to go about it?

    Reply
    • Sunita Nayak says

      September 5, 2018 at 11:33 am

      You will find some useful info in those lines in the discussions at https://github.com/pjreddie/darknet/issues/142

      Looks like it will improve the accuracy if you do that but still use the big dataset to retrain.

      Reply
  12. Naz says

    August 24, 2018 at 2:29 pm

    hey guys did anyone get this working with python2? it works with python3 but I get no output running it with python 2.

    Reply
  13. jimmy says

    August 27, 2018 at 1:02 am

    Dear Miss Sunita,

    First thank you for this awesome article!
    Secondly, i would like to ask some questions regarding the use of YOLOv3 with ros
    I have a first node which publish an image and a second node which subscribe and thus process the image. the process is done during the callback when the subscriber receive a message.
    But the process take so much time about 2s per frame and i think there is a problem somewhere…

    Have u ever experienced this kind of errors when using with ROS

    Thank you

    Reply
    • Sunita Nayak says

      September 4, 2018 at 1:48 pm

      We have not tested it on ROS yet, but it would be very interesting. Will update the post when we get a chance to work with ROS.

      Reply
  14. Armstrong Wang says

    August 28, 2018 at 3:01 am

    Hi, Sunita,

    Does it run fast on an android device? I think CPU-specific instructions should have be used to improve the performance the CPU-version YOLOv3.

    Thanks a lot for the post.

    Reply
    • Sunita Nayak says

      September 4, 2018 at 1:43 pm

      We have not yet tested it on an android device. Thanks for reading !

      Reply
  15. Pro ject says

    August 28, 2018 at 3:05 am

    Thanks for beginning learning.

    How can i use GPU Nvidia CUDA

    Reply
    • Satya Mallick says

      September 4, 2018 at 1:33 pm

      With OpenCV GPU support for DNN is flaky. If you use YOLO 3 directly, you can change the makefile so it uses the GPU. Check this out

      https://github.com/pjreddie/darknet/blob/master/Makefile

      Set GPU and CUDNN to 1

      Reply
  16. George Brunemann says

    August 28, 2018 at 10:33 am

    Is it possible to only look for one type of object, say people? Is there a way to speed up execution by limiting to just one type of object?

    Reply
    • Sunita Nayak says

      September 5, 2018 at 11:35 am

      You will find some useful info in those lines in the discussions at https://github.com/pjreddie/darknet/issues/142
      It looks like it will improve the accuracy if you do that but still use the big dataset to retrain.

      Reply
  17. Romain says

    August 28, 2018 at 11:28 am

    Hi Sunita and Satya,
    Thanks for this great solution with OpenCV. I tried your source code and It works great.
    I was wondering how you got so good results with GPU and Darknet (23ms)?
    I am using the original code of Yolov3 (https://pjreddie.com/darknet/yolo/) with GPU and the best I get is 177ms. I use a Tesla P100 which is better than GTX 1080 Ti. How did you get the result so fast ? Did you change parameters in yolo configuration?
    Thanks

    Reply
    • Satya Mallick says

      September 4, 2018 at 1:31 pm

      Sorry for the late reply. You will need to change the makefile

      https://github.com/pjreddie/darknet/blob/master/Makefile

      Set GPU and CUDNN to 1. Might as well set OPENMP to 1 ( though it should not matter when it is using the GPU ).

      Reply
  18. Mark Caruana says

    August 29, 2018 at 2:16 am

    Last time I could not manage to build opencv 4 even though I followed instructions correctly. Any help would be valuable I installed built it on ubuntu linux

    Reply
    • vishwesh shrimali says

      August 31, 2018 at 9:18 pm

      Hi! We are planning to release a post very soon to help out with OpenCV-4 installation. Stay tuned!

      Reply
  19. VIVEK TIWARI says

    August 29, 2018 at 5:15 pm

    Hi,

    Can I train it on my own data?

    Thanks for the post!

    Reply
    • Satya Mallick says

      September 4, 2018 at 1:25 pm

      Yes, you can. Search for training YOLO 3 using your own data. You can then bring that model into an OpenCV application.

      Reply
  20. bag says

    August 30, 2018 at 1:32 am

    Hi Sunita and Satyia, many thanks for the post.

    I see you are not using (or passing it to any function) the detection confidence given to each of the 13×13’s output cells, of containing a bounding box; that is the value “detection[4]” in line 15 of the function ‘postprocess()’ in the Python code.

    Shouldn’t the first bounding box screening be made based on it instead of the confidences of each of the classes for each detection?

    Thank you 🙂

    Reply
    • Sunita Nayak says

      September 6, 2018 at 11:34 am

      Ideally, detection[4] should be compared instead of scores[classId]. But the output you get from the opencv’s forward function satisfies detection[4]>confThreshold multiple times even if all the class scores are zeros. You could see it yourself if you insert the following print code before line 19(if confidence > confThreshold: )
      if detection[4]>confThreshold:
      print(detection[4], ” – “, scores[classId], ” – th : “, confThreshold)
      print(detection)
      Also if you note the class scores, not all of them are adding up to 1 in the print’s output as well..this is something that is internal to the function call.

      Regardless, even if you compare the confidence with detection[4], the results would be similar as those boxes with zero class scores will be eliminated by Non Maximum Suppression.

      Reply
  21. Ron Yadgar says

    August 30, 2018 at 4:53 am

    Does it can work also for raspberry pi?

    Reply
    • Satya Mallick says

      September 4, 2018 at 1:26 pm

      I have not tried it, but my guess is that it will be very slow.

      Reply
  22. Andrew Maule says

    September 4, 2018 at 12:46 pm

    Is there a way to segment the classified object using this approach, other than a ‘bounding box’? If not, what methods do you recommend for this?

    Reply
    • Satya Mallick says

      September 4, 2018 at 1:26 pm

      To segment the objects you should search for 1) Semantic Segmentation 2) Mask R-CNN.

      Reply
      • Andrew Maule says

        September 4, 2018 at 1:40 pm

        Thank you.

        Reply
  23. Sarunas Lapenas says

    September 5, 2018 at 8:42 am

    Could someone explain how to update my anaconda’s opencv version? Currently it is 3.3.1. I downloaded https://github.com/opencv/opencv/releases source code of OpenCV 3.4.2, but not sure how to implement it into anaconda. I’m using windows.

    Reply
  24. xram says

    September 6, 2018 at 7:49 pm

    Hi, I am getting an error running from the source code from github, File “object_detection_yolo.py”, line 33, in
    net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
    cv2.error: OpenCV(3.4.1) C:bldopencv_1520732670222workopencv-3.4.1modulesdnnsrcdarknetdarknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function cv::dnn::darknet::ReadDarknetFromCfgFile

    Kindly suggest how to get by this issue ?

    Reply
    • Satya Mallick says

      September 6, 2018 at 8:40 pm

      You need to upgrade to OpenCV 3.4.2 with opencv_contrib

      Reply
    • xram says

      September 7, 2018 at 7:30 am

      Thank you for the quick response, appreciate.

      Reply
  25. Utku Tekalmaz says

    September 10, 2018 at 1:06 am

    Hi Sunita, great post! I am both enlightened and enjoyed while reading it 🙂

    I’ve got a question for my project, I just need to detect 2 objects, a box and hand; do you think traning my own model increases the speed? I am now making a color based recognition which is pretty inaccurate and short-termed before i can find a good detection algorithm. For example right now, i ve created a box object model, which i can set, get ID and locations. Each frame, i search for the closest old found object and match them with new ones to pass ID’s of objects for tracking. But color based is too weak about lightening changes, minor-overlappings etc. I’ve got a low spec computer so I have to consider speed, how much speed do you think i can obtain with training just 2 models and using them?

    Thanks for your the amazing post again 🙂

    Reply
    • Sunita Nayak says

      September 24, 2018 at 12:29 am

      Hi Utku, thanks for reading !

      We will write a future post about the performance of YOLOv3 with fewer classes. In the meanwhile you will find some useful info in those lines in the discussions at

      https://github.com/pjreddie/darknet/issues/142

      It looks like it will improve the accuracy if you do that but still use the big dataset to retrain.

      Reply
  26. JasOlean says

    September 11, 2018 at 4:16 am

    bash: ./object_detection_yolo.out: No such file or directory

    I got this when I run c++ file. How can I run C++ file?

    Reply
    • Sunita Nayak says

      September 24, 2018 at 12:20 am

      You need to compile the code first and make sure it successfully created the object_detection_yolo.out file in your current directory

      Reply
  27. xram says

    September 23, 2018 at 10:07 pm

    Hi, thanks for the wonderful video, really helped out. Is there a tutorial to use YOLOv3 to train your custom data ?

    Reply
    • Sunita Nayak says

      September 24, 2018 at 12:32 am

      Thanks for reading ! We will write a post on that in the future, but in the meanwhile you could find some guidelines at

      https://medium.com/@manivannan_data/how-to-train-yolov3-to-detect-custom-objects-ccbcafeb13d2

      Reply
  28. JasOlean says

    September 24, 2018 at 3:46 am

    Hello,

    For custom dataset, to get weights file, do I need to train firstly using darknet53.conv.74? So, this code is only for testing?

    Reply
  29. Shendry Rosero says

    September 26, 2018 at 10:59 am

    please help me. i got : OpenCV Error: Parsing error (Unknown layer type: shortcut) in cv::dnn::darknet::ReadDarknetFromCfgFile, file C:projectsopencv-pythonopencvmodulesdnnsrcdarknetdarknet_io.cpp, line 503
    Traceback (most recent call last), I’m working with pycharm

    Reply
  30. vineeth sukumaran says

    October 3, 2018 at 12:59 am

    Hi Sunita,
    I ran into this error while running this code.

    OpenCV(3.4.1) Error: Parsing error (Unknown layer type: shortcut) in ReadDarknetFromCfgFile, file /home/vineeth/installations/opencv/modules/dnn/src/darknet/darknet_io.cpp, line 503
    Traceback (most recent call last):
    File “objectdetection.py”, line 33, in
    net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
    cv2.error: OpenCV(3.4.1) /home/vineeth/installations/opencv/modules/dnn/src/darknet/darknet_io.cpp:503: error: (-212) Unknown layer type: shortcut in function ReadDarknetFromCfgFile

    Can you help me to sort this thing out?

    Reply
  31. JasOlean says

    October 3, 2018 at 2:26 am

    How to calculate the accuracy (mAP) for a video or images?

    Reply
  32. Alec says

    October 7, 2018 at 8:47 pm

    Sunita,

    Unless i missed something. I am using MacOS, and looking to process hundreds of videos. Trying to identify objects and then match those objects against database. Been trying to simulate using your directions but not able to make it work. Per your article every video saves frames and then draws box around it. But there are a few issues with that… 1- When video is being processed i am not able to see object identification boxes… 2- once every frame is saved and boxes are drawn, one object could be spread across thousands of frames, and accuracy of object identification depends from frame to frame for the same object.
    New to YOLOv3.. would appreciate any insight.

    Thank you
    Alec

    Reply
    • kursad says

      October 21, 2018 at 7:21 am

      Sir I am a mac user and I am really new at this topic. I download this code and run on xcode but I have “/Users/kursadlacin/Documents/opencv-3.4.2/modules/dnn/src/darknet/darknet_io.cpp:784: error: (-212:Parsing error) Failed to parse NetParameter file: yolov3.cfg in function ‘ReadNetParamsFromCfgFileOrDie'” this error. Is this error about what you are saying or different? Please help me?

      Reply
  33. KB Wong says

    October 7, 2018 at 10:56 pm

    Hi Sunita,

    Just like to say thank you for your post and sample code; having downloaded and tested it, and it worked like a charm, thank you again.

    Reply
  34. Nasir Rahim says

    October 11, 2018 at 12:37 am

    Hello,
    Thanks for such a great tutorial.
    I have tested Yolov3 using OpenCv C++ by following this tutorial. My problem is when I run this code it takes soo long to generate output sometimes 7 or 8 seconds per image. Does this code work with NVIDIA GPUs. if it is possible could you guide me how I can use GPU to generate output. I have NVIDIA GTX 1060. Thanks

    Reply
  35. Deddy Welsan says

    October 15, 2018 at 10:08 pm

    How to run the objectDetection using webcam?

    Reply
  36. disqus_DkxqLkXFIO says

    October 17, 2018 at 1:59 am

    Is it possible to get higher FPS using the following repo: https://github.com/shizukachan/darknet-nnpack

    if yes can you make another article about it.

    Reply
  37. Faizan Ahmad says

    October 20, 2018 at 12:02 am

    parser.add_argument(‘–image’, help=’Path to image file.’)
    i am facing some issue regarding how to pass an argument in the above command
    i am using windows 10 jupyter notebook….

    Reply
  38. Jovann Pérez says

    October 20, 2018 at 7:27 am

    Hi Sunita. Nice tutorial. How can I change the code and yolo configuration to detect only certain object classes (like cars and people)? Could this improve execution time?

    Reply
  39. kursad says

    October 21, 2018 at 3:23 am

    Hello I have a problem.
    libc++abi.dylib: terminating with uncaught exception of type cv::Exception: OpenCV(3.4.2) /Users/kursadlacin/Documents/opencv-3.4.2/modules/dnn/src/darknet/darknet_io.cpp:784: error: (-212:Parsing error) Failed to parse NetParameter file: yolov3.cfg in function ‘ReadNetParamsFromCfgFileOrDie’ I am using mac OS. What is my problem, please help

    Reply
  40. Faizan Ahmad says

    October 24, 2018 at 1:06 am

    indices = cv.dnn.NMSBoxes(boxes, confidences, confThreshold, nmsThreshold)
    please help me out as i am getting error
    Traceback (most recent call last):
    File “E:/Image Classifier/OpenCv-Python-Yolov3/OpenCv-YoloV3.py”, line 92, in
    indices = cv2.dnn.NMSBoxes(boxes)
    NameError: name ‘boxes’ is not defined

    Reply
  41. Sohib says

    October 25, 2018 at 7:13 pm

    Do you have any plans on making a course on deep learning like you did on “Computer Vision for Faces”? If yes, please let us know the expected date and price maybe? Thanks for your awesome projects btw!

    Reply
  42. Sohib says

    October 28, 2018 at 7:57 pm

    after processing is done I have a file named “yolo_out_py.avi” in a working directory, but is empty-size so can’t be played. What is the reason for this? Thank you!

    Reply
  43. Isaac Tan says

    December 6, 2018 at 11:44 am

    Good afternoon Sunita and Satya,

    Currently I am running the OpenCV implementation of YOLOv3 on an Intel NUC device with i5 CPU and the processing speed is about 2 frames per second. As such, I am exploring on the addition of an Intel GPU card to the Intel NUC so that I can speed up the performance to at least 10 frames per second. Thus, I will like to seek your recommendation on an Intel GPU card that I should purchase such that it can be added to the Intel NUC so that the YOLOv3 DNN model can then be executed on the GPU instead of on the i5 or i7 CPU.

    Thank you very much for your advice in this matter.

    Reply

Leave a Reply to Sunita Nayak Cancel reply

Your email address will not be published. Required fields are marked *

OpenCV Certified AI Courses

AI Courses by OpenCV.org

Resources

Download Code (C++ / Python)

Disclaimer

This site is not affiliated with OpenCV.org

Satya Mallick

I am an entrepreneur who loves Computer Vision and Machine Learning. I have a dozen years of experience (and a Ph.D.) in the field.

I am a co-founder of TAAZ Inc where the scalability, and robustness of our computer vision and machine learning algorithms have been put to rigorous test by more than 100M users who have tried our products. Read More…

Recent Posts

  • Gaze Tracking
  • Simple Background Estimation in Videos using OpenCV (C++/Python)
  • Applications of Foreground-Background separation with Semantic Segmentation
  • EfficientNet: Theory + Code
  • Mask R-CNN Instance Segmentation with PyTorch

Copyright © 2019 · Big Vision LLC