Doorbell with Facial Recognition

by Mangesh Pise  ·  Feb 4, 2017
Doorbell with Facial Recognition

Biometric screening using fingerprint or retina scans are not new. Since the advancements in high resolution cameras and development of 3D facial recognition algorithms in the past few years, facial recognition as a means of biometrics has become pretty popular. My first encounter with this technology was in 2012-13 timeframe when Google first released its Face Unlock, a feature in one of their Android Operating Systems that could unlock the phone by recognizing the owner’s face.

Thus, facial recognition is not new. What is new is the access to some of the most sphisticated facial recognition algorithms through open source libraries and AWS AI services (a.k.a. Rekognition) to developers. We are going to talk about some of these libraries in this blog.

The Doorbell Project

In my previous rudimentary attempts at doorbell(s), I had used Arduino with motion and range sensors, where the motion would trigger my range sensor and measure the range every few milliseconds to determine if the object (person) at my door was approaching towards my door or was just passing by - thus concluding whether someone is really at the door.

How about if I knew not only when someone was at the door, but also tell me who s/he was and his/her name if I knew the person already. I would also love a picture of my visitors to be sent to my phone so I could be notified even if I was not at home. Finally, in the ideal state, the visitor has to simply walk at the door … no buttons to press!

Components

To conceptualize such a doorbell, I would need a (1) camera connected to a (2) computer that would constantly look for and (3) identify objects, or visitors in this case. Upon detection of a body-like feature (or a face), the computer would capture the image, (4) compare it with known faces and if found amongst the known faces, it would (6) notify me (on my phone) about who was at the door. Obviously, if it was a new face, I would like my computer to (4) learn the face and (5) remember it for next time. For kicks, I also thought it would be a cool idea for my computer to (7) greet my visitor if it was a known face.

This was going to be a lot of moving parts! They got mapped out as follows:

(1) A USB webcam

(2) Raspberry Pi 2 Model B as my portable computer with Python 3 scripting

(3) OpenCV. OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. [opencv.org]

(4) AWS Rekognition. Amazon Rekognition is an Amazon AI service that makes it easy to detect, index and search faces in images. [AWS Rekognition]

(5) AWS DynamoDB. Amazon DynamoDB is a fast and flexible NoSQL database service. [AWS DynamoDB]

(6) AWS Simple Notification Service. Amazon Simple Notification Service (Amazon SNS) is a fast, flexible, fully managed push notification service that lets you send messages to mobile device users, email recipients or even text messages. [AWS SNS]

(7) AWS Polly. Amazon Polly turns text into lifelike speech. It uses Amazon AI service that synthesizes speech that sounds like a human voice. [AWS Polly]

(8) AWS Lambda. AWS Lambda runs code without requiring to provision or manage servers (i.e. no EC2’s). In this case, Lambda was used for overall orchestration of all AWS services listed above.

The Blueprint

As the above components are mapped out to the high-level vision, the execution was as described below.

Please Note: The purpose of this blog is primarily to discuss the concept and idea, hence as much possible, I am going to stay away from the code and/or installation instructions. At the same time I will try my best to provide all my references so you can refer them too.

Blueprint

Let’s walk through some of the key components / modules of this project.

1: Face tracking

The python script on Raspberry Pi utilizes the OpenCV libraries and the Haar feature-based cascade classifiers to constantly track faces. Read more about the Haar cascade classifiers on opencv.org.

...
HAR_CLAS = "haarcascade_frontalface_alt.xml"
face_cascade = cv2.CascadeClassifier(HAR_CLAS)
cam = cv2.VideoCapture(0)
...
while(True):
    ret, img = cam.read() # we got image
    faces = face_cascade.detectMultiScale(img, 1.6, 6)
    # now see if there are faces
    if(len(faces)!=0):
      print("Psst! "+str(len(faces))+" peep(s) at the door!")
      for (x,y,w,h) in faces:
        cv2.rectangle(img,(x,y),(x+w,y+h),(255,0,0), 2) # red box around the face
        # ... do your stuff, e.g. image sharpening, cropping, etc. ... 
        # ... I used AWS CLI to upload these images to S3
        # ... Finally, call AWS Lambda function to process them (Step 2)
    ...
cam.release()
cv2.destroyAllWindows()

Face Tracking

2: Face recognition

After detecting the face(s), the python script utilizes AWS CLI to upload the images in S3. A Lambda function is then invoked that utilizes AWS Rekognition service to search from an indexed faces collection. If a match is not found, we then index the image as a new face.

AWS facial identification algorithm stores faces in their collection as feature vectors. This makes it possible to search any image from the indexed collection as well as to save any face as a feature vector. An example of image search as well as results of indexing an image is shown below.

var AWS = require('aws-sdk');
var rekognition = new AWS.Rekognition();
exports.handler = (event, context, callback) => {
  ...
  var params = {
    CollectionId: "visitors",
    Image: {
      S3Object: {
        Bucket: "mys3bucket",
        Name: event.photoname
      }
    },
    FaceMatchThreshold: 95.0,
    MaxFaces: 1
  };
  rekognition.searchFacesByImage(params, function (err, data) {
  /*
    data = {
    FaceMatches: [
       {
      Face: {
       BoundingBox: {
        Height: 0.3234420120716095, 
        Left: 0.3233329951763153, 
        Top: 0.5, 
        Width: 0.24222199618816376
       }, 
       Confidence: 99.99829864501953, 
       FaceId: "38271d79-7bc2-5efb-b752-398a8d575b85", 
       ImageId: "d5631190-d039-54e4-b267-abd22c8647c5"
      }, 
      Similarity: 99.97036743164062
     }
    ], 
    SearchedFaceBoundingBox: {
     Height: 0.33481481671333313, 
     Left: 0.31888890266418457, 
     Top: 0.4933333396911621, 
     Width: 0.25
    }, 
    SearchedFaceConfidence: 99.9991226196289
   } 
  */
  
  // -- If not found in collection, we index this new image
  var paramsIdx = {
    CollectionId: "visitors",
    Image: {
      S3Object: {
        Bucket: "mys3bucket",
        Name: event.photoname
        }
      }
    };
  rekognition.indexFaces(paramsIdx, function (err, data) {
  /*
    data = {
    FaceRecords: [
       {
      Face: {
       BoundingBox: {
        Height: 0.33481481671333313, 
        Left: 0.31888890266418457, 
        Top: 0.4933333396911621, 
        Width: 0.25
       }, 
       Confidence: 99.9991226196289, 
       FaceId: "ff43d742-0c13-5d16-a3e8-03d3f58e980b", 
       ImageId: "465f4e93-763e-51d0-b030-b9667a2d94b1"
      }, 
      FaceDetail: {
       BoundingBox: {
        Height: 0.33481481671333313, 
        Left: 0.31888890266418457, 
        Top: 0.4933333396911621, 
        Width: 0.25
       }, 
       Confidence: 99.9991226196289, 
       Landmarks: [
          {
         Type: "EYE_LEFT", 
         X: 0.3976764678955078, 
         Y: 0.6248345971107483
        }, 
          {
         Type: "EYE_RIGHT", 
         X: 0.4810936450958252, 
         Y: 0.6317117214202881
        }, 
          {
         Type: "NOSE_LEFT", 
         X: 0.41986238956451416, 
         Y: 0.7111940383911133
        }, 
          {
         Type: "MOUTH_DOWN", 
         X: 0.40525302290916443, 
         Y: 0.7497701048851013
        }, 
          {
         Type: "MOUTH_UP", 
         X: 0.4753248989582062, 
         Y: 0.7558549642562866
        }
       ], 
       Pose: {
        Pitch: -9.713645935058594, 
        Roll: 4.707281112670898, 
        Yaw: -24.438663482666016
       }, 
       Quality: {
        Brightness: 29.23358917236328, 
        Sharpness: 80
       }
      }
     }]
  */
  });
}

I have intentionally missed a step in the code snippet above that involves reading and writing into the database. The database is a simple table that stores a matrix of FaceId and a name, so I can return the name of the visitor for any matched FaceId obtained from the rekognition.searchFacesByImage method above.

An example of the database table is shown below:

FaceId   Name
ff43d742-0c13-5d16-a3e8-03d3f58e980b   Mangesh
ff499e32-0c13-3d26-a6e5-0553e5fd9e0e   John
ff493332-0c13-34d6-a6f5-0f53ehf494ee   Someone

3: Notification

Within the same Lambda function above, when a matched face is found and the name of the visitor is retrived from the database table, I utilized SNS to send SMS notification. Please note there is an assumption here that a SNS topic has been setup and the required phone numbers are subscribed to that topic.

...
var sns = new AWS.SNS();
sns.publish({
  Message: db.visitors.join(", ") + ' ' + (db,visitors.length > 1 ? 'are' : 'is') + ' at the door. ' + fullimage,
  TopicArn: 'arn:aws:sns:us-east-1:949700099995:topic'
}, function (err, data) {
  callback(err, db.visitors.join(",")); // exit out of Lambda
});
...

Text Notification

4: Talk back visitor’s name

Essentially we want to greet the user with their name. This is by far the simplest of all steps as we need our greeting text to be converted to audio. Just to set expectations, there are multiple services which could have been used for this text-to-speech conversion, but I decided to be stuck with AWS, just because - 1. It is easier to call AWS services from Lambda (same ecosystem), and 2. Because it is a new offering which claims to be more realistic (and it does indeed feel that way!).

In this case, we again use the AWS CLI from the Raspberry Pi python script. The AWS polly service in response returns an mp3 file which can be then played using any available audio player on Raspberry Pi itself (make sure the speaker is connected via HDMI or 3.5mm port).

Here is an example (the voice is called ‘Ivy’) -

Conclusion

Artificial Intelligence and Machine Learning doesn’t feel out of reach now as open source libraries and services like that from Amazon Web Services are more accessible to developers. One can simply imagine the breadth and depth of applications that can be developed using these ready-to-use machine learning algorithms (services). The doorbell is just an example, but I cannot stop fathoming applications of AI in variety of domains such as - social networking, security, traffic, medicine, education, etc.

There is still unchartered territory in this space outside of facial recognition, such as assessing facial emotions, recognizing moving objects and vehicles, color recognition, etc. that I will continue to pursue and keep sharing updates. If you have any experience to share in this space, I’m sure we’re all ears!


artificial intelligence, ai, raspberry pi, facial recognition, face tracking, opencv, python, aws, amazon web services, rekognition, lambda, serverless, and api gateway

< BACK