Video Doorbell Scoring

Scoring system version: 1.0

To provide a data-driven, objective basis for comparative video doorbell performance in our reviews, the following set of measurable, repeatable testing procedures have been designed. These tests are based on the key factors that are important for a video doorbell to fulfill the roles that owners would expect of them in areas such as surveillance, notifications, package monitoring, and reliability.

The measurements are based on real world usage scenarios, tested while installed in a real home setting. All of our usual editorial review guidelines still apply.

The scores derived from these tests are presented in product review, comparison, and buyers guide articles with color coded indicators as follows:

Video Doorbell Performance Categories

Video doorbells are evaluated across 7 key performance categories, with an optional category for HomeKit specifically where that is supported. Each of these is made up of several relevant measurements with specifically defined scoring criteria. The overarching categories are:

Camera performance
Audio performance
Notification performance
Motion detection performance
Smart detection performance
Battery performance (if applicable)
App Experience
HomeKit Secure Video (optional)

Camera Performance

Video Quality

This test provides an objective comparison score which factors in camera resolution, lens and sensor pairing, viewing angle, and video encoding quality as they pertain to the ability to distinguish fine detail of a given object at a given distance.

The Landolt C vision test chart is used, in full daylight, as this chart style provides an unambiguous clarity determination when the gaps in the ‘C’ shapes are visible. Due to encoding artifacts, these are visible or not, so there is no ambiguity.

The measurement is taken as the maximum distance that gaps in both samples of the top row of the chart are clear in the live view at maximum zoon. To account for encoding artifacts, the chart is held stationary for several seconds to allow for key frame updates to stabilize the image, and then further fine adjustments forwards or backwards are made until the maximum distance is found. Maximum possible score is achieved at 8m.

Scoring is the measured distance as a percentage of the maximum score, expressed out of 10.

Night vision Performance

This test determines the IR illumination effectiveness on video quality. The result is derived from the maximum stabilized distance from the camera where the gaps of both samples in the top row on the Landolt C vision test chart can be clearly distinguished in the live view of the doorbell app at maximum zoom.

The test is conducted at default or ‘recommended’ video quality settings where available as this best represents typical real-world usage. To account for video encoding artifacts the chart is held stationary for several seconds to allow for key frame updates to stabilize the image and then adjusted forwards or backwards (with further stabilization holds) until the maximum range is determined. Maximum possible score is achieved at 7m to account for reduced illumination compared to daylight conditions.

Scoring is non-linear for this test to reward longer range night sight and placing typical cameras at approximately the 50th percentile. This is calculated by doubling the measured distance and dividing by the maximum scoring distance, expressed as a percentage of 10.

Dynamic Range

This test determines the ability of the doorbell to make out faces and other details with strong contrasting back light conditions. Using a ISO14525 OECF 36 test chart at 1m, the chart is held in front of the camera in shade with a sunlit backdrop. Only clear sky midday conditions are used for this test. The recorded video is paused with the test chart visible, and the number of clearly defined test patches are counted.

To eliminate subjective interpretation, a zoomed screenshot of the chart is taken from the camera and imported into a photo editor. The color selection tool is then used to attempt to define each square. Where the tool can select a square shape on a given swatch a full point is awarded. A poorly defined square is awarded half a point.

Scoring is the resulting number of points as a percentage of 12, expressed out of 10.

OECF 36 dynamic range test chart — Imatest OECF 36 Test Chart

Audio Performance

Two-Way Talk Quality

This test is used to evaluate the audio pickup quality and two-way talk effectiveness of the doorbell. The test is conducted with one person indoors on the doorbell app with two-way talk active and another outside. Starting at a 10m maximum test distance, 10 of the ‘Harvard Phrases’ are spoken at normal speaking volume by each person to gauge the maximum audible distances for parties. The distance for each party is found by repeating the test at decreasing distances until each party can hear and understand all the words in the test phrase.

Measurements are recorded for each party separately, one for outdoor audibility, and one for in-app recorded audibility. The test is performed only while no background sound is present and during low wind conditions at default doorbell volume. Maximum possible score for each party is achieved at 10m. Scoring is a combination of the maximum recorded distance as a percentage of the maximum scoring distance, expressed out of 10.

Audio clarity is measured by noting the presence of corruption (recurring audio skips, corruption, or hitches) in the audio for each party as follows:

4 points - no corruption.
1 point - instances of corruption more than 2 second apart.
0 points - corruption less than 2 seconds apart.

Encoding quality is factored in by assessing the dynamic range of the audio as heard from inside and outside. Distortion and low frequency response is scored as follows:

4 points - good voice recreation.
1 point - some distortion of loss of range (flat/tinny frequency response).
0 points - significant distortion.

Weighting:

25% - Outdoor audibility.
25% - In-app audibility.
15% - In-app clarity.
10% - Outdoor clarity.
15% - In-app encoding.
10% - Outdoor encoding.

Recorded Audio Quality

This test assesses the quality of audio captured within recorded video clips. This can vary from live video, so is measured separately. Starting from a base of 9 points, 1 point is deducted for each of the following. Scoring is the final score as a percentage of the maximum, expressed out of 10. These factors are observed during the daily motion detection test samples.

Poor frequency response (flat/tinny reproduction).
Normal human speech cannot be understood more than 1m away.
Normal human speech cannot be understood more than 3 m away.
Audio skips/breaks more than 2 seconds apart.
Audio skips/breaks less than 2 seconds apart.
Obvious presence of compression artifacts.
Minor distortion.
Significant distortion.
Significant static or noise present.

Notification Performance

Notification delay

While notifications on their own are important, the vast improvement in situational awareness and convenience gained from an effective thumbnail of the event makes this features a fundamental part of what constitutes good notification performance. As such, this test is a combined score of delay with and without thumbnails (if optional), weighted in favor of thumbnails. A 3 second delay is considered perfect, maximum allowed notification delay for any points is 30 seconds.

To ensure consistent triggering for timing, the event will be triggered by exiting the door next to the doorbell and walking directly away. Average of 5 attempts for text-only and 10 attempts with thumbnails, at 1 attempt per day. Only successful motion detection events are included.

An additional 5 samples will be taken on different days for doorbell ring notifications to verify a consistent response time. These will typically be much faster given the minimal processing required (no camera involvement).

Scoring is the average delay subtracted from the maximum of 30 seconds, normalized for a 3-20 second range, expressed as a score out of 10.

Weighting:

50% - Rich notification delay (thumbnails on).
25% - Text-only delay (thumbnails off or not available).
25% - Doorbell rings

Limitations

This test is performance under ideal conditions using a close range Wi-Fi access point, close range positioning of any required base station, a high end smart phone, and a fibre-optic internet connection. This setup eliinates network related variables as much as possible to provide fair comparison between models. Your results may vary based on your own network performance and hardware.

Thumbnail effectiveness

Effective capture means the subject of the motion event is present in the thumbnail in both lateral and approach detection tests. This is more likely to be missed in lateral motion tests given the typically shorter time in frame (depends on the horizontal field of view).

Average of the 20 attempts of each type at 1 attempt per day. Test will be conducted equally between forward lit, back lit, and night conditions. Only successful motion detection events are included, and only when rich notifications are being tested.

Scoring is the percentage of effective thumbnails captured expressed out of 10.

Motion Detection Performance

Missed Events

Percentage of missed motion events. A missed event occurs when a test attempt does not result in a notification or recorded event. This includes timing, lateral and forward approach tests, and smart detection tests and is repeated until a successful test is recorded in each of those categories. The number of attempts is noted.

If an event is detected and recorded, but no notification is sent when it should have been (depending on settings), this is considered a partial failure and will be counted as 0.5 misses.

Score is the percentage of captured events as a percentage of total attempts, expressed out of 10.

Camera Wake Delay

This test determines how quickly the camera can respond to a motion event. This factors in the maximum motion detection angle, motion detection delay, wake up time, and stream initiation. In order to account for differences in motion detection range and fluctuations in walking speed during the test, the measurement is distance covered walking from one side of the field of view to the other across the front of the camera. By doing this at close range, motion detection range limitations are eliminated, and we get a better measurement for the wake-up speed.

To account for network fluctuations and other uncontrolled variables 15 attempts will be recorded, 1 per day at different times of day: 5 in morning light, 5 in afternoon light, and 5 at night.

Scoring is the horizontal percentage of the recorded field of view remaining when recording commenced, expressed out of 10 with larger being better.

Event Capture

This test determines how effectively the camera captures general motion events at varying distances. This factors in the maximum motion detection range, motion detection delay, wake up time, and stream initiation. In order to account for differences in motion detection range and fluctuations in walking speed during the test, the measurement is distance covered walking towards the camera from a fixed starting point at 10m.

Maximum motion detection range is first determined using a zig-zag test. The subject approaches the camera slowly in a zig-zag pattern, advancing only 1 foot (~30cm) on each crossing and pausing for 3 seconds at each point. The fastest notification setting will be configured on the doorbell (no thumbnails), and the sensitivity set to the highest setting. The test is continued until a motion notification is triggered. Final distance is adjusted for the motion notification delay as determined in the notification performance tests.

The motion capture test is then conducted over multiple test sessions. To account for network fluctuations and other uncontrolled variables 15 attempts will be recorded, 1 per day at different times of day: 5 when the subject is front lit, 5 when back lit, and 5 at night.

A casual walking pace will be maintained towards the camera starting from behind solid cover, so the start of the motion event is at a consistent distance.

While the maximum detection range is used for context, it does not factor into the score as the actual event capture distances are what counts. The scoring is the average distance from the camera that recording commences, nomalized on a logistic function curve where x0 = 3.5, L = 10, and k = 0.7. After much testing these values give scores that are correctly indicative of comparative performance between models and fit sensibly on our color-coded scale.

Weighting:

20% - Maximum detection range.
80% - Average percentage of maximum range remaining when recording commenced.

Smart Detection Performance

Package Monitoring

Combined score of package area visibility, package detection accuracy, and additional package monitoring features.

Detection is tested during the approach detection test while carrying a typical mid-size package. Two different packages are used to provide low-contrast and high-contrast detection. The package is placed within the package detection zone at the end of the approach run. 10 attempts of each type at 1 attempt per day. Different positions in the zone will be used each day. Scoring is the percentage of packages detected expressed out of 5.

1 point each for:

Visibility of the test package directly below doorbell.
Visibility of porch area in front of the doorbell.
More than 30 degrees off center visibility to the side.
Presence of active package alerting feature.
Presence of additional package alerts (i.e. pickup reminders, tamper alerts, etc.).

Weighting:

50% - Detection success.
50% - Package monitoring features.

Smart Detection Features

This test accounts for the smart detection options available, and how well they actually identify objects. This score combines the tally of smart detection features and object detection accuracy as observed in each of the motion detection tests.

Smart features add 1 point each for:

Custom motion zones (custom means zones of an arbitrary shape, not just rectangles).
Person detection.
Animal Detection.
Vehicle Detection.
Facial Recognition.

Object detection accuracy is measured during each motion detection test. Scoring is the percentage of correctly identified objects (within those supported) expressed out of 5.

Weighting:

50% - Detection accuracy.
50% - Smart features.

Battery performance

This test provides an indicative comparison of battery usage. Actual usage will vary greatly based on installed environment, climatic factors, and configured settings for video quality, motion detection sensitivity and the use of smart features. By testing battery consumption in a consistent setting, over a fixed duration, and with a consistent level of usage (the tests) we can get a basis for comparison between models.

Starting at 100%, rating incorporates typical domestic usage and the above testing over a period of 30 days. HDR and smart detection features will be enabled (where available) and motion detection sensitivity will be at maximum (necessary for accurate missed event measurements) and recording quality will use the default or recommended settings.

The final battery reading is normalized between 10 and 90 percent and converted to a value out of 10. This scaling takes into account the heavier than normal usage incurred during testing. Under this load a projected 3 months of battery life is considered best-in-class.

Limitations

Battery life is not just impacted by usage, but by environmental factors. Testing is conducted in a temperate climate with typically no extreme cold events. Usage in freezing weather will significantly reduce expected battery life, potentially by up to 70%. Nonetheless, this result provides a controlled comparison between models which will translate proportionally to other conditions.

Time To Dead

After the 30-day test cycle, doorbells are moved to the ‘run down rack’ where they will be exposed to an average of 20 motion events per day until they cease functioning. This is known as ‘time to dead’ and is more practical than a ‘time to zero’ measurement as doorbells may stop functioning above 0% battery state of charge.

This measurement provides an additional indication of the battery life you can expect from a doorbell model but is not used in the scoring directly.

A photo of the run down rack used to test the total battery life of video doorbells. — The ‘run down rack’

App Experience

Live Response

This test measures the time to stream for live view, which is important for responding to events in real time after a notification is received. The intention is to measure the effectiveness of the streaming platform and infrastructure used by the doorbell. As such, testing is done on the local network only for consistency as cellular or public Wi-Fi connections introduce a high level of uncontrolled latency. Maximum acceptable time to start for any points is 20 seconds.

Live view is also used for responding to doorbell ring events. As such, an additional 5 daily samples will be taken to measurement the response time to live view from the notification screen. As these will have an additional time lage due to having to open the app first, they will constitute a smaller portion of the score.

Scoring is the maximum allowed time minus the average time to start, expressed out of 10. 17 attempts on 12 different days (12 normal, 5 ring).

Weighting:

70% - Time to start live from in the app.
30% - Time to start live from the ring notification.

Limitations

Performance over public or cellular connections will be vastly different to this result, depending on specific circumstances in each case. As there are too many variables for this to provide any meaningful comparison, it is noted that this is not indiciative of what can be expected when out of the home.

Privacy and Security

These factors represent the essential components of a viable security model for a smart app, with higher points awarded for design that promotes security in an accessible way for typical users. Points are awarded for each key factor, converted to a percentage of the maximum possible score of 16, expressed out of 10.

Authentication (choose one)

4 points - Device-based auth token with long life.
3 point - Periodic Biometric auth on open.
2 points - Periodic manual password re-entry on open.
1 point - Password re-entry on every open.
0 points - No authentication.

Access Management (cumulative)

1 point - Device sharing between accounts via invite.
1 point- Authenticated device/user list.
1 point - Ability to revoke access to device/user.
1 point - Two-factor authentication is available.
1 point - Two-factor authentication is mandatory.

Firmware Updates (choose one)

1 point - firmware updates are clearly notified in app.
2 point - firmware updates are advertised by push notification.
3 points - firmware updates are automatically installed without intervention.

Privacy Controls (cumulative)

1 point - Privacy zones can be created.
1 point - Audio recording can be disabled.
1 point - Live view can be easily disabled.
1 point - Recording can be easily disabled.

App Usability

App usability is typically a more subjective area, as such the following factors have been identified as aspects that objectively promote a usable and functional user experience. Other design choices will appeal to different users, so cannot be scored, but will be noted in the review content. Points are awarded for each key factor, converted to a percentage of the maximum possible score of 14, expressed out of 10.

Feature accessibility

2 points - Device features are logically organized and easy to find.
1 point - Non-standard/new features have explanatory text.
1 point - Non-standard/new features have additional explanatory diagrams/animations.
1 point - New features are announced clearly in the interface.
1 point - Support documentation is contextually linked where significant explanation may be required.
1 point - Support documentation is easily located.

Quality Issues

Lose 1 point - minor bugs observed that cause cosmetic or slight nuisance that doesn’t impact usage.
Lose 2 points - significant bugs observed that obstruct usage and cause frustration.
Lose 4 points - major bugs observed that cannot be worked around.

Recording retrieval

1 point - Event type is clearly identified.
1 point - Events can be filtered by type.
1 point - Events can be filtered by date/time.
1 point - Recent events are retrieved and listed within 1 second.
1 point - Recent events are retrieved and listed within 3 seconds.
1 point - Events are retrieved reliably every time the device is opened.
1 point - Events can be scrubbed forwards and backwards smoothly and intuitively.

HomeKit Secure Video

For doorbells that support HomeKit Secure Video this additional category will be included. The reason for this is that HKSV largely replaces the functions of the native app and the camera itself and offloads video processing to the local home hub. This means that scores for the categories of notifications, motion detection, smart features, and the live response metric can be significantly different than those produced using the doorbell’s native app.

Each of the affected metrics will be tested separately. HomeKit specific results will then be noted alongside each metric in their respective review categories, and the average of these will constitute the overall HomeKit Secure Video performance score. This score will not be included as part of the overall score for the video doorbell being tested, but will be listed separately in the review for and used in comparisons and buyers guides where HomeKit is the focus.