When someone starts looking for optimizing the performance of their web application they immediately come across this tool called lighthouse by Google.
Google lighthouse is an awesome tool to find out the performance issues in your web application and list down all the action items. This list helps you fix the issues and see the green color performance score on your Google lighthouse report.
With time Google lighthouse has become a defacto standard for web performance measurement. Google is pushing it everywhere from chrome dev tools to browser extensions, page speed insight to web.dev, and even webmaster search console. Anywhere if you talk about performance you will see the Google lighthouse auditing tool.
This article will cover the usage of Google lighthouse, its strengths, and its weaknesses. Where to trust it and where to not. Google has advertised all the benefits of the tools and integrated it in all of its other major tools like search console, page speed insight, and web.dev. This forces people to improve their score sometimes at the cost of something important.
Many teams do weird things to see green ticks in their Google lighthouse report without knowing the exact impact of it on their conversion and usability.
Google lighthouse has made it very easy to generate your site performance report. Open your site, go to dev-tools click Audit Tab, and run the test. Boom you got the results. But wait can you trust the score you got, the answer to this is a big no.
Your results vary a lot when they are executed on a high-end machine vs when executed on a low-end machine because of different available CPU cycles to the Google lighthouse process. You can check the CPU/Memory power available to the Google lighthouse process during the test at the bottom of this report.
The Google lighthouse team has done a great job in throttling the cpu to bring computation cycles down to an average of most used devices like MOTO G4 or Nexus 5X. But on a very high-end machine like the new fancy MacBook Pro throttling CPU cycles does not drop CPU cycles to the desired level.
Let a high-end processor like Intel i7 can execute 1200 instructions in a sec by throttling it 4x only 300 instructions will get executed.
Similarly, a low-end processor like intel i3 can only execute 400 instructions in a sec and by throttling it to 4x only 100 instructions can get executed.
It means everything on intel i7 or any other higher-end processor will be executed faster and will result in much better scores.
One of the critical matrices in the Google lighthouse is TBT (Total Blocking Time) which depends on CPU availability. High CPU availability ensures a fewer number of long tasks (tasks that take more than 50ms). Less the number of long tasks lower is the TBT value and higher is the performance score.
This is not the only problem, Google lighthouse scores can differ between multiple executions on the same machine. This is because Google lighthouse or in fact any application cannot control the CPU cycles as this is the job of the operating system. The operating system decides which process will get how many computation cycles. It can reduce or increase CPU availability based on many factors like CPU temperature, other high priority tasks, etc.
Below are the Google lighthouse scores on the same machine when it is executed 5 times for housing.com once serially and once in parallel. When executed serially results are completely different than when run in parallel.
The operating system distributes the cpu cycle among 5 processes when running in parallel. During serial execution, all available cpu cycles were utilized by a single process.
let numberOfTests = 5;
let url = 'https://housing.com';
let resultsArray = [];
(async function tests() {
for(let i =1;i <= numberOfTests; i++) {
let results = await launchChromeAndRunLighthouse(url, opts)
let score = results.categories.performance.score*100;
resultsArray.push(score);
}
console.log(median(resultsArray));
console.log(resultsArray);
}());
Median - 84
[ 83, 83, 84, 84, 85]
Results are pretty much consistent.
const exec = require('child_process').exec;
const lighthouseCli = require.resolve('lighthouse/lighthouse-cli');
const {computeMedianRun as median} = require('lighthouse/lighthouse-core/lib/median-run.js');
let results = [], j=0;
for (let i = 0; i < 5; i++) {
exec(`node ${lighthouseCli}
https://housing.com
--output=json`, (e, stdout, stderr) => {
j++;
results.push(JSON.parse(stdout).categories.performance.score);
if(j === 5) {
console.log(median(results));
console.log(results);
}
});
}
Median - 26
[ 22, 25, 26, 36, 36 ]
You can clearly see the difference in scores between the two approaches.
This is the most complex issue which I see with Google lighthouse reporting. Every application is different and optimizes the available resource where it sees the best fit.
Gmail is the best example of this case. It prioritizes emails over any other things and mails get interactive as soon as the application loads in the browser. Other applications like Calendar, Peak, Chat, Tasks keep loading in the background.
If you will open the dev tools when Gmail is loading you might get a heart attack seeing the number of requests it makes to its servers. Calendar, Chat, Peak, etc. adds too much to its application payload but Gmail’s entire focus is on emails. Google Lighthouse fails to understand that and gives a very pathetic score to Gmail applications.
There are many similar applications like Twitter, a revamped version of Facebook. Performance is one core metric for these websites but they all fail to impress Google lighthouse.
All these companies have some of the best brains who very well understand the limitations of the tool. They know what to fix and what aspects to be ignored from Google lighthouse suggestions. The problem is with organizations that do not have resources and time to explore and understand these limitations.
Search google for “perfect lighthouse score” and you will find a hundred articles explaining how to achieve 100 on the Google lighthouse. Most of them have never checked other critical metrics like conversion or Bounce rate.
The only solution to this issue is to measure more and regularly. Define core metrics your organization is concerned about and prioritize them properly. Performance has no meaning if it is at the cost of your core metrics like conversion.
Inconsistency in Google lighthouse scores cannot be solved with 100% accuracy but can be controlled to a greater extent.
Cloud services are again an awesome way to test your site quickly and get a basic performance idea. Some of the google implementations like page speed insight tries to limit the inconsistency by including Google lighthouse lab data and field data (google tracks the performance score of all sites you visit if you allow Google to sync your history). Webpagetest queues the test request to control CPU cycles.
But again they also have their own limitations.
You will be amazed by seeing the delta between the smallest and largest of ten test runs of a single page on web.dev. Prefer to take the median of all results or remove the outliers and take avg of the remaining tests.
Google lighthouse team has again done a great job here by providing a CI layer for self hoisting. The product is lighthouse CI.
This is an amazing tool that can be integrated with your CI Provider (Github Actions, Jenkins, Travis, etc) and you can configure it as per your needs. You can check the performance diff between two commits, Trigger Google lighthouse test on your new pull request. Create a docker instance of it, this is a way where you can control CPU availability to some extent and get consistent results. We are doing this at housing.com and pretty much happy with the consistency of results.
The only problem at present I see with this approach is It is too complex to set up. We have wasted weeks to understand what exactly is going on. Documentation needs a lot of improvement and the process of integration should be simplified.
Web vitals are core performance metrics provided by chrome performance API and have a clear mapping with the Google lighthouse. It is used to track field data. Send data tracked to GA or any other tool you use for that sake. We are using perfume.js as it provides more metrics we are interested in along with all metrics supported by web vitals.
This is the most consistent and reliable among all the other approaches as It is the average performance score of your entire user base. We can make huge progress in optimizing our application by validating this data.
We worked on improving our Total Blocking Time(TBT) and the Largest Contentful Paint(LCP) after identifying problem areas. We improved TBT by at least 60% and LCP by 20%.
TBT improvements Graph
LCP improvements Graph
The above improvements were only possible because we were measuring things. Measuring your critical metrics is the only way to maintain the right balance between performance, conversion, etc. Measuring will help you know when performance improvement is helping your business and when it is creating problems.
Developers apply all sorts of tricks to improve their Google lighthouse scores. From lazy loading offscreen content to delaying some critical third-party scripts. In most cases, developers do not measure the impact of their change on user experience or the users lost by the marketing team.
Lighthouse performance scores depend upon the three parameters
To improve your performance score, the Google lighthouse report provides tons of suggestions. You need to understand the suggestions and check how feasible they are and how much value those suggestions will bring to your website.
Let us take a few suggestions from each category of the Google lighthouse report and see what are the hidden cost of implementing them.
Google Lighthouse suggests optimizing images by using modern image formats such as webp or avif and also resizing them to the dimension of the image container. This is a very cool optimization and can have a huge impact on your LCP score. You can enhance it further by preloading first fold images.
To build a system where images are resized on the fly or pre resized in many possible dimensions on upload is a tedious task. In both ways, depending upon your scale you might need to take a huge infra burden that needs to be maintained and also invest.
A better approach is to implement it on a single page for a limited image and track your most critical metrics like conversion, bounce rate, etc. And if you are really happy with the ROI then take it to live for all of your images.
Google Lighthouse recommends reducing your Javascript and CSS size as much as possible. Javascript or CSS execution can choke the main thread and the CPU will be unavailable for more important stuff like handling user interaction. This is a fair idea and most people understand the limitation of js being single-threaded.
But Google took the wrong path here. In the upcoming version, the Google lighthouse will start suggesting the replacement of larger libraries with their smaller counterparts. There are multiple problems with this approach.
Most libraries get larger because they solve more corner cases and feature requests. Why do people say webpack is tough because it handles so many edge cases that no other bundler handles. Imagine if webpack did not exist then half of us would have stuck in understanding the different kinds of module systems js supports.
Similarly, the popular frontend frameworks are large because they handle too many things, from backward compatibility to more bugs. Jumping to a new library may cause issues like weak documentation, bugs, etc. So if you plan to pick this item get ready to have an expert developer team.
It is highly unlikely that Google will recommend Preact to React because of the emotional attachment community has with the React framework. Doing this is unprincipled and unfair with the maintainers of projects whose community is not aggressive in nature.
Google itself does not follow rules created by themselves. Most of the google products load way too much Javascript.
A company which has the best resources around the world has never focused on their own lighthouse score but wants the entire world to take it seriously. There seems to be some hidden agenda of Google behind this like faster the web better is their ad revenue.
Google should learn from this famous quote
Before taking any step to reducing javascript on your page like lazy loading off-screen components please calculate its impact on your primary metrics like conversion, user experience, etc.
Every website must try to avoid any kind of layout shift which may cause issues in user experience. But there will be cases where you will not have many options to avoid CLS.
Let a website want to promote app downloads to users who have already not installed the app. Chrome has added support to detect if your app is already installed on the device(using getInstalledRelatedApps API) but this information is not available to the server on the first request.
What the server can do is make a guess and decide if it needs to append the app download banner on the page or not. If the server decides to add it and the app is already present on the device, the Download banner needs to be removed from the page, and similarly when the server decides to not include the download banner and the app is already not installed on the device it will be appended to the DOM on the client which will trigger Cumulative layout shift(CLS).
To avoid CLS you will remove the banner from the main layer of the page and show it as a modal, floating element or find some other way to show it, but what if you get maximum downloads when the banner is part of your page. Where will you compromise?
On a funny note, Most people have already experienced CLS on the google search result page.