New Schema.org support for retailer shipping data

Tuesday, September 22, 2020

Quick summary: Starting today, we support shippingDetails schema.org markup as an alternative way for retailers to be eligible for shipping details in Google Search results.

Since June 2020, retailers have been able to list their products across different Google surfaces for free, including on Google Search. We are committed to supporting ways for the ecosystem to better connect with users that come to Google to look for the best products, brands, and retailers by investing both in more robust tooling in Google Merchant Center as well as with new kinds of schema.org options.

Shipping details, including cost and expected delivery times, are often a key consideration for users making purchase decisions. In our own studies, we’ve heard that users abandon shopping checkouts because of unforeseen or uncertain shipping costs. This is why we will often show shipping cost information in certain result types, including on free listings on Google Search (currently in the US, in English only).

Retailers have always been able to configure shipping settings in Google Merchant Center in order to display this information in listings. Starting today, we now also support the shippingDetails schema.org markup type for retailers who don't have active Merchant Center accounts with product feeds.

For retailers that are interested in this new markup, check out our documentation to get started.

Posted by Kyle Kelly, Shopping Product Manager

New open source robots.txt projects

Monday, September 21, 2020

Last year we released the robots.txt parser and matcher that we use in our production systems to the open source world. Since then, we've seen people build new tools with it, contribute to the open source library (effectively improving our production systems- thanks!), and release new language versions like golang and rust, which make it easier for developers to build new tools.

With the intern season ending here at Google, we wanted to highlight two new releases related to robots.txt that were made possible by two interns working on the Search Open Sourcing team, Andreea Dutulescu and Ian Dolzhanskii.

Robots.txt Specification Test

First, we are releasing a testing framework for robots.txt parser developers, created by Andreea. The project provides a testing tool that can validate whether a robots.txt parser follows the Robots Exclusion Protocol, or to what extent. Currently there is no official and thorough way to assess the correctness of a parser, so Andreea built a tool that can be used to create robots.txt parsers that are following the protocol.

Java robots.txt parser and matcher

Second, we are releasing an official Java port of the C++ robots.txt parser, created by Ian. Java is the 3rd most popular programming language on GitHub and it's extensively used at Google as well, so no wonder it's been the most requested language port. The parser is a 1-to-1 translation of the C++ parser in terms of functions and behavior, and it's been thoroughly tested for parity against a large corpora of robots.txt rules. Teams are already planning to use the Java robots.txt parser in Google production systems, and we hope that you'll find it useful, too.

As usual, we welcome your contributions to these projects. If you built something with the C++ robots.txt parser or with these new releases, let us know so we can potentially help you spread the word! If you found a bug, help us fix it by opening an issue on GitHub or directly contributing with a pull request. If you have questions or comments about these projects, catch us on Twitter!

It was our genuine pleasure to host Andreea and Ian, and we're sad that their internship is ending. Their contributions help make the Internet a better place and we hope that we can welcome them back to Google in the future.

Posted by Edu Pereda and Gary, Google Search Open Sourcing team

Googlebot will soon speak HTTP/2

Thursday, September 17, 2020

Quick summary: Starting November 2020, Googlebot will start crawling some sites over HTTP/2.

Ever since mainstream browsers started supporting the next major revision of HTTP, HTTP/2 or h2 for short, web professionals asked us whether Googlebot can crawl over the upgraded, more modern version of the protocol.

Today we're announcing that starting mid November 2020, Googlebot will support crawling over HTTP/2 for select sites.

What is HTTP/2

As we said, it's the next major version of HTTP, the protocol the internet primarily uses for transferring data. HTTP/2 is much more robust, efficient, and faster than its predecessor, due to its architecture and the features it implements for clients (for example, your browser) and servers. If you want to read more about it, we have a long article on the HTTP/2 topic on developers.google.com.

Why we're making this change

In general, we expect this change to make crawling more efficient in terms of server resource usage. With h2, Googlebot is able to open a single TCP connection to the server and efficiently transfer multiple files over it in parallel, instead of requiring multiple connections. The fewer connections open, the fewer resources the server and Googlebot have to spend on crawling.

How it works

In the first phase, we'll crawl a small number of sites over h2, and we'll ramp up gradually to more sites that may benefit from the initially supported features, like request multiplexing.

Googlebot decides which site to crawl over h2 based on whether the site supports h2, and whether the site and Googlebot would benefit from crawling over HTTP/2. If your server supports h2 and Googlebot already crawls a lot from your site, you may be already eligible for the connection upgrade, and you don't have to do anything.

If your server still only talks HTTP/1.1, that's also fine. There's no explicit drawback for crawling over this protocol; crawling will remain the same, quality and quantity wise.

How to opt out

Our preliminary tests showed no issues or negative impact on indexing, but we understand that, for various reasons, you may want to opt your site out from crawling over HTTP/2. You can do that by instructing the server to respond with a 421 HTTP status code when Googlebot attempts to crawl your site over h2. If that's not feasible at the moment, you can send a message to the Googlebot team (however, this solution is temporary).

If you have more questions about Googlebot and HTTP/2, check the questions we thought you might ask. If you can't find your question, write to us on Twitter and in the help forums.

Posted by Jin Liang and Gary

Questions that we thought you might ask

Why are you upgrading Googlebot now?

The software we use to enable Googlebot to crawl over h2 has matured enough that it can be used in production.

Do I need to upgrade my server ASAP?

It's really up to you. However, we will only switch to crawling over h2 sites that support it and will clearly benefit from it. If there's no clear benefit for crawling over h2, Googlebot will still continue to crawl over h1.

How do I test if my site supports h2?

Cloudflare has a blog post with a plethora of different methods to test whether a site supports h2, check it out!

How do I upgrade my site to h2?

This really depends on your server. We recommend talking to your server administrator or hosting provider.

How do I convince Googlebot to talk h2 with my site?

You can't. If the site supports h2, it is eligible for being crawled over h2, but only if that would be beneficial for the site and Googlebot. If crawling over h2 would not result in noticeable resource savings for example, we would simply continue to crawl the site over HTTP/1.1.

Why are you not crawling every h2-enabled site over h2?

In our evaluations we found little to no benefit for certain sites (for example, those with very low qps) when crawling over h2. Therefore we have decided to switch crawling to h2 only when there's clear benefit for the site. We'll continue to evaluate the performance gains and may change our criteria for switching in the future.

How do I know if my site is crawled over h2?

When a site becomes eligible for crawling over h2, the owners of that site registered in Search Console will get a message saying that some of the crawling traffic may be over h2 going forward. You can also check in your server logs (for example, in the access.log file if your site runs on Apache).

Which h2 features are supported by Googlebot?

Googlebot supports most of the features introduced by h2. Some features like server push, which may be beneficial for rendering, are still being evaluated.

Does Googlebot support plaintext HTTP/2 (h2c)?

No. Your website must use HTTPS and support HTTP/2 in order to be eligible for crawling over HTTP/2. This is equivalent to how modern browsers handle it.

Is Googlebot going to use the ALPN extension to decide which protocol version to use for crawling?

Application-layer protocol negotiation (ALPN) will only be used for sites that are opted in to crawling over h2, and the only accepted protocol for responses will be h2. If the server responds during the TLS handshake with a protocol version other than h2, Googlebot will back off and come back later on HTTP/1.1.

How will different h2 features help with crawling?

Some of the many, but most prominent benefits of h2 include:

Multiplexing and concurrency: Fewer TCP connections open means fewer resources spent.
Header compression: Drastically reduced HTTP header sizes will save resources.
Server push: This feature is not yet enabled; it's still in the evaluation phase. It may be beneficial for rendering, but we don't have anything specific to say about it at this point.

If you want to know more about specific h2 features and their relation to crawling, ask us on Twitter.

Will Googlebot crawl more or faster over h2?

The primary benefit of h2 is resource savings, both on the server side, and on Googlebot side. Whether we crawl using h1 or h2 does not affect how your site is indexed, and hence it does not affect how much we plan to crawl from your site.

Is there any ranking benefit for a site in being crawled over h2?

No.

Sharing what we learned on the first Virtual Webmaster Unconference

Tuesday, September 15, 2020

The first Virtual Webmaster Unconference successfully took place on August 26th and, as promised, we’d like to share the main findings and conclusions here.

How did the event go?

As communicated before, this event was a pilot, in which we wanted to test a) if there was an appetite for a very different type of event, and b) whether the community would actively engage in the discussions.

To the first question, we were overwhelmed with the interest to participate; it definitely exceeded our expectations and it gives us fuel to try out future iterations. Despite the frustration of many, who did not receive an invitation, we purposefully kept the event small. This brings us to our second point: it is by creating smaller venues that discussions can happen comfortably. Larger audiences are perfect for more conventional conferences, with keynotes and panels. The Virtual Webmaster Unconference, however, was created to hear the attendees’ voices. And we did.

What did we learn in the sessions?

In total, there were 17 sessions. We divided them into two blocks: half of them ran simultaneously on block 1, the other half on block 2. There were many good discussions and, while some teams took on a few suggestions from the community to improve their products and features, others used the session to bounce off ideas and for knowledge sharing.

What were the biggest realizations for our internal teams?

Core Web Vitals came up several times during the sessions. The teams realized that they still feel rather new to users, and that people are still getting used to them. Also, although Google has provided resources on them, many users still find them hard to understand and would like additional Google help docs for non-savvy users. Also, the Discover session shared its most recent documentation update.

The topic of findability of Google help docs was also a concern. Attendees mentioned that it should be easier for people to find the official search docs, in a more centralized way, especially for beginner users who aren't always sure what to search for.

Great feedback came out from the Search Console brainstorming session, around what features work very well (like the monthly performance emails) and others that don’t work as well for Search Console users (such as messaging cadence).

The Site Kit for WordPress session showed that users were confused about data discrepancies they see between Analytics and Search console. The Structured Data team realized that they still have to focus on clarifying some confusion between the Rich Results Test and the Structured Data Testing Tool.

The e-commerce session concluded that there is a lot of concern around the heavy competition that small businesses face in the online retail space. To get an edge over large retailers and marketplaces, e-commerce stores could try to focus their efforts on a single niche, thus driving all their ranking signals towards that specific topic. Additionally, small shops have the opportunity to add additional unique value through providing expertise, for example, by creating informative content on product-related topics and thus increasing relevance and trustworthiness for both their audience and Google.

What are the main technical findings for attendees?

The Java Script Issues session concluded that 3rd party script creep is an issue for developers. Also, during the session Fun with Scripts!, attendees saw how scripts can take data sets and turn them into actionable insights. Some of the resources shared were: Code Labs, best place to learn something quickly; Data Studio, if you’re interested in app scripts or building your own connector; a starting point to get inspired: https://developers.google.com/apps-script/guides/videos

Some myths were also busted...

There were sessions that busted some popular beliefs. For example, there is no inherent ranking advantage from mobile first indexing and making a site technically better doesn't mean that it's actually better, since the content is key.

The Ads and SEO Mythbusting session was able to bust the following false statements:

1) Ads that run on Google Ads rank higher & Sites that run Google Ads rank better (False)

2) Ads from other companies causing low dwell time/high bounce lower your Ranking (False)

3) Ads vs no ads on site affects SEO (False)

For the community, with the community

As this event was interactive, we are extremely happy to see how friendly, constructive and productive the conversations were. We'd like to also use the opportunity to thank our Product Experts who facilitated sessions, namely (in alphabetical order) Ashley Berman-Hale, Dave Smart, Kenichi Suzuki and Mihai Aperghis along with the many facilitating Googlers.

What can you expect in the future?

As we mentioned previously, the event was met with overwhelmingly positive responses from the community - we see there is a need and a format to make meaningful conversations between Googlers and the community happen, so we're happy to say: We will repeat this in the future!

Based on the feedback we got from you all, we are currently exploring options in terms of how we will run the future events in terms of timezones, languages and frequency. We've learned a lot from the pilot event and we're using these learnings to make the future Virtual Webmaster Unconference even more accessible and enjoyable.

On top of working on the next editions of this event format, we heard your voice and we will have more information about an online Webmaster Conference (the usual format) very soon, as well as other topics. In order to stay informed, make sure you follow us on Twitter, YouTube and this Blog so that you don’t miss any updates on future events or other news.

Thanks again for your fantastic support!

Posted by Aurora Morales & Martin Splitt, your Virtual Webmaster Unconference team

Webmaster Central Blog