Google Designs Its Own Custom Hardware Security Chips to Securely Identify and Authenticate Legitimate Google Devices at the Hardware Level (Image Credit: Google)
Google recently shared details on the security infrastructure that protects its data centers that house both its existing services and its growing Google Cloud Platform (GCP).
While many organizations are traditionally wary about giving out such information for fear of giving attackers an advantage, Google is not. There are two reasons -- the first is to show potential GCP customers the extent of its data center security; while the second is that Google is confident in that security.
In a recently-published paper, Google describes the infrastructure in six layers, from the hardware infrastructure (including physical premise security), through service deployment, user identity, storage services, and internet communication to operational security.
The attention to detail is immediately apparent. For the most part Google builds and owns its own data centers, and incorporates 'biometric identification, metal detection, cameras, vehicle barriers, and laser-based intrusion detection systems.' Where it hosts some servers in third-party data centers, it adds its own security -- such as Google-controlled independent biometric identification systems, cameras, and metal detectors -- to that provided by the data center operator.
The thousands of servers in a Google data center have server boards custom-designed by Google. "We also design custom chips, including a hardware security chip that is currently being deployed on both servers and peripherals. These chips allow us to securely identify and authenticate legitimate Google devices at the hardware level."
There is no assumed trust between any of the different Google services housed within or between the data centers. Any necessary inter-service communication is controlled by cryptographic authentication and authorization at the application layer. This applies to both Google services and user-supplied code for products like Google App Engine or Google Compute Engine. Particularly sensitive services such as cluster orchestration and some key management services are run on dedicated servers.
Where inter-service communication is required and authorized, the infrastructure provides cryptographic privacy and integrity for RPC data on the network. All WAN traffic -- that is, from one data center to another -- is automatically encrypted. This automated encryption is now being extended to all internal traffic with the deployment of hardware cryptographic accelerators within the data centers.
Where inter-service communication is initiated by an end user (for example, when Gmail needs to interact with Contacts) a short-lived permission ticket is generated to ensure the Gmail account can only interact with that user's contacts.
For stored data the infrastructure uses a central key management service. Stored data can be configured to use keys from this key management service before it is written to physical storage. Encryption at the application layer allows the infrastructure to isolate itself from potential threats at the lower levels of storage, such as malicious disk firmware.
Those services that need to be available to the internet do so via the Google Front End (GFE). This ensures that all TLS connections use the correct certificates and support perfect forward secrecy. "In effect," explains Google, "any internal service which chooses to publish itself externally uses the GFE as a smart reverse-proxy front end. This front end provides public IP hosting of its public DNS name, Denial of Service (DoS) protection, and TLS termination."
DoS protection comes via several layers of hardware and software load balancers that can report to a central DoS service. If this detects a DoS attack, it can instruct the load balancers to throttle or drop associated traffic. A similar process occurs at the GFE, which has application layer information not seen by the load balancers.
User authentication goes beyond username and password with the addition of challenges for additional information based on a range of risk factors. 2FA is also offered; and Google worked with the FIDO Alliance to develop the Universal 2nd Factor (U2F) open standard.
To ensure its own safe software development, Google uses libraries and frameworks to eliminate XSS vulnerabilities, automated tools to detect bugs, and manual security reviews. It backs this with a public bug bounty program that has paid several million dollars as rewards; and further prides itself in the effort it puts into and success it has had in locating vulnerabilities in any open source software it uses. "For example, the OpenSSL Heartbleed bug was found at Google and we are the largest submitter of CVEs and security bug fixes for the Linux KVM hypervisor."
One of its primary methods for eliminating insider risks is to "aggressively limit and actively monitor" admin access. Methods include automation for some usual admin activity, the requirement for two-party approvals for some actions, and limited APIs that allow debugging without exposing sensitive information.
Intrusion detection and response increasingly uses machine learning. "Rules and machine intelligence... give operational security engineers warnings of possible incidents." Red Team exercises are used to measure and improve the effectiveness of the rules.
The final section of this paper is probably its main purpose -- a discussion of the Google Cloud Platform. Built using the same infrastructure, it benefits from the same security processes but with additional service-specific improvements. The paper uses the Google Compute Engine cloud service as an example.
GCE exposes its external API via the GFE -- and thus gains the same security features as all other services, including DoS protection and centrally managed SSL/TLS support. End user authentication is done via Google's centralized identity system, which provides additional features such as hijacking detection.
The creation of virtual machines is effected by the GCE management control plane. Any traffic from one data center to another is automatically encrypted; and traffic from one VM to another within the same data center is progressively being encrypted by the same hardware accelerators used for the rest of the infrastructure. VM isolation is based on the open source KVM stack, further hardened by Google.
The operational security controls described earlier are used to ensure that Google makes good on its cloud policy, "namely that Google will not access or use customer data, except as necessary to provide services to customers."
2016 saw Google's cloud business begin to make a meaningful impact on parent Alphabet's overall revenue. Together with the sale of digital content via its Play store it accounted for $2.4 billion or about 10% of Google's $22.3 billion revenue in Q3 2016.