In a recent policy update, tech giant Google decided to collect data from all sources available online to train its AI models, including Bard.
Under the new rules, Google will be able to collect data from a variety of public sources, including social media posts, government records, and websites. This data will be used to train artificial intelligence models for various purposes, such as spam filtering, fraud detection, and language translation.
Google states that the use of public data is necessary to train accurate and effective AI models. The company also said it will take steps to protect user privacy, such as de-identifying data before it is used to train models.
Google’s policy page states: “We may share non-identifying data publicly and with our partners — such as publishers, advertisers, developers or rights holders. For example, we share public information to show trends in the general use of our services.”
What is Google’s new policy?
Google’s public data collection policy is not very transparent, so users must read this policy carefully to understand what information Google collects.
Here is the policy update that mentions, “Google uses the information to improve our services and develop new products, features, and technologies that will benefit our users and the public.
“For example, we may collect publicly available information online or from other publicly available sources to help train Google’s AI models and build products and features like Google’s capabilities. Translate, Bard and Cloud AI. Or, if your business information appears on a website, we may index it and display it across Google services.” it added.
In the past, the company has used this information to update and train language models to improve its existing products, such as Google Translate. Now, the company has explicitly mentioned that all public data will be used to update its AI products.
The image above is from Google’s policy archive where green represents newly added information.
The dangers of data scraping
This new policy update may cause serious privacy and data theft issues. While companies typically keep user data confidential for future use and new product development, Google’s new policy allows the company to use any public information to train models. your AI image.
This means that Google can access and process any type of data available on the Internet, including personal data. The company mentioned that they de-identify the sources, but that could still be an issue.
First, it may violate the privacy of individuals. When data is stolen without permission, individuals may not know that their data is being collected or how it is being used. This can lead to a number of problems, such as identity theft and financial fraud.
Second, scraped data can be used to create biased AI models. If AI models are trained on data pulled from the Internet, those models may reflect biases already in the data. This can lead to artificial intelligence models that discriminate against certain groups of people.
Finally, data scraping can disrupt the Internet. The most recent example of this is the Twitter outage. When data is stolen from websites, it can slow down websites and make them difficult to use.
Elon Musk has expressed his concern about data mining and has decided to limit the number of tweets people can read per day. It is also continuously working to make the platform more secure by monetizing various services.
In short, the new policy can certainly help Google create powerful AI, but it will also pose a security risk. Policies can also lead to increased data theft and privacy violations. It is important to carefully monitor how Google enforces these policies.