Google's new cloud service eases data preparation for machine learning
One of the challenges that data scientists face when running machine learning workloads is processing information before it’s ready for use. Google unveiled a new cloud service Thursday aimed at easing that pain.
Google Cloud Dataprep will automatically detect data schemas, joins, and anomalies like missing or duplicate values, without requiring coding. After that, it will help users build a set of rules for processing the information. Those rules are then built in Apache Streams format and can be imported into products like Google’s Cloud Dataflow for processing information as it’s imported into services like the BigQuery data warehouse service.
While Cloud Dataprep is built to prepare data for machine learning, the system also uses machine learning itself to try to determine which rules will be most useful for customers. As of Thursday, it’s available in private beta.
BigQuery is receiving a number of enhancements as well, including a new Commercial Datasets program that’s now available in public beta. It will let users take information from AccuWeather, Dow Jones, Xignite, HouseCanary, and Remine and directly feed it into BigQuery for further processing.
BigQuery can also now query data stored in Cloud Bigtable, Google’s managed NoSQL database offering for low-latency data. That means users can write one SQL query that can tap into information from Bigtable and BigQuery. In the past, they’d have to write a program to search Bigtable.
Advertising customers will be able to send data from Google Adwords, DoubleClick Campaign Manager, DoubleClick for Publishers, and YouTube to BigQuery for further use in analytics and other big data applications. That feature may help encourage the company’s fleet of advertising customers to try Google’s Cloud as it faces down Amazon and Microsoft.
Speaking of database news, the company announced that its Cloud SQL managed database offering now offers beta support for PostgreSQL in addition to MySQL.
All of the news was announced as part of Google Cloud Next, the company’s user conference for businesses and enterprises taking place in San Francisco. The announcements come alongside other news about the company’s cloud platform, including changes to pricing and support for custom runtimes in AppEngine.
Source: InfoWorld Big Data