Approaching Realtime: Ingestion to Consumption with BigQuery and Looker
Mar 27,2023
Introduction:
Intuitive was selected to Design, Migrate and Orchestrate Customer's existing SQL server reporting environment to Google Cloud Platform (GCP) with integrations between BigQuery and Looker for advanced analytics and visualization.
Challenges
Customer requires a solution that can handle larger data sets with increased performance and scalability over their existing integration with Looker. Supporting future demands with more complex dashboards, while ensuring selected services are FedRAMP High compliant.
The company faced several challenges during the technology integration process, including:
- Integration with Existing Systems
- Massive Data Sets and Streaming Data
- Complex Transformations on in-flight data
- Security and Compliance
Technology Solutions
Designed and built a highly available and scalable environment on the Google Cloud Platform leveraging Google Cloud Storage, BigQuery, Pub/Sub, Dataflow and Cloud Monitoring to ingest, store, translate data from SQL sources. Utilizing a BigQuery & Looker integration to generate complex dashboards.
Intuitive successfully integrated the following innovative technologies:
- Automated Cloud Infrastructure as Code
- Hybrid Ingestion Framework (Batch and Streaming Data Paths)
- BigQuery Ingestion, Curation, Governance, Catalog and Self-Service Layers
- Source to Destination Data Orchestration pipelines
Implementation Strategy
Our Customer needed to transform its current infrastructure application environments to the Google Cloud Platform (GCP). The ingestion to consumption platform will provide a highly available and scalable environment for the applications ML/AI and Advanced Analytics with data delivered at the moment of need and allow for elastic expansion for demand consumption.
Intuitive employed a step-by-step approach to overcome the challenges faced during the integration process:
- Ideation and Business requirements mapping
- Custom Data Mesh Design Architecture supporting batch and real-time data in-flight
- Security and Compliance Assessment
- ROI Measurement and Monitoring
- Benchmarking and Analysis
Results and Impact
Completed a reload of data into BigQuery for 94 tables, this completed and on average was 2 mins per table. The Generated Column in Tables feature present in Microsoft SQL Server is not available in BigQuery; however, we able migrate the generated column logic from Microsoft SQL Server into new BigQuery views to solve this issue. Additionally, Looker dashboard load times saw an 80-90% reduction over Microsoft SQL Server. Using BigQuery we saw a reduction in loading times for complex Looker dashboards by 80-90% This section outlines the measurable results and the overall impact of the integrated technologies on the Enterprise's operations, including:
Lessons Learned
BigQuery and Looker are powerful tools for data storage, processing, and analysis. Here are five lessons that can be learned from implementing BigQuery and Looker:
- Data preparation is key: To make the most of BigQuery and Looker, it's essential to prepare your data beforehand. This involves cleaning, transforming, and structuring the data to ensure that it's accurate, consistent, and easy to analyze.
- Invest in training and education: Both BigQuery and Looker are sophisticated tools that require some level of technical knowledge and expertise to use effectively. Investing in training and education for your team can help them to get the most out of these tools, and to use them efficiently and effectively.
- Collaboration is crucial: BigQuery and Looker are collaborative tools that allow multiple users to access and analyze data simultaneously. Encouraging collaboration among your team can help to generate new insights and ideas, and to identify potential issues or errors.
- Continuous improvement is necessary: As with any tool, BigQuery and Looker require ongoing maintenance and improvement to ensure that they remain effective and efficient over time. This includes monitoring performance, identifying areas for improvement, and implementing changes and updates as needed.
- Clear communication and documentation are essential: To ensure that everyone is on the same page when using BigQuery and Looker, it's essential to establish clear communication channels and documentation. This includes documenting data definitions, data models, and business rules, as well as communicating changes and updates to the team in a clear and timely manner.
Conclusion
In conclusion, the customer success story of implementing a real-time ingestion to consumption architecture with the power of BigQuery and Looker shows the proven value of cloud native and tightly coupled technologies. By adopting a domain-driven approach and empowering individual teams to take ownership of their data, organizations can improve the quality and accessibility of their data, as well as generate new insights and innovations. However, successful implementation of ML and Advanced Analytics requires clear communication, collaboration, and ongoing learning and improvement. By following these principles, organizations can achieve greater agility, efficiency, and value from their data, as demonstrated by the success story of our customer.