We added a summarised version below for those who prefer the written word, made easy for you to skim and record top insights! 📝
Additional note from community moderators: We’re presenting the insights as-is and do not promote any specific tool, platform, or brand. This is to simply share raw experiences and opinions from actual voices in the analytics space to further discussions.
Prefer watching over listening? Watch the Full Episode here ⚡️
Introducing Sitian Gao | Our Analytics Hero at Your Service 🦸🏻♂️
An Analytics Project Lead for Global Analytics Tech at Wayfair, Sitian recently transitioned into the role of Senior Product Manager at Zalando—in a way emphasizing the close-knit nature of common skills between analytics and product domains. Making her story even more relevant for the data product wave.
With experience in delivering impactful projects, Sitian focuses on driving strategic decisions through analytics and data-driven storytelling to enable orgs to create value in customer experiences and business operations.
Sitian focuses on topics around the use of analytics, especially the integration of advanced analytics into retail and supply chain strategies, around innovative analytics frameworks that enhance decision-making processes in e-commerce and logistics.
We highly appreciate her joining the MD101 initiative and sharing his much-valued insights with us!
We’ve covered a RANGE of topics with Sitian. Dive in! 🤿
TOC
Data Layers for Supply Chain: An Overview
Decentralising Maintenance for Data Products
Go-to Analytics Stack
Key Governance Factor: Identifying the Single Source of Truth
Most Effective Way to Deal with Multiple Sources
Growth Map for Analytics Engineers: Empathy for Stakeholders
Leading Analytics Projects: Vision, Hands-On Collaboration, and Stakeholder Management
Operational & Internal Tools
Onboarding New Tools & Platforms: How Decisions Are Made
Experience with DBT
Does a Background in Statistics Help an Analytics Lead?
Furthering Business Growth with Data Modeling
Missing Pieces of the Current Analytics Stack: Enterprises vs. Small Businesses
Semantic Layer in Practice
Extent of Adoption of the Semantic Layer
Impacting Key Business KPIs as an Analytics Vertical
Are Data Catalogs Making it Easy for Analytics Leads?
All About Data Products: POV, Types, Goals & Requirements
A Good Data Product: Threshold for Coverage Acceptance, Domain Alignment, Apt Semantics, & More!
Evolution of the “Data Product Manager” in Supply Chain Analytic
Go-to Resources
Meeting the Real Sitian
Data Layers for Supply Chain: An Overview
The most recent project I worked on involved building data layers and data warehousing for the supply chain space. It required connecting different systems across the entire supply chain, which is a challenge because the data exists at varying levels of granularity. A key part of the work was figuring out how to make the data layer meaningful for end-users, ensuring the system could handle data requests from multiple sources. It’s highly technical but also requires a strong understanding of the final product you're aiming to deliver and aligning the goals with stakeholders to make the execution successful.
Decentralising Maintenance for Data Products
Managing six data products for Wayfair's global supply chain involved a decentralized approach. We had a team for each data product, but the tickets and projects were centrally managed within the same vertical. We used two-week sprints to finish tasks, similar to software development cycles, and project leads were responsible for communicating progress to stakeholders.
We used intake forms to manage stakeholder requests and maintenance tickets, focusing on both new projects and maintaining the live pipeline. Our monitoring system was integrated into Slack for real-time alerts, helping us address issues promptly. Maintenance typically took up 10-20% of our time, with the majority of our focus on developing new features.
Go-to Analytics Stack
For our backend infrastructure, we used SQL scripts for queries and Python to build pipelines along with Airflow. It’s a standard approach and works well for us. To ensure our backend infrastructure is robust, we also use DBT (Data Build Tool), which helps us map out how data flows between tables. It’s beneficial for both the backend team, who can visualize the connections, and for the end users, who can understand the data flow in the front end.
Key Governance Factor: Identifying the Single Source of Truth
Data governance is essential, especially with tools like DBT. To ensure effective governance, we start by defining the single source of truth with our stakeholders. It’s crucial to have a clear agreement, as different parties might have different ideas of what constitutes the “true” data. We hold engineering stand-ups to confirm the best sources and keep the feedback loop open with stakeholders. Data governance isn't just about setting up frameworks but about ongoing communication with upstream and downstream teams to ensure everyone is on the same page.
Most Effective Way to Deal with Multiple Sources
When dealing with multiple data sources, it’s important to prioritize the most reliable ones and use others as backup. I structure my pipelines with a main data source and use smaller scripts for backup solutions. It’s also crucial to communicate with stakeholders to prioritize which sources are most important. Setting up priority rankings for data sources helps me manage the flow of data more efficiently, ensuring that we focus on the most critical information and streamline backend processes.
Growth Map for Analytics Engineers: Empathy for Stakeholders
As an analytics engineer, you need to bridge the technical and business sides. It’s essential to understand both the engineering aspects and the product side to deliver value. In this role, you must act as both a technical expert and a product manager, knowing how to make the technical aspects work for the business.
The role blends art and science, combining analytical skills with a deep understanding of how data can help solve business challenges. My experience in strategy, operations, business analytics, and product analytics has helped me understand both the technical and business needs, making me better equipped to bridge the two worlds effectively.
Leading Analytics Projects: Vision, Hands-On Collaboration, and Stakeholder Management
A project lead must have strong ownership, ensuring projects are delivered on time with quality. It's essential to have a vision for the product, aligned with business goals, and leverage business resources to maximize value. Technical expertise is crucial—being able to code and understand backend systems is necessary. The role also involves hands-on collaboration, coding, and maintaining relationships with stakeholders for future project opportunities. It's about managing the bigger picture while identifying opportunities for growth within the team.
Operational & Internal Tools
Tools like JIRA help centralize stakeholder communication and manage sprints. Internal tools developed by the team also play a significant role in streamlining operations. Communication tools and clear context sharing are vital for better collaboration across the project cycle.
Onboarding New Tools & Platforms: How Decisions Are Made
When exploring new tools, it's important to evaluate functionality and cost. If an existing tool meets all needs, there’s no reason to switch. Tools are assessed through a collaborative process with the procurement team, who gathers feedback and evaluates the tool’s potential cost-effectiveness. Larger company decisions often guide the evaluation, with input from all levels, ensuring the right tools align with business goals.
Experience with DBT
DBT is quite useful, but the major challenge lies in pricing and the long-term commitment required once adopted. Once integrated into a project, it becomes difficult to migrate to other tools, creating dependency. Despite the challenges, if it’s helpful and integrated well, sticking with DBT can be advantageous.
Does a Background in Statistics Help an Analytics Lead?
Statistics helps in evaluating data quality and ensuring metrics are accurate. While not everything in analytics engineering is driven by statistics, the foundational knowledge ensures you can assess if metrics are calculated properly and provide meaningful insights.
Furthering Business Growth with Data Modeling
Machine learning and statistics can be used to understand target customers, define customer groups, and shape marketing and product strategies. However, in recent projects, these tools haven’t been employed yet, but they could be integrated into future initiatives.
Missing Pieces of the Current Analytics Stack: Enterprises vs. Small Businesses
AI tools like GitHub Copilot make coding more efficient, and drag-and-drop pipeline solutions could simplify data engineering for small businesses. While larger organizations may not rely on such products, they can be highly beneficial for startups and small businesses, enabling them to maintain better data quality without a dedicated engineering team.
Semantic Layer in Practice
The semantic layer works well, especially when combined with DBT for organizing and structuring data from various sources like Google Cloud. It allows for customized tables and metrics tailored to stakeholders’ needs, although it's never a perfect solution. The key is ensuring maximum leverage of the data by making it easy to consume.
Extent of Adoption of the Semantic Layer
The adoption of the semantic layer depends on each team's scope and needs. Not every team is on board yet, but pilot programs are being used to persuade stakeholders and gradually transition to the new approach. It’s a process, but it’s starting to work well for everyone involved.
Impacting Key Business KPIs as an Analytics Vertical
The data engineering team has a clear structure in place, and the metrics they support are working well. This has positively impacted the organization’s key business KPIs, allowing the team to operate confidently and effectively.
Are Data Catalogs Making it Easy for Analytics Leads?
Data catalogs help streamline data access and resolve requests effectively, but their usefulness depends on how well they are built and maintained. When a catalog is well-maintained, it significantly aids in organizing and reflecting the most up-to-date data, making the process easier for analytics teams.
All About Data Products: POV, Types, Goals & Requirements
Data Products in Use: Everything today can be considered a product, including data products. A data product could range from a table, a dashboard, or a data warehouse, with various types and uses based on what the team needs to create for the end user.
Role of Analytics Engineers: Analytics engineers create solutions for data consumers, similar to product engineers working on consumer products. They ensure quality, smooth backend operation, and accurate data.
Goal of Data Products: The end goal is to empower users to make better decisions or work more effectively with the data provided, ensuring that data products serve their intended purpose.
A Good Data Product: Threshold for Coverage Acceptance, Domain Alignment, Apt Semantics, & More!
Above 99% coverage is expected, but real-world scenarios (e.g., supply chain data from multiple vendors) may result in less than perfect coverage. Distribution must align with expectations; for instance, larger markets should dominate the data. Columns need clear labelling, accurate sources, and appropriate domain-specific requirements.
Sensitive data may need to be masked, and domain-specific rules (e.g., time zones, measurements) should be followed. Ensure time stamps are consistent and correctly labelled for the specific event, considering the time zone.
Evolution of the “Data Product Manager” in Supply Chain Analytics
Data product managers help logistics companies better understand their data and improve customer experiences by managing data products like TMS (Transportation Management Systems) and WMS (Warehouse Management Systems).
Work with product managers to ensure customers get accurate and frequently updated data. They also help leverage data for competitive advantages. Data product managers must keep up with evolving tech, including AI, and stay updated on industry developments to bridge gaps between customer needs, data management, and the tools provided.
Go-to Resources
To stay updated on analytics tools and technologies, newsletters, release notes from big companies, YouTube channels, podcasts, and hands-on experimentation are recommended.
Meeting the Real Sitian!
Work-life balance is subjective. Everyone has their own goals, and the choice between focusing on career or personal life depends on one’s life stage and priorities.
Travelling and working out are key sources of motivation. Traveling offers new experiences and a change of environment, while working out improves both physical and mental well-being.
Sitian aims to maintain a consistent workout routine, which is her current goal, and aspires to increase workout frequency despite challenges from the changing seasons.
And yeah, I really enjoyed this conversation and hope what I shared helps people looking to transition into data roles!
📝 Note from Editor
The above insights are summarised versions of Sitian Gao’s actual dialogue. Feel free to refer to the transcript or play the audio/video to capture the true essence and details of her as-is insights. There’s also a lot more information and hidden bytes of wonder in the interview, listen in for a treat!
Guest Connect 🤝🏻
Connect with me on LinkedIn 🙌🏻
Share this post