top of page

A tool designed to empower customers by helping them proactively analyse, predict, and resolve cluster capacity growth and expansion issues. Key features include the analysis of historical capacity and performance trends, prediction of future resource utilisation trends, comparison of workload metrics at different time points, simulation of cluster configuration changes, and the ability to subscribe to alerts for current or future undersized clusters.

I designed the UX flows for this feature and created a MoGraph video—scripted, animated, and produced by me—to get more eyes on it and help share the solution with marketing, product, business, customers and sales teams in a company-wide event. 

Context

Cohesity is a data security and management company that helps enterprises protect (Backup), manage, and derive value from their data across cloud-native, virtual, physical, and SaaS environments. Their platform unifies backup and recovery, disaster recovery, file and object services, data security, governance, and analytics under a single architecture.

Cohesity does not traditionally provide primary storage, instead, they focus on secondary data workloads—Backup and restore copies of production data optimized for cost, security, and scale.

Cohesity operates globally in the enterprise, government, healthcare, education, and financial sectors. Its platform scales to petabytes of data across multi-cloud and hybrid environments.

As of December 10, 2024, after combining with Veritas’s enterprise data protection business, Cohesity serves over 12,000 enterprise customers, including public-facing customers such as Nasdaq, Delta, Broadcom, Nationwide, Salesforce, as well as Cisco, Novartis, and Siemens, illustrating adoption across finance, tech, insurance, aerospace, and pharma. More than 85 of the Fortune 100, three of the top‑five U.S. banks, and three of the top‑five U.S. health insurers.

image.png
image_edited.jpg

IT Teams initiate setup and manage day‑to‑day backup infrastructure operations.

 

CISOs/CIOs define policies and ensure Cohesity aligns with organisational data protection goals.

 

Developers/DevOps integrate Cohesity via APIs to automate workflows (such as backup scheduling, restore operations, and cloud sync).

 

Data Analysts & Governance Teams leverage Cohesity’s metadata capabilities for Insights such as compliance checks, data classification, and auditing.

The Problem

As Cohesity grew, so did its customer base—ranging from small teams managing just a few clusters to large enterprises handling thousands of clusters across their infrastructure with very different needs. With this kind of scale, new, diverse challenges began to surface and solving problems at this larger scale was crucial to the company's success.

To understand these challenges, we actively listened to customers in several ways:

  • User Requests: We actively receive emails directly from customers about their requirements.

  • Sales / Field Team Feedback: We gathered insights from our sales and field teams, who talk to customers every day.

  • Support Team Feedback: We reviewed user reports and support tickets to find recurring issues.

All of these requests and feedback from customers are converted to Jira Tickets for tracking and resolving the issue.

image.png

From this data, clear patterns emerged across different parts of our product. One critical issue stood out above the rest: many customers were struggling with cluster sizing and infrastructure planning; this issue wasn’t just technical—it had a real business impact. In other words, customers needed better guidance and tools for planning their cluster resources. This issue became a top priority for us because it directly affected our largest customers. In many cases, the customers managing the most clusters are also our highest-paying clients. If we didn’t act quickly, we risked losing them.

This became a key focus of my design journey, solving the cluster sizing and infrastructure planning challenge. By improving this experience, we aim to help all customers—especially those with large-scale deployments—manage their clusters more effectively and avoid any frustration or resource issues.

User Persona

Who raises these concerns or requests?

 

As we identified the source of these concerns, they were traced back to two distinct user personas.

Storage / Deployment Architects lead the technical design and strategic deployment of Cohesity systems at scale across hybrid data environments. Develop blueprints, deployments, including architecture, sizing, integration, and automation strategies with Cohesity’s DataProtect and secondary storage solutions. Additionally, the backup infras

Backup / Infrastructure Administrators are Operational Custodians who manage daily operations such as backup policies: scheduling, retention, replication, compliance, error troubleshooting and cluster health monitoring. Oversee cluster administration: node addition, upgrades, and configuration.

Conduct capacity planning and trend analysis, generate usage reports, and provide vendor support. Maintain infrastructure consistency using ITSM tools (e.g. ServiceNow), ensure 24x7 availability, and coordinate audits and compliance checks for seamless backup operations.

What is the relationship between these two end-user types?

 

These two roles are complementary: Architects set up scalable, repeatable frameworks; Administrators keep them healthy, compliant, and efficient.

Timeline / Process

What goes into creating a design experience? 

 

My design process goes from understanding the problem, defining needs and design intent, exploring ideas, creating, testing and delivering solutions, and refining the final experience.

Timeline

Research

 "This brief research focuses on understanding the end-to-end user experience of managing data backup infrastructure across cluster environments — from planning and setup to monitoring, securing, and maintaining backup infrastructures' reliability and consistency at scale.
The goal is to identify key challenges, pain points, and security concerns faced by administrators and operators, while uncovering insights and opportunities to design a more intuitive, reliable, and user-centric backup infrastructure management experience " 

 Research Sample & Methodology 

Qualitative Primary Research - 1:1 Remote In-depth Moderated Interviews/discussion session over Zoom call about their daily operations, observations, and needs. Collect data for analysis & synthesis of observed behaviours, pain points, and insights.

Image by Plufow Le Studio

🥷 Senior Support Agent : 2 Units 

Internal : Cohesity Employee

Criteria : A Support Agent who is specialised in the Internal Live Sizer Tool and predominantly assigned to solve the sizing-related queries. Collaborates with developers if required to aid with workload analysis.

Experience : 5+ years of experience in the domain, and 3+ years of experience at Cohesity. Handled customer accounts that manage more than a 1000 clusters.

Location : Anywhere

Image by Plufow Le Studio

🥷 Senior Storage Architect : 2 Units  

External : Cohesity Customer

Criteria : Storage Architect who directly collaborates with Cohesity's Support, Sales and Field Agents for forward and reverse sizing of the cluster workloads, annual growth analysis, planning cluster configurations, licenses and, infrastructure expansion plans.

Experience : 5+ years of experience in the domain, and 3+ years of experience collaborating with Cohesity agents and expanded backup infrastructure.

Location : Anywhere (Jp Morgan & Chase, Saleforce)

Image by Plufow Le Studio

🥷 Senior Sales Agent : 2 Units 

Internal : Cohesity Employee

Criteria : A Sales Agent who is specialised in the Internal Live Sizer Tool and assigned to evaluate the customer needs based on workload requirements, performance reports, policies, projected growth and estimate the cost of Cohesity systems, licenses and, cluster configurations.

Experience : 5+ years of experience in the domain, and 3+ years of experience at Cohesity. 

Location : Anywhere

Image by Plufow Le Studio

🥷 Senior Backup Admin : 2 Units 

External : Cohesity Customer

Criteria : Backup Admin who directly collaborates with Cohesity's Support and Field Agents for forward and reverse sizing of the cluster workloads, annual growth analysis, workload planning, policy configurations and, creating infrastructure expansion requirements sheet.

Experience : 5+ years of experience in the domain, and 3+ years of experience collaborating with Cohesity agents and helped expand backup infrastructure.

Location : Anywhere (Jp Morgan & Chase, Saleforce)

This is a WORK IN PROGRESS, please check back again in a few days for full update of this project. It will be worth your time :)

User Journey

BEHAVIOURSSupport agents use a tool build with salesforce for sizer calculations, A lot of request requires them to connect with developers to retrieve the workload data and the relevant policiesCustomers reach out to them only after the cluster has reached its full capacity and not earlierCustomers request for the workload growth data for 100s of clusters at once, takes long to achieve the reports

BEHAVIOURS The report generated by Cohesity acts as an anchor point in driving the investment decisions, these planning are mostly bi-annual activity. Most of the purchase decisions are long-term investments and any mistake with the data can lead to a huge loss, so customers spend a lot of time analysis whether these reports are generated accurately without missing any data. Customers uses other factors like cloning, scaling and multiplication of the workload growth while determining the five-year term purchase decisions. As these are labour intensive and time consuming, customers expects a quick turnaround time for the analysis reports

PAINPOINTS Its a time consuming process as it involves more stakeholders to align and retrieve the data, some times the there are some error which delays even further Customers reach out very late and their requirements are not solvable only through the internal tool Analysis involves missing data and it directly impacts the report, customers often mistaken it as sale representatives are influencing the budget for own profit motives

PAINPOINTS Cohesity’s lack of guidance in optimising the current workloads puts pressure on the customers as they have to come up with manual intensive procedures to simulate the clusters based on workload analysis reports and plan accordingly using other factors like cloning and scaling. Simulation of these workloads is labour intensive as there are no market-ready tools to solve for the same. The calculations are mostly performed in excel sheets and the final investment decisions are taken approximately and not accurately

Precedence

CompetitorsPure1 Storage Solutions - AnalyticsRubrik - One Storage

MethodologyMapping the user journey and product workflows,Understand their strength and weaknesses,Insights and learnings

 

Persona Understand the target users of the competitors product and the use cases

Insights

Design Brief

A tool designed to empower customers by enabling proactive analysis, prediction, and optimization of their infrastructure. It simulates workloads to plan for efficient performance, resolves cluster capacity growth challenges, and streamlines expansion efforts for seamless scalability.

 

Key features include the analysis of historical capacity and performance trends, prediction of future resource utilization trends, comparison of workload metrics at different time points, simulation of cluster configuration changes for optimization & efficiency, and the ability to subscribe to alerts for current or future undersized clusters.

the process of analysing, predicting and planning cluster capacity growth using manual labour and legacy tools is time consuming, less efficient and prone to human error leading to poor infrastructure planning, poor user experience and mistrust among customers, all at a high operational cost for the company

ANALYSIS PREDICTION SIMULATION OPTIMIZATION

Proposed UX Flow

Explorations

Prototype

User Testing

Customer Feedback​

Confidence Score

Detailed analysis of daily collected data as tables

Actions to be performed by user to analyse the missing data errors - Get help, documentations links, support ticket etc.,

 

what happens for the clusters less than 6 months old - Graph & Forecast Data, Workload Change Analysis?

PM OPEN HOUSE

 

Recommendations to Reduce backup storage sizes

 

Add Licensing Information to the recommendation page

 

Alert on High DCR or Auto-Detection of anomaly Change

INTERNAL FEEDBACK

 

The storage data shown on the list of clusters should be used and not the consumed storage - Yet to be decided

 

90% of storage or load should be exceed limit

 

The exceed limit estimation should also include the category of 6 - 12 months and 1- 2 year terms.

 

Can customers customise their own threshold ranges, say setting what percentage to be a warning and exceed limit. Also what should be the months or time period to define the warning or exceed limit period.

 

The recommendation configuration forecast can be represented as a graph for easier consumption at top executives at the customer end

 

A note that mentions “they can be in touch with a sales executive to get a personalised license quote for the type of nodes configuration”

 

Need multiple states of recommendation pages - when cluster is exceeded, less than 6 months, more than 6 months

 

Should the exceed limit (date) within next 6 months or weeks should be shown here?

 

Search and filter options for workload change reports is needed, what if the user wants to check the VMs under only one source type

 

Can we indicate if the assigned policy itself got switched to another policy

 

The policy change tooltip is hard to consume, the text information has to be improved

 

The views should be able to be switchable between, hierarchy, object, job levels

 

The percentage change can be accompanied with upwards,downwards arrow to indicate better

 

How useful is the recommendation? A star rating as feedback from users.

 

Error points

 

Can Users dismiss the error?

 

What will be the action after clicking on "How to solve?"

 

What will happen once the user fixes the error - a. Will the error gets removed from the graph? b. Will the missing data gets retrieved back?

 

Can there be an error even if the data is available?

Next Steps

bottom of page