protege - Solutions Engineer (Media)
Upload My Resume
Drop here or click to browse · PDF, DOCX, DOC, RTF, TXT
Requirements
• 4-7 years of experience in data science, media analytics, technical curation, or similarly hands-on data roles. • Strong SQL proficiency and comfort querying large, messy datasets to generate insight and action. • Experience working with media metadata, embeddings, or unstructured content. • Ability to translate nuanced customer or model requirements into concrete dataset specifications. • High standard for data quality, operational rigor, and usability of delivered outputs. • Clear communicator who can move between technical depth and customer-friendly clarity. • Thrive in ambiguous, fast-moving environments and treats teammates with kindness. • Bonus if you also have: • Familiarity with video/audio processing, embeddings, or multimodal AI workflows. • Prior experience curating or packaging datasets for machine learning. • Background in content analysis, recommendation systems, or information retrieval. • Working with Protege • We move fast - thoughtfully. Speed matters in what we're building, but so does intention. We're biased toward action and always learning. • We're a lean, high-trust team. Everyone has real ownership. Clarity and autonomy drive our best work. • We take our work seriously, not ourselves. We solve hard problems with humility and celebrate wins - big and small. • We're kind, direct, and inclusive. We give feedback early and often, with the goal of helping one another grow. • We're builders at heart. Every person at Protege is hands-on, resourceful, and focused on creating momentum. • We grow fast - together. You'll be surrounded by people who care about impact, who challenge you to think bigger, and who are genuinely excited about what comes next.
Responsibilities
• Own data quality and curate media datasets • Partner with Sales and Solutions to translate customer requirements into curation strategies • Work with imperfect partner data, including mismatched metadata, schema differences, and incomplete labeling • Normalize and standardize datasets for reliable downstream use • Query and analyze Protege’s media catalog using SQL, internal APIs, and metadata tools to identify relevant content • Build validation checks and workflows to ensure dataset integrity before delivery • Identify, debug, and resolve data quality issues across file structures, metadata, and content alignment • Use AI tools and transcoded embeddings to surface and refine clip-level content • Turn messy, real-world data into structured datasets that meet customer and model requirements • Run iterative sample reviews with customers, incorporate feedback, refine selections, and ensure final packages meet spec • Be the catalog expert • Build deep expertise in Protege’s media catalog structure, metadata, and growth patterns • Track content coverage, diversity, and modality mix, and identify gaps relative to customer demand • Partner with Product and Partnerships to share catalog insights that inform sourcing priorities • Operate across product, data, and customer • Work cross-functionally to ensure content packaging meets technical, ethical, and licensing requirements • Develop methods, scripts, and internal tools that improve curation efficiency and scale • Help shape Protege’s delivery platform, including how internal users and customers search, sample, and export data • Drive human-in-the-loop media search and curation • Work closely with embedding-based systems to iterate between algorithmic selection and human review • Define best practices for embedding queries, relevance evaluation, and content diversity • Maintain a high bar for operational excellence and quality assurance throughout the process • What Success Looks Like • 30 days: Learn and get operational • Build a working understanding of the media catalog, delivery lifecycle, and core tools. • Establish strong cross-functional relationships and shadow live curation workflows. • 60 days: Deliver and improve • Lead dataset sampling and curation for active use cases, and document reusable workflows. • Surface early insights on catalog coverage, metadata quality, and process improvements. • 90 days: Scale and influence • Create repeatable QA and delivery workflows that increase consistency and speed. • Provide actionable feedback that shapes platform, sourcing, and catalog roadmap decisions.
Benefits
• Pre-seed/seed-stage startup • Series A or B startup • Series C or D startup • Series E+ startup • Protege would be my first startup • or drag and drop here • In 2-3 sentences, please explain how your experience aligns well with the role
Similar Jobs
No credit card. Takes 10 seconds.