Marvel is seeking an experienced AWS Cloud Data Engineer with significant DevOps and SysOps experience. The company is headquartered in NYC, but this role is envisioned as a remote position and will focus on providing engineering support for a variety of analytics, data architecture and data science endeavors.
In this role you will primarily lead the design and evolution of the BI data lake, while working closely with the rest of the BI team to build a flexible and robust data infrastructure.
You will co-develop and maintain an evolving number of data pipelines to enable advanced reporting capabilities across our Subscription, Publishing, Digital, Consumer Products, Corporate and Marketing teams. This role will function as Marvel's primary subject matter expert on creating and maintaining efficient data pipelines to clean, conform and deliver large datasets, both structured and unstructured, to downstream destinations.
As the team's ETL expert, there will be ample opportunities to conduct complex data wrangling/munging operations using a variety of AWS and other industry standards tools including, Python, Pandas, PySpark, Glue Studio, Lambdas, Advanced SQL, etc. You will also work with APIs, create complex materialized views and scheduled queries, as well as develop error logging routines and associated alerts. You can expect to work closely with the Enterprise Architect and Data Architect to design and deploy best practice data governance operations as well.
Our team is small but capable, and looking for an intellectually curious engineer with excellent communication skills, who is interested in working very closely to co-invent solutions to complex problems. Our department is also an environment that encourages constantly learning and cross-functional growth, so you can expect to explore and expand your knowledge in areas like supervised and unsupervised machine learning, graph databases, AI and applied statistics/mathematics.
• Develop batch and real-time data pipelines, and lead the integration of Marvel's many 1st, 2nd, and 3rd party data sources, while working closely with other engineering services such as the personalization and testing teams.
• Create data catalogs and validation routines to ensure quality and correctness of key operational datasets and metrics in real time.
• Build integrations with organic and paid media platforms to effectively deliver data to support the optimization of various KPIs.
• Partner with Data Architect to build data infrastructure that enables activation, attribution and segmentation capabilities across growth, retention and marketing objectives
• Collaborate with lifecycle and product marketing teams to democratize insights that will drive subscriber engagement using data driven solutions.
• Coach other engineers and BI team members on best practices and technical concepts of building large scale, robust and well governed data platforms.
• Excellent communicator and collaborator, able to apply technical acumen to drive business outcomes.
• A natural team player with a willingness and desire to engage in cross training in a small team environment.
• 4+ years in big data and/or data intensive projects in industry or academic/research settings
• 4+ years of deep experience developing in Python
• Expert Level SQL developer – with an emphasis on Redshift but capable across multiple other transactional databases such as PostgreSQL, MySQL, Mongo dB, etc.
• Significant experience developing with PySpark
• Experience engineering big-data solutions using technologies like Redshift, Spark, S3, DynamoDB, Glue Studio and AWS SageMaker, Glue Studio, SageMaker Data Wrangler, etc.
• Demonstrated understanding of data engineering tools and practices, including platforms like Airflow, Databricks, Snowflake, and Jenkins
• Experience with deploying and running AWS-based data solutions and comfortable deploying tools such as Cloud Formation, AWS Glue, Kinesis, DynamoDB, Athena and Lambda
• Demonstrated experience applying Master Data Management including metadata management, data lineage, and the principles of data governance
• Ability to deliver technical solutions in the face of challenging data conditions.
• Experience implementing marketing technology stacks including real-time messaging and attribution pipelines.
• Experience leveraging DMPs (BlueKai) and CDPs (mParticle, Segment) to create deterministic user profiles that can be leveraged across a variety of applications.
• Experience integrating with ML platforms and experimentation frameworks.
• Familiarity with front-end development frameworks and experience in full stack development a plus.
• An experimental and autodidactic mind set towards expanding your own capabilities.
• Familiarity with binary data serialization formats such as Parquet, Avro, and Thrift.
BA / BS Degree or relevant work experience.
This position is with Marvel Entertainment, LLC, which is part of a business segment we call Marvel Entertainment.
Jobcode: Reference SBJ-rjq1pe-44-192-54-67-42 in your application.