远程
全职
发布于 2024-10-15
Data Scientist
数据科学家
1,500 - $3,000
remote
Trusta Labs
交易所 · 50-200 人
bigdateETLSpark
职位描述
1. ETL 流程设计与开发
负责大数据 ETL 流程的设计、开发和优化,确保数据的准确性、完整性和及时性。理解业务需求,参与数据仓库架构设计,制定合理的 ETL 解决方案,满足不同业务场景的数据处理要求。
2.Spark 应用开发
使用 Spark 进行大规模数据处理和分析,开发 Spark 应用程序,实现数据的清洗、转换和加载等操作。优化 Spark 作业性能,对 Spark 任务进行调优,提高数据处理效率,降低资源消耗。
3.Python 编程与脚本开发
利用 Python 编写数据处理脚本和工具,用于数据采集、预处理、监控等任务。
与其他团队协作,将 Python 代码与 Spark 应用集成,实现更复杂的数据处理流程。
4.PySpark 集成与开发
在 PySpark 环境下进行开发,充分发挥 Python 和 Spark 的优势,实现高效的数据处理和分析。解决 PySpark 开发过程中遇到的技术问题,如数据类型转换、性能优化、内存管理等。
5.数据质量保障
职位要求
1.ETL process design and development
Participate in the design, development and optimization of big data ETL processes to ensure the accuracy, integrity and timeliness of data.Understand business requirements, participate in data warehouse architecture design, and formulate reasonable ETL solutions to meet data processing requirements in different business scenarios.
2.Spark application development
Use Spark for large-scale data processing and analysis, develop Spark application programs, and implement operations such as data cleaning, transformation and loading.Optimize the performance of Spark jobs, tune Spark tasks, improve data processing efficiency and reduce resource consumption.
3.Python programming and script development
Use Python to write data processing scripts and tools for tasks such as data collection, preprocessing, and monitoring.Collaborate with other teams to integrate Python code with Spark applications to achieve more complex data processing flows.
4.PySpark integration and development
Develop in the PySpark environment, give full play to the advantages of Python and Spark, and achieve efficient data processing and analysis.Solve technical problems encountered in the PySpark development process, such as data type conversion, performance optimization, and memory management.
福利待遇
Please email your resume to romola.wang@trustalabs.com