Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Data Engineering Project for Beginners (startdataengineering.com)
12 points by jkm2155 on June 7, 2020 | hide | past | favorite | 3 comments


Simple project to help beginners get started with data engineering. This is my fist post on HN, any feedback would be greatly appreciated.


Thanks for the illuminating post. I like how Apache Airflow is used to move the pyspark script to a S3 location so that it can be read by the EMR step. I remember working on a project where we wanted to automate a data pipeline using Airflow and had this problem of how to get our pipeline scripts to the right locations.


@gifflar Glad you found it illuminating :). Yea moving spark script to S3 using a Airflow task is usually the easiest.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: