Olim In Tech

Sep 2, 20192 min

Ideas for tackling the DataHack 2019 Challenges

Datahack is the first data-driven hackathon in Israel, designed to bring together the Israeli data community. This year I’ll be mentoring and wanted to provide some preliminary ideas for how to tackle some this years challenges.

Challenge #1 Faceless-recognition

The Challenge: Participants receive low resolution and occluded images of famous actors and will be asked to identify them.

Evaluation: This is a leader-board type competition. Participants will be ranked by model performance. Exact details soon.

https://github.com/orcam-public/datahack2019

Ideas

  1. Sample a few frame pass them through a pretrained image super resolution model and then pass the result through a cognitive service trained for celebrity face recognition.

  2. Take a Large Scale Celebrity Faces Data-set downsample the images and try to predict the original faces by combining a Image Super Resolution Network with an image classifier.

  3. Try to recover the original video clips from youtube by pulling and sampling the first page of results for top celebrities and see if you can find a match.

  4. Run video clips through the free Video Indexer Service demo to get a baseline.

Resources

Challenge #2: Devices Gone Rogue

The Challenge: Given access to traffic data of ~ 1M devices, find the devices that misbehave

Evaluation:As this is an unsupervised challenge, the evaluation process will be a mix of classical leaderboard evaluation and in-person review of the models used.

https://github.com/armis-security/DataHack2019

Ideas

  1. Use an trial cloud service to try to make a quick and dirty Anomaly detection baseline

  2. Use the anomaly detection feature of Facebooks prophet library to get an open source baseline

  3. Use the TagAnomaly developed by Omri Mendels to annotate some of the anomalies and train a semi-supervised model.

  4. Check out the twitter anomaly detection sdk even though it isn’t maintained.

Resources

Challenge #3 Lead Scoring

The Challenge: In our lead-scoring challenge you’ll get 6 months of anonymous user-events (over 4B records) and help us find the crème de la crème, pick of the litter, the best of the best customer that could use our extra attention and get the most out of monday.com!

Evaluation: This is a leader-board type competition. Participants will be ranked by model performance.

Ideas:

  1. Hard to know yet since the haven’t released the data but step one would be tor build a baseline model using AutoML you can get free trial for the datahack here.

Resources:

Challenge #4: Data Driven Art

The Challenge: Create art with Data and AI.

Evaluation: This is an open-ended vertical competition. Participants will be judged by a panel from Lightricks.

Ideas:

  • Generative ART with nvidia GANs

  • Image Colorization Deoldify

  • Tensorflow.js art

Resources:

Hope you enjoyed these tips and that they help you with your DataHack planning. If you have any questions, comments, or topics you would like me to discuss feel free to follow me on Twitter or reach out to me on wednesday.

About the Author

Aaron (Ari) Bornstein is an avid AI enthusiast with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.

    120
    0