Jump to content

A humble brag?


goldendesign

Recommended Posts

1 hour ago, goldendesign said:

Might be a bit technical but I'll try and get a 10,000 ft view of a cool project I just finished.  I am quite proud of it and had to share. Here's the 10,000 foot view

I deployed to our companies network a new machine learning process. Each week there are radio station surveys sent to all the Panelists, people whose listening habits we track, and Radio stations themselves to track their websites and website names. This isn't so difficult to organize, there's only about 48,000 radio stations in the US. The tricky part was for a process required by our oversight company; being FCC regulated means we have to track a lot of things that are just there to see if we track it properly.

They wanted the URLs filtered, adjusted for any user misspellings or erroneous characters and then to apply a space between words to allow for easier review. So take here: squarewheelscycling.com/index.php?/forum/3-the-cafe

It would need to become square wheels cycling . com / index . php ? / forum / 3 - the - cafe

Again for a single website, not too difficult. The real trick was applying logic across such a vast area of words, sub words, abbreviations, station call letters, misspellings, and slang. I built a "word bag" of possible variations/combinations to the tune of 1.8 million rows then wrote a Convolutional neural network (CNN) that checks essentially all possible variations of the wordbag to URL.

The entire thing took about 4 months to complete but saves about 100 hours a month in work from two different departments and the oversight company can pull ad-hoc requests through a web portal.

 

Fun for the most part but damn, coming up with the word bag was a mind numbing task.

Sometimes you look at a problem, see it differently from others and find a solution that no one else thought of. That eurika moment makes the drudgery all seem worthwhile.

Those were the fun days.

  • Heart 3
  • Awesome 1
Link to comment
Share on other sites

23 minutes ago, jsharr said:

QuickBooks

Oof. Both QB and PP use rest API with python frameworks. The trick is you have to pay for the queries from both to get meaningful transactions.

QBs about 2k a year for a reasonable refresh rate per day. While PP limits most of their API frameworks only for payment purposes. But if thats what your after,  it can be done. 

I have a fellow DE friend that works in TX, I can send him your way if ya need.

Link to comment
Share on other sites

I'm not technical to understand the details, but I do understand that you had an absurdly difficult task that you were able to accomplish with creativity and hard work. That must be very satisfying. - congratulations!

  • Heart 2
  • Awesome 1
Link to comment
Share on other sites

6 hours ago, goldendesign said:

Might be a bit technical

a new machine learning process.

only about 48,000 radio stations in the US.

They wanted the URLs filtered, adjusted

applying logic across such a vast area of words, sub words, abbreviations, station call letters, misspellings, and slang

a Convolutional neural network

a web portal.

:scratchhead:    :dontknow: 

I haven't a clue.  I even ran all that through my Starfleet universal translator, and the dang thing just melted into a heap of slag.  Now all I can do is sell it on Ebay as a slightly used Parody Meter.

 

7 hours ago, goldendesign said:

coming up with the word bag

At last!  The one part of the post I was able to (somewhat) figure out!

image.png.4af24336d0eef282428dd3023b8afc4e.png

 

I bow to you, Professor Goldendesign!  Programming like that rises above mere Art - it's sheer Wizardry!

Link to comment
Share on other sites

7 hours ago, goldendesign said:

adjusted for any user misspellings or erroneous characters

What if the misspellings and the characters are not erroneous??? 

2 hours ago, Kirby said:

I'm not technical to understand the details, but I do understand that you had an absurdly difficult task that you were able to accomplish with creativity and hard work. That must be very satisfying. - congratulations!

This ^^^

I always thought CNN was a cable news network. 

Link to comment
Share on other sites

Just now, Bikeguy said:

What if the misspellings and the characters are not erroneous??? 

That's where lots and lots of data to train an neural net to figure out. Since they updated the websites list monthly I had a pretty good base to learn from.

There was on average a 7% change in the list from month to month, couple that with 4 years of historical surveys and thats how you find the seed, the word bag, of 1.8 million words. 

Look up Ngram recursion if you want to see a simplified version. 

Mine was small, Google for its "OK Google " uses a word bag thats 3 terabytes large. Mine for all the words is still 300mb

Link to comment
Share on other sites

1 minute ago, goldendesign said:

That's where lots and lots of data to train an neural net to figure out. Since they updated the websites list monthly I had a pretty good base to learn from.

There was on average a 7% change in the list from month to month, couple that with 4 years of historical surveys and thats how you find the seed, the word bag, of 1.8 million words. 

Look up Ngram recursion if you want to see a simplified version. 

Mine was small, Google for its "OK Google " uses a word bag thats 3 terabytes large. Mine for all the words is still 300mb

OMG... that's way above my pay grade.. 

So how soon before Skynet is on line? 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...