Yes, absolutely right. The core tech. between crypto-valuations and tech-valuations stayed the same. It was basically: Python + spacy (plus a lot of different supporting libraries).
Some surprising aspects:
For tech-analysis, you can accomplish a lot by using a rule-based parsing because the source texts (e. g. patents) have the same sentence structure. In fact, several researches have shown that patent text follow a certain text structure (e. g. SAO, i. e. Subject-Action-Object).
For crypto, this was for more difficult as the text structures were all over the place.
Also, crypto-analysis ("back then") was very messy because it was difficult to find a trustworthy data set. With technologies you can confine it to Wikipedia, patents, scientific papers. There is still a lot to analyze, but at least you have a somewhat official data set.
Also with crypto you have far less data points per company/token/coin which makes it hard for a machine to not disregard it as noise.
Similarly, with tech-evaluation it seems that - because you get so much data from one document (e. g. one patent) you can often disregard a (big?) portion of it and still end up with good results.
Additionally, it seems to me that crypto-analysis was supposed to be far more numbers-heavy (how much funding etc.) and thus the tolerance for error was relatively small. E. g. if you miss one funding (out of three) you can change the company's valuation up to 50%. This happened to me basically all the time which was super frustrating.
The last surprising fact was how difficult and complicated keyword extraction is. For crypto evaluation I just went with relative word frequency (the more a word appears in a text the more important it becomes, assuming it does not appear in all the documents). However, as I have learnt with tech-evaluation, there are maybe four of five strategies for keyword extraction. And this is still an area where I have not found a solid solution for my NLP-case.
Finally, after all the reading that went into building researchly (which is relatively little) I have realized that I know significantly less about NLP than I have initially thought. It still fascinates me what kind strategies/algorithms people come up with.
Thanks for the depth of your replies, love mapping information, finding patterns, “seeing the future”... though more often than not, even if you knew for sure some aspect of the future in a predictable way, still very hard to make use of it, mainly enjoy the topic as more as a info geek more than anything else.
Some surprising aspects:
For tech-analysis, you can accomplish a lot by using a rule-based parsing because the source texts (e. g. patents) have the same sentence structure. In fact, several researches have shown that patent text follow a certain text structure (e. g. SAO, i. e. Subject-Action-Object).
For crypto, this was for more difficult as the text structures were all over the place.
Also, crypto-analysis ("back then") was very messy because it was difficult to find a trustworthy data set. With technologies you can confine it to Wikipedia, patents, scientific papers. There is still a lot to analyze, but at least you have a somewhat official data set.
Also with crypto you have far less data points per company/token/coin which makes it hard for a machine to not disregard it as noise.
Similarly, with tech-evaluation it seems that - because you get so much data from one document (e. g. one patent) you can often disregard a (big?) portion of it and still end up with good results.
Additionally, it seems to me that crypto-analysis was supposed to be far more numbers-heavy (how much funding etc.) and thus the tolerance for error was relatively small. E. g. if you miss one funding (out of three) you can change the company's valuation up to 50%. This happened to me basically all the time which was super frustrating.
The last surprising fact was how difficult and complicated keyword extraction is. For crypto evaluation I just went with relative word frequency (the more a word appears in a text the more important it becomes, assuming it does not appear in all the documents). However, as I have learnt with tech-evaluation, there are maybe four of five strategies for keyword extraction. And this is still an area where I have not found a solid solution for my NLP-case.
Finally, after all the reading that went into building researchly (which is relatively little) I have realized that I know significantly less about NLP than I have initially thought. It still fascinates me what kind strategies/algorithms people come up with.