Where communities thrive

  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
Repo info
    Tom Hazell
    Hi Guys,
    Im doing some profiling on the HTML parsing as we have noticed that it is sometimes very slow (we think on documents that are invalid or otherwise not quite right). So im doing some profiling and the HtmlDomBuilder#HeisenbergAlgorithm keeps coming up, googling HeisenbergAlgorithm does not bring anything related up, would someone be able to point me in the right place that explains what its doing.
    In trying to understand what its doing I was looking through chrome’s blink source code for the place where HeisenbergAlgorithm would have been called, and the logic they employ seems way simpler/quicker. code The equivalent of what they are doing seems to be :
    1. Find the/if there is an A tag in the _formattingElements
    2. Call ProcessFakeEndTag which when you are in the body just calls InBodyEndTag on a new token of type end tag with name tag name a.
    3. removes the active A tag from _formattingElements
    4. If open elements contains the a tag then remove it from there.
      Is there some difference between the 2 parsers that im missing?
    8 replies
    Ryan Cleven
    I was wondering if anyone had an example for how to efficiently substitute one HTML tree in for an element in another using AngleSharp. In other words, take all the tags inside the body of one HTML tree and place them underneath one of the elements in the other tree.

    Hi Everyone. Great to see this tool, I'm evaluating it with purpose of improving our Web UI test automation.

    We are considering to adopt Asp Net Core Test Server (https://docs.microsoft.com/en-us/aspnet/core/test/integration-tests?view=aspnetcore-3.1), and I'm curious what is the best way to integrate AngleSharp into the TestServer pipeline. I can see the MS example first gets full response from WebApplicationFactory created HttpClient, which I can imagine only works for initial page load.

    We would like to include Javascript and subsequent Ajax calls (it's a Asp Net Core Angular SPA website) in the testing, which I assume needs more sophisticated integration with AngleSharp (via the extensibility points, so it internal always uses the HttpClient provided from the TestServer)

    I had a look at the extensibility points and don't know if that's the way to go, and where to start just based on the docs https://anglesharp.github.io/docs/API.html

    Any ideas appreciated => AngleSharp testing Asp Net Core Angular Spa

    Eric Vander Wal
    I am trying to do an auto scroll on Twitter for scraping. Can anyone point me in the right direction or example of how to do the scroll?
    (Twitter content is loaded dynamically). I am little light on JS skills.
    Rune Jacobsen

    Maybe something like this would be a good starting point?


    Find the bottom of the list, last element, something like that, and scroll it into view.
    Eric Vander Wal
    @havremunken , thanks, ill look into it
    Sebastian Loncar
    Is it currently possible to get the calculated position and size of Dom elements? Or is anglesharp just a parser with lots of interfaces without implementation?
    With respecting css styles :-)
    Sebastian Loncar
    Ok, it seems getCalculatedStyle works, but there's not ClientWidth, OffsetWidth, or GetBoundingClientRect :-(
    Greg Bushnell
    is it possible to open up a local html file rather than online?
    1 reply
    System.AppDomain.CurrentDomain.BaseDirectory + "battery-report.html"
    is what im trying to open in var address
    Richard Thompson

    Hi. I'm trying to write some code to find an h1 on a page and get it's computed style. The code I've got works fine for getting styles that are inline in the HTML document but the styles that I have in an external style sheet don't seem to be applied.

    Are external style sheets supported and is there something I need to do to enable external stylesheets, perhaps in the config? I have tried Configuration.Default.WithDefaultLoader(new LoaderOptions { IsResourceLoadingEnabled = true }).WithCss()

    14 replies
    Aleksandr Asanov
    Hi, I am trying to scrap taobao and tmall site
    But After using anglesharp package, I can't get whole documents. that is why they are using javascript to draw dom.
    How can I resolve this issue, Please give me some hints, thanks guys!

    Hi, I"m trying out AngleSharp in F#, and experiencing a very strange with QuerySelectorAll basically working only for the top-level body element, but not for any elements inside it. Here's an example:

    > doc.QuerySelectorAll("body") |> Seq.tryHead |> Option.map (fun n -> n.TagName);;
    val it : string option = Some "BODY"
    > doc.QuerySelectorAll("div") |> Seq.tryHead |> Option.map (fun n -> n.TagName);;
    val it : string option = None

    This is for the documented loaded from this url.

    Oddly enough, the HtmlBodyElement has the correct BaseUrl but the value of OutterHtml is a mere "<body></body>". Hmm, that might be it.
    Or not, as I see the same behaviour with other URLs.
    1 reply
    @FlorianRappl: I followed the README example on github, to load the Wikipedia URL.

    But when I just tried with local content, it works:

    let getDoc (htmlContent: string) = 
        let cfg = Configuration.Default.WithDefaultLoader()
        let ctx = BrowsingContext.New(cfg)
        async { return! ctx.OpenAsync(fun req -> req.Content(htmlContent) |> ignore) |> Async.AwaitTask } |> Async.RunSynchronously
    let main argv =
        let doc = getDoc "<body><div>Hello</div></body>"
        let cells = doc.QuerySelectorAll("div")
        let titles = query {
                for cell in cells do
                    select cell.TextContent
        printfn $"Printing {Seq.length titles} titles"
        for title in titles do
            printfn $"Title: {title}"
        0 // return an integer exit code

    ^ This works, which is good enough I guess, as I don't plan to load remote content anyway.

    By the way, as a new F# dev (and being completely new to dotnet ecosystem; but familiar with FP), the main reason I'm exploring AngleSharp is to figure out how useful it would be for writing my own HTML templating language (based on XML'ish stuff; so stuff like partials & layout and variables are defined the XML way).

    Is it possible to get the contents of a <template> tag (so as to query on it)?

    In JS, you would do it using the .content property.

    Eg: to query the td inside the <template> tag of https://developer.mozilla.org/en-US/docs/Web/HTML/Element/template#examples
    (or even get the whole contents of template as its own document)
    Florian Rappl
    Yes definitely.
    I tried let tmpl = doc.QuerySelectorAll("template"), which returns the template tag, but I can't drill down further than.
    Florian Rappl
    Why - its an HTMLTemplateElement - just cast the result to the right type
    Sorry but C# / F# are no dynamic languages like JS. So you need to get the types right ;)
    fwiw, here's my current code:
        let templates = doc.QuerySelectorAll("template")
        let titles =
            query {
                for tmpl in templates do
                    for p in tmpl.QuerySelectorAll("p") do
                        select p.OuterHtml
    titles is empty seq, so the "p" querying didn't work, which isn't surprising I guess, because I'm suppose to get the fragment inside tmpl and then query on it.
    Florian Rappl
    Yes you need to query the fragment - your current query works against children of templates, not the content in the template
    My assumption is that I need a filler in here: tmpl.GetContentFragment().QuerySelectorAll("p") - but what would be GetContentFragment?
    Don't see anything relevant in auto-completion list for tmpl.
    Not sure what GetContentFragment is / should be. I guess you want Content?
    Right! That must be it; now I gotta figure out casting in F# ...
    QuerySelectorAll returns a collection of IElement. I should cast that to IHtmlTemplateElement
    Wait, you can't down cast it. Looking for polymorhpic querying ... (is that anti-pattern in dotnet)
    Florian Rappl
    1. Either use QuerySelector or iterate over all results (not sure if you interested in a single one or all results).
    2. Check the type before casting it.
    QuerySelector looks interesting - let me see how I can use it to pull multiple <template> tags
    Wait, I thought that's more polymorphic that QuerySelectorAll - but the difference is only in arity

    I was trying something like the following so as to obviate having to cast things latter:

        let templates : IHtmlCollection<IHtmlTemplateElement> =

    But this just throws a

    error FS0001: This expression 
    was expected to have type↔    'IHtmlCollection<IHtmlTemplateElement>'    ↔but here has type↔    'Collections.Generic.IEnumerable<IHtmlTemplateElement>
    Oh wait, wrong collection type
    Ah, now it worked! Using System.Collections.Generic. Polymorphism for the win.

    As an aside, it seems that I'm reinventing XSLT, as it appears to do the 'templating' feature I'm trying to build from scratch: https://developer.mozilla.org/en-US/docs/Web/API/XSLTProcessor/Basic_Example

    (My idea uses JSON as data input; but XSLT uses XML as data input, which might be more interesting when used with AngleSharp)

    Hello guys. I have such code. I would like to click programmatically on the button and it should redirect me to the main GitHub page, but somewhy it doesn't redirect me at all. Maybe you can help me?